Data Classification

View data classification insights across your Microsoft 365 content. Data classification provides visibility into sensitive information types, sensitivity labels, retention labels, and trainable classifiers applied to your organization’s data.

Note: Data classification features are available with Microsoft 365 E3 and above. Advanced classifiers and auto-labeling require E5 or Compliance add-on licensing.

Classification Overview

Category	Description
Sensitive Info Types	Content matching built-in or custom sensitive information type patterns (SSN, credit cards, etc.)
Sensitivity Labels	Documents and emails with applied sensitivity labels (Confidential, Internal, Public)
Retention Labels	Content with retention classifications controlling lifecycle and deletion
Trainable Classifiers	ML-classified content identified by machine learning models (contracts, resumes, financial statements)

Content Explorer

Browse content that has been classified by type, label, or location. The Content Explorer provides:

Aggregate counts by classification type without accessing actual content
Drill-down by location (Exchange, SharePoint, OneDrive)
Filter by specific sensitive info type or label
View distribution across departments and sites

Note: Content Explorer requires the Content Explorer Content Viewer role to see actual item content. List Viewer role shows counts only.

Activity Explorer

Track classification-related activities across the organization:

Activity	Description
Label applied	Sensitivity or retention label applied to content
Label changed	Label upgraded or downgraded on content
Label removed	Label removed from content
DLP match	Content matched a DLP policy rule
Auto-label applied	Label applied automatically by policy
Label downgrade justified	User provided justification for lowering sensitivity

Sensitive Information Types

Built-in patterns for detecting sensitive data:

Financial Data

Credit card numbers
Bank account numbers (IBAN, SWIFT)
Tax identification numbers

Personal Identifiers

Social Security numbers
Passport numbers
Driver’s license numbers

Health Information

Health records (PHI)
Medical record numbers
Health insurance IDs

Custom Types

Organization-specific patterns (employee IDs, project codes)
Regular expression-based detection
Keyword dictionaries

Trainable Classifiers

Machine learning classifiers for content categorization:

Contracts — Legal agreements and contracts
Financial statements — Balance sheets, income statements
Resumes — Job applications and CVs
Source code — Programming code files
Harassment — Potentially harassing communications
Profanity — Profane language content
Custom classifiers — Train on your organization’s specific content types

Auto-Apply Label Policies

Automatically classify content using:

Keyword conditions — Apply labels to content containing specific keywords using KQL
Sensitive information types — Apply labels when content contains sensitive data patterns
Trainable classifiers — Use ML to identify and label content types automatically
Cloud attachments — Auto-label files shared as cloud attachments in email and Teams

Classification Analytics

Metric	Description
Total classified items	Number of items with at least one classification applied
Top sensitive info types	Most frequently detected sensitive data patterns
Label coverage	Percentage of content with sensitivity or retention labels
Classification trend	Volume of new classifications over time

Best Practices

Review classification reports regularly — Understand where sensitive data resides in your environment
Tune sensitive info type thresholds — Adjust confidence levels to balance detection accuracy with false positives
Deploy auto-labeling gradually — Start in simulation mode, review results, then enable enforcement
Use trainable classifiers for complex content — When keyword or pattern matching is insufficient, leverage ML classifiers
Set library defaults — Use default labels on SharePoint libraries for consistent classification

Warning: High false positive rates in classification indicate that sensitive info type patterns or classifier thresholds need tuning. Monitor accuracy regularly.

API Reference

GET /api/compliance/classification/overview — Get classification overview and summary statistics
GET /api/compliance/classification/activity — Get classification activity data
GET /api/compliance/classification/sensitive-types — List sensitive information type matches
GET /api/compliance/classification/labels — Get label application statistics
GET /api/compliance/classification/classifiers — List trainable classifier results