Data Classification
View data classification insights across your Microsoft 365 content. Data classification provides visibility into sensitive information types, sensitivity labels, retention labels, and trainable classifiers applied to your organization’s data.
Note: Data classification features are available with Microsoft 365 E3 and above. Advanced classifiers and auto-labeling require E5 or Compliance add-on licensing.
Classification Overview
| Category | Description |
|---|---|
| Sensitive Info Types | Content matching built-in or custom sensitive information type patterns (SSN, credit cards, etc.) |
| Sensitivity Labels | Documents and emails with applied sensitivity labels (Confidential, Internal, Public) |
| Retention Labels | Content with retention classifications controlling lifecycle and deletion |
| Trainable Classifiers | ML-classified content identified by machine learning models (contracts, resumes, financial statements) |
Content Explorer
Browse content that has been classified by type, label, or location. The Content Explorer provides:
- Aggregate counts by classification type without accessing actual content
- Drill-down by location (Exchange, SharePoint, OneDrive)
- Filter by specific sensitive info type or label
- View distribution across departments and sites
Note: Content Explorer requires the Content Explorer Content Viewer role to see actual item content. List Viewer role shows counts only.
Activity Explorer
Track classification-related activities across the organization:
| Activity | Description |
|---|---|
| Label applied | Sensitivity or retention label applied to content |
| Label changed | Label upgraded or downgraded on content |
| Label removed | Label removed from content |
| DLP match | Content matched a DLP policy rule |
| Auto-label applied | Label applied automatically by policy |
| Label downgrade justified | User provided justification for lowering sensitivity |
Sensitive Information Types
Built-in patterns for detecting sensitive data:
Financial Data
- Credit card numbers
- Bank account numbers (IBAN, SWIFT)
- Tax identification numbers
Personal Identifiers
- Social Security numbers
- Passport numbers
- Driver’s license numbers
Health Information
- Health records (PHI)
- Medical record numbers
- Health insurance IDs
Custom Types
- Organization-specific patterns (employee IDs, project codes)
- Regular expression-based detection
- Keyword dictionaries
Trainable Classifiers
Machine learning classifiers for content categorization:
- Contracts — Legal agreements and contracts
- Financial statements — Balance sheets, income statements
- Resumes — Job applications and CVs
- Source code — Programming code files
- Harassment — Potentially harassing communications
- Profanity — Profane language content
- Custom classifiers — Train on your organization’s specific content types
Auto-Apply Label Policies
Automatically classify content using:
- Keyword conditions — Apply labels to content containing specific keywords using KQL
- Sensitive information types — Apply labels when content contains sensitive data patterns
- Trainable classifiers — Use ML to identify and label content types automatically
- Cloud attachments — Auto-label files shared as cloud attachments in email and Teams
Classification Analytics
| Metric | Description |
|---|---|
| Total classified items | Number of items with at least one classification applied |
| Top sensitive info types | Most frequently detected sensitive data patterns |
| Label coverage | Percentage of content with sensitivity or retention labels |
| Classification trend | Volume of new classifications over time |
Best Practices
- Review classification reports regularly — Understand where sensitive data resides in your environment
- Tune sensitive info type thresholds — Adjust confidence levels to balance detection accuracy with false positives
- Deploy auto-labeling gradually — Start in simulation mode, review results, then enable enforcement
- Use trainable classifiers for complex content — When keyword or pattern matching is insufficient, leverage ML classifiers
- Set library defaults — Use default labels on SharePoint libraries for consistent classification
Warning: High false positive rates in classification indicate that sensitive info type patterns or classifier thresholds need tuning. Monitor accuracy regularly.
API Reference
GET /api/compliance/classification/overview— Get classification overview and summary statisticsGET /api/compliance/classification/activity— Get classification activity dataGET /api/compliance/classification/sensitive-types— List sensitive information type matchesGET /api/compliance/classification/labels— Get label application statisticsGET /api/compliance/classification/classifiers— List trainable classifier results