Skip to Content
ComplianceData Classification

Data Classification

View data classification insights across your Microsoft 365 content. Data classification provides visibility into sensitive information types, sensitivity labels, retention labels, and trainable classifiers applied to your organization’s data.

Note: Data classification features are available with Microsoft 365 E3 and above. Advanced classifiers and auto-labeling require E5 or Compliance add-on licensing.

Classification Overview

CategoryDescription
Sensitive Info TypesContent matching built-in or custom sensitive information type patterns (SSN, credit cards, etc.)
Sensitivity LabelsDocuments and emails with applied sensitivity labels (Confidential, Internal, Public)
Retention LabelsContent with retention classifications controlling lifecycle and deletion
Trainable ClassifiersML-classified content identified by machine learning models (contracts, resumes, financial statements)

Content Explorer

Browse content that has been classified by type, label, or location. The Content Explorer provides:

  • Aggregate counts by classification type without accessing actual content
  • Drill-down by location (Exchange, SharePoint, OneDrive)
  • Filter by specific sensitive info type or label
  • View distribution across departments and sites

Note: Content Explorer requires the Content Explorer Content Viewer role to see actual item content. List Viewer role shows counts only.

Activity Explorer

Track classification-related activities across the organization:

ActivityDescription
Label appliedSensitivity or retention label applied to content
Label changedLabel upgraded or downgraded on content
Label removedLabel removed from content
DLP matchContent matched a DLP policy rule
Auto-label appliedLabel applied automatically by policy
Label downgrade justifiedUser provided justification for lowering sensitivity

Sensitive Information Types

Built-in patterns for detecting sensitive data:

Financial Data

  • Credit card numbers
  • Bank account numbers (IBAN, SWIFT)
  • Tax identification numbers

Personal Identifiers

  • Social Security numbers
  • Passport numbers
  • Driver’s license numbers

Health Information

  • Health records (PHI)
  • Medical record numbers
  • Health insurance IDs

Custom Types

  • Organization-specific patterns (employee IDs, project codes)
  • Regular expression-based detection
  • Keyword dictionaries

Trainable Classifiers

Machine learning classifiers for content categorization:

  • Contracts — Legal agreements and contracts
  • Financial statements — Balance sheets, income statements
  • Resumes — Job applications and CVs
  • Source code — Programming code files
  • Harassment — Potentially harassing communications
  • Profanity — Profane language content
  • Custom classifiers — Train on your organization’s specific content types

Auto-Apply Label Policies

Automatically classify content using:

  • Keyword conditions — Apply labels to content containing specific keywords using KQL
  • Sensitive information types — Apply labels when content contains sensitive data patterns
  • Trainable classifiers — Use ML to identify and label content types automatically
  • Cloud attachments — Auto-label files shared as cloud attachments in email and Teams

Classification Analytics

MetricDescription
Total classified itemsNumber of items with at least one classification applied
Top sensitive info typesMost frequently detected sensitive data patterns
Label coveragePercentage of content with sensitivity or retention labels
Classification trendVolume of new classifications over time

Best Practices

  • Review classification reports regularly — Understand where sensitive data resides in your environment
  • Tune sensitive info type thresholds — Adjust confidence levels to balance detection accuracy with false positives
  • Deploy auto-labeling gradually — Start in simulation mode, review results, then enable enforcement
  • Use trainable classifiers for complex content — When keyword or pattern matching is insufficient, leverage ML classifiers
  • Set library defaults — Use default labels on SharePoint libraries for consistent classification

Warning: High false positive rates in classification indicate that sensitive info type patterns or classifier thresholds need tuning. Monitor accuracy regularly.

API Reference

  • GET /api/compliance/classification/overview — Get classification overview and summary statistics
  • GET /api/compliance/classification/activity — Get classification activity data
  • GET /api/compliance/classification/sensitive-types — List sensitive information type matches
  • GET /api/compliance/classification/labels — Get label application statistics
  • GET /api/compliance/classification/classifiers — List trainable classifier results
Last updated on