DataGuard API : Usage Guide

Overview

DataGuard is ADEK's data classification and governance API. It identifies data points in natural-language queries, classifies them against ADEK's data governance framework (Open, Confidential, Sensitive, Secret), and answers policy questions grounded in uploaded governance documents.

Three main capabilities are exposed via the API:

Classify : identify and classify data points from free-text queries.
Policy Search : ask governance and policy questions in plain English.
Bulk Classify : submit batches of data points for background classification.

Prerequisites

Install the required Python packages:

pip install requests msal

requests : HTTP client for calling the API.
msal : Microsoft Authentication Library. Only needed in production (Azure AD).

Authentication

DataGuard supports two authentication modes depending on the environment.

Staging: API Key

In staging, pass your API key in the X-API-Key header:

import requests

BASE_URL = "https://your-container-app.azurecontainerapps.io"
API_KEY  = "your-api-key"

headers = {
    "Content-Type": "application/json",
    "X-API-Key": API_KEY,
}

i

For bulk classification routes (under /api/v1/admin/), use the admin API key instead of the standard key.

Production: Azure AD (Easy Auth)

In production, API keys are disabled. Authenticate with Azure AD using MSAL to obtain an access token, then call the API through the Easy Auth layer.

import msal
import requests

# Azure AD app registration details (get from your administrator)
TENANT_ID     = "your-tenant-id"
CLIENT_ID     = "your-client-app-id"          # YOUR app registration
CLIENT_SECRET = "your-client-secret"
API_APP_ID    = "your-dataguard-api-app-id"   # DataGuard's app registration

BASE_URL = "https://your-container-app.azurecontainerapps.io"
SCOPES   = [f"api://{API_APP_ID}/.default"]

# Confidential client (service-to-service / daemon flow)
app = msal.ConfidentialClientApplication(
    CLIENT_ID,
    authority=f"https://login.microsoftonline.com/{TENANT_ID}",
    client_credential=CLIENT_SECRET,
)

def get_access_token():
    result = app.acquire_token_for_client(scopes=SCOPES)
    if "access_token" in result:
        return result["access_token"]
    raise RuntimeError(f"Token error: {result.get('error_description')}")

token = get_access_token()

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}",
}

!

Bulk classification routes require the DataGuard.BulkOperator or DataGuard.SuperAdmin app role assigned to your service principal in Azure AD.

Classify

Classify data points mentioned in a natural-language query.

POST /api/v1/classify

Any authenticated user (API key in staging, Azure AD in production)

Request

Field	Type	Description
`query`	string	Natural-language description of the data to classify

Example

response = requests.post(
    f"{BASE_URL}/api/v1/classify",
    headers=headers,
    json={"query": "employee salary and national ID number"},
)
response.raise_for_status()
result = response.json()

print(f"Status: {result['pipeline_status']}")
print(f"Composite level: {result['composite_level']}")

for item in result["results"]:
    print(f"  {item['data_point']}: {item['classification_level']}")
    print(f"    Confidence: {item['confidence']} : {item['confidence_reason']}")
    print(f"    Summary: {item['summary']}")

Response Fields

Field	Type	Description
`pipeline_status`	string	`classified`, `partial`, or `error`
`composite_level`	string	Highest classification level across all results
`classified_count`	int	Number of successfully classified data points
`vague_terms`	list	Terms too vague to classify
`results`	list	Per-data-point classification results
`correlation_id`	string	Request trace ID for debugging
`duration_ms`	int	Processing time in milliseconds

Each Result Item

Field	Type	Description
`data_point`	string	Identified data point name
`classification_level`	string	`Open`, `Confidential`, `Sensitive`, or `Secret`
`confidence`	string	`High`, `Medium`, or `Low`
`confidence_reason`	string	Explanation of the confidence score
`summary`	string	Plain-English classification summary
`owner`	string	Data owner / responsible party
`status`	string	`classified`, `ambiguous`, `expanded`, `not_found`
`interpretations_note`	list	Alternate interpretations when ambiguous
`expanded_results`	list	Sub-data-points when query is too broad

Policy Search

Ask governance and policy questions. Returns answers grounded in ADEK's uploaded policy documents with citations.

POST /api/v1/policy-search

Any authenticated user (API key in staging, Azure AD in production)

Request

Field	Type	Description
`query`	string	A governance or policy question in plain English

Example

response = requests.post(
    f"{BASE_URL}/api/v1/policy-search",
    headers=headers,
    json={"query": "What is the retention period for financial records?"},
)
response.raise_for_status()
result = response.json()

print(f"Answer: {result['answer']}")

for cite in result.get("citations", []):
    print(f"  Source: {cite['document_name']}")

Response Fields

Field	Type	Description
`answer`	string	AI-generated answer grounded in policy documents
`status`	string	`success` or `error`
`citations`	list	Source documents (each has `document_name` and `url`)
`reference_image_url`	string	URL to a reference image, if applicable
`template_url`	string	URL to a relevant template, if applicable
`error`	string	Error message (empty on success)

Bulk Classification

Submit a batch of data points for background classification. This is a three-step process: submit, poll, then download.

!

Requires the DataGuard.BulkOperator or DataGuard.SuperAdmin role in production. Uses the admin API key in staging.

1

Submit the Batch

POST /api/v1/admin/bulk-classify

Returns a job_id for tracking (HTTP 202 Accepted)

Field	Type	Description
`items`	list	Array of objects, each with `data_point` (required) and `description` (optional)

items = [
    {"data_point": "Employee Salary", "description": "Monthly gross salary"},
    {"data_point": "National ID Number"},
    {"data_point": "Office Phone Number"},
]

response = requests.post(
    f"{BASE_URL}/api/v1/admin/bulk-classify",
    headers=headers,
    json={"items": items},
)
response.raise_for_status()
job = response.json()
job_id = job["job_id"]
print(f"Submitted: {job_id} ({job['total_items']} items)")

2

Poll for Completion

GET /api/v1/jobs/{job_id}

Returns job status: pending, running, completed, or failed

import time

while True:
    resp = requests.get(
        f"{BASE_URL}/api/v1/jobs/{job_id}",
        headers=headers,
    )
    status = resp.json()

    print(f"  Status: {status['status']} : {status.get('message', '')}")

    if status["status"] in ("completed", "failed"):
        break

    time.sleep(5)  # poll every 5 seconds

3

Download Results

GET /api/v1/admin/bulk-classify/{job_id}/results

Returns a time-limited download URL (valid 24 hours)

download = requests.get(
    f"{BASE_URL}/api/v1/admin/bulk-classify/{job_id}/results",
    headers=headers,
).json()

if download.get("download_url"):
    results = requests.get(download["download_url"]).json()

    print(f"Classified: {results['classified_count']}/{results['total_items']}")
    for r in results["results"]:
        print(f"  {r['data_point']}: {r['classification_level']} ({r['confidence']})")

Result Item Fields

Field	Type	Description
`data_point`	string	The data point name
`status`	string	`classified`, `ambiguous`, `not_found`, or `error`
`classification_level`	string	`Open`, `Confidential`, `Sensitive`, or `Secret`
`confidence`	string	`High`, `Medium`, or `Low`
`confidence_reason`	string	Explanation of confidence score
`summary`	string	Classification summary

Job Status

Check the status of any background job (bulk classification, index rebuilds).

GET /api/v1/jobs/{job_id}

Any authenticated user

Response Fields

Field	Type	Description
`job_id`	string	Job identifier
`status`	string	`pending`, `running`, `completed`, `failed`, or `not_found`
`job_type`	string	Type of background job
`created_at`	string	ISO 8601 UTC timestamp
`started_at`	string	When processing began
`completed_at`	string	When processing finished
`message`	string	Human-readable status message
`documents_processed`	int	Number of items processed so far
`error`	string	Error message (empty on success)

Error Handling

All endpoints return standard HTTP error codes with a JSON body containing a detail field.

try:
    response = requests.post(url, headers=headers, json=payload)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.HTTPError as e:
    error = e.response.json().get("detail", str(e))
    print(f"HTTP {e.response.status_code}: {error}")

Common Error Codes

Code	Meaning	Action
`401`	Not authenticated	Check API key or refresh Azure AD token
`403`	Missing required role	Request the needed role from your administrator
`422`	Invalid request body	Check the JSON payload matches the expected schema
`500`	Server error	Retry, then contact the DataGuard team with the `correlation_id`

Roles & Permissions

Azure AD app roles control access to different endpoint tiers in production.

Role	Access
DataGuard.SuperAdmin	Full access to all endpoints (classify, policy, bulk, admin UI)
DataGuard.BulkOperator	Bulk classification routes + all standard routes
DataGuard.UIController	Admin UI and admin API access (review queue, indexes, analytics)
No role (authenticated)	Standard routes only (classify, policy-search, job status)

i

Roles are configured in Azure AD App Registration → App Roles and assigned to users or service principals in Enterprise Applications.

Quick Reference

Endpoint	Method	Path	Auth
Classify	POST	`/api/v1/classify`	Any authenticated
Policy Search	POST	`/api/v1/policy-search`	Any authenticated
Bulk Classify	POST	`/api/v1/admin/bulk-classify`	BulkOperator / SuperAdmin
Job Status	GET	`/api/v1/jobs/{job_id}`	Any authenticated
Bulk Results	GET	`/api/v1/admin/bulk-classify/{job_id}/results`	BulkOperator / SuperAdmin

Environment Modes

Environment	Auth Method	Notes
Development	None	Local development only, no auth checks
Staging	API key (`X-API-Key`)	Use admin key for bulk routes
Production	Azure AD (Easy Auth)	API keys disabled; use MSAL Bearer token