Usage Guide
How to authenticate and call the DataGuard API from Python.
Overview
DataGuard is ADEK's data classification and governance API. It identifies data points in natural-language queries, classifies them against ADEK's data governance framework (Open, Confidential, Sensitive, Secret), and answers policy questions grounded in uploaded governance documents.
Three main capabilities are exposed via the API:
- Classify : identify and classify data points from free-text queries.
- Policy Search : ask governance and policy questions in plain English.
- Bulk Classify : submit batches of data points for background classification.
Prerequisites
Install the required Python packages:
pip install requests msal
requests: HTTP client for calling the API.msal: Microsoft Authentication Library. Only needed in production (Azure AD).
Authentication
DataGuard supports two authentication modes depending on the environment.
Staging: API Key
In staging, pass your API key in the X-API-Key header:
import requests BASE_URL = "https://your-container-app.azurecontainerapps.io" API_KEY = "your-api-key" headers = { "Content-Type": "application/json", "X-API-Key": API_KEY, }
/api/v1/admin/), use the admin API key instead of the standard key.Production: Azure AD (Easy Auth)
In production, API keys are disabled. Authenticate with Azure AD using MSAL to obtain an access token, then call the API through the Easy Auth layer.
import msal import requests # Azure AD app registration details (get from your administrator) TENANT_ID = "your-tenant-id" CLIENT_ID = "your-client-app-id" # YOUR app registration CLIENT_SECRET = "your-client-secret" API_APP_ID = "your-dataguard-api-app-id" # DataGuard's app registration BASE_URL = "https://your-container-app.azurecontainerapps.io" SCOPES = [f"api://{API_APP_ID}/.default"] # Confidential client (service-to-service / daemon flow) app = msal.ConfidentialClientApplication( CLIENT_ID, authority=f"https://login.microsoftonline.com/{TENANT_ID}", client_credential=CLIENT_SECRET, ) def get_access_token(): result = app.acquire_token_for_client(scopes=SCOPES) if "access_token" in result: return result["access_token"] raise RuntimeError(f"Token error: {result.get('error_description')}") token = get_access_token() headers = { "Content-Type": "application/json", "Authorization": f"Bearer {token}", }
DataGuard.BulkOperator or DataGuard.SuperAdmin app role assigned to your service principal in Azure AD.Classify
Classify data points mentioned in a natural-language query.
Request
| Field | Type | Description |
|---|---|---|
query | string | Natural-language description of the data to classify |
Example
response = requests.post(
f"{BASE_URL}/api/v1/classify",
headers=headers,
json={"query": "employee salary and national ID number"},
)
response.raise_for_status()
result = response.json()
print(f"Status: {result['pipeline_status']}")
print(f"Composite level: {result['composite_level']}")
for item in result["results"]:
print(f" {item['data_point']}: {item['classification_level']}")
print(f" Confidence: {item['confidence']} : {item['confidence_reason']}")
print(f" Summary: {item['summary']}")
Response Fields
| Field | Type | Description |
|---|---|---|
pipeline_status | string | classified, partial, or error |
composite_level | string | Highest classification level across all results |
classified_count | int | Number of successfully classified data points |
vague_terms | list | Terms too vague to classify |
results | list | Per-data-point classification results |
correlation_id | string | Request trace ID for debugging |
duration_ms | int | Processing time in milliseconds |
Each Result Item
| Field | Type | Description |
|---|---|---|
data_point | string | Identified data point name |
classification_level | string | Open, Confidential, Sensitive, or Secret |
confidence | string | High, Medium, or Low |
confidence_reason | string | Explanation of the confidence score |
summary | string | Plain-English classification summary |
owner | string | Data owner / responsible party |
status | string | classified, ambiguous, expanded, not_found |
interpretations_note | list | Alternate interpretations when ambiguous |
expanded_results | list | Sub-data-points when query is too broad |
Policy Search
Ask governance and policy questions. Returns answers grounded in ADEK's uploaded policy documents with citations.
Request
| Field | Type | Description |
|---|---|---|
query | string | A governance or policy question in plain English |
Example
response = requests.post(
f"{BASE_URL}/api/v1/policy-search",
headers=headers,
json={"query": "What is the retention period for financial records?"},
)
response.raise_for_status()
result = response.json()
print(f"Answer: {result['answer']}")
for cite in result.get("citations", []):
print(f" Source: {cite['document_name']}")
Response Fields
| Field | Type | Description |
|---|---|---|
answer | string | AI-generated answer grounded in policy documents |
status | string | success or error |
citations | list | Source documents (each has document_name and url) |
reference_image_url | string | URL to a reference image, if applicable |
template_url | string | URL to a relevant template, if applicable |
error | string | Error message (empty on success) |
Bulk Classification
Submit a batch of data points for background classification. This is a three-step process: submit, poll, then download.
Submit the Batch
job_id for tracking (HTTP 202 Accepted)| Field | Type | Description |
|---|---|---|
items | list | Array of objects, each with data_point (required) and description (optional) |
items = [
{"data_point": "Employee Salary", "description": "Monthly gross salary"},
{"data_point": "National ID Number"},
{"data_point": "Office Phone Number"},
]
response = requests.post(
f"{BASE_URL}/api/v1/admin/bulk-classify",
headers=headers,
json={"items": items},
)
response.raise_for_status()
job = response.json()
job_id = job["job_id"]
print(f"Submitted: {job_id} ({job['total_items']} items)")
Poll for Completion
pending, running, completed, or failedimport time while True: resp = requests.get( f"{BASE_URL}/api/v1/jobs/{job_id}", headers=headers, ) status = resp.json() print(f" Status: {status['status']} : {status.get('message', '')}") if status["status"] in ("completed", "failed"): break time.sleep(5) # poll every 5 seconds
Download Results
download = requests.get(
f"{BASE_URL}/api/v1/admin/bulk-classify/{job_id}/results",
headers=headers,
).json()
if download.get("download_url"):
results = requests.get(download["download_url"]).json()
print(f"Classified: {results['classified_count']}/{results['total_items']}")
for r in results["results"]:
print(f" {r['data_point']}: {r['classification_level']} ({r['confidence']})")
Result Item Fields
| Field | Type | Description |
|---|---|---|
data_point | string | The data point name |
status | string | classified, ambiguous, not_found, or error |
classification_level | string | Open, Confidential, Sensitive, or Secret |
confidence | string | High, Medium, or Low |
confidence_reason | string | Explanation of confidence score |
summary | string | Classification summary |
Job Status
Check the status of any background job (bulk classification, index rebuilds).
Response Fields
| Field | Type | Description |
|---|---|---|
job_id | string | Job identifier |
status | string | pending, running, completed, failed, or not_found |
job_type | string | Type of background job |
created_at | string | ISO 8601 UTC timestamp |
started_at | string | When processing began |
completed_at | string | When processing finished |
message | string | Human-readable status message |
documents_processed | int | Number of items processed so far |
error | string | Error message (empty on success) |
Error Handling
All endpoints return standard HTTP error codes with a JSON body containing a detail field.
try: response = requests.post(url, headers=headers, json=payload) response.raise_for_status() data = response.json() except requests.exceptions.HTTPError as e: error = e.response.json().get("detail", str(e)) print(f"HTTP {e.response.status_code}: {error}")
Common Error Codes
| Code | Meaning | Action |
|---|---|---|
401 | Not authenticated | Check API key or refresh Azure AD token |
403 | Missing required role | Request the needed role from your administrator |
422 | Invalid request body | Check the JSON payload matches the expected schema |
500 | Server error | Retry, then contact the DataGuard team with the correlation_id |
Roles & Permissions
Azure AD app roles control access to different endpoint tiers in production.
| Role | Access |
|---|---|
| DataGuard.SuperAdmin | Full access to all endpoints (classify, policy, bulk, admin UI) |
| DataGuard.BulkOperator | Bulk classification routes + all standard routes |
| DataGuard.UIController | Admin UI and admin API access (review queue, indexes, analytics) |
| No role (authenticated) | Standard routes only (classify, policy-search, job status) |
Quick Reference
| Endpoint | Method | Path | Auth |
|---|---|---|---|
| Classify | POST | /api/v1/classify |
Any authenticated |
| Policy Search | POST | /api/v1/policy-search |
Any authenticated |
| Bulk Classify | POST | /api/v1/admin/bulk-classify |
BulkOperator / SuperAdmin |
| Job Status | GET | /api/v1/jobs/{job_id} |
Any authenticated |
| Bulk Results | GET | /api/v1/admin/bulk-classify/{job_id}/results |
BulkOperator / SuperAdmin |
Environment Modes
| Environment | Auth Method | Notes |
|---|---|---|
| Development | None | Local development only, no auth checks |
| Staging | API key (X-API-Key) | Use admin key for bulk routes |
| Production | Azure AD (Easy Auth) | API keys disabled; use MSAL Bearer token |