Automated Mechanisms to Discover AI/ML Models Across Environments
Organizations must implement automated mechanisms to continuously discover AI/ML models across
cloud, on-prem, SaaS, and runtime environments. Below are practical approaches used in real-world scenarios.
1. Cloud-Native Discovery
- Integrate with AWS SageMaker, Azure ML, GCP Vertex AI
- Use APIs to list models and endpoints
- Schedule periodic discovery scans
- Auto-tag assets with owner, environment, and risk
Tools: AWS Config, Azure Resource Graph, GCP Asset Inventory
2. MLOps Platform Integration
- Integrate with MLflow, Kubeflow, Databricks
- Sync model registry and experiment tracking
- Track model versions and datasets
- Capture deployment status
3. Code Repository Scanning
- Scan GitHub, GitLab, Bitbucket repositories
- Detect model files (.pkl, .onnx, .h5)
- Identify ML libraries (TensorFlow, PyTorch)
- Detect model loading patterns
joblib.load("fraud_model.pkl")
4. API & Endpoint Discovery
- Scan API gateways (Kong, Apigee, AWS API Gateway)
- Detect endpoints like /predict, /generate
- Monitor exposure of AI services
GenAI applications often exist only as APIs and are not formally registered.
5. Runtime / Infrastructure Scanning
- Monitor Kubernetes, Docker, and VM workloads
- Detect GPU usage patterns
- Identify ML serving frameworks (TensorFlow Serving, TorchServe)
6. SaaS & Shadow AI Discovery
- Monitor usage of ChatGPT, Copilot, Gemini
- Use CASB/SSE tools (Netskope, Defender for Cloud Apps)
- Detect unauthorized API key usage
Shadow AI is one of the highest-risk areas in AI security.
7. Data Pipeline Inspection
- Scan Airflow, Spark, and ETL pipelines
- Detect inference steps like model.predict()
- Track feature engineering workflows
model.predict(data)
8. CMDB & Asset Correlation
- Integrate with ServiceNow or asset management tools
- Link models with applications and datasets
- Build AI asset relationship graphs
Key Takeaway
AI discovery requires a combination of cloud scanning, code analysis, API monitoring,
and runtime inspection.
The biggest risk comes from unknown or shadow AI systems — discovering them is critical.