Files
Anthropic-Cybersecurity-Skills/skills/detecting-model-extraction-attacks/references/standards.md
T
mukul975 8cae0648ec Add 55 new skills across 3 new domains + 6 undercovered areas (762 -> 817)
Demand-driven expansion targeting the fastest-growing 2025-2026 threat and
skills categories (ISC2/WEF/CrowdStrike/Mandiant signals):

- AI Security (NEW domain, 12 skills): LLM red-teaming with garak/PyRIT,
  prompt injection (direct/indirect/RAG), MCP tool-poisoning, agentic tool
  invocation, guardrails, model/data poisoning, system-prompt leakage,
  embedding/vector weaknesses, model extraction, continuous red-teaming
- Supply Chain Security (NEW domain, 5 skills): SBOMs, dependency confusion,
  malicious-npm triage, typosquatting, SLSA/Sigstore provenance
- Hardware & Firmware Security (NEW domain, 4 skills): CHIPSEC/UEFI audit,
  Secure Boot bypass, TPM measured-boot attestation, ESP bootkit hunting
- Identity (10): Entra ID/ROADtools, GraphRunner, AADInternals, ADCS/Certipy,
  shadow credentials, coercion, BloodHound CE, device-code phishing, SSO abuse
- Cloud-native (8): Stratus, Pacu, CloudFox, container escape, K8s RBAC,
  Falco, Trivy, kube-bench
- Offensive C2 (6): Sliver, Havoc, NetExec, DPAPI, NTLM relay ESC8, redirectors
- DFIR (6): Hayabusa, Chainsaw, KAPE, Velociraptor, EZ Tools, Plaso
- Backfill (4): OpenCTI, MISP, honeytokens, post-quantum crypto migration

Each skill follows the repo taxonomy (SKILL.md + references/{standards,api-reference}.md
+ scripts/agent.py + LICENSE), with researched real tool commands (no placeholders),
complete frontmatter, and ATT&CK/ATLAS + NIST CSF mappings. Updates README domain
table, skill count, and index.json.
2026-06-22 19:08:16 +02:00

1.6 KiB

Standards and References — Detecting Model Extraction Attacks

MITRE ATLAS Techniques

ID Name Tactic Rationale
AML.T0024 Exfiltration via AI Inference API Exfiltration Parent technique: abusing the inference API to steal model value or training data.
AML.T0024.000 Infer Training Data Membership Exfiltration Membership inference — determine if a record was in the training set (privacy leak).
AML.T0024.001 Invert AI Model Exfiltration Model inversion — reconstruct training inputs from confidence scores.
AML.T0024.002 Extract ML Model Exfiltration Model stealing — train a surrogate from query/response pairs to clone the model.

NIST AI RMF

ID Function Rationale
MEASURE-2.6 AI system security and resilience are evaluated and documented Extraction/inference testing measures and documents the model's resilience to inference-API abuse.

Official Resources

Key Research

  • Tramèr et al., "Stealing Machine Learning Models via Prediction APIs" (USENIX Security 2016)
  • Shokri et al., "Membership Inference Attacks Against Machine Learning Models" (IEEE S&P 2017)
  • Orekondy et al., "Knockoff Nets: Stealing Functionality of Black-Box Models" (CVPR 2019)