Files
Anthropic-Cybersecurity-Skills/skills/detecting-data-and-model-poisoning/references/api-reference.md
T
mukul975 8cae0648ec Add 55 new skills across 3 new domains + 6 undercovered areas (762 -> 817)
Demand-driven expansion targeting the fastest-growing 2025-2026 threat and
skills categories (ISC2/WEF/CrowdStrike/Mandiant signals):

- AI Security (NEW domain, 12 skills): LLM red-teaming with garak/PyRIT,
  prompt injection (direct/indirect/RAG), MCP tool-poisoning, agentic tool
  invocation, guardrails, model/data poisoning, system-prompt leakage,
  embedding/vector weaknesses, model extraction, continuous red-teaming
- Supply Chain Security (NEW domain, 5 skills): SBOMs, dependency confusion,
  malicious-npm triage, typosquatting, SLSA/Sigstore provenance
- Hardware & Firmware Security (NEW domain, 4 skills): CHIPSEC/UEFI audit,
  Secure Boot bypass, TPM measured-boot attestation, ESP bootkit hunting
- Identity (10): Entra ID/ROADtools, GraphRunner, AADInternals, ADCS/Certipy,
  shadow credentials, coercion, BloodHound CE, device-code phishing, SSO abuse
- Cloud-native (8): Stratus, Pacu, CloudFox, container escape, K8s RBAC,
  Falco, Trivy, kube-bench
- Offensive C2 (6): Sliver, Havoc, NetExec, DPAPI, NTLM relay ESC8, redirectors
- DFIR (6): Hayabusa, Chainsaw, KAPE, Velociraptor, EZ Tools, Plaso
- Backfill (4): OpenCTI, MISP, honeytokens, post-quantum crypto migration

Each skill follows the repo taxonomy (SKILL.md + references/{standards,api-reference}.md
+ scripts/agent.py + LICENSE), with researched real tool commands (no placeholders),
complete frontmatter, and ATT&CK/ATLAS + NIST CSF mappings. Updates README domain
table, skill count, and index.json.
2026-06-22 19:08:16 +02:00

2.2 KiB

API Reference — Data and Model Poisoning Detection

Adversarial Robustness Toolbox (ART)

Install: pip install adversarial-robustness-toolbox

API Description
from art.estimators.classification import KerasClassifier Wrap a Keras model for ART (also PyTorchClassifier, TensorFlowV2Classifier)
from art.defences.detector.poison import ActivationDefence Activation-clustering poisoning detector (Chen et al., 2018)
ActivationDefence(classifier, x_train, y_train) Construct the defense
defence.detect_poison(nb_clusters=2, nb_dims=10, reduce="PCA") Returns (report, is_clean_lst); is_clean_lst[i]==0 => poisoned
from art.defences.detector.poison import SpectralSignatureDefense Spectral-signature poisoning detector
SpectralSignatureDefense(classifier, x, y, expected_pp_poison=0.05, batch_size=128, eps_multiplier=1.5) Construct
defence.detect_poison() Returns (report, is_clean_lst)

Cleanlab

Install: pip install cleanlab

API Description
from cleanlab.filter import find_label_issues Find mislabeled samples
find_label_issues(labels, pred_probs, return_indices_ranked_by="self_confidence") Ranked indices of label issues
from cleanlab.outlier import OutOfDistribution Outlier / OOD detection
from cleanlab import Datalab End-to-end data audit (label, outlier, near-duplicate)

safetensors (safe serialization)

Install: pip install safetensors

API Description
from safetensors.numpy import load_file Load weights without executing pickle
from safetensors.torch import load_file PyTorch variant

Integrity commands

Command Purpose
sha256sum model.safetensors Compute weight digest to compare to published value
find ./models -name "*.pt" -o -name "*.bin" -o -name "*.pkl" Locate unsafe pickle-based artifacts

External References