Files
Anthropic-Cybersecurity-Skills/skills/assessing-vector-and-embedding-weaknesses/references/api-reference.md
T
mukul975 8cae0648ec Add 55 new skills across 3 new domains + 6 undercovered areas (762 -> 817)
Demand-driven expansion targeting the fastest-growing 2025-2026 threat and
skills categories (ISC2/WEF/CrowdStrike/Mandiant signals):

- AI Security (NEW domain, 12 skills): LLM red-teaming with garak/PyRIT,
  prompt injection (direct/indirect/RAG), MCP tool-poisoning, agentic tool
  invocation, guardrails, model/data poisoning, system-prompt leakage,
  embedding/vector weaknesses, model extraction, continuous red-teaming
- Supply Chain Security (NEW domain, 5 skills): SBOMs, dependency confusion,
  malicious-npm triage, typosquatting, SLSA/Sigstore provenance
- Hardware & Firmware Security (NEW domain, 4 skills): CHIPSEC/UEFI audit,
  Secure Boot bypass, TPM measured-boot attestation, ESP bootkit hunting
- Identity (10): Entra ID/ROADtools, GraphRunner, AADInternals, ADCS/Certipy,
  shadow credentials, coercion, BloodHound CE, device-code phishing, SSO abuse
- Cloud-native (8): Stratus, Pacu, CloudFox, container escape, K8s RBAC,
  Falco, Trivy, kube-bench
- Offensive C2 (6): Sliver, Havoc, NetExec, DPAPI, NTLM relay ESC8, redirectors
- DFIR (6): Hayabusa, Chainsaw, KAPE, Velociraptor, EZ Tools, Plaso
- Backfill (4): OpenCTI, MISP, honeytokens, post-quantum crypto migration

Each skill follows the repo taxonomy (SKILL.md + references/{standards,api-reference}.md
+ scripts/agent.py + LICENSE), with researched real tool commands (no placeholders),
complete frontmatter, and ATT&CK/ATLAS + NIST CSF mappings. Updates README domain
table, skill count, and index.json.
2026-06-22 19:08:16 +02:00

2.3 KiB
Raw Blame History

API and Command Reference

sentence-transformers (embedding generation)

Call Purpose
SentenceTransformer("all-MiniLM-L6-v2") Load an embedding model (384-dim)
model.encode([texts]) Return numpy array of embeddings
model.encode(text, normalize_embeddings=True) L2-normalized vectors (for cosine)

scikit-learn similarity

Call Purpose
cosine_similarity(a, b) Pairwise cosine similarity matrix

Qdrant client (qdrant-client)

Call Purpose
QdrantClient(url="http://localhost:6333") Connect
client.get_collection(name) Inspect vector size + distance metric
client.count(name) Corpus size
client.search(collection_name, query_vector, limit, query_filter) k-NN search with optional filter
client.upsert(name, points=[PointStruct(id, vector, payload)]) Insert/update points
Filter(must=[FieldCondition(key, match=MatchValue(value))]) Metadata filter (tenant isolation)

Chroma (chromadb)

Call Purpose
chromadb.Client() / PersistentClient(path) Connect
collection.query(query_embeddings=[...], n_results=k, where={...}) k-NN with metadata filter
collection.add(ids, embeddings, metadatas, documents) Insert

Pinecone (pinecone-client)

Call Purpose
Pinecone(api_key=...) Connect
index.query(vector=..., top_k=k, namespace="tenant", filter={...}) k-NN; namespace = tenant boundary
index.upsert(vectors=[(id, vec, meta)], namespace=...) Insert

Assessment metrics

Metric Meaning
Inversion cosine Similarity between reconstructed candidate and target vector; high = recoverable.
Membership delta top-1 score(in-corpus query) top-1 score(control query); large positive = membership leak.
Poison dominance Fraction of unrelated queries returning the poison chunk in top_k.
Cross-tenant count Number of foreign-tenant rows returned to a tenant query (should be 0).

vec2text (research baseline)

Call Purpose
vec2text.load_pretrained_corrector("gtr-base") Load inversion corrector for compatible embedder
vec2text.invert_embeddings(embeddings, corrector) Reconstruct text from embeddings