Files
Anthropic-Cybersecurity-Skills/skills/extracting-iocs-from-malware-samples/references/api-reference.md
T
mukul975 27c6414ca5 Add folder anatomy (scripts/agent.py + references/api-reference.md) for 648 cybersecurity skills
Complete skill folder anatomy across all cybersecurity skills:
- scripts/agent.py: 80-150 line Python agents using real libraries (impacket,
  boto3, azure-mgmt-*, kubernetes, pefile, yara, scapy, shodan, stix2, etc.)
- references/api-reference.md: real API documentation with method signatures
- LICENSE: MIT license for all skill folders
2026-03-10 21:02:12 +01:00

2.6 KiB

API Reference: Malware IOC Extraction Agent

Dependencies

Library Version Purpose
pefile >=2023.2 PE file parsing for imphash, sections, imports
yara-python >=4.3 YARA rule scanning against malware samples
requests >=2.28 VirusTotal API v3 IOC validation

CLI Usage

python scripts/agent.py \
  --sample /cases/malware.exe \
  --yara-rules /rules/malware.yar \
  --vt-key YOUR_VT_API_KEY \
  --output-dir /cases/analysis/ \
  --output ioc_report.json

Functions

compute_hashes(file_path) -> dict

Computes MD5, SHA-1, SHA-256 and file size for the sample.

extract_pe_metadata(file_path) -> dict

Parses PE headers via pefile: imphash, compile timestamp, section entropy, import table.

extract_strings(file_path, min_length) -> list

Extracts ASCII and Unicode strings (min 4 chars) from the binary.

extract_network_iocs(strings) -> dict

Regex extraction of IPs, domains, URLs, emails from strings. Filters private IP ranges.

extract_host_iocs(strings) -> dict

Identifies Windows file paths, registry keys, and mutex names from strings.

run_yara_scan(file_path, rules_path) -> list

Compiles and runs YARA rules against the sample. Returns matched rule names, tags, and string offsets.

validate_ioc_virustotal(ioc_value, ioc_type, api_key) -> dict

Queries VirusTotal API v3 for IP, domain, or file hash. Returns malicious/suspicious counts.

defang_ioc(value) -> str

Defangs IOCs by replacing http with hxxp and . with [.].

export_stix_bundle(iocs, sha256) -> dict

Builds a STIX 2.1 indicator bundle with file hash, IP, and domain patterns.

export_csv(iocs, hashes, output_path)

Writes IOCs to CSV format (type, value, context, confidence) for SIEM ingestion.

run_extraction(sample_path, output_dir, yara_rules, vt_key) -> dict

Orchestrates the full extraction pipeline and generates all output files.

Regex Patterns

Pattern Target
\b(?:(?:25[0-5]|...)\.){3}...\b IPv4 addresses
\b[a-zA-Z0-9]...\.[a-zA-Z]{2,}+\b Domain names
https?://[^\s<>"'{}]+ URLs
[a-zA-Z0-9_.+-]+@... Email addresses

Output Schema

{
  "hashes": {"md5": "...", "sha256": "...", "sha1": "..."},
  "pe_metadata": {"imphash": "...", "compile_time": "...", "sections": []},
  "network_iocs": {"ips": [], "domains": [], "urls": []},
  "host_iocs": {"file_paths": [], "registry_keys": [], "mutexes": []},
  "yara_matches": [{"rule": "APT28_dropper", "tags": ["apt"]}],
  "summary": {"ips": 3, "domains": 5, "yara_hits": 1}
}