Complete skill folder anatomy across all cybersecurity skills: - scripts/agent.py: 80-150 line Python agents using real libraries (impacket, boto3, azure-mgmt-*, kubernetes, pefile, yara, scapy, shodan, stix2, etc.) - references/api-reference.md: real API documentation with method signatures - LICENSE: MIT license for all skill folders
2.6 KiB
API Reference: Malware IOC Extraction Agent
Dependencies
| Library | Version | Purpose |
|---|---|---|
| pefile | >=2023.2 | PE file parsing for imphash, sections, imports |
| yara-python | >=4.3 | YARA rule scanning against malware samples |
| requests | >=2.28 | VirusTotal API v3 IOC validation |
CLI Usage
python scripts/agent.py \
--sample /cases/malware.exe \
--yara-rules /rules/malware.yar \
--vt-key YOUR_VT_API_KEY \
--output-dir /cases/analysis/ \
--output ioc_report.json
Functions
compute_hashes(file_path) -> dict
Computes MD5, SHA-1, SHA-256 and file size for the sample.
extract_pe_metadata(file_path) -> dict
Parses PE headers via pefile: imphash, compile timestamp, section entropy, import table.
extract_strings(file_path, min_length) -> list
Extracts ASCII and Unicode strings (min 4 chars) from the binary.
extract_network_iocs(strings) -> dict
Regex extraction of IPs, domains, URLs, emails from strings. Filters private IP ranges.
extract_host_iocs(strings) -> dict
Identifies Windows file paths, registry keys, and mutex names from strings.
run_yara_scan(file_path, rules_path) -> list
Compiles and runs YARA rules against the sample. Returns matched rule names, tags, and string offsets.
validate_ioc_virustotal(ioc_value, ioc_type, api_key) -> dict
Queries VirusTotal API v3 for IP, domain, or file hash. Returns malicious/suspicious counts.
defang_ioc(value) -> str
Defangs IOCs by replacing http with hxxp and . with [.].
export_stix_bundle(iocs, sha256) -> dict
Builds a STIX 2.1 indicator bundle with file hash, IP, and domain patterns.
export_csv(iocs, hashes, output_path)
Writes IOCs to CSV format (type, value, context, confidence) for SIEM ingestion.
run_extraction(sample_path, output_dir, yara_rules, vt_key) -> dict
Orchestrates the full extraction pipeline and generates all output files.
Regex Patterns
| Pattern | Target |
|---|---|
\b(?:(?:25[0-5]|...)\.){3}...\b |
IPv4 addresses |
\b[a-zA-Z0-9]...\.[a-zA-Z]{2,}+\b |
Domain names |
https?://[^\s<>"'{}]+ |
URLs |
[a-zA-Z0-9_.+-]+@... |
Email addresses |
Output Schema
{
"hashes": {"md5": "...", "sha256": "...", "sha1": "..."},
"pe_metadata": {"imphash": "...", "compile_time": "...", "sections": []},
"network_iocs": {"ips": [], "domains": [], "urls": []},
"host_iocs": {"file_paths": [], "registry_keys": [], "mutexes": []},
"yara_matches": [{"rule": "APT28_dropper", "tags": ["apt"]}],
"summary": {"ips": 3, "domains": 5, "yara_hits": 1}
}