Anthropic-Cybersecurity-Skills/skills/extracting-iocs-from-malware-samples/references/api-reference.md

# API Reference: Malware IOC Extraction Agent

## Dependencies

| Library | Version | Purpose |
|---------|---------|---------|
| pefile | >=2023.2 | PE file parsing for imphash, sections, imports |
| yara-python | >=4.3 | YARA rule scanning against malware samples |
| requests | >=2.28 | VirusTotal API v3 IOC validation |

## CLI Usage

```bash
python scripts/agent.py \
  --sample /cases/malware.exe \
  --yara-rules /rules/malware.yar \
  --vt-key YOUR_VT_API_KEY \
  --output-dir /cases/analysis/ \
  --output ioc_report.json
```

## Functions

### `compute_hashes(file_path) -> dict`
Computes MD5, SHA-1, SHA-256 and file size for the sample.

### `extract_pe_metadata(file_path) -> dict`
Parses PE headers via pefile: imphash, compile timestamp, section entropy, import table.

### `extract_strings(file_path, min_length) -> list`
Extracts ASCII and Unicode strings (min 4 chars) from the binary.

### `extract_network_iocs(strings) -> dict`
Regex extraction of IPs, domains, URLs, emails from strings. Filters private IP ranges.

### `extract_host_iocs(strings) -> dict`
Identifies Windows file paths, registry keys, and mutex names from strings.

### `run_yara_scan(file_path, rules_path) -> list`
Compiles and runs YARA rules against the sample. Returns matched rule names, tags, and string offsets.

### `validate_ioc_virustotal(ioc_value, ioc_type, api_key) -> dict`
Queries VirusTotal API v3 for IP, domain, or file hash. Returns malicious/suspicious counts.

### `defang_ioc(value) -> str`
Defangs IOCs by replacing `http` with `hxxp` and `.` with `[.]`.

### `export_stix_bundle(iocs, sha256) -> dict`
Builds a STIX 2.1 indicator bundle with file hash, IP, and domain patterns.

### `export_csv(iocs, hashes, output_path)`
Writes IOCs to CSV format (type, value, context, confidence) for SIEM ingestion.

### `run_extraction(sample_path, output_dir, yara_rules, vt_key) -> dict`
Orchestrates the full extraction pipeline and generates all output files.

## Regex Patterns

| Pattern | Target |
|---------|--------|
| `\b(?:(?:25[0-5]\|...)\.){3}...\b` | IPv4 addresses |
| `\b[a-zA-Z0-9]...\.[a-zA-Z]{2,}+\b` | Domain names |
| `https?://[^\s<>"'{}]+` | URLs |
| `[a-zA-Z0-9_.+-]+@...` | Email addresses |

## Output Schema

```json
{
  "hashes": {"md5": "...", "sha256": "...", "sha1": "..."},
  "pe_metadata": {"imphash": "...", "compile_time": "...", "sections": []},
  "network_iocs": {"ips": [], "domains": [], "urls": []},
  "host_iocs": {"file_paths": [], "registry_keys": [], "mutexes": []},
  "yara_matches": [{"rule": "APT28_dropper", "tags": ["apt"]}],
  "summary": {"ips": 3, "domains": 5, "yara_hits": 1}
}
```