Production hardening: security fixes, code quality, 724 skills complete

- Fix 25 shell=True subprocess calls with list-based commands
- Fix 49 verify=False in defensive skills (env-var override)
- Add timeout to 231 HTTP/subprocess/socket calls
- Fix 6 SQL injection patterns with whitelist validation
- Replace 8 __import__() with standard imports
- Remove 701 unused imports across 442 files
- Add authorized-testing disclaimers to all offensive skills
- Complete 11 incomplete skill directories
- Expand 10 stub SKILL.md files with full content
- Fix 2 YAML parse errors in frontmatter
- Fix 5 pre-existing syntax errors
- Convert 22 hardcoded paths/ports to environment variables
- Back up 21 redundant skill pairs to .bak
- Fix 2 global declaration errors
- 724/724 skills with full folder anatomy (SKILL.md + agent.py + api-reference.md + LICENSE)
- 0 compile errors across all 724 agent.py files
This commit is contained in:
mukul975
2026-03-19 13:26:49 +01:00
parent 63b442d347
commit c47eed6a64
900 changed files with 23085 additions and 2720 deletions
+398
View File
@@ -0,0 +1,398 @@
# Cybersecurity Skills Repository -- Security & Quality Audit Report
**Audit Date:** 2026-03-17
**Repository:** Anthropic-Cybersecurity-Skills
**Auditors:** 15-agent automated audit team (silly-herding-tide)
**Scope:** All 742 skill directories, 734 SKILL.md files, 733 agent.py files
---
## Executive Summary
A comprehensive 14-task automated audit of 742 cybersecurity skill directories (734 with SKILL.md, 733 with agent.py) found **zero critical security vulnerabilities** (no eval/exec on live data, no prompt injection, no YAML injection, no real hardcoded secrets) but identified **25 HIGH-severity shell injection patterns** using `subprocess.run(shell=True)` with f-string interpolation, **178 instances of disabled SSL verification**, and **33 HTTP requests missing timeouts**. The repository content is verified as high-quality (87% of sampled skills confirmed real against official documentation, 0% fake), but has systemic quality issues: all 734 SKILL.md files contain extra frontmatter fields beyond the standard spec, 697/734 use an alternate body template lacking `## Instructions`/`## Examples` sections, and 9 offensive tools lack disclaimers in both their SKILL.md and agent.py files. The repo is **educational-grade, not production-safe** -- it is well-researched reference material with real code, but should not be deployed as-is in any environment accepting untrusted input.
---
## Security Findings
### CRITICAL
**eval/exec/pickle/marshal on live data: 0 findings**
- Scanned all 733 agent.py files for `eval(`, `exec(`, `pickle.loads(`, `marshal.loads(` used on live data
- 16 `eval(` matches were all string literals (SPL query syntax, regex patterns, CSP header text)
- 9 `exec(` matches were all function/variable names (e.g., `detect_psexec`) or regex patterns
- Zero instances of `pickle.loads()` or `marshal.loads()`
- **Verdict: CLEAN**
**Prompt injection in SKILL.md: 0 exploitable findings**
- Scanned all 734 SKILL.md files for "ignore previous", "you are now", "ADMIN:", `<system>`, `<prompt>`, `[INST]`, "as a helpful AI", hidden HTML comments, zero-width characters, base64 payloads
- 21 files matched patterns, but all are educational content explaining prompt injection as a security topic (e.g., skills about detecting/preventing prompt injection)
- **Verdict: CLEAN** (educational context, not weaponized)
**YAML injection in frontmatter: 0 findings**
- Scanned all 734 SKILL.md frontmatter blocks for injection patterns
- All matches were in body content (educational examples), not in frontmatter
- **Verdict: CLEAN**
**Real hardcoded secrets (API keys, tokens): 0 findings**
- Scanned for AKIA*, sk-*, ghp_*, real tokens, embedded base64 blobs
- Found default/example credentials only (see MEDIUM section)
- **Verdict: CLEAN**
### HIGH
**1. Shell injection via subprocess.run(shell=True) -- 25 instances**
25 agent.py files use `subprocess.run(cmd, shell=True, ...)`, and at least 4 use f-string interpolation of file paths directly into shell commands (e.g., `f"strings -n {min_length} {filepath}"`). If any of these scripts ever received untrusted input, shell injection would be trivial.
Top-risk files (f-string + shell=True):
- `analyzing-linux-elf-malware/scripts/agent.py` (lines 88, 129, 138, 151) -- **HIGHEST RISK: compound vulnerability.** Uses raw `sys.argv[1]` (not even argparse), flows unsanitized into both `open(filepath, "rb")` (path traversal at lines 25, 46, 67, 122) AND 4 `shell=True` f-string subprocess calls (shell injection). A malicious filename could both traverse the filesystem and execute arbitrary commands.
- `analyzing-network-traffic-for-incidents/scripts/agent.py` (lines 22, 35, 61, 124, 138)
- `performing-threat-emulation-with-atomic-red-team/scripts/agent.py` (lines 99, 128)
- `performing-privilege-escalation-assessment/scripts/agent.py` (line 30)
**Mitigating factor:** All scripts are CLI tools invoked locally via argparse (or sys.argv), not web-exposed. The user already has shell access.
**Risk: HIGH in reuse/integration contexts, LOW for current local-CLI usage.**
**2. Dynamic imports via __import__() -- 8 instances**
8 agent.py files use `__import__()` for inline imports of standard library modules (datetime, time, collections, os). Not malicious, but obscures dependencies and is an anti-pattern.
Files: `analyzing-threat-intelligence-feeds`, `bypassing-authentication-with-forced-browsing`, `conducting-api-security-testing`, `conducting-man-in-the-middle-attack-simulation`, `exploiting-ipv6-vulnerabilities`, `implementing-zero-trust-with-hashicorp-boundary`, `performing-hash-cracking-with-hashcat`, `performing-security-headers-audit`
**Risk: MEDIUM (poor practice, not exploitable)**
**3. Missing authorized-testing disclaimers -- 9 CRITICAL skills**
9 offensive security skills have NO disclaimer in EITHER their SKILL.md or agent.py:
1. `exploiting-excessive-data-exposure-in-api`
2. `performing-graphql-depth-limit-attack`
3. `performing-graphql-introspection-attack`
4. `performing-http-parameter-pollution-attack`
5. `performing-jwt-none-algorithm-attack`
6. `performing-supply-chain-attack-simulation`
7. `performing-web-cache-deception-attack`
8. `conducting-internal-network-penetration-test`
9. `conducting-mobile-application-penetration-test`
An additional 7 skills are missing disclaimers in agent.py only, and 20 are missing disclaimers in SKILL.md only. Total: 36 of 58 offensive skills have at least one missing disclaimer.
**Risk: HIGH (legal/liability concern for offensive tooling)**
### MEDIUM
**1. Disabled SSL verification (verify=False) -- 178 instances**
- 178 occurrences across agent.py files explicitly disable SSL certificate verification
- Common in tools connecting to local/lab instances (Splunk, SIEM, Nessus), but unsafe if pointed at production endpoints
- **Risk: MEDIUM**
**2. HTTP requests without timeout -- 33 instances**
- 33 HTTP request calls across agent.py files lack a `timeout` parameter
- Can cause indefinite hangs if target is unresponsive
- **Risk: MEDIUM**
**3. HTTP URLs instead of HTTPS -- 76 agent.py files**
- 76 scripts reference `http://` URLs
- Some are intentional (testing HTTP-specific vulnerabilities), others are careless defaults
- **Risk: LOW-MEDIUM**
**4. Default/example credentials in code -- ~9 instances**
- `neo4j`/`bloodhound` (BloodHound tool default)
- `admin`/`admin` (GVM default)
- `kismet`/`kismet` (Kismet default)
- `Harbor12345` (Harbor default)
- `SecureP@ss123` (demo password)
- All are well-known tool defaults or demo values, not real secrets
- **Risk: LOW (tool defaults, not real credentials)**
**5. Path traversal -- systemic but low-exploitability**
- ~342 agent.py files use `open()` with `args.*` parameters without path sanitization
- ~43 scripts create directories from unsanitized user input (`os.makedirs(args.output_dir)`)
- 1 script uses `shutil.rmtree()` on a derived path (`implementing-immutable-backup-with-restic`)
- Zero scripts validate that resolved paths stay within an expected base directory
- **Risk: LOW for CLI tools (user already has filesystem access), HIGH if ever web-exposed**
### LOW
**1. SQL injection patterns -- 6 MEDIUM findings**
- 6 agent.py files use SQL patterns that could be vulnerable (string formatting in queries)
- Limited scope -- most are local SQLite usage in forensics/logging contexts
- **Risk: MEDIUM (localized)**
**2. Minor format issues** (see Quality Findings below)
---
## Quality Findings
### SKILL.md Frontmatter Compliance (Task #4 -- auditor-4)
**732 of 734 SKILL.md files (99.7%) contain 6 extra frontmatter fields** beyond the minimal `name` + `description` spec:
- Extra fields present in nearly all files: `domain`, `subdomain`, `tags`, `version`, `author`, `license`
- **2 files have YAML parse errors** (unescaped colons in values)
- **ALL `name` values pass validation:** lowercase-with-hyphens, max 64 chars, no "claude" or "anthropic"
- **ALL `description` values pass validation:** under 1024 characters
- **Compliance with minimal two-field spec: 0%** (all have extra fields)
- **Compliance with extended format: 732/734 (99.7%)** (2 YAML errors)
**Verdict:** The frontmatter is internally consistent but uses a richer schema than the minimal two-field standard. This is a format standardization finding (the cybersecurity repo uses a different template than the ai-agents repo), not a security vulnerability. The 2 YAML parse errors should be fixed.
### SKILL.md Body Structure
Two distinct templates are in use across the repository:
**Primary template (697/734 = 95%):** Uses sections like `## When to Use`, `## Key Concepts`, `## Prerequisites`, `## Workflow`, `## Tools & Systems`, `## Output Format`, `## Common Scenarios`. Does NOT include `## Instructions` or `## Examples`.
**Standard template (37/734 = 5%):** Uses `## Instructions` and `## Examples` sections per the original spec.
Section presence across all 734 files:
- `## Prerequisites`: 627 (85%)
- `## Key Concepts`: 438 (60%)
- `## Workflow`: 369 (50%)
- `## When to Use`: 369 (50%)
- `## Tools & Systems`: 350 (48%)
- `## Overview`: 318 (43%)
- `## Output Format`: 326 (44%)
- `## Common Scenarios`: 300 (41%)
- `## Instructions`: 37 (5%)
- `## Examples`: 37 (5%)
**Quality issues:**
- Stub/minimal SKILL.md files (under 20 lines): **10 files**
- Placeholder text (`TODO`, `FIXME`, `lorem ipsum`, `placeholder`): **0 files** (per auditor-5 deep scan)
- Average SKILL.md length: **218 lines** (substantial content)
### agent.py Quality
- Total agent.py files: **733**
- Average length: **178 lines** (non-trivial implementations)
- Files under 10 lines: **0** (none suspiciously short)
- Total lines of Python code: **130,466**
- Boilerplate/generic agent.py detected: **~4 out of 30 sampled** (13%) -- these use a generic HTTP-request template instead of tool-specific implementation
### Missing Files
- Directories missing SKILL.md: **8** (all ransomware/recovery-related batch additions)
- `analyzing-ransomware-payment-wallets`
- `building-ransomware-playbook-with-cisa-framework`
- `deploying-decoy-files-for-ransomware-detection`
- `detecting-ransomware-encryption-behavior`
- `detecting-suspicious-powershell-execution`
- `implementing-anti-ransomware-group-policy`
- `implementing-ransomware-kill-switch-detection`
- `testing-ransomware-recovery-procedures`
- `validating-backup-integrity-for-recovery` (also missing SKILL.md)
- Directories missing agent.py: **9** (same set as above)
---
## Dependency Audit
### Top 30 Imports (by frequency across 733 agent.py files)
| Package | Count | Type | Status |
|---------|-------|------|--------|
| json | 689 | stdlib | Safe |
| argparse | 514 | stdlib | Safe |
| sys | 421 | stdlib | Safe |
| subprocess | 222 | stdlib | Safe (see shell=True findings) |
| os | 219 | stdlib | Safe |
| re | 197 | stdlib | Safe |
| logging | 133 | stdlib | Safe |
| hashlib | 95 | stdlib | Safe |
| requests | 82 | PyPI | Safe, well-known |
| csv | 46 | stdlib | Safe |
| time | 40 | stdlib | Safe |
| datetime | 32 | stdlib | Safe |
| math | 31 | stdlib | Safe |
| struct | 30 | stdlib | Safe |
| socket | 27 | stdlib | Safe |
| base64 | 22 | stdlib | Safe |
| xml | 19 | stdlib | Safe |
| urllib/urllib3 | 28 | stdlib/PyPI | Safe |
| boto3 | 15 | PyPI | Safe, AWS SDK |
| ssl | 12 | stdlib | Safe |
| email | 12 | stdlib | Safe |
| hmac | 9 | stdlib | Safe |
| splunklib | 8 | PyPI | Safe, Splunk SDK |
| uuid | 7 | stdlib | Safe |
| collections | 7 | stdlib | Safe |
| sqlite3 | 6 | stdlib | Safe |
| pandas | 6 | PyPI | Safe |
**Typosquatted packages found: 0**
**Known-malicious packages found: 0**
**Suspicious single-use packages found: 0**
**Packages not on PyPI found: 0**
All imports are well-known standard library modules or established PyPI packages (requests, boto3, splunklib, pandas, pefile, yara-python, python-nmap, sslyze, ldap3, etc.). No evidence of supply chain compromise.
---
## Content Verification
### Methodology
30 randomly selected skills across 10 categories (forensics, cloud, network, malware, web, endpoint, SIEM, appsec, identity, threat intel) were verified by reading both SKILL.md and agent.py, then cross-referencing tool commands, API methods, CLI flags, and MITRE ATT&CK IDs against official documentation via web search.
### Results
| Category | Count | Verdict |
|----------|-------|---------|
| VERIFIED (all code references real tools/APIs) | 26/30 | 87% |
| PARTIALLY_REAL (SKILL.md real, agent.py generic boilerplate) | 4/30 | 13% |
| FAKE (invented commands/APIs) | 0/30 | 0% |
**Key verification highlights:**
- All Volatility 3 plugin names confirmed real (windows.pslist, windows.psscan, windows.malfind)
- All Splunk SDK classes confirmed real (splunklib.client.connect, JSONResultsReader)
- All AWS CLI/boto3 commands verified (GuardDuty, CloudTrail, S3)
- All nmap flags verified against nmap.org documentation
- All sslyze classes confirmed against official docs
- All MITRE ATT&CK technique IDs verified (T1055.012, T1140, T1218.005, etc.)
- All Kubernetes commands verified against kubernetes.io
- All LDAP OIDs verified (1.2.840.113556.1.4.1941 for recursive group membership)
- LOLBin signatures verified against LOLBAS project
- Certipy/Certify commands verified for AD CS ESC1 exploitation
**PARTIALLY_REAL pattern:** 4 skills use a generic HTTP-request template in agent.py (`GET {target}/api/v1/status` with bearer token) instead of implementing the actual tool described in SKILL.md. Examples: `implementing-semgrep-for-custom-sast-rules`, `performing-dark-web-monitoring-for-threats`. This suggests template-based generation was used for a subset of agent.py files.
---
## Duplicate Analysis
### Methodology
Jaccard similarity analysis across all 742 skill directory names, comparing SKILL.md content.
### Results
- **Exact duplicates: 0**
- **Near-duplicate pairs (Jaccard >= 0.60): 67**
- Classified as REDUNDANT: **21 pairs**
- Classified as UNIQUE_TECHNIQUES (overlapping topic but different approach): **46 pairs**
The 21 redundant pairs likely result from skills being created under slightly different names covering the same tool or technique. These should be reviewed for consolidation.
---
## Folder Anatomy
### Expected structure per skill:
```
skill-name/
SKILL.md
scripts/
agent.py
```
### Completion Stats
| Component | Present | Missing | Percentage |
|-----------|---------|---------|------------|
| Total directories | 742 | -- | -- |
| SKILL.md | 734 | 8 | 98.9% |
| scripts/ directory | 742 | 0 | 100% |
| scripts/agent.py | 733 | 9 | 98.8% |
| Fully complete (SKILL.md + agent.py) | 731 | 11 | 98.5% |
| Empty shell directories (scripts/ only) | 8 | -- | 1.1% |
| Partial (missing one file) | 3 | -- | 0.4% |
Per auditor-13: 731 of 742 directories are fully complete (98.5%). 8 directories are empty shells containing only a scripts/ directory with no SKILL.md or agent.py. 3 directories are partial (have one file but not the other). The incomplete directories are predominantly from a ransomware/recovery-related batch addition.
---
## Statistics
| Category | Count |
|----------|-------|
| Total skill directories | 742 |
| Directories with SKILL.md | 734 (98.9%) |
| Directories with agent.py | 733 (98.8%) |
| SKILL.md frontmatter present | 734/734 (100%) |
| SKILL.md with extended frontmatter (extra fields) | 732/734 (99.7%) |
| SKILL.md frontmatter YAML parse errors | 2 |
| SKILL.md name field valid (lowercase-hyphens, <64 chars) | 734/734 (100%) |
| SKILL.md description field valid (<1024 chars) | 734/734 (100%) |
| Average SKILL.md length | 218 lines |
| Average agent.py length | 178 lines |
| Total Python code | 130,466 lines |
| Code security issues (CRITICAL -- eval/exec/pickle) | 0 |
| Code security issues (HIGH -- shell=True) | 25 |
| Code security issues (HIGH -- missing disclaimers) | 9 (both files) |
| Code security issues (MEDIUM -- SQL injection) | 6 |
| Dynamic imports (__import__) | 8 |
| verify=False (disabled SSL) | 178 |
| HTTP requests without timeout | 33 |
| HTTP URLs (not HTTPS) | 76 |
| Default credentials in code | ~9 |
| Prompt injection found | 0 (21 educational references) |
| YAML injection found | 0 |
| Hardcoded real secrets found | 0 |
| Typosquatted/malicious imports | 0 |
| Unique packages imported | 84 (all legitimate) |
| Skills verified as real code (sample) | 26/30 (87%) |
| Skills verified as partially real (sample) | 4/30 (13%) |
| Skills verified as fake | 0/30 (0%) |
| Exact duplicate skills | 0 |
| Near-duplicate (redundant) skill pairs | 21 |
| Overlap clusters | 4 |
| Complete folder anatomy | 731/742 (98.5%) |
| Empty shell directories | 8 |
| Partial directories | 3 |
| SKILL.md using alternate template | 697/734 (95%) |
| Stub SKILL.md files (<20 lines) | 10 |
| Placeholder text in SKILL.md | 0 |
| Offensive skills missing any disclaimer | 36/58 (62%) |
---
## Recommendations
### Priority 1 (HIGH): Fix shell injection patterns
Replace all 25 instances of `subprocess.run(cmd, shell=True)` with list-based commands and `shlex.split()`. This is especially urgent for the 4 files using f-string interpolation of file paths into shell commands (analyzing-linux-elf-malware, analyzing-network-traffic-for-incidents, performing-threat-emulation-with-atomic-red-team, performing-privilege-escalation-assessment).
### Priority 2 (HIGH): Add authorized-testing disclaimers to all 58 offensive skills
9 skills have zero disclaimers. 36 of 58 offensive skills are missing at least one disclaimer. Every offensive skill should have a clear disclaimer in both SKILL.md and agent.py stating: "For authorized security testing and educational purposes only. Unauthorized use against systems you do not own or have permission to test is illegal."
### Priority 3 (MEDIUM): Fix SSL verification and add timeouts
178 instances of `verify=False` disable SSL certificate validation. 33 HTTP requests lack timeouts. Add `timeout=30` to all HTTP calls and only disable SSL verification when explicitly connecting to local/lab instances with self-signed certificates.
### Priority 4 (MEDIUM): Complete the 11 incomplete skill directories
8 directories are empty shells and 3 are partial (missing either SKILL.md or agent.py). Either complete these skills or remove the incomplete directories.
### Priority 5 (LOW): Consolidate 21 redundant skill pairs
Review and merge or differentiate the 21 near-duplicate skill pairs to reduce redundancy and improve navigability.
---
## Final Verdict
### Is this repo "vibe coded"?
**No.** This is not vibe-coded. The evidence strongly indicates this is a carefully structured, systematically generated cybersecurity skills repository:
- **87% of sampled skills contain verified, accurate tool commands, API methods, CLI flags, and MITRE ATT&CK references** confirmed against official documentation
- **0% contain fabricated or invented tool commands** -- even the 13% classified as "partially real" have accurate SKILL.md content, just generic agent.py boilerplate
- **130,466 lines of Python** with an average of 178 lines per agent.py -- these are non-trivial implementations, not stubs
- **734 SKILL.md files** averaging 218 lines each with consistent frontmatter and structured sections
- **Zero critical security vulnerabilities** (no eval/exec exploitation, no prompt injection, no real secrets, no YAML injection, no supply chain compromised packages)
- The entire import set consists of well-known, legitimate packages
The repository shows hallmarks of systematic, high-quality generation with domain expertise: correct MITRE technique IDs, accurate tool-specific CLI flags, proper library usage patterns, and real-world security concepts. The 4/30 boilerplate agent.py files and the frontmatter consistency suggest automated generation with manual or expert-guided prompting, but the output quality is genuinely high.
### Is it production-safe?
**No, with caveats.** It is safe as a reference/educational resource but not safe to deploy directly:
1. **25 shell injection risks** (shell=True with interpolation) would be exploitable if scripts ever receive untrusted input
2. **178 disabled SSL verifications** and **33 missing timeouts** are not production-grade
3. **342 files accept file paths without sanitization** -- acceptable for CLI tools, dangerous in any other context
4. **36 offensive tools lack proper legal disclaimers** -- a liability concern
5. The code was designed as educational/reference material, not as production software
**Bottom line:** This is a high-quality, well-researched cybersecurity skills library with real, verified content and no critical vulnerabilities. It needs targeted hardening (shell injection, timeouts, disclaimers) before any production or public-facing use, but it is fundamentally sound educational material -- not a security risk in its intended context.
---
*Report compiled by auditor-15 from findings of all 14 specialized audit agents (14/14 tasks completed).*
*Audit completed: 2026-03-17*
+140
View File
@@ -0,0 +1,140 @@
#!/usr/bin/env python3
"""Add missing timeout= parameter to subprocess calls in agent.py files."""
import glob
import re
def add_timeout_to_subprocess_calls(filepath):
"""Add timeout=120 to subprocess.run/check_output/check_call calls missing it."""
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
content = f.read()
original = content
fixes = 0
funcs = ["subprocess.run", "subprocess.check_output", "subprocess.check_call"]
for func in funcs:
start = 0
while True:
idx = content.find(func + "(", start)
if idx == -1:
break
# Check if this line is a comment
line_start = content.rfind("\n", 0, idx) + 1
line_prefix = content[line_start:idx].lstrip()
if line_prefix.startswith("#"):
start = idx + 1
continue
# Find matching closing paren with basic string tracking
paren_depth = 0
pos = idx + len(func)
found_close = -1
in_str = None
escape_next = False
while pos < len(content):
ch = content[pos]
if escape_next:
escape_next = False
pos += 1
continue
if ch == "\\":
escape_next = True
pos += 1
continue
if in_str is None:
if ch == '"' and content[pos:pos+3] == '"""':
in_str = '"""'
pos += 3
continue
elif ch == "'" and content[pos:pos+3] == "'''":
in_str = "'''"
pos += 3
continue
elif ch == '"':
in_str = '"'
elif ch == "'":
in_str = "'"
elif ch == "(":
paren_depth += 1
elif ch == ")":
if paren_depth == 1:
found_close = pos
break
paren_depth -= 1
else:
if in_str == '"""' and content[pos:pos+3] == '"""':
in_str = None
pos += 3
continue
elif in_str == "'''" and content[pos:pos+3] == "'''":
in_str = None
pos += 3
continue
elif in_str == '"' and ch == '"':
in_str = None
elif in_str == "'" and ch == "'":
in_str = None
pos += 1
if found_close == -1:
start = idx + 1
continue
call_content = content[idx:found_close + 1]
if "timeout" not in call_content:
# Insert timeout=120 before the closing paren
before_close = content[:found_close].rstrip()
after_close = content[found_close + 1:]
# Determine indentation by looking at the line with the func call
func_line_start = content.rfind("\n", 0, idx) + 1
indent = ""
for c in content[func_line_start:]:
if c in (" ", "\t"):
indent += c
else:
break
# Check if call is multiline
call_text = content[idx:found_close]
if "\n" in call_text:
# Multiline: add timeout on new line with proper indent
content = before_close + ", timeout=120\n" + indent + ")" + after_close
else:
# Single line: add inline
content = content[:found_close] + ", timeout=120)" + after_close
fixes += 1
start = idx + 1
if fixes > 0:
with open(filepath, "w", encoding="utf-8") as f:
f.write(content)
return fixes
if __name__ == "__main__":
files = sorted(glob.glob("skills/*/scripts/agent.py"))
total_fixed = 0
files_fixed = 0
for filepath in files:
n = add_timeout_to_subprocess_calls(filepath)
if n > 0:
total_fixed += n
files_fixed += 1
print(f" Fixed {n} calls in {filepath}")
print(f"\nTotal: {total_fixed} subprocess calls fixed across {files_fixed} files")
@@ -1,17 +1,19 @@
#!/usr/bin/env python3
"""Forensic disk image acquisition agent using dd and dcfldd with hash verification."""
import shlex
import subprocess
import hashlib
import os
import sys
import datetime
import json
def run_cmd(cmd, capture=True):
"""Execute a shell command and return output."""
result = subprocess.run(cmd, shell=True, capture_output=capture, text=True)
"""Execute a command and return output."""
if isinstance(cmd, str):
cmd = shlex.split(cmd)
result = subprocess.run(cmd, capture_output=capture, text=True, timeout=120)
return result.stdout.strip(), result.stderr.strip(), result.returncode
@@ -65,16 +67,22 @@ def compute_hash(path, algorithm="sha256", block_size=65536):
def acquire_with_dd(source, destination, block_size=4096, log_file=None):
"""Acquire a forensic image using dd with error handling."""
cmd = (
f"dd if={source} of={destination} bs={block_size} "
f"conv=noerror,sync status=progress"
)
if log_file:
cmd += f" 2>&1 | tee {log_file}"
dd_cmd = [
"dd", f"if={source}", f"of={destination}",
f"bs={block_size}", "conv=noerror,sync", "status=progress"
]
print(f"[*] Starting dd acquisition: {source} -> {destination}")
print(f"[*] Block size: {block_size}")
start = datetime.datetime.utcnow()
_, stderr, rc = run_cmd(cmd, capture=False)
if log_file:
dd_proc = subprocess.run(dd_cmd, capture_output=True, text=True, timeout=120)
combined = (dd_proc.stdout or "") + (dd_proc.stderr or "")
with open(log_file, "w") as lf:
lf.write(combined)
rc = dd_proc.returncode
else:
result = subprocess.run(dd_cmd, text=True, timeout=120)
rc = result.returncode
elapsed = (datetime.datetime.utcnow() - start).total_seconds()
print(f"[*] Acquisition completed in {elapsed:.1f} seconds (rc={rc})")
return rc == 0
@@ -83,18 +91,21 @@ def acquire_with_dd(source, destination, block_size=4096, log_file=None):
def acquire_with_dcfldd(source, destination, hash_alg="sha256", hash_log=None,
error_log=None, block_size=4096, split_size=None):
"""Acquire a forensic image using dcfldd with built-in hashing."""
cmd = f"dcfldd if={source} of={destination} bs={block_size} conv=noerror,sync"
cmd += f" hash={hash_alg}"
cmd = [
"dcfldd", f"if={source}", f"of={destination}",
f"bs={block_size}", "conv=noerror,sync",
f"hash={hash_alg}", "hashwindow=1G",
]
if hash_log:
cmd += f" hashlog={hash_log}"
cmd += " hashwindow=1G"
cmd.append(f"hashlog={hash_log}")
if error_log:
cmd += f" errlog={error_log}"
cmd.append(f"errlog={error_log}")
if split_size:
cmd += f" split={split_size} splitformat=aa"
cmd.extend([f"split={split_size}", "splitformat=aa"])
print(f"[*] Starting dcfldd acquisition: {source} -> {destination}")
start = datetime.datetime.utcnow()
_, stderr, rc = run_cmd(cmd, capture=False)
result = subprocess.run(cmd, text=True, timeout=120)
rc = result.returncode
elapsed = (datetime.datetime.utcnow() - start).total_seconds()
print(f"[*] dcfldd completed in {elapsed:.1f} seconds (rc={rc})")
return rc == 0
@@ -9,6 +9,9 @@ author: mahipal
license: Apache-2.0
---
# Analyzing Active Directory ACL Abuse
## Overview
Active Directory Access Control Lists (ACLs) define permissions on AD objects through Discretionary Access Control Lists (DACLs) containing Access Control Entries (ACEs). Misconfigured ACEs can grant non-privileged users dangerous permissions such as GenericAll (full control), WriteDACL (modify permissions), WriteOwner (take ownership), and GenericWrite (modify attributes) on sensitive objects like Domain Admins groups, domain controllers, or GPOs.
@@ -4,11 +4,8 @@
import argparse
import json
import struct
import sys
from collections import defaultdict
from ldap3 import Server, Connection, ALL, NTLM, SUBTREE
from ldap3.protocol.formatters.formatters import format_sid
DANGEROUS_MASKS = {
@@ -1,15 +1,12 @@
#!/usr/bin/env python3
"""Agent for analyzing API Gateway access logs for security threats."""
import os
import re
import json
import argparse
from datetime import datetime
from collections import defaultdict
import pandas as pd
import numpy as np
def load_api_logs(log_path):
@@ -36,7 +36,7 @@ ATT&CK catalogs over 140 threat groups with documented technique usage. Each gro
The Navigator supports loading multiple layers simultaneously, allowing analysts to overlay threat actor TTPs against detection coverage to identify gaps, compare multiple APT groups to find common techniques worth prioritizing, and track technique coverage changes over time.
## Practical Steps
## Workflow
### Step 1: Query ATT&CK Data for APT Group
@@ -8,7 +8,6 @@ performs detection gap analysis, and generates threat-informed reports.
import json
import os
import sys
import hashlib
from collections import Counter
try:
@@ -112,8 +112,11 @@ def analyze_boot_code(mbr_data):
def run_volatility_rootkit_scan(memory_dump, plugin):
"""Run a Volatility 3 plugin for rootkit detection via subprocess."""
cmd = f"vol3 -f {memory_dump} {plugin}"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
result = subprocess.run(
["vol3", "-f", memory_dump, plugin],
capture_output=True, text=True,
timeout=120,
)
return result.stdout, result.stderr, result.returncode
@@ -10,8 +10,6 @@ import sys
import json
import sqlite3
import datetime
import hashlib
from collections import defaultdict
def chrome_time_to_datetime(chrome_time):
@@ -40,7 +40,7 @@ Campaign attribution analysis involves systematically evaluating evidence to det
### Analysis of Competing Hypotheses (ACH)
Structured analytical method that evaluates evidence against multiple competing hypotheses. Each piece of evidence is scored as consistent, inconsistent, or neutral with respect to each hypothesis. The hypothesis with the least inconsistent evidence is favored.
## Practical Steps
## Workflow
### Step 1: Collect Attribution Evidence
@@ -6,9 +6,6 @@ malware code similarity, timing patterns, and language artifacts.
"""
import json
import os
import sys
import hashlib
import re
from collections import defaultdict
from datetime import datetime
@@ -36,7 +36,7 @@ Attackers register lookalike domains and obtain free certificates (often from Le
crt.sh is a free web interface and PostgreSQL database operated by Sectigo that indexes CT logs. It supports wildcard searches (`%.example.com`), direct SQL queries, and JSON API responses. It tracks certificate issuance, expiration, and revocation across all major CT logs.
## Practical Steps
## Workflow
### Step 1: Query crt.sh for Certificate History
@@ -6,10 +6,7 @@ certificates, and identifies potential phishing infrastructure.
"""
import json
import os
import sys
import re
from datetime import datetime
from collections import defaultdict
try:
@@ -13,6 +13,9 @@ author: mahipal
license: Apache-2.0
---
# Analyzing Cloud Storage Access Patterns
## Instructions
1. Install dependencies: `pip install boto3 requests`
@@ -21,7 +21,7 @@ def query_cloudtrail_s3_events(bucket_name, hours_back=24):
"--start-time", start_time,
"--output", "json",
]
result = subprocess.run(cmd, capture_output=True, text=True)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
if result.returncode != 0:
logger.error("CloudTrail query failed: %s", result.stderr[:200])
return []
@@ -37,7 +37,7 @@ The beacon configuration encodes the malleable C2 profile that dictates HTTP req
Each Cobalt Strike license embeds a unique watermark (4-byte integer) into generated beacons. Extracting the watermark can link multiple beacons to the same operator or cracked license. Known watermark databases maintained by threat intelligence providers map watermarks to specific threat actors or leaked license keys.
## Practical Steps
## Workflow
### Step 1: Extract Configuration with CobaltStrikeParser
@@ -8,9 +8,7 @@ communication settings, malleable C2 profile details, and watermark values.
import struct
import os
import sys
import json
import hashlib
import re
from collections import OrderedDict
# Cobalt Strike beacon configuration field IDs (Type-Length-Value format)
@@ -5,7 +5,6 @@
import argparse
import json
import re
import sys
from collections import Counter
from datetime import datetime
from pathlib import Path
@@ -3,13 +3,12 @@
import statistics
import base64
import json
import os
import sys
from collections import defaultdict
try:
from scapy.all import rdpcap, IP, TCP, UDP, DNS, DNSQR, Raw
from scapy.all import rdpcap, IP, TCP, DNS, DNSQR
HAS_SCAPY = True
except ImportError:
HAS_SCAPY = False
@@ -1,9 +1,6 @@
#!/usr/bin/env python3
"""Cyber Kill Chain analysis agent for mapping incidents to Lockheed Martin kill chain phases."""
import json
import os
import sys
import datetime
@@ -1,6 +1,7 @@
#!/usr/bin/env python3
"""Forensic disk image analysis agent using The Sleuth Kit (TSK) command-line tools."""
import shlex
import subprocess
import os
import sys
@@ -10,8 +11,10 @@ import datetime
def run_cmd(cmd):
"""Execute a shell command and return output."""
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
"""Execute a command and return output."""
if isinstance(cmd, str):
cmd = shlex.split(cmd)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
return result.stdout.strip(), result.stderr.strip(), result.returncode
@@ -93,9 +96,15 @@ def list_deleted_files(image_path, offset):
def recover_file(image_path, offset, inode, output_path):
"""Recover a file by inode using icat."""
cmd = f"icat -o {offset} {image_path} {inode} > {output_path}"
_, _, rc = run_cmd(cmd)
return rc == 0
result = subprocess.run(
["icat", "-o", str(offset), image_path, str(inode)],
capture_output=True,
timeout=120,
)
if result.returncode == 0:
with open(output_path, "wb") as f:
f.write(result.stdout)
return result.returncode == 0
def get_file_metadata(image_path, offset, inode):
@@ -106,26 +115,40 @@ def get_file_metadata(image_path, offset, inode):
def create_bodyfile(image_path, offset, output_path):
"""Generate a TSK bodyfile for timeline creation."""
cmd = f'fls -r -m "/" -o {offset} {image_path} > {output_path}'
_, _, rc = run_cmd(cmd)
return rc == 0
result = subprocess.run(
["fls", "-r", "-m", "/", "-o", str(offset), image_path],
capture_output=True, text=True,
timeout=120,
)
if result.returncode == 0:
with open(output_path, "w") as f:
f.write(result.stdout)
return result.returncode == 0
def generate_timeline(bodyfile_path, output_csv, start_date=None, end_date=None):
"""Generate a timeline from a bodyfile using mactime."""
cmd = f"mactime -b {bodyfile_path} -d"
cmd = ["mactime", "-b", bodyfile_path, "-d"]
if start_date and end_date:
cmd += f" {start_date}..{end_date}"
cmd += f" > {output_csv}"
_, _, rc = run_cmd(cmd)
return rc == 0
cmd.append(f"{start_date}..{end_date}")
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
if result.returncode == 0:
with open(output_csv, "w") as f:
f.write(result.stdout)
return result.returncode == 0
def search_keywords(image_path, offset, keyword):
"""Search for keyword strings in the disk image."""
cmd = f'srch_strings -a -o {offset} {image_path} | grep -i "{keyword}"'
stdout, _, rc = run_cmd(cmd)
return stdout.splitlines() if rc == 0 else []
result = subprocess.run(
["srch_strings", "-a", "-o", str(offset), image_path],
capture_output=True, text=True,
timeout=120,
)
if result.returncode != 0 or not result.stdout:
return []
keyword_lower = keyword.lower()
return [line for line in result.stdout.splitlines() if keyword_lower in line.lower()]
def find_file_signature(image_path, offset, hex_signature):
@@ -179,7 +202,8 @@ if __name__ == "__main__":
if len(sys.argv) > 1:
image = sys.argv[1]
case = sys.argv[2] if len(sys.argv) > 2 else "/tmp/autopsy_case"
import tempfile
case = sys.argv[2] if len(sys.argv) > 2 else os.environ.get("AUTOPSY_CASE_DIR", os.path.join(tempfile.gettempdir(), "autopsy_case"))
if os.path.exists(image):
analyze_image(image, case)
else:
@@ -2,11 +2,6 @@
"""DNS exfiltration detection agent using entropy analysis and query pattern detection."""
import math
import os
import sys
import json
import csv
import datetime
from collections import Counter, defaultdict
@@ -1,6 +1,7 @@
#!/usr/bin/env python3
"""Docker container forensics agent for investigating compromised containers."""
import shlex
import subprocess
import json
import os
@@ -10,8 +11,10 @@ import datetime
def run_cmd(cmd):
"""Execute a shell command and return output."""
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
"""Execute a command and return output."""
if isinstance(cmd, str):
cmd = shlex.split(cmd)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
return result.stdout.strip(), result.stderr.strip(), result.returncode
@@ -134,9 +137,13 @@ def detect_suspicious_files(changes):
def export_container(container_id, output_path):
"""Export container filesystem as a tarball for offline analysis."""
cmd = f"docker export {container_id} > {output_path}"
_, _, rc = run_cmd(cmd)
if rc == 0 and os.path.exists(output_path):
with open(output_path, "wb") as out_f:
result = subprocess.run(
["docker", "export", container_id],
stdout=out_f, stderr=subprocess.PIPE,
timeout=120,
)
if result.returncode == 0 and os.path.exists(output_path):
sha256 = hashlib.sha256()
with open(output_path, "rb") as f:
for chunk in iter(lambda: f.read(65536), b""):
@@ -8,7 +8,6 @@ import hashlib
import os
import sys
import subprocess
import json
from email import policy
@@ -147,9 +146,10 @@ def extract_attachments(msg, output_dir=None):
def dns_lookup(domain, record_type="TXT"):
"""Perform DNS lookup for SPF/DKIM/DMARC records."""
cmd = f"dig {record_type} {domain} +short"
stdout, _, rc = subprocess.run(cmd, shell=True, capture_output=True, text=True,
timeout=10).stdout, "", 0
stdout, _, rc = subprocess.run(
["dig", record_type, domain, "+short"],
capture_output=True, text=True, timeout=10
).stdout, "", 0
return stdout.strip() if stdout else ""
@@ -5,7 +5,6 @@ import json
import argparse
import logging
import subprocess
import os
from collections import defaultdict
from datetime import datetime
@@ -37,7 +37,7 @@ Despite stripping symbol tables, Go binaries retain function names within the pc
Go's dependency management embeds module paths and version strings in the binary. Extracting these reveals the malware's third-party dependencies (HTTP libraries, encryption packages, C2 frameworks), which provides insight into capabilities without full reverse engineering.
## Practical Steps
## Workflow
### Step 1: Initial Binary Analysis
@@ -5,7 +5,6 @@ Analyzes Go binaries to extract function names, strings, build metadata,
package information, and detects common Go malware characteristics.
"""
import struct
import os
import sys
import json
@@ -3,9 +3,7 @@
import re
import os
import sys
import json
import hashlib
import datetime
try:
@@ -69,7 +67,7 @@ def is_private_ip(ip):
def query_virustotal_hash(sha256, api_key):
"""Query VirusTotal for a file hash."""
url = f"https://www.virustotal.com/api/v3/files/{sha256}"
resp = requests.get(url, headers={"x-apikey": api_key})
resp = requests.get(url, headers={"x-apikey": api_key}, timeout=30)
if resp.status_code == 200:
data = resp.json().get("data", {}).get("attributes", {})
stats = data.get("last_analysis_stats", {})
@@ -88,7 +86,7 @@ def query_virustotal_hash(sha256, api_key):
def query_virustotal_domain(domain, api_key):
"""Query VirusTotal for domain reputation."""
url = f"https://www.virustotal.com/api/v3/domains/{domain}"
resp = requests.get(url, headers={"x-apikey": api_key})
resp = requests.get(url, headers={"x-apikey": api_key}, timeout=30)
if resp.status_code == 200:
data = resp.json().get("data", {}).get("attributes", {})
stats = data.get("last_analysis_stats", {})
@@ -107,7 +105,7 @@ def query_abuseipdb(ip, api_key, max_age_days=90):
"""Query AbuseIPDB for IP reputation."""
url = "https://api.abuseipdb.com/api/v2/check"
resp = requests.get(url, headers={"Key": api_key, "Accept": "application/json"},
params={"ipAddress": ip, "maxAgeInDays": max_age_days})
params={"ipAddress": ip, "maxAgeInDays": max_age_days}, timeout=30)
if resp.status_code == 200:
data = resp.json().get("data", {})
return {
@@ -125,7 +123,7 @@ def query_abuseipdb(ip, api_key, max_age_days=90):
def query_malwarebazaar(sha256):
"""Query MalwareBazaar for file hash information."""
url = "https://mb-api.abuse.ch/api/v1/"
resp = requests.post(url, data={"query": "get_info", "hash": sha256})
resp = requests.post(url, data={"query": "get_info", "hash": sha256}, timeout=30)
if resp.status_code == 200:
result = resp.json()
if result.get("query_status") == "ok" and result.get("data"):
@@ -7,9 +7,7 @@ keychain dumping, filesystem inspection, and jailbreak detection bypass.
import subprocess
import json
import os
import sys
import re
def run_objection(command, app_id=None, timeout=30):
@@ -1,7 +1,6 @@
#!/usr/bin/env python3
"""Agent for analyzing Kubernetes audit logs for security threats."""
import os
import json
import argparse
from collections import defaultdict
@@ -7,7 +7,6 @@ unauthorized file access, suspicious syscalls, and process execution anomalies.
import argparse
import json
import os
import re
import sys
import datetime
@@ -6,12 +6,10 @@ import math
import os
import sys
import subprocess
import struct
from collections import Counter
try:
from elftools.elf.elffile import ELFFile
from elftools.elf.sections import SymbolTableSection
HAS_ELFTOOLS = True
except ImportError:
HAS_ELFTOOLS = False
@@ -85,9 +83,9 @@ def analyze_sections(filepath):
def extract_strings(filepath, min_length=6):
"""Extract ASCII strings from the binary and categorize by type."""
stdout, _, rc = subprocess.run(
f"strings -n {min_length} {filepath}", shell=True,
["strings", "-n", str(min_length), filepath],
capture_output=True, text=True
).stdout, "", 0
, timeout=120).stdout, "", 0
if not stdout:
return {}
all_strings = stdout.strip().splitlines()
@@ -126,8 +124,9 @@ def check_packing(filepath):
indicators.append("UPX packer detected (UPX! magic)")
if b"UPX0" in data or b"UPX1" in data:
indicators.append("UPX section names found")
stdout, _, _ = subprocess.run(f"upx -t {filepath} 2>&1", shell=True,
capture_output=True, text=True).stdout, "", 0
stdout, _, _ = subprocess.run(["upx", "-t", filepath],
capture_output=True, text=True,
stderr=subprocess.STDOUT, timeout=120).stdout, "", 0
if stdout and "packed" in stdout.lower():
indicators.append("UPX verification confirms packing")
return indicators
@@ -135,8 +134,8 @@ def check_packing(filepath):
def analyze_dynamic_linking(filepath):
"""Analyze dynamic linking information and imported functions."""
stdout, _, rc = subprocess.run(f"readelf -d {filepath}", shell=True,
capture_output=True, text=True).stdout, "", 0
stdout, _, rc = subprocess.run(["readelf", "-d", filepath],
capture_output=True, text=True, timeout=120).stdout, "", 0
dynamic_info = {"libraries": [], "rpath": None}
if stdout:
for line in stdout.splitlines():
@@ -146,10 +145,17 @@ def analyze_dynamic_linking(filepath):
if "RPATH" in line or "RUNPATH" in line:
dynamic_info["rpath"] = line.split("[")[-1].rstrip("]")
stdout2, _, _ = subprocess.run(
f"readelf -r {filepath} | grep -E 'socket|connect|exec|fork|open|write|bind|listen|send|recv'",
shell=True, capture_output=True, text=True
).stdout, "", 0
readelf_proc = subprocess.run(
["readelf", "-r", filepath],
capture_output=True, text=True,
timeout=120,
)
import re as _re
suspicious_funcs = _re.compile(r'socket|connect|exec|fork|open|write|bind|listen|send|recv')
stdout2 = "\n".join(
line for line in (readelf_proc.stdout or "").splitlines()
if suspicious_funcs.search(line)
)
dynamic_info["suspicious_imports"] = [
line.strip() for line in (stdout2 or "").splitlines() if line.strip()
]
@@ -6,7 +6,6 @@ import argparse
import logging
import subprocess
import os
from collections import defaultdict
from datetime import datetime
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
@@ -4,15 +4,15 @@
import os
import sys
import glob
import json
import re
import datetime
import shlex
import subprocess
def run_cmd(cmd):
"""Execute a shell command and return output."""
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=30)
"""Execute a command and return output."""
if isinstance(cmd, str):
cmd = shlex.split(cmd)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
return result.stdout.strip(), result.stderr.strip(), result.returncode
@@ -196,10 +196,12 @@ def check_ld_preload(evidence_root):
def find_suid_binaries(evidence_root):
"""Find SUID/SGID binaries (potential privilege escalation)."""
stdout, _, rc = run_cmd(
f"find {evidence_root} -perm -4000 -type f 2>/dev/null"
result = subprocess.run(
["find", evidence_root, "-perm", "-4000", "-type", "f"],
capture_output=True, text=True, timeout=30
)
return stdout.splitlines() if rc == 0 and stdout else []
stdout = result.stdout.strip()
return stdout.splitlines() if result.returncode == 0 and stdout else []
def find_suspicious_tmp_files(evidence_root):
@@ -5,12 +5,11 @@ import re
import os
import sys
import hashlib
import subprocess
import json
import zipfile
try:
from oletools.olevba import VBA_Parser, TYPE_OLE, TYPE_OpenXML
from oletools.olevba import VBA_Parser
from oletools import oleid
HAS_OLETOOLS = True
except ImportError:
@@ -37,9 +37,9 @@ def run_pdfid(filepath):
"""Run pdfid.py to triage PDF for suspicious keywords."""
cmd = ["python3", "-m", "pdfid", filepath]
alt_cmd = ["pdfid.py", filepath]
result = subprocess.run(cmd, capture_output=True, text=True)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
if result.returncode != 0:
result = subprocess.run(alt_cmd, capture_output=True, text=True)
result = subprocess.run(alt_cmd, capture_output=True, text=True, timeout=120)
keywords = {}
for line in result.stdout.strip().split("\n"):
line = line.strip()
@@ -59,9 +59,9 @@ def run_peepdf_analysis(filepath):
"""Run peepdf for detailed PDF object analysis."""
cmd = ["peepdf", "-f", "-l", filepath]
alt_cmd = ["python3", "-m", "peepdf", "-f", "-l", filepath]
result = subprocess.run(cmd, capture_output=True, text=True)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
if result.returncode != 0:
result = subprocess.run(alt_cmd, capture_output=True, text=True)
result = subprocess.run(alt_cmd, capture_output=True, text=True, timeout=120)
analysis = {
"versions": 0,
"objects": 0,
@@ -98,7 +98,7 @@ def run_pdf_parser(filepath, object_id=None):
cmd = ["pdf-parser.py", "-o", str(object_id), "-f", "-d", filepath]
else:
cmd = ["pdf-parser.py", "--stats", filepath]
result = subprocess.run(cmd, capture_output=True, text=True)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
return result.stdout[:3000]
@@ -107,7 +107,7 @@ def extract_javascript(filepath, peepdf_analysis):
js_content = []
for obj_id in peepdf_analysis.get("js_objects", []):
cmd = ["pdf-parser.py", "-o", str(obj_id), "-f", "-w", filepath]
result = subprocess.run(cmd, capture_output=True, text=True)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
if result.stdout:
js_content.append({
"object_id": obj_id,
@@ -41,7 +41,7 @@ URLScan.io is a free service for scanning and analyzing suspicious URLs. It capt
- Data URIs or base64-encoded content
- JavaScript-heavy pages with minimal HTML
## Implementation Steps
## Workflow
### Step 1: Submit URL to URLScan
```
@@ -4,9 +4,7 @@
import json
import os
import sys
import subprocess
import hashlib
import datetime
try:
import requests
@@ -30,7 +28,7 @@ def submit_file(filepath, timeout=300, machine=None, package=None):
data["machine"] = machine
if package:
data["package"] = package
resp = requests.post(url, files=files, data=data)
resp = requests.post(url, files=files, data=data, timeout=30)
if resp.status_code == 200:
return resp.json().get("task_id")
return None
@@ -42,7 +40,7 @@ def submit_url(url_to_analyze, timeout=300):
return None
url = f"{CUCKOO_API}/tasks/create/url"
data = {"url": url_to_analyze, "timeout": timeout}
resp = requests.post(url, data=data)
resp = requests.post(url, data=data, timeout=30)
if resp.status_code == 200:
return resp.json().get("task_id")
return None
@@ -53,7 +51,7 @@ def get_task_status(task_id):
if not HAS_REQUESTS:
return None
url = f"{CUCKOO_API}/tasks/view/{task_id}"
resp = requests.get(url)
resp = requests.get(url, timeout=30)
if resp.status_code == 200:
return resp.json().get("task", {}).get("status")
return None
@@ -36,7 +36,7 @@ Malpedia uses the format `platform.family_name` (e.g., `win.emotet`, `elf.mirai`
Malware families have relationships including: parent-child (code reuse, forks), loader-payload (Emotet loads TrickBot loads Ryuk), shared authorship (same threat actor develops multiple tools), and infrastructure sharing (common C2 frameworks).
## Practical Steps
## Workflow
### Step 1: Query Malpedia API for Malware Families
@@ -22,7 +22,7 @@ Sysinternals Autoruns extracts data from hundreds of Auto-Start Extensibility Po
- VirusTotal API key for reputation checks
- Clean baseline export for comparison
## Practical Steps
## Workflow
### Step 1: Automated Persistence Scanning
@@ -3,7 +3,6 @@
import json
import csv
import os
import re
import logging
import argparse
@@ -4,7 +4,6 @@
import json
import argparse
from datetime import datetime
from collections import defaultdict
TIMING_APIS = {
"GetTickCount", "GetTickCount64", "QueryPerformanceCounter",
@@ -1,19 +1,18 @@
#!/usr/bin/env python3
"""Memory forensics agent using Volatility 3 for malware detection in RAM dumps."""
import shlex
import subprocess
import os
import sys
import json
import csv
import re
import io
def run_vol3(memory_dump, plugin, extra_args=""):
"""Execute a Volatility 3 plugin and return output."""
cmd = f"vol3 -f {memory_dump} {plugin} {extra_args}"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=300)
cmd = ["vol3", "-f", memory_dump, plugin]
if extra_args:
cmd.extend(shlex.split(extra_args))
result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
return result.stdout.strip(), result.stderr.strip(), result.returncode
@@ -1,7 +1,6 @@
#!/usr/bin/env python3
"""Agent for Linux memory forensics using LiME acquisition and Volatility 3."""
import os
import json
import subprocess
import argparse
@@ -12,13 +11,13 @@ from pathlib import Path
def acquire_memory_lime(output_path, lime_format="lime"):
"""Acquire memory using LiME kernel module."""
kernel_version = subprocess.run(
["uname", "-r"], capture_output=True, text=True
["uname", "-r"], capture_output=True, text=True, timeout=120
).stdout.strip()
lime_module = f"lime-{kernel_version}.ko"
if not Path(lime_module).exists():
lime_module = "lime.ko"
cmd = ["insmod", lime_module, f"path={output_path}", f"format={lime_format}"]
result = subprocess.run(cmd, capture_output=True, text=True)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
return {
"status": "success" if result.returncode == 0 else "failed",
"output_path": output_path,
@@ -22,7 +22,7 @@ Malware uses covert channels to disguise C2 communication and data exfiltration
- DNS query logging infrastructure
- Understanding of DNS, ICMP, HTTP protocols at packet level
## Practical Steps
## Workflow
### Step 1: DNS Tunneling Detection
@@ -9,11 +9,10 @@ import os
import sys
import json
import math
import hashlib
from collections import Counter, defaultdict
try:
from scapy.all import rdpcap, DNS, DNSQR, DNSRR, ICMP, IP, TCP, UDP, Raw
from scapy.all import rdpcap, DNS, DNSQR, ICMP, IP, TCP, Raw
HAS_SCAPY = True
except ImportError:
HAS_SCAPY = False
@@ -13,6 +13,9 @@ author: mahipal
license: Apache-2.0
---
# Analyzing Network Flow Data with Netflow
## Instructions
1. Install dependencies: `pip install netflow`
@@ -7,7 +7,7 @@ import argparse
from collections import defaultdict, Counter
from datetime import datetime
from scapy.all import rdpcap, IP, TCP, UDP, DNS, DNSQR, ICMP, Raw
from scapy.all import rdpcap, IP, TCP, UDP, DNS, DNSQR, ICMP
def load_pcap(filepath):
@@ -6,10 +6,10 @@ import os
import sys
import json
import statistics
from collections import defaultdict, Counter
from collections import defaultdict
try:
from scapy.all import rdpcap, IP, TCP, UDP, DNS, DNSQR, Raw, ARP
from scapy.all import rdpcap, IP, TCP, DNS
HAS_SCAPY = True
except ImportError:
HAS_SCAPY = False
@@ -17,9 +17,11 @@ except ImportError:
def run_tshark(pcap_path, display_filter, fields):
"""Run tshark with a display filter and extract specific fields."""
field_args = " ".join(f"-e {f}" for f in fields)
cmd = f'tshark -r {pcap_path} -Y "{display_filter}" -T fields {field_args} -E separator="|"'
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=120)
cmd = ["tshark", "-r", pcap_path, "-Y", display_filter, "-T", "fields"]
for f in fields:
cmd += ["-e", f]
cmd += ["-E", "separator=|"]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
rows = []
if result.returncode == 0:
for line in result.stdout.strip().splitlines():
@@ -31,8 +33,8 @@ def run_tshark(pcap_path, display_filter, fields):
def get_pcap_summary(pcap_path):
"""Get high-level PCAP statistics."""
cmd = f"tshark -r {pcap_path} -q -z conv,ip"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=60)
cmd = ["tshark", "-r", pcap_path, "-q", "-z", "conv,ip"]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
return result.stdout if result.returncode == 0 else ""
@@ -57,8 +59,8 @@ def detect_lateral_movement(pcap_path):
def detect_data_exfiltration(pcap_path, threshold_mb=10):
"""Detect potential data exfiltration based on outbound data volume."""
cmd = f'tshark -r {pcap_path} -q -z conv,ip'
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=60)
cmd = ["tshark", "-r", pcap_path, "-q", "-z", "conv,ip"]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
suspects = []
if result.returncode == 0:
for line in result.stdout.splitlines():
@@ -120,10 +122,13 @@ def extract_dns_queries(pcap_path):
def detect_ids_alerts(pcap_path):
"""Run Suricata on the PCAP and extract alerts."""
cmd = f"suricata -r {pcap_path} -l /tmp/suricata_output -k none 2>/dev/null"
subprocess.run(cmd, shell=True, timeout=120)
import tempfile
suricata_output = os.environ.get("SURICATA_OUTPUT_DIR", os.path.join(tempfile.gettempdir(), "suricata_output"))
os.makedirs(suricata_output, exist_ok=True)
cmd = ["suricata", "-r", pcap_path, "-l", suricata_output, "-k", "none"]
subprocess.run(cmd, capture_output=True, timeout=120)
alerts = []
alert_file = "/tmp/suricata_output/fast.log"
alert_file = os.path.join(suricata_output, "fast.log")
if os.path.exists(alert_file):
with open(alert_file, "r") as f:
for line in f:
@@ -134,8 +139,8 @@ def detect_ids_alerts(pcap_path):
def extract_http_objects(pcap_path, output_dir):
"""Extract HTTP objects (files) from the PCAP."""
os.makedirs(output_dir, exist_ok=True)
cmd = f'tshark -r {pcap_path} --export-objects "http,{output_dir}"'
subprocess.run(cmd, shell=True, timeout=60)
cmd = ["tshark", "-r", pcap_path, "--export-objects", f"http,{output_dir}"]
subprocess.run(cmd, capture_output=True, timeout=60)
exported = []
if os.path.exists(output_dir):
for f in os.listdir(output_dir):
@@ -3,9 +3,7 @@
import os
import sys
import json
import math
import subprocess
from collections import defaultdict, Counter
try:
@@ -15,7 +13,7 @@ except ImportError:
HAS_DPKT = False
try:
from scapy.all import rdpcap, IP, TCP, UDP, DNS, DNSQR, Raw
from scapy.all import rdpcap, IP, TCP, DNS, DNSQR
HAS_SCAPY = True
except ImportError:
HAS_SCAPY = False
@@ -2,26 +2,24 @@
"""Wireshark/tshark packet analysis agent for network security investigations."""
import subprocess
import shlex
import os
import sys
import json
import re
from collections import defaultdict
def run_tshark(pcap_path, args):
"""Execute tshark with custom arguments."""
cmd = f"tshark -r {pcap_path} {args}"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=120)
cmd = ["tshark", "-r", pcap_path] + shlex.split(args)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
return result.stdout.strip(), result.stderr.strip(), result.returncode
def capture_live(interface, output_path, duration=60, capture_filter=None):
"""Start a live packet capture using tshark."""
cmd = f"tshark -i {interface} -w {output_path} -a duration:{duration}"
cmd = ["tshark", "-i", interface, "-w", output_path, "-a", f"duration:{duration}"]
if capture_filter:
cmd += f' -f "{capture_filter}"'
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=duration + 10)
cmd += ["-f", capture_filter]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=duration + 10)
return result.returncode == 0
@@ -10,8 +10,6 @@ import sys
import json
import hashlib
import re
from datetime import datetime
from collections import defaultdict
try:
import pypff
@@ -5,7 +5,6 @@ import subprocess
import os
import sys
import hashlib
import struct
import math
from collections import Counter
@@ -129,8 +128,8 @@ def unpack_upx(filepath, output_path=None):
if output_path is None:
output_path = filepath + ".unpacked"
# First try standard UPX decompression
cmd = f"upx -d -o {output_path} {filepath}"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
cmd = ["upx", "-d", "-o", output_path, filepath]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
if result.returncode == 0:
return True, "Standard UPX unpack succeeded", output_path
@@ -5,9 +5,7 @@ import re
import os
import sys
import hashlib
import json
import zlib
import struct
def compute_hash(filepath):
@@ -44,7 +44,8 @@ def scan_crontabs():
findings.extend(_scan_cron_file(full_path))
user_crontabs = subprocess.run(
["bash", "-c", "for u in $(cut -d: -f1 /etc/passwd); do crontab -l -u $u 2>/dev/null && echo \"__USER:$u\"; done"],
capture_output=True, text=True
capture_output=True, text=True,
timeout=120,
)
if user_crontabs.returncode == 0:
current_user = None
@@ -112,7 +113,8 @@ def scan_systemd_units():
if re.search(pattern, ex, re.IGNORECASE):
risk = "critical"
dpkg_check = subprocess.run(
["dpkg", "-S", unit_file], capture_output=True, text=True
["dpkg", "-S", unit_file], capture_output=True, text=True,
timeout=120,
)
package_managed = dpkg_check.returncode == 0
if not package_managed:
@@ -141,7 +143,7 @@ def scan_ld_preload():
"libraries": content.splitlines(), "risk": "critical",
"mitre": "T1574.006",
})
env_check = subprocess.run(["env"], capture_output=True, text=True)
env_check = subprocess.run(["env"], capture_output=True, text=True, timeout=120)
for line in env_check.stdout.splitlines():
if line.startswith("LD_PRELOAD="):
findings.append({
@@ -174,7 +176,7 @@ def scan_shell_profiles():
continue
etc_profiles = glob.glob("/etc/profile.d/*.sh")
for filepath in etc_profiles:
dpkg = subprocess.run(["dpkg", "-S", filepath], capture_output=True, text=True)
dpkg = subprocess.run(["dpkg", "-S", filepath], capture_output=True, text=True, timeout=120)
if dpkg.returncode != 0:
findings.append({
"type": "etc_profile_d", "path": filepath,
@@ -7,12 +7,9 @@ suspicious routing, and phishing indicators.
import os
import sys
import json
import re
import email
import email.utils
from datetime import datetime
from collections import OrderedDict
def parse_email_file(filepath):
@@ -13,6 +13,9 @@ author: mahipal
license: Apache-2.0
---
# Analyzing PowerShell Script Block Logging
## Instructions
1. Install dependencies: `pip install python-evtx lxml`
@@ -4,7 +4,6 @@
import struct
import os
import sys
import hashlib
import datetime
import json
import glob
@@ -164,11 +163,11 @@ def build_execution_timeline(prefetch_results):
def run_pecmd(prefetch_path, output_dir=None):
"""Run Eric Zimmerman's PECmd for comprehensive prefetch parsing."""
cmd = f"PECmd.exe -f {prefetch_path}"
if output_dir:
cmd += f" --csv {output_dir}"
import subprocess
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=30)
cmd = ["PECmd.exe", "-f", prefetch_path]
if output_dir:
cmd += ["--csv", output_dir]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
return result.stdout, result.returncode
@@ -7,11 +7,9 @@ and assesses decryption feasibility for ransomware samples and encrypted files.
import os
import sys
import struct
import hashlib
import math
import json
import re
from collections import Counter
@@ -36,7 +36,7 @@ Leak sites provide: victim identification (company name, sector, country), attac
Never directly access DLS sites in a production environment. Use purpose-built monitoring services (Ransomwatch, DarkFeed, KELA, Flashpoint), Tor-isolated research VMs, commercial threat intelligence platforms, or community-maintained datasets. All analysis should be conducted in isolated environments with proper authorization.
## Practical Steps
## Workflow
### Step 1: Ingest Ransomware Leak Site Data from Public Feeds
@@ -5,11 +5,8 @@ Monitors and analyzes ransomware group leak site data for threat intelligence,
victim tracking, and TTI (time-to-intelligence) reporting.
"""
import os
import sys
import json
import re
import hashlib
from datetime import datetime, timedelta
from collections import defaultdict, Counter
@@ -3,11 +3,10 @@
import json
import csv
import math
import argparse
import urllib.request
from datetime import datetime
from collections import defaultdict, Counter
from collections import defaultdict
from statistics import mean, stdev
TOR_EXIT_LIST_URL = "https://check.torproject.org/torbulkexitlist"
@@ -0,0 +1,163 @@
---
name: analyzing-ransomware-payment-wallets
description: >
Traces ransomware cryptocurrency payment flows using blockchain analysis tools
such as Chainalysis Reactor, WalletExplorer, and blockchain.com APIs. Identifies
wallet clusters, tracks fund movement through mixers and exchanges, and supports
law enforcement attribution. Activates for requests involving ransomware payment
tracing, bitcoin wallet analysis, cryptocurrency forensics, or blockchain
intelligence gathering.
domain: cybersecurity
subdomain: ransomware-defense
tags: [ransomware, blockchain, cryptocurrency, forensics, threat-intelligence, bitcoin]
version: 1.0.0
author: mahipal
license: Apache-2.0
---
# Analyzing Ransomware Payment Wallets
## When to Use
- An organization has been hit by ransomware and the ransom note contains a Bitcoin or cryptocurrency wallet address that needs investigation
- Law enforcement or incident responders need to trace where ransom payments flowed after the victim paid
- Threat intelligence analysts are attributing ransomware campaigns by clustering payment infrastructure across incidents
- Investigators need to determine if a ransomware group is reusing wallet infrastructure across multiple victims
- Compliance or legal teams need evidence of fund flows for prosecution, sanctions enforcement, or insurance claims
**Do not use** this skill for live payment interception or to interact directly with ransomware operators. All analysis should be passive and read-only against public blockchain data.
## Prerequisites
- Python 3.8+ with `requests`, `json`, and `hashlib` libraries
- Access to blockchain explorer APIs (blockchain.com, WalletExplorer.com, Blockstream.info)
- Familiarity with Bitcoin transaction model (UTXOs, inputs, outputs, change addresses)
- Understanding of common obfuscation techniques (mixers, tumblers, peel chains, cross-chain swaps)
- Optional: Chainalysis Reactor license for enterprise-grade cluster analysis
- Optional: OXT.me for advanced transaction graph visualization
## Workflow
### Step 1: Extract Wallet Address from Ransom Note
Parse the ransom note to identify the payment address(es):
```
Common address formats:
Bitcoin (P2PKH): 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa (starts with 1)
Bitcoin (P2SH): 3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy (starts with 3)
Bitcoin (Bech32): bc1qar0srrr7xfkvy5l643lydnw9re59gtzzwf5mdq (starts with bc1)
Monero: 4... (95 characters, much harder to trace)
Ethereum: 0x... (40 hex chars)
```
### Step 2: Query Blockchain Explorer for Transaction History
Retrieve all transactions associated with the wallet:
```python
import requests
def get_wallet_transactions(address):
"""Query blockchain.com API for address transactions."""
url = f"https://blockchain.info/rawaddr/{address}"
resp = requests.get(url, timeout=30)
resp.raise_for_status()
data = resp.json()
return {
"address": address,
"n_tx": data.get("n_tx", 0),
"total_received_satoshi": data.get("total_received", 0),
"total_sent_satoshi": data.get("total_sent", 0),
"final_balance_satoshi": data.get("final_balance", 0),
"transactions": data.get("txs", []),
}
```
### Step 3: Map Fund Flow and Identify Clusters
Trace outputs from the ransom wallet to downstream addresses:
```
Fund Flow Analysis:
━━━━━━━━━━━━━━━━━━
Victim Payment ──► Ransom Wallet ──► Consolidation Wallet
├─► Mixer/Tumbler Service
├─► Exchange Deposit Address
└─► Peel Chain (sequential small outputs)
Key indicators:
- Consolidation: Multiple ransom payments aggregated into one wallet
- Peel chains: Sequential transactions with diminishing outputs
- Mixer usage: Funds sent to known mixer addresses (Wasabi, Samourai, ChipMixer)
- Exchange cashout: Deposits to known exchange wallets (Binance, Kraken hot wallets)
```
### Step 4: Cross-Reference with Known Wallet Databases
Check addresses against known ransomware infrastructure:
```python
# Check WalletExplorer for entity identification
def check_wallet_explorer(address):
url = f"https://www.walletexplorer.com/api/1/address?address={address}&caller=research"
resp = requests.get(url, timeout=30)
data = resp.json()
return {
"wallet_id": data.get("wallet_id"),
"label": data.get("label", "Unknown"),
"is_exchange": data.get("is_exchange", False),
}
```
### Step 5: Generate Attribution Report
Compile findings into a structured intelligence report:
```
RANSOMWARE WALLET ANALYSIS REPORT
====================================
Ransom Address: bc1q...xyz
Family Attribution: LockBit 3.0 (based on ransom note format)
Total Received: 4.25 BTC ($178,500 at time of payment)
Total Sent: 4.25 BTC (wallet fully drained)
Number of Payments: 3 (likely 3 separate victims)
FUND FLOW:
Payment 1: 1.5 BTC → Consolidation wallet → Binance deposit
Payment 2: 1.0 BTC → Wasabi Mixer → Unknown
Payment 3: 1.75 BTC → Peel chain (12 hops) → OKX deposit
CLUSTER ANALYSIS:
Related wallets: 47 addresses identified in same cluster
Total cluster volume: 156.3 BTC ($6.5M USD)
First activity: 2024-01-15
Last activity: 2024-09-22
```
## Verification
- Confirm wallet address format is valid before querying APIs
- Cross-reference transaction timestamps with known incident timelines
- Validate cluster associations by checking common-input-ownership heuristic
- Compare findings against OFAC SDN list for sanctioned addresses
- Verify exchange attribution against multiple sources (WalletExplorer, OXT, Chainalysis)
## Key Concepts
| Term | Definition |
|------|------------|
| **UTXO** | Unspent Transaction Output; the fundamental unit of Bitcoin that tracks ownership through a chain of transactions |
| **Cluster Analysis** | Grouping multiple Bitcoin addresses believed to be controlled by the same entity using common-input-ownership and change-address heuristics |
| **Peel Chain** | A laundering pattern where funds are sent through many sequential transactions, each peeling off a small amount to a new address |
| **CoinJoin/Mixer** | Privacy techniques that combine multiple users' transactions to obscure the link between sender and receiver |
| **Common Input Ownership** | Heuristic that assumes all inputs to a single transaction are controlled by the same entity |
## Tools & Systems
- **Chainalysis Reactor**: Enterprise blockchain investigation platform with entity attribution and cross-chain tracing
- **WalletExplorer**: Free tool that clusters Bitcoin addresses and labels known services (exchanges, mixers, markets)
- **OXT.me**: Advanced Bitcoin transaction visualization with UTXO graph analysis
- **Blockstream.info**: Open-source Bitcoin block explorer with full API access
- **blockchain.com API**: Free API for querying Bitcoin address balances and transaction histories
- **OFAC SDN List**: U.S. Treasury sanctioned address list for compliance checking
@@ -0,0 +1,97 @@
# API Reference: Ransomware Payment Wallet Analysis
## blockchain.com API
### Get Address Information
```
GET https://blockchain.info/rawaddr/{address}?limit=50
```
Returns transaction history, balance, and UTXO data for a Bitcoin address.
### Response Fields
| Field | Type | Description |
|-------|------|-------------|
| `address` | string | Bitcoin address |
| `n_tx` | int | Total number of transactions |
| `total_received` | int | Total satoshis received |
| `total_sent` | int | Total satoshis sent |
| `final_balance` | int | Current balance in satoshis |
| `txs` | array | Array of transaction objects |
### Get Single Transaction
```
GET https://blockchain.info/rawtx/{tx_hash}
```
### Get Unspent Outputs
```
GET https://blockchain.info/unspent?active={address}
```
## Blockstream.info API
### Get Address Stats
```
GET https://blockstream.info/api/address/{address}
```
### Response Fields
| Field | Type | Description |
|-------|------|-------------|
| `chain_stats.funded_txo_count` | int | Number of funding transactions |
| `chain_stats.spent_txo_count` | int | Number of spending transactions |
| `chain_stats.funded_txo_sum` | int | Total satoshis funded |
| `chain_stats.spent_txo_sum` | int | Total satoshis spent |
### Get Address Transactions
```
GET https://blockstream.info/api/address/{address}/txs
```
## WalletExplorer API
### Look Up Address
```
GET https://www.walletexplorer.com/api/1/address?address={address}&caller=research
```
### Response Fields
| Field | Type | Description |
|-------|------|-------------|
| `wallet_id` | string | Cluster wallet identifier |
| `label` | string | Known entity label (exchange, mixer, etc.) |
| `is_exchange` | bool | Whether address belongs to known exchange |
### Get Wallet Transactions
```
GET https://www.walletexplorer.com/api/1/wallet-addresses?wallet={wallet_id}&caller=research
```
## Bitcoin Address Formats
| Format | Prefix | Example | Notes |
|--------|--------|---------|-------|
| P2PKH (Legacy) | 1 | 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa | Original format |
| P2SH (SegWit compatible) | 3 | 3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy | Script hash |
| Bech32 (Native SegWit) | bc1q | bc1qar0srrr7xfkvy5l643lydnw9re59gtzzwf5mdq | Lower fees |
| Bech32m (Taproot) | bc1p | bc1p... | Newest format |
## Common Ransomware Wallet Indicators
| Pattern | Significance |
|---------|-------------|
| Single large inbound, rapid outbound | Ransom payment received, quickly laundered |
| Multiple small inbound from different addresses | Multiple victims paying same wallet |
| Outbound to known mixer addresses | Laundering through CoinJoin/mixer services |
| Peel chain (sequential diminishing outputs) | Structured laundering to evade detection |
| Transfer to exchange hot wallet | Cash-out attempt via cryptocurrency exchange |
## OFAC SDN Sanctions Check
```
Download list: https://www.treasury.gov/ofac/downloads/sdnlist.txt
Search API: https://sanctionssearch.ofac.treas.gov/
```
Check addresses against OFAC Specially Designated Nationals list for compliance.
@@ -0,0 +1,218 @@
#!/usr/bin/env python3
"""Ransomware payment wallet blockchain analysis agent.
Traces cryptocurrency payment flows from ransomware wallets using public
blockchain APIs. Identifies transaction patterns, cluster relationships,
and fund movement to exchanges or mixers.
"""
import json
import re
import sys
try:
import requests
except ImportError:
print("[!] 'requests' library required: pip install requests")
sys.exit(1)
BLOCKCHAIN_API = "https://blockchain.info"
BLOCKSTREAM_API = "https://blockstream.info/api"
BTC_ADDRESS_REGEX = re.compile(
r"^(1[a-km-zA-HJ-NP-Z1-9]{25,34}|3[a-km-zA-HJ-NP-Z1-9]{25,34}|bc1[a-z0-9]{39,59})$"
)
KNOWN_RANSOMWARE_WALLETS = {
"12t9YDPgwueZ9NyMgw519p7AA8isjr6SMw": "WannaCry",
"13AM4VW2dhxYgXeQepoHkHSQuy6NgaEb94": "WannaCry",
"115p7UMMngoj1pMvkpHijcRdfJNXj6LrLn": "WannaCry",
"1Mz7153HMuxXTuR2R1t78mGSdzaAtNbBWX": "DarkSide (Colonial Pipeline)",
"bc1qxy2kgdygjrsqtzq2n0yrf2493p83kkfjhx0wlh": "DarkSide",
}
def validate_btc_address(address):
"""Validate Bitcoin address format."""
if BTC_ADDRESS_REGEX.match(address):
return True
return False
def query_address_info(address):
"""Query blockchain.info for address details."""
url = f"{BLOCKCHAIN_API}/rawaddr/{address}?limit=50"
resp = requests.get(url, timeout=30)
resp.raise_for_status()
data = resp.json()
return {
"address": address,
"total_received_btc": data.get("total_received", 0) / 1e8,
"total_sent_btc": data.get("total_sent", 0) / 1e8,
"final_balance_btc": data.get("final_balance", 0) / 1e8,
"n_tx": data.get("n_tx", 0),
"transactions": data.get("txs", []),
}
def query_blockstream_address(address):
"""Query Blockstream API for address stats (fallback)."""
url = f"{BLOCKSTREAM_API}/address/{address}"
resp = requests.get(url, timeout=30)
resp.raise_for_status()
data = resp.json()
chain = data.get("chain_stats", {})
return {
"address": address,
"funded_txo_count": chain.get("funded_txo_count", 0),
"spent_txo_count": chain.get("spent_txo_count", 0),
"funded_txo_sum_btc": chain.get("funded_txo_sum", 0) / 1e8,
"spent_txo_sum_btc": chain.get("spent_txo_sum", 0) / 1e8,
}
def extract_output_addresses(transactions, source_address):
"""Extract downstream addresses from transaction outputs."""
downstream = {}
for tx in transactions:
tx_hash = tx.get("hash", "unknown")
is_outgoing = any(
inp.get("prev_out", {}).get("addr") == source_address
for inp in tx.get("inputs", [])
)
if not is_outgoing:
continue
for out in tx.get("out", []):
addr = out.get("addr")
value = out.get("value", 0) / 1e8
if addr and addr != source_address:
if addr not in downstream:
downstream[addr] = {"total_btc": 0, "tx_count": 0, "tx_hashes": []}
downstream[addr]["total_btc"] += value
downstream[addr]["tx_count"] += 1
downstream[addr]["tx_hashes"].append(tx_hash[:16])
return downstream
def check_known_wallets(address):
"""Check if address matches known ransomware wallets."""
if address in KNOWN_RANSOMWARE_WALLETS:
return {"known": True, "family": KNOWN_RANSOMWARE_WALLETS[address]}
return {"known": False, "family": None}
def detect_peel_chain(transactions, address):
"""Detect peel chain pattern in outgoing transactions."""
outgoing_values = []
for tx in transactions:
is_outgoing = any(
inp.get("prev_out", {}).get("addr") == address
for inp in tx.get("inputs", [])
)
if is_outgoing:
outputs = [o.get("value", 0) / 1e8 for o in tx.get("out", []) if o.get("addr") != address]
outgoing_values.extend(outputs)
if len(outgoing_values) < 3:
return {"peel_chain_detected": False, "reason": "Insufficient transactions"}
decreasing = sum(1 for i in range(1, len(outgoing_values)) if outgoing_values[i] < outgoing_values[i - 1])
ratio = decreasing / (len(outgoing_values) - 1) if len(outgoing_values) > 1 else 0
return {
"peel_chain_detected": ratio > 0.6,
"decreasing_ratio": round(ratio, 3),
"num_outputs": len(outgoing_values),
}
def analyze_wallet(address):
"""Full analysis of a ransomware payment wallet."""
report = {"analysis_type": "Ransomware Payment Wallet Analysis", "address": address}
if not validate_btc_address(address):
report["error"] = f"Invalid Bitcoin address format: {address}"
return report
report["known_wallet_check"] = check_known_wallets(address)
try:
info = query_address_info(address)
report["wallet_info"] = {
"total_received_btc": info["total_received_btc"],
"total_sent_btc": info["total_sent_btc"],
"final_balance_btc": info["final_balance_btc"],
"transaction_count": info["n_tx"],
}
downstream = extract_output_addresses(info["transactions"], address)
report["downstream_addresses"] = {
"count": len(downstream),
"top_recipients": sorted(
[{"address": a, **d} for a, d in downstream.items()],
key=lambda x: x["total_btc"],
reverse=True,
)[:10],
}
for recipient in report["downstream_addresses"]["top_recipients"]:
match = check_known_wallets(recipient["address"])
recipient["known_entity"] = match["family"] if match["known"] else "Unknown"
report["peel_chain_analysis"] = detect_peel_chain(info["transactions"], address)
except requests.RequestException as e:
report["error"] = f"API query failed: {e}"
try:
fallback = query_blockstream_address(address)
report["wallet_info_blockstream"] = fallback
except requests.RequestException as e2:
report["fallback_error"] = f"Blockstream fallback also failed: {e2}"
return report
if __name__ == "__main__":
print("=" * 60)
print("Ransomware Payment Wallet Analysis Agent")
print("Blockchain tracing, cluster analysis, fund flow mapping")
print("=" * 60)
if len(sys.argv) < 2:
print("\nUsage:")
print(" python agent.py <bitcoin_address>")
print(" python agent.py <bitcoin_address> --deep")
print("\nExample:")
print(" python agent.py 12t9YDPgwueZ9NyMgw519p7AA8isjr6SMw")
sys.exit(0)
address = sys.argv[1]
print(f"\n[*] Analyzing wallet: {address}")
report = analyze_wallet(address)
known = report.get("known_wallet_check", {})
if known.get("known"):
print(f"[!] KNOWN RANSOMWARE WALLET: {known['family']}")
info = report.get("wallet_info", {})
if info:
print(f"\n--- Wallet Summary ---")
print(f" Total received: {info.get('total_received_btc', 0):.8f} BTC")
print(f" Total sent: {info.get('total_sent_btc', 0):.8f} BTC")
print(f" Balance: {info.get('final_balance_btc', 0):.8f} BTC")
print(f" Transactions: {info.get('transaction_count', 0)}")
ds = report.get("downstream_addresses", {})
if ds.get("count", 0) > 0:
print(f"\n--- Top Downstream Recipients ({ds['count']} total) ---")
for r in ds.get("top_recipients", [])[:5]:
entity = r.get("known_entity", "Unknown")
print(f" {r['address'][:20]}... {r['total_btc']:.8f} BTC [{entity}]")
peel = report.get("peel_chain_analysis", {})
if peel.get("peel_chain_detected"):
print(f"\n[!] Peel chain pattern detected (ratio: {peel['decreasing_ratio']})")
if report.get("error"):
print(f"\n[!] Error: {report['error']}")
print(f"\n[*] Full report:\n{json.dumps(report, indent=2, default=str)}")
@@ -2,11 +2,10 @@
"""Agent for analyzing security logs with Splunk using splunk-sdk."""
import os
import sys
import json
import time
import argparse
from datetime import datetime, timedelta
from datetime import datetime
import splunklib.client as client
import splunklib.results as results
@@ -2,7 +2,6 @@
"""Agent for analyzing NTFS slack space and file system artifacts."""
import os
import sys
import json
import struct
import argparse
@@ -14,7 +13,7 @@ from pathlib import Path
def parse_mft_with_analyzeMFT(mft_path, output_csv):
"""Parse MFT using analyzeMFT and return deleted/timestomped files."""
cmd = ["analyzeMFT.py", "-f", mft_path, "-o", output_csv, "-c"]
subprocess.run(cmd, check=True)
subprocess.run(cmd, check=True, timeout=120)
return output_csv
@@ -22,7 +21,7 @@ def extract_slack_space(image_path, offset, output_path):
"""Extract slack space from a disk image using blkls from The Sleuth Kit."""
cmd = ["blkls", "-s", "-o", str(offset), image_path]
with open(output_path, "wb") as out:
subprocess.run(cmd, stdout=out, check=True)
subprocess.run(cmd, stdout=out, check=True, timeout=120)
return output_path
@@ -92,7 +91,7 @@ def parse_usn_journal(usn_path):
def find_ads_in_image(image_path, offset):
"""List Alternate Data Streams using fls from The Sleuth Kit."""
cmd = ["fls", "-r", "-o", str(offset), image_path]
result = subprocess.run(cmd, capture_output=True, text=True)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
ads_entries = [line for line in result.stdout.splitlines() if ":" in line]
return ads_entries
@@ -23,7 +23,7 @@ Supply chain attacks compromise legitimate software distribution channels to del
- Access to legitimate software versions for comparison
- Package repository monitoring (npm, PyPI, NuGet)
## Practical Steps
## Workflow
### Step 1: Binary Comparison Analysis
@@ -135,8 +135,7 @@ if __name__ == "__main__":
target = sys.argv[1] if len(sys.argv) > 1 else None
if not target:
print("
[DEMO] Usage:")
print("\n[DEMO] Usage:")
print(" python agent.py <package.json> # Analyze npm package")
print(" python agent.py npm:<package_name> # Check npm registry")
print(" python agent.py pypi:<package_name> # Check PyPI registry")
@@ -144,17 +143,14 @@ if __name__ == "__main__":
if target.startswith("npm:"):
pkg_name = target[4:]
print(f"
[*] Checking npm: {pkg_name}")
print(f"\n[*] Checking npm: {pkg_name}")
info = check_npm_package(pkg_name)
typos = detect_typosquat_packages(pkg_name)
print(json.dumps(info, indent=2))
print(f"
Potential typosquats: {typos[:10]}")
print(f"\n Potential typosquats: {typos[:10]}")
elif target.startswith("pypi:"):
pkg_name = target[5:]
print(f"
[*] Checking PyPI: {pkg_name}")
print(f"\n[*] Checking PyPI: {pkg_name}")
info = check_pypi_package(pkg_name)
print(json.dumps(info, indent=2))
elif os.path.exists(target):
@@ -36,7 +36,7 @@ ATT&CK catalogs over 140 threat groups (e.g., APT28, APT29, Lazarus Group, FIN7)
The ATT&CK Navigator is a web-based tool for creating custom ATT&CK matrix visualizations. Analysts create layers (JSON files) that annotate techniques with scores, colors, comments, and metadata to visualize threat actor coverage, detection capabilities, or risk assessments.
## Practical Steps
## Workflow
### Step 1: Query ATT&CK Data Programmatically
@@ -122,8 +122,7 @@ if __name__ == "__main__":
print(f"[*] Loaded {len(techniques)} techniques, {len(groups)} groups")
if not group_query:
print("
--- Available Groups (sample) ---")
print("\n--- Available Groups (sample) ---")
for gid, g in list(groups.items())[:15]:
print(f" {g['id']:8s} {g['name']}")
sys.exit(0)
@@ -133,23 +132,20 @@ if __name__ == "__main__":
print(f"[!] Group not found: {group_query}")
sys.exit(1)
print(f"
[*] Group: {ginfo['name']} ({ginfo['id']})")
print(f"\n[*] Group: {ginfo['name']} ({ginfo['id']})")
print(f" Aliases: {', '.join(ginfo['aliases'][:5])}")
ttps = map_group_techniques(bundle, gid, techniques)
print(f" Techniques: {len(ttps)}")
coverage = tactic_coverage(ttps)
print("
--- Tactic Coverage ---")
print("\n--- Tactic Coverage ---")
for tactic, info in sorted(coverage.items(), key=lambda x: -x[1]["count"]):
bar = "#" * info["count"]
print(f" {tactic:35s} {info['count']:3d} {bar}")
sample_detections = [t["id"] for t in ttps[:len(ttps)//2]]
gaps, pct = detection_gaps(ttps, sample_detections)
print(f"
--- Detection Gaps (demo: {pct}% coverage) ---")
print(f"\n--- Detection Gaps (demo: {pct}% coverage) ---")
for g in gaps[:10]:
print(f" [GAP] {g['id']:12s} {g['name']}")
@@ -4,11 +4,10 @@
import os
import json
import argparse
from datetime import datetime, timedelta
from datetime import datetime, timedelta, timezone
from taxii2client.v21 import Server, Collection, as_pages
from stix2 import Filter, MemoryStore, Indicator, Relationship, Bundle
from stix2 import ThreatActor, Malware
from stix2 import Indicator, Bundle
def discover_taxii_server(url, user=None, password=None):
@@ -95,7 +94,7 @@ def score_feed_quality(indicators, known_good_iocs=None):
1 for i in indicators
if i.get("valid_from") and
datetime.fromisoformat(i["valid_from"].replace("Z", "+00:00"))
> datetime.now(tz=__import__("datetime").timezone.utc) - timedelta(days=90)
> datetime.now(tz=timezone.utc) - timedelta(days=90)
)
score = int(
(with_confidence / total * 25) +
@@ -14,6 +14,9 @@ author: mahipal
license: Apache-2.0
---
# Analyzing Threat Landscape with MISP
## Instructions
1. Install dependencies: `pip install pymisp`
@@ -1,7 +1,6 @@
#!/usr/bin/env python3
"""Agent for analyzing Certificate Transparency logs for phishing detection."""
import os
import json
import argparse
from datetime import datetime
@@ -36,7 +36,7 @@ DNSTwist uses ssdeep (locality-sensitive hash) to compare HTML content and pHash
The typical workflow is: generate domain permutations -> resolve DNS records -> check for registered domains -> compare web page similarity -> flag suspicious domains -> alert security team -> request takedown. For a typical corporate domain, dnstwist generates 5,000-10,000 permutations.
## Practical Steps
## Workflow
### Step 1: Basic Domain Permutation Scan
@@ -75,11 +75,9 @@ if __name__ == '__main__':
print('=' * 60)
domain = sys.argv[1] if len(sys.argv) > 1 else None
if not domain:
print('
[DEMO] Usage: python agent.py <domain.com>')
print('\n[DEMO] Usage: python agent.py <domain.com>')
sys.exit(0)
print(f'
[*] Target: {domain}')
print(f'\n[*] Target: {domain}')
dnstwist_results = run_dnstwist_cli(domain)
if dnstwist_results:
print(f'[*] dnstwist found {len(dnstwist_results)} permutations')
@@ -95,5 +93,4 @@ if __name__ == '__main__':
for r in resolved[:15]:
print(f' {r["domain"]:40s} {", ".join(r["ips"])}')
risk = 'HIGH' if len(resolved) > 20 else 'MEDIUM' if len(resolved) > 5 else 'LOW'
print(f'
[*] Risk: {risk}')
print(f'\n[*] Risk: {risk}')
@@ -5,7 +5,6 @@ import os
import json
import argparse
import csv
from datetime import datetime
from regipy.registry import RegistryHive
@@ -45,7 +44,6 @@ def parse_usbstor(system_hive_path):
def parse_mounted_devices(system_hive_path):
"""Parse MountedDevices to map drive letters to USB devices."""
import struct
reg = RegistryHive(system_hive_path)
mounted_key = reg.get_key("MountedDevices")
mappings = []
@@ -13,6 +13,9 @@ author: mahipal
license: Apache-2.0
---
# Analyzing Web Server Logs for Intrusion
## Instructions
1. Install dependencies: `pip install geoip2 user-agents`
@@ -7,10 +7,8 @@ file metadata, and device information using the regipy library.
import argparse
import json
import os
import sys
import datetime
import struct
try:
from regipy.registry import RegistryHive
@@ -6,7 +6,6 @@ import json
import csv
import argparse
from datetime import datetime
from pathlib import Path
import LnkParse3
@@ -7,7 +7,6 @@ import argparse
from datetime import datetime, timedelta
from azure.identity import DefaultAzureCredential, ClientSecretCredential
from azure.mgmt.authorization import AuthorizationManagementClient
import requests
@@ -21,7 +20,7 @@ def graph_get(token, endpoint, params=None):
"""Make an authenticated GET request to Microsoft Graph API."""
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
url = f"https://graph.microsoft.com/v1.0{endpoint}"
resp = requests.get(url, headers=headers, params=params)
resp = requests.get(url, headers=headers, params=params, timeout=30)
resp.raise_for_status()
return resp.json()
@@ -1,14 +1,12 @@
#!/usr/bin/env python3
"""Agent for auditing GCP IAM permissions using google-cloud libraries."""
import os
import json
import argparse
from datetime import datetime
from google.cloud import asset_v1
from google.cloud import resourcemanager_v3
from google.iam.v1 import iam_policy_pb2
def search_iam_policies(scope, query=""):

Some files were not shown because too many files have changed in this diff Show More