mirror of
https://github.com/mukul975/Anthropic-Cybersecurity-Skills.git
synced 2026-07-05 23:38:57 +03:00
281 lines
12 KiB
Markdown
281 lines
12 KiB
Markdown
---
|
|
name: reverse-engineering-malware-with-ghidra
|
|
description: >
|
|
Reverse engineers malware binaries using NSA's Ghidra disassembler and decompiler to
|
|
understand internal logic, cryptographic routines, C2 protocols, and evasion techniques
|
|
at the assembly and pseudo-C level. Activates for requests involving malware reverse
|
|
engineering, disassembly analysis, decompilation, binary analysis, or understanding
|
|
malware internals.
|
|
domain: cybersecurity
|
|
subdomain: malware-analysis
|
|
tags: [malware, reverse-engineering, Ghidra, disassembly, decompilation]
|
|
version: 1.0.0
|
|
author: mahipal
|
|
license: MIT
|
|
---
|
|
|
|
# Reverse Engineering Malware with Ghidra
|
|
|
|
## When to Use
|
|
|
|
- Static and dynamic analysis have identified suspicious functionality that requires deeper code-level understanding
|
|
- You need to reverse engineer C2 communication protocols, encryption algorithms, or custom obfuscation
|
|
- Understanding the exact exploit mechanism or vulnerability targeted by a malware sample
|
|
- Extracting hardcoded configuration data (C2 addresses, encryption keys, campaign IDs) embedded in compiled code
|
|
- Developing precise YARA rules or detection signatures based on unique code patterns
|
|
|
|
**Do not use** for initial triage of unknown samples; perform static analysis with PEStudio and behavioral analysis with Cuckoo first.
|
|
|
|
## Prerequisites
|
|
|
|
- Ghidra 11.x installed (download from https://ghidra-sre.org/) with JDK 17+
|
|
- Analysis VM isolated from production network (Windows or Linux host)
|
|
- Familiarity with x86/x64 assembly language and Windows API conventions
|
|
- PDB symbol files for Windows system DLLs to improve decompilation accuracy
|
|
- Ghidra scripts repository (ghidra_scripts) for automated analysis tasks
|
|
- Secondary reference: IDA Free or Binary Ninja for cross-validation of analysis results
|
|
|
|
## Workflow
|
|
|
|
### Step 1: Create Project and Import Binary
|
|
|
|
Set up a Ghidra project and import the malware sample:
|
|
|
|
```
|
|
1. Launch Ghidra: ghidraRun (Linux) or ghidraRun.bat (Windows)
|
|
2. File -> New Project -> Non-Shared Project -> Select directory
|
|
3. File -> Import File -> Select malware binary
|
|
4. Ghidra auto-detects format (PE, ELF, Mach-O) and architecture
|
|
5. Accept default import options (or specify base address if known)
|
|
6. Double-click imported file to open in CodeBrowser
|
|
7. When prompted, run Auto Analysis with default analyzers enabled
|
|
```
|
|
|
|
**Headless analysis for automation:**
|
|
```bash
|
|
# Run Ghidra headless analysis with decompiler
|
|
/opt/ghidra/support/analyzeHeadless /tmp/ghidra_project MalwareProject \
|
|
-import suspect.exe \
|
|
-postScript ExportDecompilation.py \
|
|
-scriptPath /opt/ghidra/scripts/ \
|
|
-deleteProject
|
|
```
|
|
|
|
### Step 2: Identify Key Functions and Entry Points
|
|
|
|
Navigate the binary to locate critical code sections:
|
|
|
|
```
|
|
Navigation Strategy:
|
|
━━━━━━━━━━━━━━━━━━━
|
|
1. Start at entry point (OEP) - follow execution from _start/WinMain
|
|
2. Check Symbol Tree for imported functions (Window -> Symbol Tree)
|
|
3. Search for cross-references to suspicious APIs:
|
|
- VirtualAlloc/VirtualAllocEx (memory allocation for injection)
|
|
- CreateRemoteThread (remote thread injection)
|
|
- CryptEncrypt/CryptDecrypt (encryption operations)
|
|
- InternetOpen/HttpSendRequest (C2 communication)
|
|
- RegSetValueEx (persistence via registry)
|
|
4. Use Search -> For Strings to find embedded URLs, IPs, and paths
|
|
5. Check the Functions window sorted by size (large functions often contain core logic)
|
|
```
|
|
|
|
**Ghidra keyboard shortcuts for efficient navigation:**
|
|
```
|
|
G - Go to address
|
|
Ctrl+E - Search for strings
|
|
X - Show cross-references to current location
|
|
Ctrl+Shift+F - Search memory for byte patterns
|
|
L - Rename label/function
|
|
; - Add comment
|
|
T - Retype variable
|
|
Ctrl+L - Retype return value
|
|
```
|
|
|
|
### Step 3: Analyze Decompiled Code
|
|
|
|
Use Ghidra's decompiler to understand function logic:
|
|
|
|
```c
|
|
// Example: Ghidra decompiler output for a decryption routine
|
|
// Analyst renames variables and adds types for clarity
|
|
|
|
void decrypt_config(BYTE *encrypted_data, int data_len, BYTE *key, int key_len) {
|
|
// XOR decryption with rolling key
|
|
for (int i = 0; i < data_len; i++) {
|
|
encrypted_data[i] = encrypted_data[i] ^ key[i % key_len];
|
|
}
|
|
return;
|
|
}
|
|
|
|
// Analyst actions in Ghidra:
|
|
// 1. Right-click parameters -> Retype to correct types (BYTE*, int)
|
|
// 2. Right-click variables -> Rename to meaningful names
|
|
// 3. Add comments explaining the algorithm
|
|
// 4. Set function signature to propagate types to callers
|
|
```
|
|
|
|
### Step 4: Trace C2 Communication Logic
|
|
|
|
Follow the network communication code path:
|
|
|
|
```
|
|
Analysis Steps for C2 Protocol Reverse Engineering:
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
1. Find InternetOpenA/WinHttpOpen call -> trace to wrapper function
|
|
2. Follow data flow from encrypted config -> URL construction
|
|
3. Identify HTTP method (GET/POST), headers, and body format
|
|
4. Locate response parsing logic (JSON parsing, custom binary protocol)
|
|
5. Map the C2 command dispatcher (switch/case or jump table)
|
|
6. Document the command set (download, execute, exfiltrate, update, uninstall)
|
|
```
|
|
|
|
**Ghidra Script for extracting C2 configuration:**
|
|
```python
|
|
# Ghidra Python script: extract_c2_config.py
|
|
# Run via Script Manager in Ghidra
|
|
|
|
from ghidra.program.model.data import StringDataType
|
|
from ghidra.program.model.symbol import SourceType
|
|
|
|
# Search for XOR decryption patterns
|
|
listing = currentProgram.getListing()
|
|
memory = currentProgram.getMemory()
|
|
|
|
# Find references to InternetOpenA
|
|
symbol_table = currentProgram.getSymbolTable()
|
|
for symbol in symbol_table.getExternalSymbols():
|
|
if "InternetOpen" in symbol.getName():
|
|
refs = getReferencesTo(symbol.getAddress())
|
|
for ref in refs:
|
|
print("C2 init at: {}".format(ref.getFromAddress()))
|
|
```
|
|
|
|
### Step 5: Analyze Encryption and Obfuscation
|
|
|
|
Identify and document cryptographic routines:
|
|
|
|
```
|
|
Common Malware Encryption Patterns:
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
XOR Cipher: Loop with XOR operation, often single-byte or rolling key
|
|
RC4: Two loops (KSA + PRGA), 256-byte S-box initialization
|
|
AES: Look for S-box constants (0x63, 0x7C, 0x77...) or calls to CryptEncrypt
|
|
Base64: Lookup table with A-Za-z0-9+/= characters
|
|
Custom: Combination of arithmetic operations (ADD, SUB, ROL, ROR with XOR)
|
|
|
|
Identification Tips:
|
|
- Search for constants: AES S-box, CRC32 table, MD5 init values
|
|
- Look for loop structures operating on byte arrays
|
|
- Check for Windows Crypto API usage (CryptAcquireContext -> CryptCreateHash -> CryptEncrypt)
|
|
- FindCrypt Ghidra plugin automatically identifies crypto constants
|
|
```
|
|
|
|
### Step 6: Document Findings and Create Detection Signatures
|
|
|
|
Produce actionable intelligence from reverse engineering:
|
|
|
|
```bash
|
|
# Generate YARA rule from unique code patterns found in Ghidra
|
|
cat << 'EOF' > malware_family_x.yar
|
|
rule MalwareFamilyX_Decryptor {
|
|
meta:
|
|
description = "Detects MalwareX decryption routine"
|
|
author = "analyst"
|
|
date = "2025-09-15"
|
|
strings:
|
|
// XOR decryption loop with hardcoded key
|
|
$decrypt = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 }
|
|
// C2 URL pattern after decryption
|
|
$c2_pattern = "/gate.php?id=" ascii
|
|
condition:
|
|
uint16(0) == 0x5A4D and $decrypt and $c2_pattern
|
|
}
|
|
EOF
|
|
```
|
|
|
|
## Key Concepts
|
|
|
|
| Term | Definition |
|
|
|------|------------|
|
|
| **Disassembly** | Converting machine code bytes into human-readable assembly language instructions; Ghidra's Listing view shows disassembled code |
|
|
| **Decompilation** | Lifting assembly code to pseudo-C representation for easier analysis; Ghidra's Decompile window provides this view |
|
|
| **Cross-Reference (XREF)** | Reference showing where a function or data address is called from or used; essential for tracing code execution flow |
|
|
| **Control Flow Graph (CFG)** | Visual representation of all possible execution paths through a function; reveals branching logic and loops |
|
|
| **Original Entry Point (OEP)** | The actual start address of the malware code after unpacking; packers redirect execution through an unpacking stub first |
|
|
| **Function Signature** | The return type, name, and parameter types of a function; applying correct signatures improves decompiler output quality |
|
|
| **Ghidra Script** | Python or Java automation script executed within Ghidra to perform batch analysis, pattern searching, or data extraction |
|
|
|
|
## Tools & Systems
|
|
|
|
- **Ghidra**: NSA's open-source software reverse engineering suite with disassembler, decompiler, and scripting support for multiple architectures
|
|
- **IDA Pro/Free**: Industry-standard interactive disassembler; IDA Free provides x86/x64 cloud-based decompilation
|
|
- **Binary Ninja**: Commercial reverse engineering platform with modern UI and extensive API for plugin development
|
|
- **x64dbg**: Open-source x64/x32 debugger for Windows used alongside Ghidra for dynamic debugging of malware
|
|
- **FindCrypt (Ghidra Plugin)**: Plugin that identifies cryptographic constants and algorithms in binary code
|
|
|
|
## Common Scenarios
|
|
|
|
### Scenario: Reversing Custom C2 Protocol
|
|
|
|
**Context**: Behavioral analysis shows encrypted traffic to an external IP on a non-standard port. Network signatures cannot detect variants because the protocol is proprietary. Deep reverse engineering is needed to understand the protocol structure.
|
|
|
|
**Approach**:
|
|
1. Import the unpacked sample into Ghidra and run full auto-analysis
|
|
2. Locate socket/WinHTTP API calls and trace backwards to the calling function
|
|
3. Identify the encryption routine called before data is sent (follow data flow from send/HttpSendRequest)
|
|
4. Reverse the encryption (XOR key extraction, RC4 key derivation, AES key location)
|
|
5. Map the command structure by analyzing the response parsing function (switch/case on command IDs)
|
|
6. Document the protocol format (header structure, command bytes, encryption method)
|
|
7. Create a protocol decoder script for network monitoring tools
|
|
|
|
**Pitfalls**:
|
|
- Not running the full auto-analysis before starting manual analysis (missing function boundaries and type propagation)
|
|
- Ignoring indirect calls through function pointers or vtables (use cross-references to data holding function addresses)
|
|
- Spending time on library code that Ghidra's Function ID (FID) or FLIRT signatures should have identified
|
|
- Not saving Ghidra project progress frequently (analysis state can be lost on crashes)
|
|
|
|
## Output Format
|
|
|
|
```
|
|
REVERSE ENGINEERING ANALYSIS REPORT
|
|
=====================================
|
|
Sample: unpacked_payload.exe
|
|
SHA-256: abc123def456...
|
|
Architecture: x86 (32-bit PE)
|
|
Ghidra Project: MalwareX_Analysis
|
|
|
|
FUNCTION MAP
|
|
0x00401000 main() - Entry point, initializes config
|
|
0x00401200 decrypt_config() - XOR decryption with 16-byte key
|
|
0x00401400 init_c2() - WinHTTP initialization, URL construction
|
|
0x00401800 c2_beacon() - HTTP POST beacon with system info
|
|
0x00401C00 cmd_dispatcher() - Switch on 12 command codes
|
|
0x00402000 inject_process() - Process hollowing into svchost.exe
|
|
0x00402400 persist_registry() - HKCU Run key persistence
|
|
0x00402800 exfil_data() - File collection and encrypted upload
|
|
|
|
C2 PROTOCOL
|
|
Method: HTTPS POST to /gate.php
|
|
Encryption: RC4 with derived key (MD5 of bot_id + campaign_key)
|
|
Bot ID Format: MD5(hostname + username + volume_serial)
|
|
Beacon Interval: 60 seconds with 10% jitter
|
|
Command Set:
|
|
0x01 - Download and execute file
|
|
0x02 - Execute shell command
|
|
0x03 - Upload file to C2
|
|
0x04 - Update configuration
|
|
0x05 - Uninstall and remove traces
|
|
|
|
ENCRYPTION DETAILS
|
|
Algorithm: RC4
|
|
Key Derivation: MD5(bot_id + "campaign_2025_q3")
|
|
Hardcoded Seed: "campaign_2025_q3" at offset 0x00405A00
|
|
|
|
EXTRACTED IOCs
|
|
C2 URLs: hxxps://update.malicious[.]com/gate.php
|
|
hxxps://backup.evil[.]net/gate.php (failover)
|
|
Campaign ID: campaign_2025_q3
|
|
RC4 Key Material: [see encryption details above]
|
|
```
|