Files
Anthropic-Cybersecurity-Skills/skills/analyzing-pdf-malware-with-pdfid/references/api-reference.md
T
mukul975 27c6414ca5 Add folder anatomy (scripts/agent.py + references/api-reference.md) for 648 cybersecurity skills
Complete skill folder anatomy across all cybersecurity skills:
- scripts/agent.py: 80-150 line Python agents using real libraries (impacket,
  boto3, azure-mgmt-*, kubernetes, pefile, yara, scapy, shodan, stix2, etc.)
- references/api-reference.md: real API documentation with method signatures
- LICENSE: MIT license for all skill folders
2026-03-10 21:02:12 +01:00

120 lines
3.4 KiB
Markdown

# API Reference: PDF Malware Analysis Tools
## PDFiD - PDF Keyword Scanner
### Syntax
```bash
pdfid.py document.pdf
pdfid.py -n document.pdf # Show all keywords (including zero counts)
pdfid.py -e document.pdf # Extra data (entropy)
pdfid.py -f document.pdf # Force scan (ignore header)
```
### Suspicious Keywords
| Keyword | Risk | Description |
|---------|------|-------------|
| `/JS` | HIGH | JavaScript code |
| `/JavaScript` | HIGH | JavaScript action |
| `/AA` | HIGH | Additional Actions (auto-execute) |
| `/OpenAction` | HIGH | Action on document open |
| `/Launch` | HIGH | Launch external application |
| `/EmbeddedFile` | MEDIUM | Embedded file object |
| `/AcroForm` | MEDIUM | Interactive form |
| `/JBIG2Decode` | HIGH | JBIG2 exploit vector (CVE-2009-0658) |
| `/RichMedia` | MEDIUM | Flash/multimedia content |
| `/XFA` | MEDIUM | XML Forms (script capable) |
| `/ObjStm` | LOW | Object streams (can hide objects) |
### Output Format
```
PDF Header: %PDF-1.7
obj 45
endobj 45
stream 12
/JS 2
/JavaScript 1
/OpenAction 1
/EmbeddedFile 0
```
## pdf-parser.py - PDF Object Parser
### Syntax
```bash
pdf-parser.py document.pdf # List all objects
pdf-parser.py -o 5 document.pdf # Show object 5
pdf-parser.py -s "/JS" document.pdf # Search for keyword
pdf-parser.py -f document.pdf # Filter streams
pdf-parser.py -c document.pdf # Show raw content
pdf-parser.py -d 5 document.pdf # Dump stream of object 5
pdf-parser.py --object 5 --filter document.pdf # Decompress stream
```
## peepdf - Interactive PDF Analysis
### Syntax
```bash
peepdf -i document.pdf # Interactive mode
peepdf -f document.pdf # Force analysis
peepdf -l document.pdf # Loose mode
```
### Interactive Commands
```
info # Document summary
tree # Object tree
object 5 # Show object
stream 5 # Show stream content
js_analyse # Analyze all JavaScript
extract js > output.js # Extract JavaScript
```
## Known PDF Exploit CVEs
| CVE | Component | Description |
|-----|-----------|-------------|
| CVE-2009-0658 | JBIG2Decode | Buffer overflow in JBIG2 decoder |
| CVE-2009-0927 | Collab.getIcon | JavaScript method exploit |
| CVE-2008-2992 | util.printf | Format string vulnerability |
| CVE-2010-0188 | LibTIFF | TIFF image processing overflow |
| CVE-2013-0640 | XFA | XML Forms Architecture exploit |
| CVE-2018-4990 | EmbeddedFile | Double-free in embedded files |
## YARA Rules for PDF Malware
### Example Rule
```yara
rule PDF_Suspicious {
meta:
description = "PDF with JavaScript and auto-execution"
strings:
$pdf = "%PDF-"
$js = "/JS" nocase
$openaction = "/OpenAction"
$launch = "/Launch"
condition:
$pdf at 0 and ($js and $openaction) or $launch
}
```
## Python PDF Libraries
### PyPDF2
```python
from PyPDF2 import PdfReader
reader = PdfReader("document.pdf")
print(len(reader.pages))
for page in reader.pages:
print(page.extract_text())
```
### pikepdf
```python
import pikepdf
pdf = pikepdf.open("document.pdf")
for obj_num in pdf.objects:
obj = pdf.get_object(obj_num)
if "/JS" in str(obj):
print(f"JavaScript in object {obj_num}")
```