Files
Anthropic-Cybersecurity-Skills/skills/analyzing-pdf-malware-with-pdfid/references/api-reference.md
T
mukul975 27c6414ca5 Add folder anatomy (scripts/agent.py + references/api-reference.md) for 648 cybersecurity skills
Complete skill folder anatomy across all cybersecurity skills:
- scripts/agent.py: 80-150 line Python agents using real libraries (impacket,
  boto3, azure-mgmt-*, kubernetes, pefile, yara, scapy, shodan, stix2, etc.)
- references/api-reference.md: real API documentation with method signatures
- LICENSE: MIT license for all skill folders
2026-03-10 21:02:12 +01:00

3.4 KiB

API Reference: PDF Malware Analysis Tools

PDFiD - PDF Keyword Scanner

Syntax

pdfid.py document.pdf
pdfid.py -n document.pdf     # Show all keywords (including zero counts)
pdfid.py -e document.pdf     # Extra data (entropy)
pdfid.py -f document.pdf     # Force scan (ignore header)

Suspicious Keywords

Keyword Risk Description
/JS HIGH JavaScript code
/JavaScript HIGH JavaScript action
/AA HIGH Additional Actions (auto-execute)
/OpenAction HIGH Action on document open
/Launch HIGH Launch external application
/EmbeddedFile MEDIUM Embedded file object
/AcroForm MEDIUM Interactive form
/JBIG2Decode HIGH JBIG2 exploit vector (CVE-2009-0658)
/RichMedia MEDIUM Flash/multimedia content
/XFA MEDIUM XML Forms (script capable)
/ObjStm LOW Object streams (can hide objects)

Output Format

PDF Header: %PDF-1.7
 obj                   45
 endobj                45
 stream                12
 /JS                    2
 /JavaScript            1
 /OpenAction            1
 /EmbeddedFile          0

pdf-parser.py - PDF Object Parser

Syntax

pdf-parser.py document.pdf                      # List all objects
pdf-parser.py -o 5 document.pdf                 # Show object 5
pdf-parser.py -s "/JS" document.pdf             # Search for keyword
pdf-parser.py -f document.pdf                   # Filter streams
pdf-parser.py -c document.pdf                   # Show raw content
pdf-parser.py -d 5 document.pdf                 # Dump stream of object 5
pdf-parser.py --object 5 --filter document.pdf  # Decompress stream

peepdf - Interactive PDF Analysis

Syntax

peepdf -i document.pdf              # Interactive mode
peepdf -f document.pdf              # Force analysis
peepdf -l document.pdf              # Loose mode

Interactive Commands

info                    # Document summary
tree                    # Object tree
object 5                # Show object
stream 5                # Show stream content
js_analyse              # Analyze all JavaScript
extract js > output.js  # Extract JavaScript

Known PDF Exploit CVEs

CVE Component Description
CVE-2009-0658 JBIG2Decode Buffer overflow in JBIG2 decoder
CVE-2009-0927 Collab.getIcon JavaScript method exploit
CVE-2008-2992 util.printf Format string vulnerability
CVE-2010-0188 LibTIFF TIFF image processing overflow
CVE-2013-0640 XFA XML Forms Architecture exploit
CVE-2018-4990 EmbeddedFile Double-free in embedded files

YARA Rules for PDF Malware

Example Rule

rule PDF_Suspicious {
    meta:
        description = "PDF with JavaScript and auto-execution"
    strings:
        $pdf = "%PDF-"
        $js = "/JS" nocase
        $openaction = "/OpenAction"
        $launch = "/Launch"
    condition:
        $pdf at 0 and ($js and $openaction) or $launch
}

Python PDF Libraries

PyPDF2

from PyPDF2 import PdfReader
reader = PdfReader("document.pdf")
print(len(reader.pages))
for page in reader.pages:
    print(page.extract_text())

pikepdf

import pikepdf
pdf = pikepdf.open("document.pdf")
for obj_num in pdf.objects:
    obj = pdf.get_object(obj_num)
    if "/JS" in str(obj):
        print(f"JavaScript in object {obj_num}")