# API Reference: Performing AI-Driven OSINT Correlation ## CLI Usage ```bash # Correlate Sherlock + theHarvester results python agent.py --target "targetdomain.com" \ --sherlock sherlock-results.csv \ --harvester harvester-results.json \ -o correlation_report.json # Full multi-source correlation python agent.py --target "john.doe" \ --sherlock sherlock.csv \ --harvester harvester.json \ --spiderfoot spiderfoot.json \ --breach breach-results.json \ -o report.json \ --markdown intelligence-profile.md # Normalize only (no correlation) python agent.py --sherlock sherlock.csv --harvester harvester.json \ --normalize-only -o normalized.json # Load pre-normalized generic findings python agent.py --generic normalized_findings.json -o report.json ``` ## Supported Data Sources | Source | Flag | Input Format | Data Extracted | |--------|------|-------------|----------------| | Sherlock | `--sherlock` | CSV or text | Usernames, social profile URLs, platforms | | theHarvester | `--harvester` | JSON | Emails, hostnames, IP addresses | | SpiderFoot | `--spiderfoot` | JSON | Mixed OSINT findings (200+ module types) | | Breach/HIBP | `--breach` | JSON | Breach names, dates, data classes | | Generic | `--generic` | JSON array | Any pre-normalized findings | ## Input File Formats ### Sherlock CSV Format ```csv username,name,url_user,exists,http_status johndoe,GitHub,https://github.com/johndoe,Claimed,200 johndoe,Twitter,https://twitter.com/johndoe,Claimed,200 ``` ### theHarvester JSON Format ```json { "emails": ["john@targetdomain.com", "admin@targetdomain.com"], "hosts": ["mail.targetdomain.com", "vpn.targetdomain.com"], "ips": ["203.0.113.10", "203.0.113.11"] } ``` ### SpiderFoot JSON Format ```json [ {"type": "EMAILADDR", "data": "john@targetdomain.com", "module": "sfp_hunter"}, {"type": "IP_ADDRESS", "data": "203.0.113.10", "module": "sfp_dnsresolve"}, {"type": "SOCIAL_MEDIA", "data": "https://github.com/johndoe", "module": "sfp_github"} ] ``` ### Breach/HIBP JSON Format ```json [ { "Name": "ExampleBreach", "BreachDate": "2023-06-15", "DataClasses": ["Email addresses", "Passwords", "Usernames"] } ] ``` ## Correlation Confidence Scoring | Factor | Weight | Description | |--------|--------|-------------| | Exact email match | 0.95 | Same email found across multiple sources | | Breach email match | 0.90 | Email found in breach database | | Exact username match | 0.85 | Same username across multiple platforms | | Same IP infrastructure | 0.70 | Shared IP address or hosting | | Domain match | 0.60 | Shared domain registration or hosting | | Similar username | 0.45 | Partial username overlap with shared metadata | | Temporal co-registration | 0.40 | Accounts created within similar timeframe | Cross-source corroboration increases confidence: +0.15 per additional source, capped at 0.95. ## Report Output Schema ```json { "meta": { "target": "targetdomain.com", "generated_at": "2026-03-19T12:00:00+00:00", "sources_used": ["sherlock", "theHarvester", "spiderfoot", "breach_database"], "total_findings": 247, "total_entities": 12 }, "identifiers": { "usernames": ["johndoe", "jdoe"], "emails": ["john@targetdomain.com"], "domains": ["targetdomain.com"], "ip_addresses": ["203.0.113.10"], "urls": ["https://github.com/johndoe"] }, "entities": [ { "identifier": "johndoe", "identifier_type": "user", "confidence": 0.92, "sources": ["sherlock", "theHarvester", "breach_database"], "source_count": 3, "linked_accounts": [ {"source": "sherlock", "platform": "GitHub", "url": "https://github.com/johndoe"} ], "flags": ["Exposed in 2 breach(es)"], "risk_level": "high" } ], "risk_summary": { "high_risk": 2, "medium_risk": 5, "low_risk": 5 } } ``` ## Markdown Report Output The `--markdown` flag generates an intelligence profile in Markdown containing: - Target metadata and source summary - Risk summary table - Entity profiles with linked accounts, confidence scores, and risk flags ## OSINT Tool Commands (Data Collection) ```bash # Sherlock: enumerate username across platforms sherlock "targetuser" --output sherlock.csv --csv # theHarvester: harvest emails and subdomains theHarvester -d targetdomain.com -b all -f harvester.json # SpiderFoot: passive scan via REST API curl -s http://localhost:5001/api/scan/start \ -d "scanname=recon&scantarget=targetdomain.com&usecase=passive" # HIBP: check email breach exposure curl -s -H "hibp-api-key: ${HIBP_KEY}" -H "User-Agent: OSINT-Agent" \ "https://haveibeenpwned.com/api/v3/breachedaccount/target@example.com" \ -o breach.json ``` ## References - Sherlock Project: https://github.com/sherlock-project/sherlock - theHarvester: https://github.com/laramies/theHarvester - SpiderFoot: https://github.com/smicallef/spiderfoot - HIBP API: https://haveibeenpwned.com/API/v3 - Maltego: https://www.maltego.com/ - LOLBAS for graph visualization: https://lolbas-project.github.io/