Anthropic-Cybersecurity-Skills/skills/implementing-semgrep-for-custom-sast-rules/references/api-reference.md

# API Reference: Semgrep Custom SAST Rules

## Libraries Used

| Library | Purpose |
|---------|---------|
| `subprocess` | Execute semgrep CLI scans |
| `json` | Parse semgrep JSON output |
| `yaml` | Read and write custom Semgrep rule files |
| `pathlib` | Handle source code and rule file paths |

## Installation

```bash
# Python package
pip install semgrep

# Homebrew (macOS)
brew install semgrep

# Docker
docker pull semgrep/semgrep:latest
```

## CLI Reference

### Core Commands

```bash
# Scan with auto-detected rules
semgrep scan --config auto --json --output results.json /path/to/code

# Scan with specific rulesets from Semgrep Registry
semgrep scan --config p/python --config p/owasp-top-ten /path/to/code

# Scan with a custom rule file
semgrep scan --config my-rules.yaml /path/to/code

# Scan with multiple configs
semgrep scan --config p/security-audit --config ./custom-rules/ /path/to/code
```

### Key CLI Flags

| Flag | Description |
|------|-------------|
| `--config`, `-c` | Rule source: registry key, YAML file, or directory |
| `--json` | Output results in JSON format |
| `--sarif` | Output in SARIF format (for CI/CD integration) |
| `--output`, `-o` | Write results to file |
| `--severity` | Filter by severity: `INFO`, `WARNING`, `ERROR` |
| `--include` | Only scan files matching glob pattern |
| `--exclude` | Skip files matching glob pattern |
| `--lang` | Restrict scan to specific language |
| `--max-target-bytes` | Skip files larger than N bytes |
| `--timeout` | Per-rule timeout in seconds (default: 5) |
| `--jobs`, `-j` | Number of parallel jobs |
| `--verbose`, `-v` | Show detailed scan progress |
| `--metrics off` | Disable anonymous metrics |

## Custom Rule Syntax

### Basic Pattern Rule
```yaml
rules:
  - id: hardcoded-password
    pattern: password = "..."
    message: "Hardcoded password detected — use environment variables"
    languages: [python]
    severity: ERROR
    metadata:
      cwe: ["CWE-798: Use of Hard-coded Credentials"]
      owasp: ["A07:2021 - Identification and Authentication Failures"]
```

### Pattern Operators
```yaml
rules:
  - id: sql-injection-format-string
    patterns:
      - pattern: |
          cursor.execute($QUERY % ...)
      - pattern-not: |
          cursor.execute("..." % ())
    message: "SQL injection via string formatting — use parameterized queries"
    languages: [python]
    severity: ERROR

  - id: unsafe-deserialization
    pattern-either:
      - pattern: pickle.loads(...)
      - pattern: pickle.load(...)
      - pattern: yaml.load(..., Loader=yaml.Loader)
      - pattern: yaml.unsafe_load(...)
    message: "Unsafe deserialization — may allow remote code execution"
    languages: [python]
    severity: ERROR

  - id: missing-timeout-requests
    patterns:
      - pattern: requests.$METHOD(...)
      - pattern-not: requests.$METHOD(..., timeout=..., ...)
    message: "HTTP request without timeout — may hang indefinitely"
    languages: [python]
    severity: WARNING
```

### Metavariable Patterns
```yaml
rules:
  - id: eval-user-input
    patterns:
      - pattern: |
          $INPUT = request.$METHOD(...)
          ...
          eval($INPUT)
    message: "User input passed to eval() — command injection risk"
    languages: [python]
    severity: ERROR
```

## Python Integration

```python
import subprocess
import json

def run_semgrep(target_path, config="auto", severity=None):
    cmd = [
        "semgrep", "scan",
        "--config", config,
        "--json",
        "--metrics", "off",
        str(target_path),
    ]
    if severity:
        cmd.extend(["--severity", severity])

    result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)
    output = json.loads(result.stdout)
    return output.get("results", [])

def summarize_findings(results):
    by_severity = {"ERROR": [], "WARNING": [], "INFO": []}
    for r in results:
        sev = r.get("extra", {}).get("severity", "INFO")
        by_severity[sev].append({
            "rule": r["check_id"],
            "file": r["path"],
            "line": r["start"]["line"],
            "message": r["extra"]["message"],
        })
    return by_severity
```

## Semgrep Registry Rule Packs

| Pack | Description |
|------|-------------|
| `p/python` | Python-specific security and correctness rules |
| `p/javascript` | JavaScript/TypeScript rules |
| `p/owasp-top-ten` | OWASP Top 10 vulnerability patterns |
| `p/security-audit` | Broad security audit rules across languages |
| `p/secrets` | Secret and credential detection |
| `p/ci` | Rules optimized for CI/CD pipelines |
| `p/docker` | Dockerfile security best practices |
| `p/terraform` | Terraform IaC security rules |

## Output Format

```json
{
  "results": [
    {
      "check_id": "python.lang.security.audit.eval-detected",
      "path": "app/views.py",
      "start": {"line": 42, "col": 5},
      "end": {"line": 42, "col": 28},
      "extra": {
        "message": "Detected eval() usage — avoid with untrusted input",
        "severity": "ERROR",
        "metadata": {
          "cwe": ["CWE-95"],
          "owasp": ["A03:2021 - Injection"]
        }
      }
    }
  ],
  "errors": [],
  "stats": {
    "findings": 3,
    "errors": 0,
    "total_time": 2.45
  }
}
```