Files
Anthropic-Cybersecurity-Skills/skills/implementing-semgrep-for-custom-sast-rules/references/api-reference.md
T
mukul975 c47eed6a64 Production hardening: security fixes, code quality, 724 skills complete
- Fix 25 shell=True subprocess calls with list-based commands
- Fix 49 verify=False in defensive skills (env-var override)
- Add timeout to 231 HTTP/subprocess/socket calls
- Fix 6 SQL injection patterns with whitelist validation
- Replace 8 __import__() with standard imports
- Remove 701 unused imports across 442 files
- Add authorized-testing disclaimers to all offensive skills
- Complete 11 incomplete skill directories
- Expand 10 stub SKILL.md files with full content
- Fix 2 YAML parse errors in frontmatter
- Fix 5 pre-existing syntax errors
- Convert 22 hardcoded paths/ports to environment variables
- Back up 21 redundant skill pairs to .bak
- Fix 2 global declaration errors
- 724/724 skills with full folder anatomy (SKILL.md + agent.py + api-reference.md + LICENSE)
- 0 compile errors across all 724 agent.py files
2026-03-19 13:26:49 +01:00

197 lines
5.1 KiB
Markdown

# API Reference: Semgrep Custom SAST Rules
## Libraries Used
| Library | Purpose |
|---------|---------|
| `subprocess` | Execute semgrep CLI scans |
| `json` | Parse semgrep JSON output |
| `yaml` | Read and write custom Semgrep rule files |
| `pathlib` | Handle source code and rule file paths |
## Installation
```bash
# Python package
pip install semgrep
# Homebrew (macOS)
brew install semgrep
# Docker
docker pull semgrep/semgrep:latest
```
## CLI Reference
### Core Commands
```bash
# Scan with auto-detected rules
semgrep scan --config auto --json --output results.json /path/to/code
# Scan with specific rulesets from Semgrep Registry
semgrep scan --config p/python --config p/owasp-top-ten /path/to/code
# Scan with a custom rule file
semgrep scan --config my-rules.yaml /path/to/code
# Scan with multiple configs
semgrep scan --config p/security-audit --config ./custom-rules/ /path/to/code
```
### Key CLI Flags
| Flag | Description |
|------|-------------|
| `--config`, `-c` | Rule source: registry key, YAML file, or directory |
| `--json` | Output results in JSON format |
| `--sarif` | Output in SARIF format (for CI/CD integration) |
| `--output`, `-o` | Write results to file |
| `--severity` | Filter by severity: `INFO`, `WARNING`, `ERROR` |
| `--include` | Only scan files matching glob pattern |
| `--exclude` | Skip files matching glob pattern |
| `--lang` | Restrict scan to specific language |
| `--max-target-bytes` | Skip files larger than N bytes |
| `--timeout` | Per-rule timeout in seconds (default: 5) |
| `--jobs`, `-j` | Number of parallel jobs |
| `--verbose`, `-v` | Show detailed scan progress |
| `--metrics off` | Disable anonymous metrics |
## Custom Rule Syntax
### Basic Pattern Rule
```yaml
rules:
- id: hardcoded-password
pattern: password = "..."
message: "Hardcoded password detected — use environment variables"
languages: [python]
severity: ERROR
metadata:
cwe: ["CWE-798: Use of Hard-coded Credentials"]
owasp: ["A07:2021 - Identification and Authentication Failures"]
```
### Pattern Operators
```yaml
rules:
- id: sql-injection-format-string
patterns:
- pattern: |
cursor.execute($QUERY % ...)
- pattern-not: |
cursor.execute("..." % ())
message: "SQL injection via string formatting — use parameterized queries"
languages: [python]
severity: ERROR
- id: unsafe-deserialization
pattern-either:
- pattern: pickle.loads(...)
- pattern: pickle.load(...)
- pattern: yaml.load(..., Loader=yaml.Loader)
- pattern: yaml.unsafe_load(...)
message: "Unsafe deserialization — may allow remote code execution"
languages: [python]
severity: ERROR
- id: missing-timeout-requests
patterns:
- pattern: requests.$METHOD(...)
- pattern-not: requests.$METHOD(..., timeout=..., ...)
message: "HTTP request without timeout — may hang indefinitely"
languages: [python]
severity: WARNING
```
### Metavariable Patterns
```yaml
rules:
- id: eval-user-input
patterns:
- pattern: |
$INPUT = request.$METHOD(...)
...
eval($INPUT)
message: "User input passed to eval() — command injection risk"
languages: [python]
severity: ERROR
```
## Python Integration
```python
import subprocess
import json
def run_semgrep(target_path, config="auto", severity=None):
cmd = [
"semgrep", "scan",
"--config", config,
"--json",
"--metrics", "off",
str(target_path),
]
if severity:
cmd.extend(["--severity", severity])
result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)
output = json.loads(result.stdout)
return output.get("results", [])
def summarize_findings(results):
by_severity = {"ERROR": [], "WARNING": [], "INFO": []}
for r in results:
sev = r.get("extra", {}).get("severity", "INFO")
by_severity[sev].append({
"rule": r["check_id"],
"file": r["path"],
"line": r["start"]["line"],
"message": r["extra"]["message"],
})
return by_severity
```
## Semgrep Registry Rule Packs
| Pack | Description |
|------|-------------|
| `p/python` | Python-specific security and correctness rules |
| `p/javascript` | JavaScript/TypeScript rules |
| `p/owasp-top-ten` | OWASP Top 10 vulnerability patterns |
| `p/security-audit` | Broad security audit rules across languages |
| `p/secrets` | Secret and credential detection |
| `p/ci` | Rules optimized for CI/CD pipelines |
| `p/docker` | Dockerfile security best practices |
| `p/terraform` | Terraform IaC security rules |
## Output Format
```json
{
"results": [
{
"check_id": "python.lang.security.audit.eval-detected",
"path": "app/views.py",
"start": {"line": 42, "col": 5},
"end": {"line": 42, "col": 28},
"extra": {
"message": "Detected eval() usage — avoid with untrusted input",
"severity": "ERROR",
"metadata": {
"cwe": ["CWE-95"],
"owasp": ["A03:2021 - Injection"]
}
}
}
],
"errors": [],
"stats": {
"findings": 3,
"errors": 0,
"total_time": 2.45
}
}
```