Add 30 new production-grade cybersecurity skills: AI security, supply chain, firmware, cloud-native, compliance, deception, crypto, threat hunting, purple team, OT, privacy

This commit is contained in:
mukul975
2026-03-19 19:14:23 +01:00
parent d43cc7a766
commit d833f0eab9
125 changed files with 47874 additions and 334 deletions
@@ -0,0 +1,314 @@
# API Reference: GDPR DSAR Workflow Automation
## PIIPatternMatcher
Scans text for PII using compiled regex patterns with confidence scoring and contextual boosting.
### Constructor
```python
PIIPatternMatcher(custom_patterns=None)
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `custom_patterns` | `dict` or `None` | Additional regex patterns to include in scanning |
### Methods
#### `scan_text(text, min_confidence=0.5)`
Scan a string for PII matches.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `text` | `str` | required | Text to scan for PII |
| `min_confidence` | `float` | `0.5` | Minimum confidence threshold (0.0-1.0) |
**Returns:** `list[dict]` -- Each match contains `type`, `value`, `description`, `confidence`, `gdpr_category`, `position`.
#### `scan_file(file_path, min_confidence=0.5)`
Scan a file on disk for PII matches.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `file_path` | `str` | required | Absolute path to the file |
| `min_confidence` | `float` | `0.5` | Minimum confidence threshold |
**Returns:** `dict` with `file`, `size_bytes`, `matches`, `match_count`, `pii_types_found`.
### Built-in PII Patterns
| Pattern Name | Description | Confidence | GDPR Category |
|-------------|-------------|------------|---------------|
| `email` | Email address | 0.95 | contact_information |
| `phone_international` | International phone number | 0.70 | contact_information |
| `uk_phone` | UK phone number | 0.80 | contact_information |
| `ssn_us` | US Social Security Number | 0.85 | government_id |
| `nino_uk` | UK National Insurance Number | 0.90 | government_id |
| `credit_card` | Credit/debit card number | 0.85 | financial_data |
| `iban` | International Bank Account Number | 0.80 | financial_data |
| `ipv4` | IPv4 address | 0.60 | online_identifier |
| `date_of_birth` | Date of birth (DD/MM/YYYY) | 0.65 | demographic_data |
| `uk_postcode` | UK postcode | 0.75 | location_data |
| `passport_uk` | UK passport number (9 digits) | 0.40 | government_id |
| `eu_vat` | EU VAT number | 0.50 | financial_data |
---
## PIIDiscoveryEngine
Discovers PII across structured (database) and unstructured (files) data sources.
### Constructor
```python
PIIDiscoveryEngine(custom_patterns=None)
```
### Methods
#### `scan_database(connection_string, search_identifiers, tables=None)`
Generate parameterized SQL queries for PII discovery in databases.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `connection_string` | `str` | required | Database connection string (redacted in output) |
| `search_identifiers` | `dict` | required | Key-value pairs to search for (e.g., `{"email": "user@example.com"}`) |
| `tables` | `list[str]` or `None` | auto | Tables to scan; defaults to common tables |
**Returns:** `dict` with `source_type`, `connection`, `tables_scanned`, `queries_generated`, `queries`.
#### `scan_files(directories, search_identifiers, file_extensions=None, max_file_size_mb=50)`
Scan files in directories for PII matching identifiers.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `directories` | `list[str]` | required | Directory paths to scan |
| `search_identifiers` | `dict` | required | Identifiers to search for |
| `file_extensions` | `list[str]` or `None` | common types | File extensions to include |
| `max_file_size_mb` | `int` | `50` | Skip files larger than this |
**Returns:** `dict` with `files_scanned`, `files_with_matches`, `matches`, `raw_text_matches`.
#### `scan_with_ner(text_corpus, entity_types=None, confidence_threshold=0.7)`
Scan text using Named Entity Recognition (spaCy NER with regex fallback).
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `text_corpus` | `list[str]` | required | List of file paths to scan |
| `entity_types` | `list[str]` or `None` | common types | NER entity types to detect |
| `confidence_threshold` | `float` | `0.7` | Minimum confidence for results |
**Supported Entity Types:** `PERSON`, `EMAIL`, `PHONE_NUMBER`, `LOCATION`, `DATE_OF_BIRTH`, `ORG`, `GPE`
**Returns:** `dict` with `files_processed`, `total_entities`, `results`, `model_used`.
#### `consolidate_results(*result_sets)`
Merge results from database, file, and NER scans into a unified record set.
**Returns:** `dict` with `total_records`, `source_count`, `sources`, `records`.
#### `full_scan(search_identifiers, sources=None, db_connection="", directories=None)`
Run a complete PII discovery scan across all source types.
**Returns:** Consolidated `dict` from all scans.
---
## DataMapper
Maps discovered PII to GDPR Article 15 disclosure categories.
### Constructor
```python
DataMapper(data_inventory_path=None)
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `data_inventory_path` | `str` or `None` | Path to JSON data inventory for overrides |
### Methods
#### `map_to_article15(pii_records, data_subject_id)`
Map PII records to Article 15 required categories including processing purposes, legal basis, retention periods, and recipients.
**Returns:** `dict` with `categories`, `supplementary_info`, `article_15_reference`.
### Article 15 Categories Mapped
| Category | Article Reference | Contents |
|----------|-------------------|----------|
| Processing Purposes | Art. 15(1)(a) | Why data is processed |
| Data Categories | Art. 15(1)(b) | Types of personal data |
| Recipients | Art. 15(1)(c) | Who receives the data |
| Retention Period | Art. 15(1)(d) | How long data is kept |
| Data Subject Rights | Art. 15(1)(e-f) | Rights to rectify, erase, restrict, object |
| Data Source | Art. 15(1)(g) | Where data was collected from |
| Automated Decisions | Art. 15(1)(h) | Profiling and automated decision-making |
| International Transfers | Art. 15(2) | Safeguards for cross-border transfers |
---
## ExemptionReviewer
Reviews DSAR data against applicable GDPR/UK GDPR exemptions.
### Methods
#### `review_exemptions(mapped_data, exemption_checks=None)`
Flag applicable exemptions for DPO review.
**Returns:** `dict` with `exemption_count`, `exemptions`, `review_status`.
#### `apply_redactions(mapped_data, approved_exemptions)`
Apply approved exemption redactions to the mapped data.
**Returns:** Redacted `dict` with `redaction_log`.
### Supported Exemption Types
| Type | Legal Basis | Action |
|------|-------------|--------|
| `third_party_data` | Art. 15(4) / DPA 2018 Sch. 2 Para 16 | redact |
| `legal_professional_privilege` | DPA 2018 Sch. 2 Para 19 | withhold |
| `trade_secrets` | Recital 63 GDPR | redact |
| `crime_prevention` | DPA 2018 Sch. 2 Para 2 | withhold |
| `management_forecasting` | DPA 2018 Sch. 2 Para 22 | withhold |
| `negotiations` | DPA 2018 Sch. 2 Para 24 | withhold |
| `regulatory_function` | DPA 2018 Sch. 2 Para 20 | withhold |
---
## DSARResponseGenerator
Generates compliant DSAR response packages per GDPR Article 15.
### Constructor
```python
DSARResponseGenerator(template_dir=None, organization_name="Organization",
dpo_email="dpo@organization.com", controller_name="Data Protection Officer")
```
### Methods
#### `generate_response(dsar_id, data_subject, mapped_data, format="json", request_date=None)`
Generate a complete response package with cover letter, data export, supplementary info, and audit metadata.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `dsar_id` | `str` | required | DSAR reference ID |
| `data_subject` | `str` | required | Name of the data subject |
| `mapped_data` | `dict` | required | Output from DataMapper/ExemptionReviewer |
| `format` | `str` | `"json"` | Export format: `json` or `csv` |
| `request_date` | `str` or `None` | today | Date the request was received |
**Returns:** `dict` with `documents` list containing filename, type, and content for each document.
#### `save_response_package(response, output_dir)`
Save all response documents to disk.
**Returns:** `list[str]` of saved file paths.
---
## DSARWorkflowEngine
Manages the complete DSAR lifecycle: intake, tracking, deadlines, and compliance.
### Constructor
```python
DSARWorkflowEngine(config_path=None)
```
### Methods
#### `register_dsar(requester_name, requester_email, request_channel, request_text, identity_docs=None)`
Register a new DSAR and start the 30-day compliance clock.
**Returns:** `dict` with `dsar_id`, `deadline`, `status`, `identity_verified`.
#### `update_status(dsar_id, new_status, notes="")`
Update DSAR processing status.
**Valid Statuses:** `received`, `identity_verification`, `verification_failed`, `in_progress`, `pii_discovery`, `exemption_review`, `dpo_review`, `response_generation`, `response_sent`, `closed`, `refused`.
#### `apply_extension(dsar_id, reason)`
Apply a 2-month extension for complex requests per Art. 12(3).
#### `pause_clock(dsar_id, reason)`
Pause the response clock (e.g., awaiting identity verification).
#### `days_remaining(dsar_id)`
Calculate remaining days until DSAR deadline. **Returns:** `int`.
#### `get_overdue_dsars()`
Get all DSARs past their deadline. **Returns:** `list[dict]`.
#### `generate_dashboard()`
Generate a DSAR processing dashboard summary. **Returns:** `dict` with status breakdown and overdue info.
---
## DSARAuditLogger
Maintains JSONL audit trails for DSAR processing lifecycle.
### Constructor
```python
DSARAuditLogger(log_path="dsar_audit_logs")
```
### Methods
#### `log_event(dsar_id, event_type, details=None)`
Log a DSAR processing event to the JSONL audit file.
#### `get_audit_trail(dsar_id)`
Retrieve the complete audit trail. **Returns:** `list[dict]`.
#### `generate_compliance_report(dsar_id)`
Generate a compliance report with pass/fail checks for all processing steps.
**Returns:** `dict` with `compliance_checks`, `timeline`, `overall_compliance` (`COMPLIANT` or `REVIEW_REQUIRED`).
---
## CLI Usage
```bash
# Full automated pipeline
python agent.py --action full_pipeline \
--requester-name "Jane Smith" \
--requester-email "jane.smith@example.com" \
--scan-dirs /var/log/app /data/exports \
--db-connection "postgresql://user:pass@localhost/appdb" \
--output-dir dsar_output \
--format json
# Scan text for PII
python agent.py --action scan_pii \
--scan-text "Contact jane@example.com or call +44 20 7946 0958"
# Scan files only
python agent.py --action scan_files \
--scan-dirs /data/exports /var/log \
--requester-email "jane@example.com"
# Generate dashboard
python agent.py --action dashboard
```
### CLI Arguments
| Argument | Default | Description |
|----------|---------|-------------|
| `--action` | `full_pipeline` | Action to perform |
| `--requester-name` | `Test Subject` | Data subject name |
| `--requester-email` | `test@example.com` | Data subject email |
| `--request-channel` | `email` | Request channel |
| `--scan-dirs` | `[]` | Directories to scan |
| `--db-connection` | `""` | Database connection string |
| `--output-dir` | `dsar_output` | Output directory |
| `--config` | `dsar_config.json` | Configuration file path |
| `--format` | `json` | Output format (`json` or `csv`) |
| `--min-confidence` | `0.5` | Minimum PII confidence threshold |
| `--scan-text` | `""` | Direct text to scan for PII |