mirror of
https://github.com/mukul975/Anthropic-Cybersecurity-Skills.git
synced 2026-06-15 23:44:56 +03:00
Add 30 new production-grade cybersecurity skills: AI security, supply chain, firmware, cloud-native, compliance, deception, crypto, threat hunting, purple team, OT, privacy
This commit is contained in:
@@ -0,0 +1,201 @@
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to the Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by the Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding any notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. Please do not remove or change
|
||||
the license header comment from a contributed file except when
|
||||
necessary.
|
||||
|
||||
Copyright 2026 mukul975
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
@@ -0,0 +1,286 @@
|
||||
---
|
||||
name: implementing-gdpr-data-subject-access-request
|
||||
description: >
|
||||
Automates GDPR Data Subject Access Request (DSAR) workflows including identity verification,
|
||||
PII discovery across databases and files using regex and NER, data mapping, response
|
||||
templating per Article 15 requirements, deadline tracking, and audit logging. Covers
|
||||
ICO/EDPB guidance compliance, exemption handling, and scalable batch processing. Use when
|
||||
building or auditing DSAR response capabilities under GDPR/UK GDPR.
|
||||
domain: cybersecurity
|
||||
subdomain: privacy-compliance
|
||||
tags: [gdpr, dsar, privacy, pii-discovery, data-subject-rights, compliance, article-15]
|
||||
version: "1.0"
|
||||
author: mukul975
|
||||
license: Apache-2.0
|
||||
---
|
||||
|
||||
# Implementing GDPR Data Subject Access Request (DSAR) Workflow
|
||||
|
||||
## When to Use
|
||||
|
||||
- When building automated DSAR processing pipelines for GDPR/UK GDPR compliance
|
||||
- When implementing PII discovery across structured and unstructured data sources
|
||||
- When creating response templates that satisfy Article 15 disclosure requirements
|
||||
- When auditing existing DSAR handling for regulatory compliance gaps
|
||||
- When scaling DSAR processing from manual to automated workflows
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.8+ with required dependencies (spacy, presidio-analyzer, jinja2)
|
||||
- Access to data sources where personal data resides (databases, file shares, logs)
|
||||
- Understanding of GDPR Article 15 requirements and ICO/EDPB guidance
|
||||
- Appropriate authorization and data protection officer (DPO) approval
|
||||
- Test environment with synthetic or anonymized data for validation
|
||||
|
||||
## Background
|
||||
|
||||
### GDPR Article 15 - Right of Access
|
||||
|
||||
Under GDPR Article 15, data subjects have the right to obtain from the controller:
|
||||
|
||||
1. **Confirmation** that their personal data is being processed
|
||||
2. **A copy** of all personal data held about them
|
||||
3. **Supplementary information** including:
|
||||
- Purposes of processing
|
||||
- Categories of personal data
|
||||
- Recipients or categories of recipients
|
||||
- Retention periods or criteria to determine them
|
||||
- Right to rectification, erasure, restriction, or objection
|
||||
- Right to lodge a complaint with a supervisory authority
|
||||
- Source of the data (if not collected directly from the subject)
|
||||
- Existence of automated decision-making, including profiling
|
||||
|
||||
### Timeline Requirements
|
||||
|
||||
- **Standard deadline**: 1 calendar month from receipt of valid request
|
||||
- **Complex extension**: Up to 2 additional months (must notify within first month)
|
||||
- **Clock pause**: Permitted when identity verification or clarification is needed
|
||||
- **Format**: Electronic form if request made electronically (unless otherwise requested)
|
||||
- **Cost**: Free of charge (unless manifestly unfounded/excessive)
|
||||
|
||||
### ICO/EDPB Guidance Key Points
|
||||
|
||||
- No formal format required for DSARs - verbal, written, social media all valid
|
||||
- Request need not mention "subject access request" or cite Article 15
|
||||
- Identity verification must be proportionate to the risk
|
||||
- Exemptions exist for legal privilege, third-party data, trade secrets
|
||||
- EDPB coordinated enforcement actions cover right of access compliance
|
||||
|
||||
## Instructions
|
||||
|
||||
### Step 1: DSAR Intake and Verification
|
||||
|
||||
Implement a request intake system that captures the request through any channel,
|
||||
verifies the requester's identity, and starts the compliance clock.
|
||||
|
||||
```python
|
||||
from agent import DSARWorkflowEngine
|
||||
|
||||
engine = DSARWorkflowEngine(config_path="dsar_config.json")
|
||||
|
||||
# Register a new DSAR
|
||||
request = engine.register_dsar(
|
||||
requester_name="Jane Smith",
|
||||
requester_email="jane.smith@example.com",
|
||||
request_channel="email",
|
||||
request_text="I would like a copy of all personal data you hold about me.",
|
||||
identity_docs=["passport_verified"],
|
||||
)
|
||||
print(f"DSAR ID: {request['dsar_id']}, Deadline: {request['deadline']}")
|
||||
```
|
||||
|
||||
### Step 2: PII Discovery Across Data Sources
|
||||
|
||||
Scan databases, files, and logs using regex patterns and NER to find all
|
||||
personal data associated with the data subject.
|
||||
|
||||
```python
|
||||
from agent import PIIDiscoveryEngine
|
||||
|
||||
pii_engine = PIIDiscoveryEngine()
|
||||
|
||||
# Scan structured data (database)
|
||||
db_results = pii_engine.scan_database(
|
||||
connection_string="postgresql://user:pass@localhost/appdb",
|
||||
search_identifiers={"email": "jane.smith@example.com", "name": "Jane Smith"},
|
||||
)
|
||||
|
||||
# Scan unstructured data (files, logs)
|
||||
file_results = pii_engine.scan_files(
|
||||
directories=["/var/log/app", "/data/exports", "/data/documents"],
|
||||
search_identifiers={"email": "jane.smith@example.com", "name": "Jane Smith"},
|
||||
)
|
||||
|
||||
# Scan with NER for contextual PII detection
|
||||
ner_results = pii_engine.scan_with_ner(
|
||||
text_corpus=file_results["raw_text_matches"],
|
||||
entity_types=["PERSON", "EMAIL", "PHONE_NUMBER", "LOCATION", "DATE_OF_BIRTH"],
|
||||
)
|
||||
|
||||
all_pii = pii_engine.consolidate_results(db_results, file_results, ner_results)
|
||||
print(f"Found {all_pii['total_records']} PII records across {all_pii['source_count']} sources")
|
||||
```
|
||||
|
||||
### Step 3: Data Mapping and Classification
|
||||
|
||||
Map discovered PII to processing purposes, legal bases, and retention periods
|
||||
as required by Article 15.
|
||||
|
||||
```python
|
||||
from agent import DataMapper
|
||||
|
||||
mapper = DataMapper(data_inventory_path="data_inventory.json")
|
||||
|
||||
# Map PII to Article 15 categories
|
||||
mapped_data = mapper.map_to_article15(
|
||||
pii_records=all_pii,
|
||||
data_subject_id="jane.smith@example.com",
|
||||
)
|
||||
|
||||
# Output includes processing purposes, recipients, retention for each data category
|
||||
for category in mapped_data["categories"]:
|
||||
print(f"Category: {category['name']}")
|
||||
print(f" Purpose: {category['processing_purpose']}")
|
||||
print(f" Legal basis: {category['legal_basis']}")
|
||||
print(f" Retention: {category['retention_period']}")
|
||||
print(f" Recipients: {', '.join(category['recipients'])}")
|
||||
```
|
||||
|
||||
### Step 4: Exemption Review
|
||||
|
||||
Apply exemptions where lawful (third-party data, legal privilege, trade secrets)
|
||||
before compiling the response.
|
||||
|
||||
```python
|
||||
from agent import ExemptionReviewer
|
||||
|
||||
reviewer = ExemptionReviewer()
|
||||
|
||||
# Check for applicable exemptions
|
||||
review_result = reviewer.review_exemptions(
|
||||
mapped_data=mapped_data,
|
||||
exemption_checks=[
|
||||
"third_party_data",
|
||||
"legal_professional_privilege",
|
||||
"trade_secrets",
|
||||
"crime_prevention",
|
||||
"management_forecasting",
|
||||
],
|
||||
)
|
||||
|
||||
# Apply redactions where exemptions apply
|
||||
redacted_data = reviewer.apply_redactions(mapped_data, review_result["exemptions"])
|
||||
print(f"Applied {review_result['exemption_count']} exemptions")
|
||||
```
|
||||
|
||||
### Step 5: Response Generation
|
||||
|
||||
Generate a compliant DSAR response package with cover letter, data export,
|
||||
and supplementary information document.
|
||||
|
||||
```python
|
||||
from agent import DSARResponseGenerator
|
||||
|
||||
generator = DSARResponseGenerator(template_dir="templates/")
|
||||
|
||||
# Generate complete response package
|
||||
response = generator.generate_response(
|
||||
dsar_id=request["dsar_id"],
|
||||
data_subject="Jane Smith",
|
||||
mapped_data=redacted_data,
|
||||
format="pdf", # or "json", "csv"
|
||||
)
|
||||
|
||||
# Package includes: cover letter, data export, supplementary info, audit log
|
||||
for doc in response["documents"]:
|
||||
print(f"Generated: {doc['filename']} ({doc['type']})")
|
||||
```
|
||||
|
||||
### Step 6: Audit Trail and Compliance Logging
|
||||
|
||||
Maintain complete audit trail of the DSAR lifecycle for accountability.
|
||||
|
||||
```python
|
||||
from agent import DSARAuditLogger
|
||||
|
||||
logger = DSARAuditLogger(log_path="dsar_audit_logs/")
|
||||
|
||||
# Log complete DSAR lifecycle
|
||||
logger.log_event(request["dsar_id"], "request_received", {
|
||||
"channel": "email",
|
||||
"identity_verified": True,
|
||||
})
|
||||
logger.log_event(request["dsar_id"], "pii_discovery_complete", {
|
||||
"records_found": all_pii["total_records"],
|
||||
"sources_scanned": all_pii["source_count"],
|
||||
})
|
||||
logger.log_event(request["dsar_id"], "response_sent", {
|
||||
"format": "pdf",
|
||||
"documents_count": len(response["documents"]),
|
||||
"exemptions_applied": review_result["exemption_count"],
|
||||
})
|
||||
|
||||
# Generate compliance report
|
||||
compliance_report = logger.generate_compliance_report(request["dsar_id"])
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Complete DSAR Processing Pipeline
|
||||
|
||||
```python
|
||||
from agent import DSARWorkflowEngine, PIIDiscoveryEngine, DSARResponseGenerator
|
||||
|
||||
# Full automated pipeline
|
||||
engine = DSARWorkflowEngine(config_path="dsar_config.json")
|
||||
pii = PIIDiscoveryEngine()
|
||||
gen = DSARResponseGenerator(template_dir="templates/")
|
||||
|
||||
# 1. Intake
|
||||
req = engine.register_dsar(
|
||||
requester_name="John Doe",
|
||||
requester_email="john.doe@example.com",
|
||||
request_channel="web_form",
|
||||
request_text="Please provide all my data under GDPR Article 15.",
|
||||
identity_docs=["email_verified", "account_match"],
|
||||
)
|
||||
|
||||
# 2. Discover
|
||||
results = pii.full_scan(
|
||||
search_identifiers={"email": "john.doe@example.com"},
|
||||
sources=["database", "files", "logs"],
|
||||
)
|
||||
|
||||
# 3. Generate response
|
||||
response = gen.generate_response(
|
||||
dsar_id=req["dsar_id"],
|
||||
data_subject="John Doe",
|
||||
mapped_data=results,
|
||||
)
|
||||
|
||||
# 4. Track deadline
|
||||
engine.update_status(req["dsar_id"], "response_sent")
|
||||
print(f"DSAR {req['dsar_id']} completed, {engine.days_remaining(req['dsar_id'])} days remaining")
|
||||
```
|
||||
|
||||
### PII Regex Pattern Testing
|
||||
|
||||
```python
|
||||
from agent import PIIPatternMatcher
|
||||
|
||||
matcher = PIIPatternMatcher()
|
||||
|
||||
# Test individual patterns
|
||||
test_text = "Contact jane.smith@example.com or call +44 20 7946 0958. SSN: 123-45-6789"
|
||||
matches = matcher.scan_text(test_text)
|
||||
for m in matches:
|
||||
print(f" [{m['type']}] '{m['value']}' (confidence: {m['confidence']})")
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- GDPR Article 15: https://gdpr-info.eu/art-15-gdpr/
|
||||
- ICO Subject Access Request Guidance: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/subject-access-requests/
|
||||
- EDPB Guidelines 01/2022 on Right of Access: https://www.edpb.europa.eu/system/files/2023-04/edpb_guidelines_202201_data_subject_rights_access_v2_en.pdf
|
||||
- GDPR Article 12 (DSAR Modalities): https://gdpr-info.eu/art-12-gdpr/
|
||||
- Regulation (EU) 2025/2518 (Procedural Rules): Cross-border GDPR enforcement procedural rules
|
||||
@@ -0,0 +1,314 @@
|
||||
# API Reference: GDPR DSAR Workflow Automation
|
||||
|
||||
## PIIPatternMatcher
|
||||
|
||||
Scans text for PII using compiled regex patterns with confidence scoring and contextual boosting.
|
||||
|
||||
### Constructor
|
||||
```python
|
||||
PIIPatternMatcher(custom_patterns=None)
|
||||
```
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `custom_patterns` | `dict` or `None` | Additional regex patterns to include in scanning |
|
||||
|
||||
### Methods
|
||||
|
||||
#### `scan_text(text, min_confidence=0.5)`
|
||||
Scan a string for PII matches.
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `text` | `str` | required | Text to scan for PII |
|
||||
| `min_confidence` | `float` | `0.5` | Minimum confidence threshold (0.0-1.0) |
|
||||
|
||||
**Returns:** `list[dict]` -- Each match contains `type`, `value`, `description`, `confidence`, `gdpr_category`, `position`.
|
||||
|
||||
#### `scan_file(file_path, min_confidence=0.5)`
|
||||
Scan a file on disk for PII matches.
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `file_path` | `str` | required | Absolute path to the file |
|
||||
| `min_confidence` | `float` | `0.5` | Minimum confidence threshold |
|
||||
|
||||
**Returns:** `dict` with `file`, `size_bytes`, `matches`, `match_count`, `pii_types_found`.
|
||||
|
||||
### Built-in PII Patterns
|
||||
|
||||
| Pattern Name | Description | Confidence | GDPR Category |
|
||||
|-------------|-------------|------------|---------------|
|
||||
| `email` | Email address | 0.95 | contact_information |
|
||||
| `phone_international` | International phone number | 0.70 | contact_information |
|
||||
| `uk_phone` | UK phone number | 0.80 | contact_information |
|
||||
| `ssn_us` | US Social Security Number | 0.85 | government_id |
|
||||
| `nino_uk` | UK National Insurance Number | 0.90 | government_id |
|
||||
| `credit_card` | Credit/debit card number | 0.85 | financial_data |
|
||||
| `iban` | International Bank Account Number | 0.80 | financial_data |
|
||||
| `ipv4` | IPv4 address | 0.60 | online_identifier |
|
||||
| `date_of_birth` | Date of birth (DD/MM/YYYY) | 0.65 | demographic_data |
|
||||
| `uk_postcode` | UK postcode | 0.75 | location_data |
|
||||
| `passport_uk` | UK passport number (9 digits) | 0.40 | government_id |
|
||||
| `eu_vat` | EU VAT number | 0.50 | financial_data |
|
||||
|
||||
---
|
||||
|
||||
## PIIDiscoveryEngine
|
||||
|
||||
Discovers PII across structured (database) and unstructured (files) data sources.
|
||||
|
||||
### Constructor
|
||||
```python
|
||||
PIIDiscoveryEngine(custom_patterns=None)
|
||||
```
|
||||
|
||||
### Methods
|
||||
|
||||
#### `scan_database(connection_string, search_identifiers, tables=None)`
|
||||
Generate parameterized SQL queries for PII discovery in databases.
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `connection_string` | `str` | required | Database connection string (redacted in output) |
|
||||
| `search_identifiers` | `dict` | required | Key-value pairs to search for (e.g., `{"email": "user@example.com"}`) |
|
||||
| `tables` | `list[str]` or `None` | auto | Tables to scan; defaults to common tables |
|
||||
|
||||
**Returns:** `dict` with `source_type`, `connection`, `tables_scanned`, `queries_generated`, `queries`.
|
||||
|
||||
#### `scan_files(directories, search_identifiers, file_extensions=None, max_file_size_mb=50)`
|
||||
Scan files in directories for PII matching identifiers.
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `directories` | `list[str]` | required | Directory paths to scan |
|
||||
| `search_identifiers` | `dict` | required | Identifiers to search for |
|
||||
| `file_extensions` | `list[str]` or `None` | common types | File extensions to include |
|
||||
| `max_file_size_mb` | `int` | `50` | Skip files larger than this |
|
||||
|
||||
**Returns:** `dict` with `files_scanned`, `files_with_matches`, `matches`, `raw_text_matches`.
|
||||
|
||||
#### `scan_with_ner(text_corpus, entity_types=None, confidence_threshold=0.7)`
|
||||
Scan text using Named Entity Recognition (spaCy NER with regex fallback).
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `text_corpus` | `list[str]` | required | List of file paths to scan |
|
||||
| `entity_types` | `list[str]` or `None` | common types | NER entity types to detect |
|
||||
| `confidence_threshold` | `float` | `0.7` | Minimum confidence for results |
|
||||
|
||||
**Supported Entity Types:** `PERSON`, `EMAIL`, `PHONE_NUMBER`, `LOCATION`, `DATE_OF_BIRTH`, `ORG`, `GPE`
|
||||
|
||||
**Returns:** `dict` with `files_processed`, `total_entities`, `results`, `model_used`.
|
||||
|
||||
#### `consolidate_results(*result_sets)`
|
||||
Merge results from database, file, and NER scans into a unified record set.
|
||||
|
||||
**Returns:** `dict` with `total_records`, `source_count`, `sources`, `records`.
|
||||
|
||||
#### `full_scan(search_identifiers, sources=None, db_connection="", directories=None)`
|
||||
Run a complete PII discovery scan across all source types.
|
||||
|
||||
**Returns:** Consolidated `dict` from all scans.
|
||||
|
||||
---
|
||||
|
||||
## DataMapper
|
||||
|
||||
Maps discovered PII to GDPR Article 15 disclosure categories.
|
||||
|
||||
### Constructor
|
||||
```python
|
||||
DataMapper(data_inventory_path=None)
|
||||
```
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `data_inventory_path` | `str` or `None` | Path to JSON data inventory for overrides |
|
||||
|
||||
### Methods
|
||||
|
||||
#### `map_to_article15(pii_records, data_subject_id)`
|
||||
Map PII records to Article 15 required categories including processing purposes, legal basis, retention periods, and recipients.
|
||||
|
||||
**Returns:** `dict` with `categories`, `supplementary_info`, `article_15_reference`.
|
||||
|
||||
### Article 15 Categories Mapped
|
||||
|
||||
| Category | Article Reference | Contents |
|
||||
|----------|-------------------|----------|
|
||||
| Processing Purposes | Art. 15(1)(a) | Why data is processed |
|
||||
| Data Categories | Art. 15(1)(b) | Types of personal data |
|
||||
| Recipients | Art. 15(1)(c) | Who receives the data |
|
||||
| Retention Period | Art. 15(1)(d) | How long data is kept |
|
||||
| Data Subject Rights | Art. 15(1)(e-f) | Rights to rectify, erase, restrict, object |
|
||||
| Data Source | Art. 15(1)(g) | Where data was collected from |
|
||||
| Automated Decisions | Art. 15(1)(h) | Profiling and automated decision-making |
|
||||
| International Transfers | Art. 15(2) | Safeguards for cross-border transfers |
|
||||
|
||||
---
|
||||
|
||||
## ExemptionReviewer
|
||||
|
||||
Reviews DSAR data against applicable GDPR/UK GDPR exemptions.
|
||||
|
||||
### Methods
|
||||
|
||||
#### `review_exemptions(mapped_data, exemption_checks=None)`
|
||||
Flag applicable exemptions for DPO review.
|
||||
|
||||
**Returns:** `dict` with `exemption_count`, `exemptions`, `review_status`.
|
||||
|
||||
#### `apply_redactions(mapped_data, approved_exemptions)`
|
||||
Apply approved exemption redactions to the mapped data.
|
||||
|
||||
**Returns:** Redacted `dict` with `redaction_log`.
|
||||
|
||||
### Supported Exemption Types
|
||||
|
||||
| Type | Legal Basis | Action |
|
||||
|------|-------------|--------|
|
||||
| `third_party_data` | Art. 15(4) / DPA 2018 Sch. 2 Para 16 | redact |
|
||||
| `legal_professional_privilege` | DPA 2018 Sch. 2 Para 19 | withhold |
|
||||
| `trade_secrets` | Recital 63 GDPR | redact |
|
||||
| `crime_prevention` | DPA 2018 Sch. 2 Para 2 | withhold |
|
||||
| `management_forecasting` | DPA 2018 Sch. 2 Para 22 | withhold |
|
||||
| `negotiations` | DPA 2018 Sch. 2 Para 24 | withhold |
|
||||
| `regulatory_function` | DPA 2018 Sch. 2 Para 20 | withhold |
|
||||
|
||||
---
|
||||
|
||||
## DSARResponseGenerator
|
||||
|
||||
Generates compliant DSAR response packages per GDPR Article 15.
|
||||
|
||||
### Constructor
|
||||
```python
|
||||
DSARResponseGenerator(template_dir=None, organization_name="Organization",
|
||||
dpo_email="dpo@organization.com", controller_name="Data Protection Officer")
|
||||
```
|
||||
|
||||
### Methods
|
||||
|
||||
#### `generate_response(dsar_id, data_subject, mapped_data, format="json", request_date=None)`
|
||||
Generate a complete response package with cover letter, data export, supplementary info, and audit metadata.
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `dsar_id` | `str` | required | DSAR reference ID |
|
||||
| `data_subject` | `str` | required | Name of the data subject |
|
||||
| `mapped_data` | `dict` | required | Output from DataMapper/ExemptionReviewer |
|
||||
| `format` | `str` | `"json"` | Export format: `json` or `csv` |
|
||||
| `request_date` | `str` or `None` | today | Date the request was received |
|
||||
|
||||
**Returns:** `dict` with `documents` list containing filename, type, and content for each document.
|
||||
|
||||
#### `save_response_package(response, output_dir)`
|
||||
Save all response documents to disk.
|
||||
|
||||
**Returns:** `list[str]` of saved file paths.
|
||||
|
||||
---
|
||||
|
||||
## DSARWorkflowEngine
|
||||
|
||||
Manages the complete DSAR lifecycle: intake, tracking, deadlines, and compliance.
|
||||
|
||||
### Constructor
|
||||
```python
|
||||
DSARWorkflowEngine(config_path=None)
|
||||
```
|
||||
|
||||
### Methods
|
||||
|
||||
#### `register_dsar(requester_name, requester_email, request_channel, request_text, identity_docs=None)`
|
||||
Register a new DSAR and start the 30-day compliance clock.
|
||||
|
||||
**Returns:** `dict` with `dsar_id`, `deadline`, `status`, `identity_verified`.
|
||||
|
||||
#### `update_status(dsar_id, new_status, notes="")`
|
||||
Update DSAR processing status.
|
||||
|
||||
**Valid Statuses:** `received`, `identity_verification`, `verification_failed`, `in_progress`, `pii_discovery`, `exemption_review`, `dpo_review`, `response_generation`, `response_sent`, `closed`, `refused`.
|
||||
|
||||
#### `apply_extension(dsar_id, reason)`
|
||||
Apply a 2-month extension for complex requests per Art. 12(3).
|
||||
|
||||
#### `pause_clock(dsar_id, reason)`
|
||||
Pause the response clock (e.g., awaiting identity verification).
|
||||
|
||||
#### `days_remaining(dsar_id)`
|
||||
Calculate remaining days until DSAR deadline. **Returns:** `int`.
|
||||
|
||||
#### `get_overdue_dsars()`
|
||||
Get all DSARs past their deadline. **Returns:** `list[dict]`.
|
||||
|
||||
#### `generate_dashboard()`
|
||||
Generate a DSAR processing dashboard summary. **Returns:** `dict` with status breakdown and overdue info.
|
||||
|
||||
---
|
||||
|
||||
## DSARAuditLogger
|
||||
|
||||
Maintains JSONL audit trails for DSAR processing lifecycle.
|
||||
|
||||
### Constructor
|
||||
```python
|
||||
DSARAuditLogger(log_path="dsar_audit_logs")
|
||||
```
|
||||
|
||||
### Methods
|
||||
|
||||
#### `log_event(dsar_id, event_type, details=None)`
|
||||
Log a DSAR processing event to the JSONL audit file.
|
||||
|
||||
#### `get_audit_trail(dsar_id)`
|
||||
Retrieve the complete audit trail. **Returns:** `list[dict]`.
|
||||
|
||||
#### `generate_compliance_report(dsar_id)`
|
||||
Generate a compliance report with pass/fail checks for all processing steps.
|
||||
|
||||
**Returns:** `dict` with `compliance_checks`, `timeline`, `overall_compliance` (`COMPLIANT` or `REVIEW_REQUIRED`).
|
||||
|
||||
---
|
||||
|
||||
## CLI Usage
|
||||
|
||||
```bash
|
||||
# Full automated pipeline
|
||||
python agent.py --action full_pipeline \
|
||||
--requester-name "Jane Smith" \
|
||||
--requester-email "jane.smith@example.com" \
|
||||
--scan-dirs /var/log/app /data/exports \
|
||||
--db-connection "postgresql://user:pass@localhost/appdb" \
|
||||
--output-dir dsar_output \
|
||||
--format json
|
||||
|
||||
# Scan text for PII
|
||||
python agent.py --action scan_pii \
|
||||
--scan-text "Contact jane@example.com or call +44 20 7946 0958"
|
||||
|
||||
# Scan files only
|
||||
python agent.py --action scan_files \
|
||||
--scan-dirs /data/exports /var/log \
|
||||
--requester-email "jane@example.com"
|
||||
|
||||
# Generate dashboard
|
||||
python agent.py --action dashboard
|
||||
```
|
||||
|
||||
### CLI Arguments
|
||||
|
||||
| Argument | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `--action` | `full_pipeline` | Action to perform |
|
||||
| `--requester-name` | `Test Subject` | Data subject name |
|
||||
| `--requester-email` | `test@example.com` | Data subject email |
|
||||
| `--request-channel` | `email` | Request channel |
|
||||
| `--scan-dirs` | `[]` | Directories to scan |
|
||||
| `--db-connection` | `""` | Database connection string |
|
||||
| `--output-dir` | `dsar_output` | Output directory |
|
||||
| `--config` | `dsar_config.json` | Configuration file path |
|
||||
| `--format` | `json` | Output format (`json` or `csv`) |
|
||||
| `--min-confidence` | `0.5` | Minimum PII confidence threshold |
|
||||
| `--scan-text` | `""` | Direct text to scan for PII |
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user