mirror of
https://github.com/phuryn/pm-skills.git
synced 2026-06-11 14:04:56 +03:00
83 lines
2.7 KiB
Markdown
83 lines
2.7 KiB
Markdown
---
|
|
description: Generate realistic dummy datasets for testing — CSV, JSON, SQL inserts, or Python scripts
|
|
argument-hint: "<description of the data you need>"
|
|
---
|
|
|
|
# /generate-data -- Test Data Generator
|
|
|
|
Create realistic dummy datasets for development, testing, demos, or prototyping. Outputs as ready-to-use files in your preferred format.
|
|
|
|
## Invocation
|
|
|
|
```
|
|
/generate-data 1000 users with names, emails, plan tier, signup date, and activity score
|
|
/generate-data E-commerce orders dataset: products, customers, timestamps, amounts
|
|
/generate-data Sample data matching this schema: [paste table definition]
|
|
```
|
|
|
|
## Workflow
|
|
|
|
### Step 1: Define the Dataset
|
|
|
|
Understand:
|
|
- What entities? (users, orders, events, products, etc.)
|
|
- What columns? (with data types and constraints)
|
|
- How many rows?
|
|
- Any relationships between tables?
|
|
- Any specific distributions? (e.g., "80% should be on the free plan")
|
|
- Any realistic constraints? (emails should be unique, dates should be chronological)
|
|
|
|
### Step 2: Generate the Data
|
|
|
|
Apply the **dummy-dataset** skill:
|
|
|
|
- Create a Python script that generates the dataset
|
|
- Use realistic-looking data (not random strings): proper names, valid email formats, real-seeming dates
|
|
- Respect constraints: unique IDs, foreign key relationships, chronological ordering
|
|
- Apply specified distributions
|
|
- Execute the script and produce the output file
|
|
|
|
### Step 3: Deliver
|
|
|
|
Output in the requested format (or ask):
|
|
- **CSV**: Most common, works everywhere
|
|
- **JSON**: For API testing or frontend development
|
|
- **SQL INSERT**: For populating test databases
|
|
- **Python script**: For reproducible generation (user can tweak and re-run)
|
|
|
|
```
|
|
## Generated Dataset: [Description]
|
|
|
|
**Rows**: [count]
|
|
**Columns**: [list]
|
|
**Format**: [CSV / JSON / SQL / Python]
|
|
|
|
### Schema
|
|
| Column | Type | Constraints | Distribution |
|
|
|--------|------|-----------|-------------|
|
|
|
|
### Sample (first 5 rows)
|
|
[Preview of the data]
|
|
|
|
### Files
|
|
- [data file]
|
|
- [generator script, if applicable]
|
|
```
|
|
|
|
Save data file and generator script to the user's workspace.
|
|
|
|
### Step 4: Offer Follow-ups
|
|
|
|
- "Want me to **add more columns** or **increase the dataset size**?"
|
|
- "Should I **create related tables** (e.g., orders for these users)?"
|
|
- "Want me to **write test scenarios** that use this data?"
|
|
- "Should I **create SQL queries** to analyze this dataset?"
|
|
|
|
## Notes
|
|
|
|
- Always provide the generator script so the user can regenerate with different parameters
|
|
- For demo datasets, make the data tell a story (e.g., seasonal trends, a retention problem, a power user segment)
|
|
- Respect realistic cardinality: 1000 users don't have 1000 unique cities
|
|
- For financial data, use realistic price distributions — not uniform random
|
|
- Never include real personal data — all names, emails, and identifiers must be fake
|