This commit is contained in:
Pawel Huryn
2026-03-02 00:36:23 +01:00
parent 61004d0c4e
commit 77dbdfa1b9
118 changed files with 11087 additions and 0 deletions
@@ -0,0 +1,19 @@
{
"name": "pm-data-analytics",
"version": "1.0.0",
"description": "Data analytics skills for PMs: SQL query generation and cohort analysis. Analyze user data, generate queries, and identify retention patterns.",
"author": {
"name": "Paweł Huryn",
"email": "pawel@productcompass.pm",
"url": "https://www.productcompass.pm"
},
"keywords": [
"product-management",
"data-analytics",
"sql",
"cohort-analysis",
"retention"
],
"homepage": "https://www.productcompass.pm",
"license": "MIT"
}
+39
View File
@@ -0,0 +1,39 @@
# pm-data-analytics
Data analytics skills for PMs: SQL query generation and cohort analysis. Analyze user data, generate queries, and identify retention patterns.
## Overview
This plugin provides 3 skills and 3 commands for product managers.
## Skills
- **ab-test-analysis** — Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and actionable ...
- **cohort-analysis** — Perform cohort analysis on user engagement data.
- **sql-queries** — Generate SQL queries from natural language descriptions.
## Commands
- `/pm-data-analytics:analyze-cohorts` — Perform cohort analysis on user data — retention curves, feature adoption, and engagement trends
- `/pm-data-analytics:analyze-test` — Analyze A/B test results — statistical significance, sample size validation, and ship/extend/stop recommendations
- `/pm-data-analytics:write-query` — Generate SQL queries from natural language — supports BigQuery, PostgreSQL, MySQL, and more
## Installation
```bash
/install pm-data-analytics
```
Or use directly:
```bash
cc --plugin-dir /path/to/pm-data-analytics
```
## Author
Paweł Huryn — [The Product Compass Newsletter](https://www.productcompass.pm)
## License
MIT
@@ -0,0 +1,99 @@
---
description: Perform cohort analysis on user data — retention curves, feature adoption, and engagement trends
argument-hint: "<data file or description of what to analyze>"
---
# /analyze-cohorts -- Cohort Analysis
Analyze user retention and engagement patterns by cohort. Upload your data or describe what you need, and get retention curves, feature adoption trends, and actionable insights.
## Invocation
```
/analyze-cohorts [upload a CSV of user activity data]
/analyze-cohorts Monthly retention for users who signed up in Jan-Jun, grouped by acquisition channel
/analyze-cohorts Help me set up a cohort analysis for our onboarding redesign
```
## Workflow
### Step 1: Accept Data or Define Analysis
Two paths:
- **With data**: User uploads a CSV/spreadsheet with user-level data (user_id, signup_date, activity_date, event_type, etc.)
- **Without data**: User describes the analysis they need → generate the SQL query and analysis framework
### Step 2: Define Cohorts
Ask:
- What defines a cohort? (signup week/month, acquisition channel, plan tier, first feature used)
- What is the retention event? (login, core action, any activity, purchase)
- What time granularity? (daily, weekly, monthly)
- What time range?
### Step 3: Analyze
Apply the **cohort-analysis** skill:
**If data is provided:**
- Process the data using Python (pandas) to create cohort tables
- Calculate retention rates per cohort per period
- Generate retention curves
- Identify patterns: improving/declining cohorts, seasonal effects, anomalies
- Compare feature adoption across cohorts
**If describing an analysis:**
- Design the cohort analysis framework
- Generate SQL queries to extract the data
- Create a template spreadsheet for the analysis
- Define the metrics and visualization approach
### Step 4: Generate Report
```
## Cohort Analysis: [Description]
**Date**: [today]
**Cohort definition**: [e.g., signup month]
**Retention event**: [e.g., completed a project]
**Granularity**: [weekly/monthly]
### Retention Table
| Cohort | Size | Week 1 | Week 2 | Week 3 | ... | Week 12 |
|--------|------|--------|--------|--------|-----|---------|
### Key Findings
1. **[Finding]** — [supporting data]
2. ...
### Cohort Comparison
- **Best-performing cohort**: [which, why]
- **Worst-performing cohort**: [which, why]
- **Trend**: [improving/declining/stable over time]
### Retention Benchmarks
| Period | Your Rate | Industry Benchmark | Gap |
|--------|----------|-------------------|-----|
### Recommendations
1. [What to investigate or change based on findings]
2. ...
### Follow-Up Queries
[SQL queries for deeper investigation]
```
If data was provided, save analysis as both markdown report and CSV/spreadsheet.
### Step 5: Offer Next Steps
- "Want me to **segment this further** by another dimension?"
- "Should I **set up metrics alerts** based on these retention thresholds?"
- "Want me to **design experiments** to improve retention for the weakest cohort?"
## Notes
- Cohort analysis is only as good as the retention event definition — push for a meaningful action, not just "logged in"
- Early cohorts often look different due to founding user bias — note this when comparing
- If retention is calculated using a Python script, save the script so the user can re-run with new data
- Seasonal effects can masquerade as trends — flag if cohort differences might be calendar-driven
+109
View File
@@ -0,0 +1,109 @@
---
description: Analyze A/B test results — statistical significance, sample size validation, and ship/extend/stop recommendations
argument-hint: "<test results as data, screenshot, or description>"
---
# /analyze-test -- A/B Test Analysis
Evaluate experiment results with statistical rigor and translate findings into a clear product decision: ship, extend, or stop.
## Invocation
```
/analyze-test Control: 4.2% conversion (n=5000), Variant: 4.8% conversion (n=5100)
/analyze-test [upload a CSV of test results]
/analyze-test [screenshot from your experimentation platform]
```
## Workflow
### Step 1: Accept Test Data
Accept in any format:
- Summary statistics (conversion rates, sample sizes per variant)
- Raw event data (CSV with user_id, variant, converted, timestamp)
- Screenshot from an experimentation platform (Optimizely, LaunchDarkly, etc.)
- Description of the experiment and results
### Step 2: Validate Test Design
Before analyzing results, check:
- Was sample size sufficient? (run a power analysis)
- Was the test run long enough? (capture weekly cycles, minimum 1-2 business cycles)
- Was randomization clean? (check for sample ratio mismatch)
- Were there any external factors during the test period?
Flag issues if found — results from a flawed test can be misleading.
### Step 3: Analyze Results
Apply the **ab-test-analysis** skill:
- **Statistical significance**: Calculate p-value and confidence interval
- **Effect size**: Absolute and relative difference between variants
- **Practical significance**: Is the effect large enough to matter for the business?
- **Confidence interval**: What's the range of plausible true effects?
- **Segment analysis**: If data allows, check for differential effects by user segment
### Step 4: Generate Analysis
```
## A/B Test Analysis: [Test Name]
**Date**: [today]
**Test duration**: [X days/weeks]
**Total sample**: [N users]
### Results Summary
| Variant | Sample | Metric | Rate | 95% CI |
|---------|--------|--------|------|--------|
| Control | [n] | [metric] | [X%] | [X% - Y%] |
| Variant | [n] | [metric] | [X%] | [X% - Y%] |
### Statistical Analysis
- **Relative lift**: [+X%] ([CI range])
- **P-value**: [X]
- **Statistically significant**: [Yes/No] at 95% confidence
- **Minimum detectable effect**: [X%] (what the test was powered to detect)
### Sample Size Check
- **Required sample**: [N] per variant (for [X%] MDE at 80% power)
- **Actual sample**: [N] per variant
- **Verdict**: [Sufficiently powered / Underpowered / Overpowered]
### Decision
**Recommendation: [SHIP / EXTEND / STOP]**
[Clear explanation of why, considering both statistical and practical significance]
### Business Impact Estimate
If shipped to 100% of users:
- **Expected impact**: [metric change per month/quarter]
- **Revenue impact**: [if applicable]
- **Confidence**: [How certain we are about this estimate]
### Caveats
- [Any concerns about the test validity]
- [Segments where results differ]
- [Novelty effects or other biases to consider]
### Follow-Up
- [What to test next based on learnings]
- [Monitoring plan if shipping the variant]
```
### Step 5: Offer Next Steps
- "Want me to **design a follow-up experiment** based on these findings?"
- "Should I **run the analysis for specific segments**?"
- "Want me to **generate the SQL** to monitor this metric post-launch?"
## Notes
- Statistical significance ≠ practical significance — a 0.1% lift can be significant with enough data but not worth shipping
- Always check for sample ratio mismatch before trusting results
- Novelty effects can inflate short-term results — recommend monitoring for 2-4 weeks post-launch
- If the test is underpowered, the right answer is usually "extend" not "no effect"
- For revenue metrics, use confidence intervals to estimate best-case and worst-case business impact
- If data is provided as CSV, generate the full analysis using Python with scipy.stats
+84
View File
@@ -0,0 +1,84 @@
---
description: Generate SQL queries from natural language — supports BigQuery, PostgreSQL, MySQL, and more
argument-hint: "<what you want to know, in plain English>"
---
# /write-query -- SQL Query Generator
Describe what data you need in plain English and get an optimized SQL query. Supports multiple dialects and can read your schema from uploaded files.
## Invocation
```
/write-query Show me daily active users for the last 30 days, broken down by plan tier
/write-query Find users who signed up last month but never completed onboarding
/write-query [upload a schema diagram] What's the conversion rate from trial to paid by cohort?
```
## Workflow
### Step 1: Understand the Question
Parse the user's natural language request to identify:
- What data is being requested (metrics, dimensions, filters)
- Time range and granularity
- Grouping and ordering preferences
- Output expectations (raw data, aggregated, ranked)
### Step 2: Determine Schema
If a schema is available (uploaded diagram, DDL, or description):
- Map the request to specific tables and columns
- Identify necessary joins
If no schema is provided:
- Ask for the database type (BigQuery, PostgreSQL, MySQL, etc.)
- Infer a reasonable schema from the question and ask the user to confirm
- Use common SaaS data model conventions as defaults
### Step 3: Generate Query
Apply the **sql-queries** skill:
- Write the SQL query in the correct dialect
- Optimize for readability and performance
- Include comments explaining key logic
- Add CTEs for complex queries to improve readability
- Handle edge cases (NULLs, timezone considerations, duplicate handling)
### Step 4: Present and Iterate
```
## SQL Query: [What It Does]
**Dialect**: [BigQuery / PostgreSQL / MySQL / etc.]
**Tables used**: [list]
### Query
[SQL code block with comments]
### What This Returns
[Description of the output: columns, rows, expected result shape]
### Assumptions
- [Schema assumptions made]
- [Business logic assumptions]
### Notes
- [Performance considerations for large datasets]
- [Edge cases handled or flagged]
```
Offer:
- "Want me to **modify this** — add filters, change grouping, extend the time range?"
- "Should I **create a companion query** for a related metric?"
- "Want me to **build a dashboard** around this query?"
- "Need a **cohort analysis** version of this?"
## Notes
- Always include comments in the SQL — PMs share queries with analysts who need to understand intent
- Default to readable over clever — CTEs over nested subqueries
- Flag queries that might be slow on large datasets and suggest optimization
- If the request is ambiguous (e.g., "active users"), ask the user to define the metric precisely
- Offer to generate the query in multiple dialects if the user is unsure which database they're using
@@ -0,0 +1,82 @@
---
name: ab-test-analysis
description: "Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and actionable recommendations. Use when evaluating experiment results, checking if a test is significant, interpreting A/B test data, or deciding whether to ship a variant. Triggers: A/B test, AB test, experiment results, statistical significance, test analysis, split test, which variant won."
---
## A/B Test Analysis
Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
### Context
You are analyzing A/B test results for **$ARGUMENTS**.
If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.
### Instructions
1. **Understand the experiment**:
- What was the hypothesis?
- What was changed (the variant)?
- What is the primary metric? Any guardrail metrics?
- How long did the test run?
- What is the traffic split?
2. **Validate the test setup**:
- **Sample size**: Is the sample large enough for the expected effect size?
- Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
- Flag if the test is underpowered (<80% power)
- **Duration**: Did the test run for at least 1-2 full business cycles?
- **Randomization**: Any evidence of sample ratio mismatch (SRM)?
- **Novelty/primacy effects**: Was there enough time to wash out initial behavior changes?
3. **Calculate statistical significance**:
- **Conversion rate** for control and variant
- **Relative lift**: (variant - control) / control × 100
- **p-value**: Using a two-tailed z-test or chi-squared test
- **Confidence interval**: 95% CI for the difference
- **Statistical significance**: Is p < 0.05?
- **Practical significance**: Is the lift meaningful for the business?
If the user provides raw data, generate and run a Python script to calculate these.
4. **Check guardrail metrics**:
- Did any guardrail metrics (revenue, engagement, page load time) degrade?
- A winning primary metric with degraded guardrails may not be a true win
5. **Interpret results**:
| Outcome | Recommendation |
|---|---|
| Significant positive lift, no guardrail issues | **Ship it** — roll out to 100% |
| Significant positive lift, guardrail concerns | **Investigate** — understand trade-offs before shipping |
| Not significant, positive trend | **Extend the test** — need more data or larger effect |
| Not significant, flat | **Stop the test** — no meaningful difference detected |
| Significant negative lift | **Don't ship** — revert to control, analyze why |
6. **Provide the analysis summary**:
```
## A/B Test Results: [Test Name]
**Hypothesis**: [What we expected]
**Duration**: [X days] | **Sample**: [N control / M variant]
| Metric | Control | Variant | Lift | p-value | Significant? |
|---|---|---|---|---|---|
| [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
| [Guardrail] | ... | ... | ... | ... | ... |
**Recommendation**: [Ship / Extend / Stop / Investigate]
**Reasoning**: [Why]
**Next steps**: [What to do]
```
Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.
---
### Further Reading
- [A/B Testing 101 + Examples](https://www.productcompass.pm/p/ab-testing-101-for-pms)
- [Testing Product Ideas: The Ultimate Validation Experiments Library](https://www.productcompass.pm/p/the-ultimate-experiments-library)
- [Are You Tracking the Right Metrics?](https://www.productcompass.pm/p/are-you-tracking-the-right-metrics)
@@ -0,0 +1,114 @@
---
name: cohort-analysis
description: "Perform cohort analysis on user engagement data. Identifies retention patterns, feature usage trends, and suggests qualitative follow-up research. Use when analyzing user retention by cohort, studying feature adoption over time, or investigating engagement patterns. Triggers: cohort analysis, retention analysis, user cohorts, engagement trends, cohort data."
---
# Cohort Analysis & Retention Explorer
## Purpose
Analyze user engagement and retention patterns by cohort to identify trends in user behavior, feature adoption, and long-term engagement. Combine quantitative insights with qualitative research recommendations.
## How It Works
### Step 1: Read and Validate Your Data
- Accept CSV, Excel, or JSON data files with user cohort information
- Verify data structure: cohort identifier, time periods, engagement metrics
- Check for missing values and data quality issues
- Summarize key statistics (cohort sizes, date ranges, metrics available)
### Step 2: Generate Quantitative Analysis
- Calculate cohort retention rates and engagement trends
- Identify retention curves, drop-off patterns, and anomalies
- Compute feature adoption rates across cohorts
- Calculate month-over-month or period-over-period changes
- Generate Python analysis scripts using pandas and numpy if requested
### Step 3: Create Visualizations
- Generate retention heatmaps (cohorts vs. time periods)
- Create line charts showing cohort progression
- Build comparison charts for feature adoption
- Visualize drop-off points and engagement trends
- Output as interactive charts or static images
### Step 4: Identify Insights & Patterns
- Spot one or more significant patterns:
- Early churn in specific cohorts
- Late-stage engagement changes
- Feature adoption clusters
- Seasonal or temporal trends
- Highlight surprising findings and deviations
- Compare cohort performance to establish baselines
### Step 5: Suggest Follow-Up Research
- Recommend qualitative research methods:
- Targeted user interviews with churning users
- Feature usage surveys with engaged cohorts
- Session replays of key interaction patterns
- Win/loss analysis for high vs. low retention cohorts
- Design follow-up quantitative studies
- Suggest A/B tests or feature experiments
## Usage Examples
**Example 1: Upload CSV Data**
```
Upload cohort_engagement.csv with columns: cohort_month, weeks_active,
user_id, feature_x_usage, engagement_score
Request: "Analyze retention patterns and identify why Q4 2025 cohorts
underperform compared to Q3"
```
**Example 2: Describe Data Format**
```
"I have monthly user cohorts from Jan-Dec 2025. Each row shows:
cohort date, user ID, purchase frequency, and support tickets.
Analyze which cohorts show best long-term retention."
```
**Example 3: Feature Adoption Analysis**
```
Upload feature_usage.xlsx with cohort adoption data.
Request: "Compare adoption curves for our new feature across cohorts.
Which cohorts adopted fastest? Any patterns?"
```
## Key Capabilities
- **Data Reading**: Import CSV, Excel, JSON, SQL query results
- **Retention Analysis**: Calculate and visualize retention rates over time
- **Cohort Comparison**: Compare metrics across cohort groups
- **Anomaly Detection**: Flag unusual patterns or drop-offs
- **Python Scripts**: Generate reusable analysis code for ongoing analysis
- **Visualizations**: Create heatmaps, charts, and interactive dashboards
- **Research Design**: Suggest targeted follow-up studies and interview approaches
- **Statistical Summary**: Provide quantitative metrics and correlation analysis
## Tips for Best Results
1. **Include time dimension**: Provide data across multiple time periods
2. **Define cohort clearly**: Make cohort grouping explicit (signup month, feature launch date, etc.)
3. **Provide context**: Explain product changes, launches, or events during the period
4. **Multiple metrics**: Include retention, engagement, feature usage, revenue, etc.
5. **Sufficient data**: At least 3-4 cohorts for meaningful pattern identification
6. **Request specific output**: Ask for visualizations, Python scripts, or research recommendations
## Output Format
You'll receive:
- **Data Summary**: Cohort overview and data quality assessment
- **Quantitative Findings**: Key metrics, retention rates, and trend analysis
- **Visualizations**: Charts showing retention curves, adoption patterns
- **Pattern Identification**: 2-3 significant insights from the data
- **Research Recommendations**: Specific qualitative and quantitative follow-ups
- **Analysis Scripts** (if requested): Python code for reproducible analysis
- **Next Steps**: Prioritized actions based on findings
---
### Further Reading
- [Cohort Analysis 101: How to Reduce Churn and Make Better Product Decisions](https://www.productcompass.pm/p/cohort-analysis)
- [The Product Analytics Playbook: AARRR, HEART, Cohorts & Funnels for PMs](https://www.productcompass.pm/p/the-product-analytics-playbook-aarrr)
- [Are You Tracking the Right Metrics?](https://www.productcompass.pm/p/are-you-tracking-the-right-metrics)
@@ -0,0 +1,87 @@
---
name: sql-queries
description: "Generate SQL queries from natural language descriptions. Supports BigQuery, PostgreSQL, MySQL, and other SQL dialects. Reads database schemas from uploaded diagrams or documentation. Use when writing SQL queries, analyzing databases, building reports, or exploring data. Triggers: SQL query, write SQL, database query, BigQuery, data report, generate query."
---
# SQL Query Generator
## Purpose
Transform natural language requirements into optimized SQL queries across multiple database platforms. This skill helps product managers, analysts, and engineers generate accurate queries without manual syntax work.
## How It Works
### Step 1: Understand Your Database Schema
- If you provide a schema file (SQL, documentation, or diagram description), I will read and analyze it
- Extract table names, column definitions, data types, and relationships
- Identify primary keys, foreign keys, and indexing strategies
### Step 2: Process Your Request
- Clarify the exact data you need to retrieve or analyze
- Confirm the SQL dialect (BigQuery, PostgreSQL, MySQL, Snowflake, etc.)
- Ask for any additional requirements (filters, aggregations, sorting)
### Step 3: Generate Optimized Query
- Write efficient SQL that leverages your database structure
- Include comments explaining complex logic
- Add performance considerations for large datasets
- Provide alternative approaches if applicable
### Step 4: Explain and Test
- Explain the query logic in plain English
- Suggest how to test or validate results
- Offer tips for performance optimization
- If you want, generate a test script or sample data
## Usage Examples
**Example 1: Query from Schema File**
```
Upload your database_schema.sql file and say:
"Generate a query to find users who signed up in the last 30 days
and had at least 5 active sessions"
```
**Example 2: Query from Diagram Description**
```
"Here's my database: Users table (id, email, created_at), Sessions table
(id, user_id, timestamp, duration). Generate a query for average session
duration per user in January 2026."
```
**Example 3: Complex Analysis Query**
```
"Create a BigQuery query to analyze our revenue by region and customer tier,
including year-over-year growth rates."
```
## Key Capabilities
- **Multi-Dialect Support**: Works with BigQuery, PostgreSQL, MySQL, Snowflake, SQL Server
- **File Reading**: Reads schema files, SQL dumps, and data documentation
- **Query Optimization**: Suggests indexes, partitioning, and performance improvements
- **Explanation**: Breaks down queries for learning and documentation
- **Testing**: Can generate test queries and sample data scripts
- **Script Execution**: Create executable SQL scripts for your database
## Tips for Best Results
1. **Provide context**: Share your database schema or structure
2. **Be specific**: Clearly describe what data you need and any filters
3. **Mention database**: Specify which SQL dialect you're using
4. **Include constraints**: Mention data volume, time ranges, and performance needs
5. **Request format**: Ask for the query result format if you need specific output
## Output Format
You'll receive:
- **SQL Query**: Production-ready SQL code with comments
- **Explanation**: What the query does and how it works
- **Performance Notes**: Optimization tips and considerations
- **Test Script** (if requested): Sample data and validation queries
---
### Further Reading
- [The Product Analytics Playbook: AARRR, HEART, Cohorts & Funnels for PMs](https://www.productcompass.pm/p/the-product-analytics-playbook-aarrr)
- [How to Become a Technology-Literate PM](https://www.productcompass.pm/p/how-to-become-a-technology-literate)