Overview
An end-to-end customer intelligence pipeline that transforms 286K raw e-commerce transactions into actionable business insights.
What it does:
- Segments customers into 12 monthly cohorts based on first purchase date
- Computes 3 distinct retention metrics (Activity Rate, Classic Retention, Rolling Retention)
- Calculates Customer Lifetime Value per cohort
- Applies Wilson confidence intervals and IQR outlier detection
- Generates 10 publication-quality visualizations with a professional dark theme
Key corrections over the original dataset:
- Filtered out 50.9% invalid orders (canceled/refunded)
- Fixed cohort definition from account-creation date to first-purchase date
- Derived true quantity using formula verification instead of arbitrary subtraction
- Stripped 14 PII columns (SSN, names, emails, phone numbers)
Architecture: Modular Python pipeline with 5 source modules, CLI orchestrator, and structured logging. Runs end-to-end in ~10 seconds.