All Projects
FEATURED

Cohort Analysis: Customer Retention & Revenue Analytics

Oct - Dec 2025
PythonpandasNumPymatplotlibseabornSciPy

Overview

An end-to-end customer intelligence pipeline that transforms 286K raw e-commerce transactions into actionable business insights.

What it does:

  • Segments customers into 12 monthly cohorts based on first purchase date
  • Computes 3 distinct retention metrics (Activity Rate, Classic Retention, Rolling Retention)
  • Calculates Customer Lifetime Value per cohort
  • Applies Wilson confidence intervals and IQR outlier detection
  • Generates 10 publication-quality visualizations with a professional dark theme

Key corrections over the original dataset:

  • Filtered out 50.9% invalid orders (canceled/refunded)
  • Fixed cohort definition from account-creation date to first-purchase date
  • Derived true quantity using formula verification instead of arbitrary subtraction
  • Stripped 14 PII columns (SSN, names, emails, phone numbers)

Architecture: Modular Python pipeline with 5 source modules, CLI orchestrator, and structured logging. Runs end-to-end in ~10 seconds.

Cohort Analysis: Customer Retention & Revenue Analytics | Abir Barman