All Projects
FEATURED

Customer Lifetime Value Prediction System

Jan - Mar 2026
PythonPandasNumPyScikit-learnLightGBMXGBoost

Overview

Overview

This project builds a production-grade machine learning system to predict Customer Lifetime Value (CLTV) in the insurance domain.

The system processes structured customer data, engineers domain-specific features, and trains a stacked ensemble model (LightGBM, XGBoost, CatBoost with Ridge stacking) to generate high-quality predictions.

It is designed not just as a model, but as a complete end-to-end ML pipeline including preprocessing, feature engineering, training, evaluation, inference, and reporting.

Problem Statement

Insurance companies struggle to identify which customers generate the most long-term value. Without CLTV prediction:

  • Marketing budgets are allocated inefficiently across low-value customers
  • High-value customers churn without targeted intervention
  • Pricing and underwriting decisions ignore long-term profitability

This results in wasted acquisition spend, preventable revenue loss, and suboptimal business strategy.

The core problem is the lack of a data-driven system to quantify and predict customer lifetime value at an individual level.

Results and Impact

The system achieved an R² score of 0.1605 using a stacked ensemble approach, delivering a 43% performance improvement over individual base models.

Business Impact

  • Enables customer segmentation into value tiers (Top 10%, mid-tier, low-value)
  • Improves marketing ROI by enabling value-based targeting (estimated +15–25%)
  • Reduces high-value customer churn through proactive retention strategies (10–20% potential reduction)
  • Increases cross-sell conversion by identifying high-potential customers (20–35% improvement)
  • Optimizes acquisition strategies by evaluating channels based on customer value, not volume

Key Insight

Multi-policy customers generate 2.4× higher lifetime value than single-policy customers, providing a clear direction for retention and upselling strategies.

Key Highlights

  • Leakage-safe cross-validation pipeline
  • Advanced feature engineering (22 derived features)
  • Multi-model ensemble with Ridge stacking
  • Config-driven and reproducible system
  • Business-focused outputs for decision-making

This project demonstrates a real-world ML system that bridges technical modeling and business impact.

Customer Lifetime Value Prediction System | Abir Barman