Overview
Overview
This project builds a production-grade machine learning system to predict Customer Lifetime Value (CLTV) in the insurance domain.
The system processes structured customer data, engineers domain-specific features, and trains a stacked ensemble model (LightGBM, XGBoost, CatBoost with Ridge stacking) to generate high-quality predictions.
It is designed not just as a model, but as a complete end-to-end ML pipeline including preprocessing, feature engineering, training, evaluation, inference, and reporting.
Problem Statement
Insurance companies struggle to identify which customers generate the most long-term value. Without CLTV prediction:
- Marketing budgets are allocated inefficiently across low-value customers
- High-value customers churn without targeted intervention
- Pricing and underwriting decisions ignore long-term profitability
This results in wasted acquisition spend, preventable revenue loss, and suboptimal business strategy.
The core problem is the lack of a data-driven system to quantify and predict customer lifetime value at an individual level.
Results and Impact
The system achieved an R² score of 0.1605 using a stacked ensemble approach, delivering a 43% performance improvement over individual base models.
Business Impact
- Enables customer segmentation into value tiers (Top 10%, mid-tier, low-value)
- Improves marketing ROI by enabling value-based targeting (estimated +15–25%)
- Reduces high-value customer churn through proactive retention strategies (10–20% potential reduction)
- Increases cross-sell conversion by identifying high-potential customers (20–35% improvement)
- Optimizes acquisition strategies by evaluating channels based on customer value, not volume
Key Insight
Multi-policy customers generate 2.4× higher lifetime value than single-policy customers, providing a clear direction for retention and upselling strategies.
Key Highlights
- Leakage-safe cross-validation pipeline
- Advanced feature engineering (22 derived features)
- Multi-model ensemble with Ridge stacking
- Config-driven and reproducible system
- Business-focused outputs for decision-making
This project demonstrates a real-world ML system that bridges technical modeling and business impact.