← All projects

Deep Learning for Credit Card Fraud Detection

Comparing supervised, unsupervised, and hybrid models on the ULB dataset — with drift analysis and SHAP interpretability.

Logistic RegressionFeed-Forward NetworkIsolation ForestAutoencoderEnsembleChronological SplitConcept DriftClass ImbalanceBootstrap CISHAPPyTorchscikit-learnpandasmatplotlib
PR curves for all six models
PR curves for all six models with 95% bootstrap confidence intervals. ENS achieves the highest PR-AUC (0.772); unsupervised models fall near the no-skill baseline.

Abstract

Payment card fraud across the European Economic Area reached EUR 4.2 billion in 2024. This project evaluates six model configurations on the ULB dataset — logistic regression, feed-forward network, isolation forest, autoencoder, and two hybrid extensions — under a strict chronological split that simulates deployment conditions. PR-AUC is adopted as the primary metric given extreme class imbalance (1:578). The FFN ensemble achieves the highest PR-AUC (0.772) while remaining statistically indistinguishable from logistic regression, suggesting deep learning's advantages are operational rather than purely predictive.

Video

Video coming soon.

Live Demo

Interactive fraud detection system — submit a transaction to see the risk score, model decision, and SHAP feature breakdown in real time.

Highlights

SHAP feature importance
SHAP importance mirrors KS separability (ρ = +0.727), confirming the FFN learned genuinely discriminative features.
Error analysis
23% of fraud cases missed by all models simultaneously — a data-level floor no architecture can resolve alone.
Drift mitigation
Only score-level ensembling improved drift performance; sliding-window and time-weighted methods worsened due to data scarcity.

Results

Model Paradigm PR-AUC F1 Recall@P=0.9
LR Supervised ML 0.692 0.753 0.673
FFN Supervised DL 0.766 0.813 0.731
IF Unsupervised ML 0.054 0.133 0.000
AE Unsupervised DL 0.069 0.129 0.000
W1B Semi-supervised 0.770 0.821 0.750
ENS Ensemble 0.772 0.821 0.750

Test set: n = 42,722, 52 fraud cases. Threshold targets Precision ≥ 0.90.

Read More

Full analysis with all 16 figures, drift experiments, and SHAP deep-dive:

Read the analysis note →