Is My Data Safe to Use in a Credit Model?

Leland Burns & Jim McGuire
Nov 10
3 min read

Most lenders we work with already have a sizable pile of data—application fields, bank data, bureaus, device signals, platform behavior. When building a credit model, not all data is good data. Some data actually carries long-term risk not just to your model, but to your entire business. Here’s a quick guide to evaluating your data.

Here’s a snapshot of what we often see on our first call with a lender:

Application data: Self-reported income, employer name, product type, purpose of loan
Device & behavioral data: Time-on-page, keyboard usage, app version, IP location
Bank transaction data: Direct from a link or aggregators like Plaid or Belvo
Credit bureau data: Traditional bureaus, alternative bureaus, internal repayment history
Derived features: Hand-engineered ratios, rollups, scoring attempts from prior efforts

The data varies slightly across geographies and product types, but the evaluation framework is consistent. You must examine each feature for the following three issues.

1. Leakage

Data leakage is when your model has access to information that wouldn’t be available at the time of the decision, or worse, is accidentally reflecting the outcome itself. And it’s everywhere.

Some of the most common leaky variables we’ve seen:

Bank transaction features based on data pulled after approval
Fraud scores or “internal flags” added post-decision but merged into training data
Roll-up fields like “total charged off balance” that include future behavior
Fields created by analysts for a quick experiment, then never stripped out

Leakage isn’t always obvious. Some fields are “technically” available at decision time but are contaminated by manual processes, reverse causality, or feedback loops. We’ve had clients come to us with models that looked excellent in validation but completely collapsed in production. In nearly every case, leaky inputs were to blame.

Catching leakage early saves months of rework. Our approach involves structured tests that check for leakage throughout the development process:

Compare raw vs derived features
Cross-reference time stamps and data creation events
Look for suspiciously strong univariate predictors
Rebuild candidate variables using only features provably available at decision time

2. Stability

Even clean, non-leaky features can break down if they’re not stable. Some features have high predictive power—but only for one channel, one segment, or one month. Why?

Availability varies by channel (e.g., mobile app vs call center)
Null rates spike over time due to upstream system changes
Means shift when you expand into a new geography or customer base
Segment risk curves flip (a top-decile risk group becomes middle of the pack)

A favorite example: one client had a feature measuring “device change since last login.” It seemed predictive, but broke completely when users started switching devices more often. The model degraded, and no one understood why.

To avoid surprises in production, we stress-test features using:

Availability audits (what % of time is this field usable?)
Population Stability Index (PSI)
Segmented risk plots
Holdout testing across cohorts and vintages

We also tag all features as “Robust”, “Use with caution”, or “Do not use.”

3. Compliance

Even if your features pass the statistical and operational tests, they might still introduce legal risk.

Regulators care about whether features could act as proxies for protected characteristics, whether you can explain the input and justify its impact on decisions, and whether you’ve disclosed its use in your privacy policy or T&Cs.

Examples we’ve helped clients evaluate:

Geolocation from IP address: Legal in some places, high risk in others
Device metadata (e.g., screen size, battery life): Commonly used, often unexamined
Education level or employment industry: Sometimes helpful, often correlated with race or income

We flag high-risk features early and—when needed—help clients run bias and fairness audits, draft explanations for compliance review, map inputs to adverse action codes, and engage regulators proactively

What Happens When You Get This Right?

Done well, these checks don’t slow down the process—they speed it up and de-risk your launch.

Some recent examples:

A Latin American lender avoided a major rollout failure when we spotted that their bureau inputs were contaminated with post-decision updates. We rebuilt their training set and saved them from deploying a model that would have collapsed on day one.
A U.S. fintech had two high-performing features that flagged device changes. We identified one as leaky (based on post-decision behavior) and the other as reliable. They kept performance—and cut the risk.
An early-stage startup had beautiful model validation results. We showed them that 25% of their features wouldn’t be available in production. They rebuilt the model—and their trust in the process grew.

In lending, data has to be more than just predictive. It must be safe, stable, and explainable. At Ensemblex, we help lenders maximize the value of their data without headaches.