Are You Overfitting to a Weird Economic Period?

Leland Burns & Jim McGuire
Apr 27
4 min read

It’s a question we hear often during model builds:

“Are we overfitting to a weird period in the data?”

Sometimes the concern is macro — COVID, stimulus, rate shocks.

Other times it’s more internal — a change in product, channel, or geography.

Either way, the underlying issue is the same:

Does the data we trained on actually reflect the environment we’re about to operate in?

That’s not always an easy question to answer. But it’s one you have to ask.

What “Overfitting” Means in This Context

In a textbook sense, overfitting means a model performs better on its training data than it does on unseen data.

But in credit modeling, the more subtle version shows up over time.

A model can:

Perform well on training data
Perform well on in-time test data
Still degrade in production

Not just in AUC — but in business outcomes:

Approval strategies miss targets
Loss rates drift higher than expected
Pricing assumptions break down

That’s often a sign the model has learned patterns tied to a specific period — not patterns that generalize.

Why This Happens More Than You Think

Time-based overfitting isn’t just about macro shocks. It can come from a wide range of changes.

1. Macro and Market Conditions

COVID is the obvious example:

Stimulus programs changed borrower behavior
Lender policies shifted dramatically
Bureau data itself behaved differently

Interest rate changes can have similar second-order effects:

Different borrower selection into products
Changes in repayment behavior
Shifts in underlying risk

Even robust datasets can behave differently under these conditions.

2. Changes in Your Own Business

Sometimes the bigger risk isn’t the economy — it’s you.

Common examples:

Expanding into new geographies
Adding new retail or channel partners
Launching new products
Changing underwriting or marketing strategy

In one case we saw, a lender had strong performance with a narrow set of retail partners — but struggled as they expanded. The original model had effectively learned a very specific customer profile that didn’t generalize.

That’s not a modeling bug. It’s a data reality.

There’s No Perfect Fix — But There Is a Process

One of the most important things to acknowledge:

There’s rarely a clean solution.

You’re almost always balancing tradeoffs:

More data vs more relevant data
Longer history vs cleaner regimes
Stability vs recency

The goal isn’t perfection. It’s awareness and control.

What We Do to Guard Against It

1. Out-of-Time Validation

This is the most important check.

We hold out the most recent portion of data — and don’t touch it during model development. That dataset becomes a proxy for “tomorrow.”

If performance drops meaningfully there, it’s a clear signal something isn’t generalizing.

2. Feature-Level Stability Checks

We don’t just evaluate the model — we evaluate the inputs.

For each key feature, we ask:

Is the distribution stable over time?
Does its relationship with risk hold?
Are reporting definitions changing?

A simple example: bureau inquiry counts have changed meaning over time due to reporting shifts. A feature that once signaled risk may gradually lose that meaning.

3. Segment-Level Performance

We test models across meaningful slices:

Channels
Products
Geographies
Customer types

A model that looks strong overall can break down in specific segments — often where the business is evolving.

4. Baseline Comparisons

We almost always benchmark against a stable reference (e.g., a bureau score).

Not because it’s perfect — but because it provides context.

If both the baseline and new model shift similarly over time, the issue may be the data environment.

If only the new model degrades, that’s a stronger signal of overfit.

5. Targeted Diagnostics (When Needed)

When we suspect a specific issue — like COVID-era distortion — we go deeper:

Compare feature importance across time periods
Use tools like SHAP to see what’s driving predictions
Analyze how relationships change between “normal” and “disrupted” periods

This helps distinguish between:

A model that’s broken
And a world that’s changed

Real-World Tradeoffs

Example: Modeling Through COVID

In one indirect auto project, we had no clean way to avoid COVID-era data:

Going further back introduced outdated dynamics
Excluding the period reduced sample size and relevance

So we included it — but with heavy scrutiny:

Validated across pre- and post-COVID periods
Stress-tested assumptions
Built in conservative guardrails

The model held up — but only because we treated the data with caution, not blind trust.

Example: Expanding a Business Beyond Its Roots

In another case, a lender expanding into new geographies saw performance deteriorate quickly.

Their model wasn’t wrong — it was just trained on a narrow, highly specific customer base.

We adjusted:

Training strategy
Policy design
Segmentation approach

The fix wasn’t just technical. It required acknowledging that the future didn’t look like the past.

Final Thoughts

Overfitting to a time period isn’t always obvious. And it’s not always avoidable.

But it is manageable — if you approach model development with the right mindset:

Be skeptical of in-sample performance
Validate against the future, not just the past
Understand your data, not just your metrics
And most importantly, align the model with how the business is evolving

Because the real risk isn’t that your model is “wrong.”

It’s that it’s perfectly tuned to a world that no longer exists.