What Underwriting Approach Is Right for My Business: Expert System, Decision Tree, Logistic Regression, or Gradient Boost?
- Brandon Homuth
- Jun 2
- 4 min read
Choosing the right underwriting model is one of the most strategic decisions a lender makes. The right approach depends on many factors: your company's data maturity, lending objectives, tolerance for complexity, need for explainability, and regulatory expectations in your market.
Let’s explore four common modeling techniques—expert systems, decision trees, logistic regression, and gradient boost models—and when each might be right for your business.
1. Expert Systems: When You’re Just Getting Started
Best for: Pre-market or early-stage lenders with little to no relevant data.
Expert systems are rule-based models—think "if this, then that." They rely on domain expertise rather than data to define a sequence of conditions that must be met to reach a final decision. Jim McGuire, Head Data Scientist at EnsembleX, explains:
“The most appropriate point for doing an expert system is when you have little to no data relevant to what you're trying to make a decision on… It’s good for putting intuitive guardrails on your lending.”
Expert systems provide a structured, explainable way to start lending responsibly. However, as your loan book grows, expert systems will begin to limit your business. In particular, a system’s blunt rules give sub-optimal predictions and bias the data you have available for training follow-on models.
Key risk: Expert systems make hard cuts on your applicant pool, which can bias future model training data.
Think of the rules as guardrails that should be gradually relaxed over time as you collect more data.
2. Decision Trees: The Analytical Bridge
Best for: Lenders beginning to collect data but not yet ready for production-grade statistical models.
Decision trees look like expert systems, but they’re built from data. They split applicants into branches based on statistically meaningful patterns. Think of a decision tree as a very rudimentary modeling technique that can bolster your expert system as you evolve toward more sophisticated models.
Guideline: Begin using trees analytically once you’ve collected a few thousand loan records, including at least a few hundred defaults. This helps improve your rules-based system and may surface additional insights.
3. Logistic Regression: The Workhorse of Traditional Credit Modeling
Best for: Mid-stage lenders with growing data sets.
Logistic regression models have long been a cornerstone of underwriting. They assign weights (coefficients) to input variables like income or credit score to calculate the probability of default. These models are easy to explain and monitor, a plus in highly regulated industries. They're often distilled down into even simpler “scorecards” for analytical simplicity and ease of deployment. However, logistic models assume linear relationships between variables and can struggle with non-linearities or interactions in your data. For established lenders with rich data, logistic models typically underperform more sophisticated AI methods.
When to use it: Once you have several thousand records, logistic regression can offer an interpretable, production-ready model.
4. Gradient Boosted Models (GBMs): The High-Power Tool for Mature Lenders
Best for: Data-rich lenders who want the most predictive power and are ready to manage the complexity.
Gradient boosted models (like XGBoost) are powerful ensemble machine learning algorithms. They build successive decision trees to correct the errors of prior ones, improving prediction accuracy. This power comes at a cost: complexity and risk of overfitting.
“Gradient boost is a powerful tool like a chainsaw,” Ensemblex co-founder Shawn Budde notes. “If you know what you're doing, you can cut the tree down just right. If not, you can cut your arm off.”
But used correctly, with careful hyperparameter tuning, curated feature sets, clean training/validation splits, variable monotonicity constraints, and explainability tools like SHAP, GBMs can outperform simpler models by a wide margin.
Regulatory tip: GBMs are increasingly accepted by regulators, especially when paired with explainability tools. SHAP is not a fringe technique; it’s a mainstream expectation in many jurisdictions.
What About Combining Models?
At Ensemblex, we find great value in combining multiple submodels. We can build models for various drivers of profitability: a default model, an early repayment model, a propensity to repeat model, etc., and combine them into a single decision that gives a more nuanced prediction. We even find that combining different algorithms on the same target outcome, such as Logistic Regression and GBM, can improve on the predictions of either model alone.
Ensembling helps you get the most out of your data when you have multiple data sources that vary in risk profiles, stability levels, and coverage. By building separate submodels, you can isolate risks, easily adapt and iterate, and use the full rowspace of each dataset.
We're such fans of ensembling, we named ourselves Ensemblex!
The Bottom Line
Model | Best For | Data Needed | Strength | Watch Out For |
Expert System | New lenders | None | Easy to build, explainable | Hard-coded, may bias data |
Decision Tree | Early analytics | ~1K–5K rows | Tests expert rules | Not suitable for production |
Logistic Regression | Scaling lenders | 5K+ rows, 500+ bads | Simple, explainable | Can’t model complex interactions easily |
Gradient Boosted Trees | Mature lenders | 10K+ rows | High predictive power | Needs expertise, risk of overfitting |
Ensemble | Sophisticated use cases | Varies | Stability + power | More development and data required |