S1E8

Beyond Algorithms: Tips for Building Excellent AI Models

October 23, 2024

Please, don’t ask ChatGPT to build your credit model. Using machine learning (ML) and AI well is still a struggle for many lenders. Underneath all the buzz about AI, alternative data, and ever-smarter models, there's a lot of hard work and prudence required to implement these techniques safely and to manage the resulting models.

Josh Campos, Head of Enterprise Data Science, AI & ML Engineering at DriveTime, is an expert on just that. He joins Shawn to discuss the practical side of ML and AI in credit risk management. They touch on best practices for data science teams, monitoring and updating models, using alternative data, and why LLMs should come with a warning on the label.

Find Josh and Ensemblex on LinkedIn. Hosted by: Shawn Budde

Guest: Josh Campos

Produced by: Meagan LeBlanc

Theme Music by: Brad Frank

Transcript

Shawn

Hello, this is Shawn Budde of Ensemblex, and this is The Ensemblex Exchange Podcast. Today I'm talking with Josh Campos of DriveTime about starting a lending business from zero and the use of AI models in underwriting.

Josh is a senior leader with over 15 years of experience in data science, machine learning, and AI, currently serving as the head of enterprise data science at DriveTime, Bridgecrest, and GoFi. He has built top tier data science organizations, ML, engineering frameworks, architecture, and numerous models for fraud, credit risk, analytics, and pricing.

Most recently, Josh has been leading the efforts to build an AI framework for the development and implementation of generative AI technologies. Josh holds a BA in risk management, and an MSc in advanced data analytics from the University of North Texas. In 2022, Josh was named one of Amplify's future FinTech leaders and ones to watch by Money 2020. Thank you for joining me today.

Josh

Thank you. Glad to be here.

Shawn

So Josh, when we were working together, you had some land you guys used to, to kind of escape to, and as I recall, you're, you're really into hiking, backpacking. Can you tell us a little bit about that.

Josh

Yeah, I absolutely love the outdoors. That's like my biggest passion. Love going out there and, you know, getting some miles in. I've done a lot of backpacking before. So pretty much anywhere in the US, I've done a bunch of different hikes around the world. I'm actually super excited about a month and a half, my wife and I are doing a big hike in the Dolomites. So, about a hundred miles through the Dolomites and we're super excited about. So, that's me every time that, you know, like when I'm not working, I'm just trying to get out there and trying to put some miles in and enjoy the outdoors and nature.

Shawn

Yeah. The Dolomites, I don't even know if a lot of people know like they're, this is basically Northern Italy, right? Roughly on the border with probably Switzerland. It feels to me like there's just more and more people. It feels to me like a Machu Picchu 20 years ago. I don't think I'd ever heard of, and then some people went, and that's kind of the way I feel about the Dolomites, all of a sudden it seems to be a place that people are going to. So, I'll be curious to hear how it, how it went. So, before this trip, what's been maybe your coolest or most unusual trip that you've done so far?

Josh

Ooh, that's a tough one, but I do have one in the top five that I think I would absolutely do it all over again. So, we did Banff National Park. We did a place called the Sawtooth Range, which is like a back range in the park. So that was another one that was over a hundred miles. It was grueling. I mean, I'm talking like we saw snow, we saw rain, we saw sun and then everything in between like rock slides and going over, you know, ice filled mountain passes and bears. And I mean, it's like a, it's a wild place. and I think it was like, yeah, definitely like a top experience, you know, just because of so much wilderness that you get to see in just one place and one trip. So definitely would redo it and encourage anyone out there that's trying to do a trip to Banff definitely do the Sawtooth Range so it's beautiful.

Shawn

And you've been doing this your whole life?

Josh

I actually got into it after, you know, I want to say 2019. That's when I got into it.

Shawn

Oh, wow!

Josh

Yeah, we had a storm in Dallas that knocked the power out and we didn't have any food or water or electricity. So, it actually inspired me to like go out there and be like, we should be better prepared, right? Like should have some food, and so I was researching all of these things. I actually found that a lot of you know, emergency preparation gear also happens to be like backpacking gear. And I'm like, what is this backpacking thing? Right. And so as I got into it, I discovered, Hey, like, this is a whole culture. This is a whole thing that people do. We just went to it, went for it and went to a few places and we just absolutely fell in love with it. We've been doing it ever since.

Shawn

That's interesting. After I left my last job, before Ensemblex, I decided to go on a backpacking trip. It was, it was organized, so I didn't have to do all the logistics and, well I haven't backpacked since. So, I guess it's, we'll say it's a personal preference. I totally get that, but that's really cool to hear.

Josh

Awesome.

Shawn

Well, why don't we dive in? So, you, a couple of years ago, led the data science efforts to launch a completely new lender, GoFi. Can you tell us kind of a little bit about GoFi and what that experience was like?

Josh

Yeah, so starting a lender is like super challenging, especially from scratch. There's so many different things that you just don't know. So, you're having to make, guided assumptions and be smart about what you do. I think, I would say like in the data science realm, one of the hardest things is building, a legitimate team, a team that's strong. And that means like, finding the talent and retaining the talents, which can be actually quite challenging. When we actually did it, it was when the job market was incredibly hot. So, it was very challenging to actually find those individuals, but we did. Super, super excited about the people that we found. They're still with us and they're doing amazing things. So, I want to say we got lucky. Luck had a little bit of part of it, but also like, you know, looking for people that are engaged, that were passionate. So yeah, we have a pretty decent group of people.

Shawn

So, we were able to kind of help you in the construction of the initial underwriting model. And I'd love to hear about how do you get comfortable with a model built for a business that didn't exist, right, when you built the model and how did you get comfortable that you were going to be able to use it and rely on it?

Josh

Yeah, I think first I want to acknowledge, I mean, working with you was awesome. Truly. It was really cool. I mean, I remember like getting in there with Jim and Leland and just working through a lot of the assumptions, like iterating, like going through, you know, trying to understand, you know, what we were trying to do. I think the key that I would call out is when you don't know like, cause like a lot of lenders would then have their own data sets that they can look back to and say, okay, this is exactly what's happening.

For us, we didn't have that. So, we had to create assumptions about what we would see. And that was the key, and that's where a lot of the creativity from the team came into the picture. Being able to say, okay, these are our assumptions of the population that we would see, right? These are our types of consumers. And then being able to work backwards and say, how do we design the right data in order to be able to create a model to identify those people. So it's kind of funny, right, it's like a lot of lenders, especially whether they go on upstream, perhaps in a kind of segment of the credit spectrum that they haven't seen before, or maybe like a brand new market, like they usually struggle with that, but you're modeling, right? You're just trying to find like data that explains what you're seeing, right? So it's about being creative and how you actually find and create and design that cohort of people that you expect your model to also see when you actually go live. So, a lot of the time that we spent together was like finding that and being able to kind of refine those assumptions over time.

Shawn

Yeah, so Josh, I mean, I guess how closely were you able to assess the population you were likely to see through the door and maybe how have your assumptions changed from the development of this model, I think about two years ago until what you're seeing through the door at this point.

Josh

Yeah, I mean, we get, we were able to get pretty comfortable. you know, the credit spectrum is a credit spectrum. You know, if anyone's been in the space long enough, I think we've all tried like all kinds of different ways to cut it up and, and to define it, creating personas. I don't know how many times I've done that. And you, you usually find, really fun things that are interesting, but, you know, I will say, it always changes depending on the dimension that is like your business. So, if you're an auto versus if you're like an unsecured versus mortgage, it's all going to change. It's all going to, you know, depend on a specific tranche of that population. So, it's really important that if you're out there like trying to build something, you got to make sure that the thing's matching up to like what your business, you know, is like what you're actually trying to underwrite.

I know that sounds pretty obvious, but you'd be surprised, right? Like you'd be surprised, like sometimes a lot of models out there will have a lot of information that has no bearing with the outcome in your business. So being able to craft right, across all these different dimensions, like what the final data set is to any modeling exercises, I would argue probably the most important part of the whole thing, you know, fitting a model is fitting a model. But if you don't have like the right cohort of individuals that you're expecting to see, it just won't work.

One of the things that I strongly believe in is building systems, right? Especially when it comes to the credit spectrum, you usually see, okay, fine, like these traditional cuts where you have people that are prime, super prime, non-prime, whatever, right? But then there's the economy, that it keeps moving underneath. That increases affordability, decreases affordability. It creates credit crunches, creates the velocity of cash out there. There's so many factors behind this one measurement that we all use. So for me, when you are developing a model, you're not just thinking about it, at least for me, as just a single point of time, as I said.

You want to build actually something that is more of like a, you know, like a sequence, like a flow, right? That can update with the assumptions as the assumptions are changing. So we see all kinds of new, fun, behaviors emerging in the data all the time, right? I think a lot of people have seen a bunch of them come out recently. That's not new and it won't stop being the case. We'll continue to see those things come out. That's why I strongly believe in having models that, you know, update over time that you're able to understand the assumptions and you're able to have a good solid, you know, model risk management program behind the scenes that allows you to also review those assumptions, understand what's happening, inform the business about what's happening and be ready to go and deploy in that model.

Other things that I've seen is where that cycle is not quite as quick as you would like it to be. And that's where you find these situations. Like you see an emerging pattern. You're like, this is something, but now you got to send your data science team to go build the model. Well, by the time you, you do it and you fix it, it's probably already baked. Right? So for me, it's more about system building as opposed to like, figure out the assumptions today, you know, cause they'll, they'll get stale.

Shawn

Right, so creating the right environment ecosystem so that you're getting the feedback loop in there, you're tracking the model, population stability indexes. I think a lot of those things that we all know we should do, but frequently find organizations really don't bother, because it never feels like the most pressing thing.

So, these kinds of consumers might not be well represented on the bureau or have an accurate perspective. Can you give kind of some concept of how one approaches working with those kinds of consumers?

Josh

Yeah. This is probably like the most exciting part about modeling. I think like the getting creative part. And what I mean by that is what is modeling but finding the data that explains the variance of things, you know, like explains the outcome. And it's funny to me sometimes, right? Cause like, if you know that a certain behavior would not be precedent in the credit bureau, then there should be no expectation that any model could do anything to actually help you make that prediction. And so, the way that I think about it, I equate like bank data with, you know, all kinds of alternative data, right? Like if you look at your traditional credit bureaus, and there are so many wonderful companies out there, honestly, building this type of stuff. I know there's a few out there that I think they're doing great. There are others that actually like provide the data raw that you can actually use like for instance, like bank data information so you can make affordability assessments or you can also build your own. I've also done that in the past. I can tell you there's a massive tradeoff between building it yourself and buying it off the shelf. There are different slivers of the population and depending on where you are as a business, you almost have to make that determination.

If you're kind of like high up in the credit spectrum, you might actually get away with like off the shelf tools, right? You're going to be able to make a pretty good distinction between these two individuals. But if you're like non-prime, that's pretty tough, right? So you're actually going to be finding that you need to find like the very specific interactions between, you know, affordability and some of the things that we actually have registered in the credit bureau.

So all in all, like I want to say that it is definitely about finding the balance between the two and for sure, not just assuming that, “Hey, well, the model is just going to fit between these, what I know and what I don't know.” Because like that usually doesn't work. This is, this is where lenders can get in trouble. Like if they are not able to distinguish this, they're not able to find like the actual data that explains these outcomes. So, I actually think it's super valuable. I think if you're a lender, you should absolutely have like an alternative data program where you're evaluating both affordability as well as other things that are, you know, FCRA compliant, you know, features if it is a credit model. So that to me is like massive and must have in a credit risk management program.

Shawn

Yeah, so you mentioned affordability. Is that the old DTI or are you thinking about that more from a cash flow standpoint?

Josh

Well, it's, it's always both and everything, right? It's contextually to, you making a true assessment of the person's ability to, to repay the loan. Right. So, a hundred percent, like, cashflow modeling, there's providers that will give you the data raw. There are people that actually can build these things for you. You can build it yourself.

But I think it's a little bit deeper. I think we like traditionally we've always looked at measurements of like debt to income as like a good indicator. And that is true. That's also super important and holistically meaningful to the application. But, now being able to also assess true affordability and understand whether or not someone can even afford a certain payment, that's really, really important. So I think that's a program like every lender should have this. So, more call it to action, I guess, but absolutely it's super important to have this.

Shawn

Yeah. So, to switch gears a little bit, you know, Josh, in the time since we built this model, I don't know, you may have been aware of it, but everybody became aware of ChatGPT.

Josh

What's that?

Shawn

And, and so there's been a, yeah, well, maybe almost everybody has heard of ChatGPT. So there's a ton of interest in generative AI and, and, LLMs, large language models. So I'm curious, from your perspective, from the perspective of a lender, is this all hype? Or is there, do you think there's real value in the lending industry?

Josh

Yeah, that's a good question. I don't think…I keep hearing this like the hype, no hype. I don't think this is hype at all. Honestly, this is not. I mean, obviously, I mean, you based on the intro you gave me, like I lead a team in generative AI, as well. I can tell you that first and foremost, all of this is super new to a lot of people. And if I can answer the question about hype, no hype, I think I have an idea, I have a guess of why I think this is happening.

It used to be, I don't know, some of the core components of what's being used, with some exceptions, obviously, there's new novel advancements and the architecture of some of these models. But a lot of the core components have existed for a long time. Right? So, this isn't quite new. I think there's one particular aspect that now we can actually interact with these tools, right? I can just type words and it can tell me things. Now I can even talk to it. Scarlett Johansson apparently is now the spokesperson for OpenAI. Don't quote me on that. But so like, it's really funny, you know, like now you're able to actually interact with it. So, it can be so easily trivialized, right? And a lot of people kind of look at it and say, you know what? I actually think that...You know, this is the greatest invention of all time. And there are other people that actually say like, Hey, this is super hype. We're at the very top of the hype curve, et cetera. I think we're at the top of one hype curve. If I may, I think, LLMs themselves are reaching an asymptote. You're actually beginning to see a lot of this, right? Where like a lot of the LLMs that are actually being released today, are just, just slightly better than the next, right? They're just being able to score a little bit better. But honestly, when you see what we already have, they're already powerful in and of themselves. We can interact with them. We can have a conversation. There have been articles where, you know, these LLMs are being tested up against people, people are talking to them and like in blind studies where people can't tell if it's a human or if it's a machine. So, they already asymptote because language, they asymptote to our human language. And so, I think they're as good as they can be. We're like 90% of the way there, right? Just don't quote me on that number.

But I think because of that, right, like people are starting to just be like, you know, it's nothing, it's hype. Actually, I think the application of this technology is not hyped at all. I actually think it's actually quite new because what we've been kind of turning over with the past few years has been check out this LLM. Check out this thing that can generate texts, it could generate audio or video. But people still having caught on to the idea that like, okay, how do we, as businesses, how do we take that and apply it directly to what we're trying to do? And that's why when I say it's not hyped, that's what I mean. I don't think the majority of businesses out there have begun the application of this technology. A lot of them are starting, hopefully, and some of them are saying, this is like going to blow away. It's not.

If you're a lender, imagine if you can augment your call center with this, it’s a totally viable way of running, certain parts of your business. Right? So like, there's no hype here. It just depends how you want to perceive it. Right? But I think we haven't even scratched the surface to the application of this technology, but I do think that we've already reached a little bit of a top on the technology itself of language itself.

Shawn

Yeah, that's interesting. So, I mean, I think part of the challenge is, and I've said this before, but you have machine learning, you have artificial intelligence, it kind of sounds like you just cut the humans out of this. And as you mentioned, you have a team working on this, so there's obviously something the human has to do. Like you can't just say, make my call center more efficient. What is the role of a data science team in bridging this tool into a real business application?

Josh

Yeah, super great question. You know, when it comes to, again, I'm just speaking in super broad terms, but if your data science team is primarily a team of statisticians, it is slightly challenging to move to this particular field, like say that you wanted to create a chat bot to answer consumer questions on your website, something simple.

And I say that because, the skillset needed there, maybe people are familiar, there's a couple of like frameworks out there, open-source frameworks. There's like Microsoft specific frameworks, et cetera. But let's just say something like an open-source framework, like LangChain, right? That particular skill set would be more adjacent to someone that's experienced like traditional application development or DevOps or some sort of like software engineer type of role. They're actually very easily transferable types of skills and knowledge. So if you come from a purely statistical background, it's not impossible, but it's definitely like there's a switching cost, right? That's what you would need to then use as a framework to then go and utilize some of these large language models. The model itself is only a thing that you would then need to put on some framework to make it interact with different things in your business. So that's what I'm saying, like it's so part of like how people are perceiving the different layers behind this technology. It doesn't just stop at the model itself, there's so much more underneath if you actually want to use it within your business.

Shawn

Yeah, that's fantastic. I think it actually, as you say that it reminds me, you know, if I go back to, at Zest AI, we were using tools that to your point had been around for a long time, right? Many of them for decades. It's just all of a sudden we had the compute power to actually be able to use those tools. Back then, as I recall, like we'd launch a model on a virtual machine and let it chunk away for like two days in what is probably done in your laptop now and in half an hour, 15 minutes, right? So it's just the change isn't necessarily in the technique so much as the compute power to take full advantage of that technique, and we may be approaching the end of the LLM technique in terms of kind of what it can do with a certain set of tools but there's a lot more to be done I think, especially in applying this tool, into organizations. And I think there are, you know, as a lender, there's parts of that that frightened me. I certainly see ChatGPT hallucinate on me and, you know, hallucinations and lending are very, very bad. You know, certain hallucinations in a call center environment, you know, are probably, less bad as long as it doesn't, I don't know, become wildly sexist or, go the way of the Microsoft model from 10 years ago.

Josh

And Shawn, it goes without saying, right, that these are called large language models, not credit risk models. So I know most people out there would never dare, right? But I'm just saying like, they don't do well with math or math particular task. You know, just the safety warning at the back of the bottle, right? Don't use this for credit risk modeling.

You know, hallucination is, it's a very real risk of large language models. So, you know, usually as you would think about this, you know, think about applications that, you know, don't impact the consumer goodness, right? So like be safe, like this stuff is useful, but it has its use cases. So.

Shawn

Yeah, appreciate that. And I appreciate your time. This was really great. So thank you for joining me today. And everyone else, you can follow Ensemblex, and Josh Campos. That's C -A -M-P -O -S on LinkedIn. You can also visit us at Ensemblex.com and you can find The Ensemblex Exchange Podcast on all major platforms. Thank you all for listening.

Josh

Thanks, Shawn.