Summary of Results

As originally hypothesized, increasing the amount of training data led to better performance. However, it also cost more: the time (resources) increased exponentially.

We hope to use our findings to open interesting conversations with policy analysts and bank representatives about which mortage risk models are feasible to rely on.

Models at First Glance: Visualizing Predictive Performance

Firstly, let's evaluate how each model iteration performed in terms of ROC AUC and precision.

As a reminder:

By measuring both of these metrics, we can get a nuanced understanding of how well the model performs overall and in relation to false positives.


AUC increases linearly with each model
AUCs for each model. References to each model here.
Precision increases exponentially with each model
Precision for each model. References to each model here.

Main Idea: As the amount of data (instances OR features) increases, performance improves. Specifically, AUC increases linearly while precision increases exponentially.

Here are some highlights:

Models at Second Glance: Cost using Time as a Proxy

To get a nuanced understanding of the model performance, we also measured the time to run the model.

Time represents the time taken to initialize, train, and evaluate each model. Time is measured in minutes on the graph.

We can see time to run the model as representative (or a proxy) of the cost in resources, computing, and money to process a model.


Time increases exponentially with each model
Time to run each model. References to each model here.

Main Idea: Time (cost) goes up exponentially as we feed the model more data to train/learn from.

Here are some other highlights: