Model Capacity

While studying the book Deep Learning by Ian Goodfellow, I came across this concept of model capacity, and it was really intuitive in helping me understand the models representation of a given problem.

This ties to the concept of overfitting and underfitting

Capacity


Put simply, the capacity of the model is the complexity of the model, or the ability to represent complex relationships.

A model with high capacity can represent very complex relationships, while a model with low capacity can represent not so complex relationships

How it ties to Overfitting and Underfitting


Models perform best when their capacity approximately captures the complexity of the task they need to perform.

If the task or data set has a very low complexity, but the model has a very high capacity, the model will overfit, as it will memorize and represent all the features in the training set that may not be in the testing set.

Conversely, if the task or data set is highly complex, but the capacity of the model is low, the model will underfit, as it cannot sufficiently represent the complex relationships and features in the training data

Vapnik–Chervonenkis (VC) dimension


The VC dimension is a measurement of a model’s capacity, given that the task at hand is a binary classification problem.

The numerical value of a VC dimension tells us the largest set of data point the model can perfectly classify (or shatter). Thus, the larger the VC dimension, the more data points the model is able to perfectly classify, which also means that the model is inherently more complex.

The usage of a VC dimension usually just tells us how complex an algorithm is. A model with a higher VC dimension requires more data to properly train due to the higher complexity.

Conclusion


The take away here is that, overly complicated models are not always better as they can overfit. A simple decision tree or random forest can be perfect for data with low complexity.

A VC dimension is simply a quantification of how complex a model is. Higher VC = Able to separate more data points = more complex = requires more training data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s