Recently, I have been struggling to understand something, which is related to the notion of "overfitting". In learning theory, It usually refers to the situation in which one try to infer a general concept (e.g., regressor or classifier) from finite observations (aka data). The concept is said to overfit the observation if it performs well on the set of observations, but performs worse on the previously unseen observations. The ability to generalize to the future observation is known as a "generalization" ability of the concept we have inferred. In practice, we would prefer the concept that not only performs well on the current data, but also do so on the data we might observe in the future (we generally assume that the data come from the same distribution).

When I think of overfitting, it is unavoidable to refer to "generalisation". In fact, as we can see above, we give the definition of overfitting based on the generalization ability of the concept. The notion of overfitting is also closely related to the notion of ill-posedness in the inverse problem. The is in fact the motivation of the regularisation problem we often encounter in the machine learning.

In the past few weeks, I have to deal with estimation problem. In principle, it is different from regression or classification problem(and regularization problem as well). However, there seems to be a connection between estimation problem and regularization problem, that still puzzle me. In estimation theory, the problem is mostly unsupervised, so it is not clear how to define "overfitting" in this case. Can we look at overfitting based on something beyond generalisation?

So the question I would like to ask in this post is "what else can we see/consider as overfitting?" If you have good examples, please feel free to leave comments.