Limitations of Cross-Validation

Currently, cross-validation is widely accepted by the data science community and continues to be the standard procedure for estimating the performance and ultimately choosing the correct model. (Tang, 2008) However, cross-validation does present some challenges. One main drawback of cross-validation is the need for excessive computational resources, especially in methods such as k-fold cross-validation. Since the algorithm must be rerun from scratch for k times, it requires k times more computation to evaluate. (Lau, 2020)

Another limitation involves unseen data. In cross-validation, the test dataset is the unseen data used to evaluate the model's performance. This works in theory, however, in real life there can never be a comprehensive set of unseen data. Since no one can predict the kind of data that the model might encounter in the future, there is always a level of uncertainty. (Tang, 2008)

Take for example a model is built to predict an individual's risk of contracting a specific disease. If the model is trained on data from a research study involving only a particular population group (for example, men aged 60 to 65), when it's applied to the general population, the predictive performance might differ dramatically compared to the cross-validation accuracy.

Finally, not only must the datasets be independently controlled across different runs, but there also must not be any overlap between the data used for learning and the data used for testing. (Tang 2008) Typically, an algorithm can make more accurate predictions on data it has seen during the learning phase as compared to data it has not seen. This can lead to an overestimation of performance. (Tang 2008) For this reason, scientists strictly forbid any overlap between the training set and the validation set. (Tang, 2008)