Abstract:
Multicollinearity which is the occurrence of high intercorrelations among two or more independent variables in a prediction model, may occur when stacking is applied to estimates from the base models. In order to investigate and compare the performances of different methods when stacking is applied in small size data sets, several approaches are conducted in this study. Firstly, Principal Component Analysis (PCA) is applied on the predictions of the base models before stacking. Next, stacking is done using tree-based models that are PCA-free but can handle multicollinearity. Moreover, PCA is built on the combination of the two previous approaches, i.e., the predictions of the baseline models, and then tree-based models are used in the stacking. These operations are performed on small scale data sets with different numbers of predictors. In addition, a noisy dataset is also used. As a result of these, it is found that there is no difference between the inferences of the different scales of the data sets. It is observed viii that stacking do not create a significant increase in accuracy, complexity and stability measures.