A study of Machine Learning for Prediction in Panel Data

Authors

  • Shenglong Huang
  • Lingyun Hu

DOI:

https://doi.org/10.62051/ijcsit.v3n2.33

Keywords:

Machine learning, RNN, XGBoost, Panel data, Prediction

Abstract

With the development of the times, the prediction study of machine learning in panel data is becoming more and more extensive, this paper adopts the RNN algorithm and XGBoost algorithm to carry out the prediction study on the quarterly GDP panel data of 31 provinces and cities in China from the 1st quarter of 2005 to the 4th quarter of 2023, and compares and analyses the two methods. In the forecasting study, this paper considers the influence of geographic location in the panel data, and the results show that there are significant regional differences in the RNN algorithm in forecasting, and the eastern coastal and inland provinces with stronger economies perform better, for example, the training set correlation coefficient of Guangdong province is as high as 0.8052, followed by Anhui and Hubei. However, the Qinghai-Xizang region performs poorly and is at risk of overfitting. The XGBoost algorithm, on the other hand, shows high correlation coefficients on both the training and test sets in most provinces and cities, especially in Beijing, Tianjin and Hebei, where the correlation coefficients are above 0.9, showing good prediction results. In terms of mean square error, the MSE of the training set is generally smaller than that of the test set, indicating that the predicted values of some regions have a large deviation from the actual values. In terms of the mean absolute percentage error (MAPE), the MAPE of most provinces and cities is below 1%, which indicates that the relative error of prediction is small. Comprehensive analysis shows that XGBoost is good at dealing with nonlinear relationships and complex feature interactions, and is especially suitable for capturing nonlinear geolocation features, while RNN may be more effective in dealing with temporal geolocation features; XGBoost has a strong fitting ability and is suitable for sparse data, while RNN needs more data to learn effective representations; XGBoost has less need for hyperparameter tuning, while RNN is is more sensitive to hyperparameters.

Downloads

Download data is not yet available.

References

Baltagi, Badi H. "Forecasting with panel data." Journal of forecasting 27.2 (2008): 153-173.

Liu, Laura, Hyungsik Roger Moon, and Frank Schorfheide. "Forecasting with dynamic panel data models." Econometrica 88.1 (2020): 171-201.

Qu, Ritong, Allan Timmermann, and Yinchu Zhu. "Comparing forecasting performance with panel data." International journal of forecasting 40.3 (2024): 918-941.

Baltagi, Badi H. "Panel data forecasting." Handbook of economic forecasting 2 (2013): 995-1024.

Timmermann, Allan, and Yinchu Zhu. "Comparing forecasting performance with panel data." (2019).

Baltagi, Badi H., Bernard Fingleton, and Alain Pirotte. "Estimating and forecasting with a dynamic spatial panel data model." Oxford Bulletin of Economics and Statistics 76.1 (2014): 112-138.

Baltagi, Badi H. "Forecasting with panel data." Center for Policy Research Working Paper 91 (2007).

Downloads

Published

19-07-2024

Issue

Section

Articles

How to Cite

Huang, S., & Hu, L. (2024). A study of Machine Learning for Prediction in Panel Data. International Journal of Computer Science and Information Technology, 3(2), 302-313. https://doi.org/10.62051/ijcsit.v3n2.33