What Drives Real GDP Growth?
Econometrics is the application of statistical and computational techniques to the study of economic data. It differs from classical economics in that it is based on empirical findings rather than theories. One benefit of this approach is that empirical findings are more easy to produce than theories. The drawback is that findings lack sound economic theory which support them. This article represents a first step towards deriving empirically a factor model which tries to explain what drives real GDP growth,
What socioeconomic factors drive long term real GDP growth? (Part 1)
What better ways are there of grouping countries together than geographically? (Part 2)
Which countries of the world are going to experience real GDP growth from now till 2019? (Part 3)
Update - I have completed the second part of this three part series entitled: Monte Carlo K-Means Clustering of Countries with Python. The article discusses the foundations of clustering, measures of clustering quality, different clustering algorithms, and then brings it all together in Python to cluster the countries of the world. I'm still working on part three.
For a more theoretical and really easy to understand example of how Economies work in practice I highly recommend the video entitled "How The Economic Machine Works" by hedge fund manager Ray Dalio
Constructing the Data Sets
The first challenge with any empirical research is that it needs to be based on data. As such this research started off with the construction of the following data-sets using Quandl.com. Three data sets containing absolute values were constructed; one from ~2004, another from ~2009, and lastly one from ~2014. From these three data sets two more data sets (with which we are concerned) were produced containing the change in each socioeconomic indicators over two five year periods namely 2004 to 2009 and 2009 to 2014. The 67 socioeconomic indicators in these data sets included,
There are many problems with using data but the two biggest problems is that there is often not enough data to produce a representative sample, and when there is, the data it often incomplete and noisy.
In this study I replaced missing values with the mean value over each socioeconomic indicator. A more quantitative approach would have been to construct regression trees for each indicator and use them to predict the most likely value for each missing value. That said, the raw data sets were ~90% complete so any bias is probably negligible.
Real GDP Growth (Real Economic Growth)
The dependent variable which we are interested in is the real GDP growth of each country during the first and second five year period. Real GDP growth is inflation adjusted GDP growth. In this study the difference is profound and Real GDP Growth was deemed to be more correct. This is shown side by side in the following images.
What most of you will have noticed is that during the second period, 2009 to 2014, the world was in the midst of a recession. As such, the data sets are quite different. Interestingly only 47 countries out of 188 grew by as much or more from 2009 to 2014 and 2004 to 2009 and out of these only 13 were in the top 33% of countries in terms of average real GDP growth. These countries, in order, included Mongolia, Sierra Leone, Ghana, Iraq, Burkina Faso, Congo, Singapore, Georgia, Philippines, Belize, Montenegro, Chad and Paraguay.
Below I have included an interactive visualization of each countries average percentage growth rate over five years based on the data from 2004 to 2014. Visualizations are a form of exploratory data analysis which can help contextualize their quantitative results. The rest of this article describes how real GDP growth relates to socioeconomic indicators.
In the above chart green indicated growth, red indicated either negative growth or low growth, and white indicates intermediate growth somewhere in-between red and green.
Two popular correlations measures are Spearman Correlation, , which is computed using ranks and measures the monotonic relationship between two variables i.e. whether or not they increase or decrease with one another and Pearson Correlation, , which measures the linear relationship between the two variables. Both measures are scaled between -1 and 1 where indicates a perfect negative or positive correlation, and 0 indicates no correlation.
Using and together can help identify non-linear relationships in the data set. To be more specific, if we compute the correlations for some set of dependent variables and some set of independent variables and then that indicated that there may exist some non-linear transformation of , , such that the Pearson correlation between and is equal to 1. To illustrate this consider the following table I produced in Excel,
In this table I have calculated the Pearson and Spearman correlations between and a set of non-linear transformations on to illustrate the difference. The transformations included,
- Equal -
- Power -
- Natural Log -
- Uniform Random = , and
- Power with Uniform Noise =
As can be seen from the bottom two rows, there is a big difference between the Pearson and Spearman correlations between the original values and non-linear transformations of . Computing both the Pearson and Spearman correlation matrices and comparing them is a good way to pick up non-linear relationships in the data which you may have missed had you just chosen to look at the linear correlations.
Getting back to the problem at hand, I used correlation analysis in two ways. Firstly by computing the correlation matrices between each socioeconomic indicator it is possible to see which indicators are highly correlated with one another. From this one can determine whether using dimensionality reduction would be beneficial. Secondly I used correlation analysis to narrow down the number of socioeconomic indicators from 67 to 19 relevant ones.
Looking at the change in the correlation matrices can give us a clue as to how stable the correlations are over time and whether or not they were spurious. In this instance it was found that correlations are reasonably stable over time deviating by 0.1 on average (for both Spearman and Pearson) with a deviation of about 0.1 as well.
Another way to test correlations is to calculate its statistical significance using the students t-test. This measures the probability, given some sample size N, that a computed correlation has occurred randomly. The probability is looked up from the students t-distribution. Click here for more information on hypothesis testing. Using this we can remove socioeconomic indicators whose correlations to real GDP growth are statistically insignificant.
Lastly, of those socioeconomic indicators we can apply a general rule of thumb which states that correlation co-efficient values less than 0.2 are insignificant. In this study I averaged the absolute values of the Spearman and Pearson correlations over both data sets to reduce the number of socioeconomic indicators to the following,
[table nl="#"] Socioconomic Indicator,Average |Correlation|,Standard Deviation
Exchange Rate At PPP,0.3746,0.0832
Access to Improved Sanitation,0.3204,0.0627
Unemployment (negatively correlated),0.3131,0.0584
Access to Improved Water,0.3118,0.0763
Population Aged 15 To 64,0.2713,0.0479
Age Dependency Ratio Old (negatively correlated),0.2118,0.0177
Percentage of population Urban,0.2090,0.0557
Please note, I also removed the other economic indicators related to GDP for fear of bias.
From these socioeconomic indicators four themes stand out,
1. Real GDP Growth is positively linked to Government Activities, Foreign Trade, and Investments
2. Real GDP Growth is positively linked to the Purchasing Power Parity of a country's currency
3. Real GDP Growth is positively linked to large, employed, healthy workforce (between the age of 15 and 64) which have access to water, sanitation, and excess capital.
4. Real GDP Growth is positively linked to increased productivity and technology.
That having been said, correlation does not imply causation, therefore any of these statements could be the reverse. For example "real GDP growth is positively linked to increased productivity and technology" might be incorrect because "Increased productivity and technology adoption is closely linked to real GDP growth" is true. I think that all of the above mentioned relationships are cyclical in nature and feed off of each other.
Quantitatively these factors could be further optimized using Principal Component Analysis (PCA). PCA is used to produce uncorrelated factors which often work better in predictive models such as regression models and neural networks. This will be the focus of the third part of this article series. The second part of this series will investigate ways in which clustering algorithms such as K-means clustering can help us rationalize about the world better.
In conclusion, real GDP growth is explained by both economic and social development indicators. As such, investors looking to allocate their capital globally should pick countries which have governments with coherent plans to invest capital, a positive economic outlook in terms of trade and currency, and improving social and demographic circumstances.
This is a personal blog. As such the opinions expressed here are my own and do not necessarily represent those of my employer. All information on this blog is for educational purposes and is not intended to provide financial advice.