What Drives Real GDP Growth?
Econometrics is the application of statistical and computational techniques to the study of economic data. It differs from classical economics in that it is based on empirical findings rather than theories. One benefit of this approach is that empirical findings are more easy to produce than theories. The drawback is that findings lack sound economic theory which support them. This article represents a first step towards deriving empirically a factor model which tries to explain what drives real GDP growth,
What socioeconomic factors drive long term real GDP growth? (Part 1)
What better ways are there of grouping countries together than geographically? (Part 2)
Which countries of the world are going to experience real GDP growth from now till 2019? (Part 3)
For a more theoretical and really easy to understand example of how Economies work in practice I highly recommend the video entitled "How The Economic Machine Works" by hedge fund manager Ray Dalio
Constructing the Data Sets
The first challenge with any empirical research is that it needs to be based on data. As such this research started off with the construction of the following datasets using Quandl.com. Three data sets containing absolute values were constructed; one from ~2004, another from ~2009, and lastly one from ~2014. From these three data sets two more data sets (with which we are concerned) were produced containing the change in each socioeconomic indicators over two five year periods namely 2004 to 2009 and 2009 to 2014. The 67 socioeconomic indicators in these data sets included,
 Estimated Control of Corruption
 Estimated Government Effectiveness
 Access to Improved Sanitation Facilities
 Access to Improved Water Sources
 Life Expectancy at Birth
 Estimated Political Stability
 Estimated Rule Of Law
 Percent Under Nourished People
 Estimated Accountability
 Fertility Rate
 Health Expenditure
 Youth Unemployment Rate
 Unemployment Rate
 Enrollment Rate in Primary Education
 Enrollment Rate in Secondary Education
 Enrollment Rate in Tertiary Education
 Net Debt
 Current Account Balance
 Exchange Rate At PPP
 Imports
 Exports
 GDP
 Household Consumption Expenditures
 Real Interest Rates
 Total Investment
 Total Reserves
 Trade Balance
 Agriculture Share of GDP
 Armed Forces Personnel
 Armed Forces Personnel Percent Labour Force
 Change In Inventories
 Deposit Interest Rate
 Exports Share of GDP
 GDP As Share World GDP at PPP
 GDP Per Capita
 GDP Per Capita At PPP
 Deficit Surplus Percent GDP
 Government Revenues
 Government Revenues Percent GDP
 Government Spending
 Government Spending Percent GDP
 Imports Share of GDP
 Industrial Production
 Industry Percent GDP
 Lending Interest Rate
 Lending Risk Premium
 Military Expenditure
 National Savings
 Net Debt Percent GDP
 Net Tax Revenue Percent GDP
 Real GDP Growth
 GDP Growth (change in Real GDP)
 Real GDP
 Services Percent GDP
 Economic Growth
 Population
 Population Younger than 15
 Population Aged 15 To 64
 Population Greater Than 64
 Population Growth Rate
 Percentage of Population Rural
 Percentage of Population Urban
 Age Dependency Ratio
 Age Dependency Ratio Old
 Age Dependency Ratio Young
There are many problems with using data but the two biggest problems is that there is often not enough data to produce a representative sample, and when there is, the data it often incomplete and noisy.
In this study I replaced missing values with the mean value over each socioeconomic indicator. A more quantitative approach would have been to construct regression trees for each indicator and use them to predict the most likely value for each missing value. That said, the raw data sets were ~90% complete so any bias is probably negligible.
Real GDP Growth (Real Economic Growth)
The dependent variable which we are interested in is the real GDP growth of each country during the first and second five year period. Real GDP growth is inflation adjusted GDP growth. In this study the difference is profound and Real GDP Growth was deemed to be more correct. This is shown side by side in the following images.
What most of you will have noticed is that during the second period, 2009 to 2014, the world was in the midst of a recession. As such, the data sets are quite different. Interestingly only 47 countries out of 188 grew by as much or more from 2009 to 2014 and 2004 to 2009 and out of these only 13 were in the top 33% of countries in terms of average real GDP growth. These countries, in order, included Mongolia, Sierra Leone, Ghana, Iraq, Burkina Faso, Congo, Singapore, Georgia, Philippines, Belize, Montenegro, Chad and Paraguay.
Below I have included an interactive visualization of each countries average percentage growth rate over five years based on the data from 2004 to 2014. Visualizations are a form of exploratory data analysis which can help contextualize their quantitative results. The rest of this article describes how real GDP growth relates to socioeconomic indicators.
.
In the above chart green indicated growth, red indicated either negative growth or low growth, and white indicates intermediate growth somewhere inbetween red and green.
Correlation Analysis
Theoretical Underpinnings
Two popular correlations measures are Spearman Correlation, , which is computed using ranks and measures the monotonic relationship between two variables i.e. whether or not they increase or decrease with one another and Pearson Correlation, , which measures the linear relationship between the two variables. Both measures are scaled between 1 and 1 where indicates a perfect negative or positive correlation, and 0 indicates no correlation.
Using and together can help identify nonlinear relationships in the data set. To be more specific, if we compute the correlations for some set of dependent variables and some set of independent variables and then that indicated that there may exist some nonlinear transformation of , , such that the Pearson correlation between and is equal to 1. To illustrate this consider the following table I produced in Excel,
In this table I have calculated the Pearson and Spearman correlations between and a set of nonlinear transformations on to illustrate the difference. The transformations included,
 Equal 
 Power 
 Natural Log 
 Uniform Random = , and
 Power with Uniform Noise =
As can be seen from the bottom two rows, there is a big difference between the Pearson and Spearman correlations between the original values and nonlinear transformations of . Computing both the Pearson and Spearman correlation matrices and comparing them is a good way to pick up nonlinear relationships in the data which you may have missed had you just chosen to look at the linear correlations.
Practical Application
Getting back to the problem at hand, I used correlation analysis in two ways. Firstly by computing the correlation matrices between each socioeconomic indicator it is possible to see which indicators are highly correlated with one another. From this one can determine whether using dimensionality reduction would be beneficial. Secondly I used correlation analysis to narrow down the number of socioeconomic indicators from 67 to 19 relevant ones.
Looking at the change in the correlation matrices can give us a clue as to how stable the correlations are over time and whether or not they were spurious. In this instance it was found that correlations are reasonably stable over time deviating by 0.1 on average (for both Spearman and Pearson) with a deviation of about 0.1 as well.
Another way to test correlations is to calculate its statistical significance using the students ttest. This measures the probability, given some sample size N, that a computed correlation has occurred randomly. The probability is looked up from the students tdistribution. Click here for more information on hypothesis testing. Using this we can remove socioeconomic indicators whose correlations to real GDP growth are statistically insignificant.
Lastly, of those socioeconomic indicators we can apply a general rule of thumb which states that correlation coefficient values less than 0.2 are insignificant. In this study I averaged the absolute values of the Spearman and Pearson correlations over both data sets to reduce the number of socioeconomic indicators to the following,
[table nl="#"] Socioconomic Indicator,Average Correlation,Standard Deviation
Government Revenues,0.5906,0.0407
Government Spending,0.5580,0.0543
Exchange Rate At PPP,0.3746,0.0832
Population Size,0.3684,0.1031
Cellphone Subscriptions,0.3216,0.1657
Access to Improved Sanitation,0.3204,0.0627
Unemployment (negatively correlated),0.3131,0.0584
Access to Improved Water,0.3118,0.0763
Population Total,0.3010,0.0676
Internet Users,0.2972,0.1487
Household Expenditures,0.2746,0.2205
Population Aged 15 To 64,0.2713,0.0479
Imports,0.2612,0.1648
Total Investment,0.2580,0.0848
Industrial Production,0.2351,0.1041
Health Expenditure,0.2349,0.1011
Age Dependency Ratio Old (negatively correlated),0.2118,0.0177
Exports,0.2092,0.1372
Percentage of population Urban,0.2090,0.0557
[/table]
Please note, I also removed the other economic indicators related to GDP for fear of bias.
Conclusion
From these socioeconomic indicators four themes stand out,
1. Real GDP Growth is positively linked to Government Activities, Foreign Trade, and Investments
2. Real GDP Growth is positively linked to the Purchasing Power Parity of a country's currency
3. Real GDP Growth is positively linked to large, employed, healthy workforce (between the age of 15 and 64) which have access to water, sanitation, and excess capital.
4. Real GDP Growth is positively linked to increased productivity and technology.
That having been said, correlation does not imply causation, therefore any of these statements could be the reverse. For example "real GDP growth is positively linked to increased productivity and technology" might be incorrect because "Increased productivity and technology adoption is closely linked to real GDP growth" is true. I think that all of the above mentioned relationships are cyclical in nature and feed off of each other.
Quantitatively these factors could be further optimized using Principal Component Analysis (PCA). PCA is used to produce uncorrelated factors which often work better in predictive models such as regression models and neural networks. This will be the focus of the third part of this article series. The second part of this series will investigate ways in which clustering algorithms such as Kmeans clustering can help us rationalize about the world better.
In conclusion, real GDP growth is explained by both economic and social development indicators. As such, investors looking to allocate their capital globally should pick countries which have governments with coherent plans to invest capital, a positive economic outlook in terms of trade and currency, and improving social and demographic circumstances.
Disclaimer
This is a personal blog. As such the opinions expressed here are my own and do not necessarily represent those of my employer. All information on this blog is for educational purposes and is not intended to provide financial advice.

hi, what program you used to produce the visualization of map graph for real gdp growth of each country.
thanks!
j 
Stuart, these are awesome posts. I'm slowly reading through them all. I hope you keep it up. You ever write a book, it'll be a "shut up and take my money" moment.
Comments