Open Market Data
Most financial analysis is based on historical data so understanding where to get it and what to do with it is important. For many years data was either not available or it was locked away behind online pay-walls. Luckily for us we are in the midst of an open source data movement. Open market data is market data which is freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. This page highlights some of the best sources of open source data which can be used for algorithmic trading, quantitative finance, and machine learning. If you know of any more sources, please mention them in the comments below.
Useful blog posts and tutorials with code
Quandl.com is the Google of data. Quandl has proprietary and open source data covering many topics including economics, countries, demographics, companies, market prices, and housing data.
- Building a data dashboard in Excel using the Quandl plugin (Video, Blog.Quandl.com)
- Using Quandl open source data in R (Blog.Quandl.com)
- How to download futures data from Quandl (QuantStart.com)
- How to perform regression analysis to Quandl data sets (StuartReid.co.za)
- How to integrate Quandl data sets into Quantopian trading strategies (Quantopian.com)
- Pre-processing Quandl data for time series analysis (QuantAtRisk.com)
- Applied VaR decomposition using Quandl data (QuantAtRisk.com)
- How to get an N-asset portfolio covariance matrix using Quandl data (QuantAtRisk.com)
Yahoo Finance is a web site which provides financial information including stock quotes, stock exchange rates, corporate press releases and financial reports
- Extracting stock prices from Yahoo! Finance using Python (SimplyPython.Wordpress.com)
- Direct scraping Yahoo! Finance data using Python (SimplyPython.Wordpress.com)
- How to download S&P 500 tickers and data using Python and Yahoo! (TheAlgoEngineer.com)
- On the quality of historical stock prices from Yahoo Finance (TheSystematicInvestor.com)
- The basics of statistical mean reversion testing - uses Yahoo Finance (QuantStart.com)
- Forecasting financial time series - uses Yahoo Finance (QuantStart.com)
- Guide to using GetSymbolsExtra in QuantMod for Yahoo data (TheSystematicInvestor.com)
In addition to these two sources the following sources also provide a lot of meaningful market data: Amazon Web Services Data provides data which can be seamlessly integrated into AWS cloud-based applications; The World Bank provides free and open access to data about development in countries around the globe; Google Finance offers similar information to Yahoo Finance and has some cool features for portfolios; the UCI machine learning data provides useful data sets for testing machine learning algorithms on different types of problems; MLData also provides open source machine learning data sets for testing different algorithms, and DataBib offers a searchable directory of research open source data repositories. That said, many of these data-sets are aggregated on Quandl.com for convenience.