Machine Learning Software
Machine learning has many synonyms including, but not limited to, computational statistics, data mining, artificial intelligence, computational intelligence, and most recently deep learning (deep learning can also be seen as a specific instance of machine learning). Put simply, machine learning is the construction of algorithms which enable models to learn the hidden patterns in data. Popular machine learning techniques include decision tree learning, association rule learning, neural networks, support vector machines, clustering, Bayesian networks, meta-heuristic algorithms, and more. Because implementing your own models is typically quite time consuming there are a number of packages available which implement these techniques for us. Some of the machine learning software I have used are listed below.
For a much more comprehensive list check out this GitHub list of machine learning frameworks!
Data Mining / Statistical Analysis Software
SciKit Learn provides Python tools which support classification, regression, clustering, dimensionality reduction, model selection, and data analysis
Weka is is a collection of machine learning algorithms for data mining tasks. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
Gretl is a cross-platform software package for econometric analysis. It supports a wide variety of estimators, time series methods, and Limited dependent variable models.
I a big proponent of Python because of it's readability, scalability (especially when coupled with systems like Apache Spark), and the depth of functionality offered by Python packages. The following packages are essential for the statistical analysis of data which is a fundamental aspect of machine learning.
SciPy is a designed for mathematics, science, and engineering applications. The SciPi package is the core package which brings the others together
Pandas provides data structures and data analysis tools including operations for manipulating numerical tables (matrices) and doing (financial) time series analysis.
NumPy is used for scientific computing and it contains an N-dimensional array, linear algebra, Fourier transformations, and random number generators (useful for quants).
SymPy is a library for symbolic mathematics it has features related to polynomials, combinatorics, equation solving, discrete math, matrices, and more.
Artificial Neural Networks Software
Encog is Java based and supports data pre-processing, Support Vector Machines, Neural Networks, Bayesian Networks, Hidden Markov Models, and Genetic Algorithms (incl. programming).
PyBrain is a modular Machine Learning Library for Python. It contains algorithms for neural networks, for reinforcement learning (and combinations), for unsupervised learning, and evolution.
Neuroph is lightweight Java neural network framework for common neural network architectures. It contains open source Java libraries with classes which correspond to basic NN concepts.
DEAP (Distributed Evolutionary Algorithms for Python) contains a rapid prototyping environment for testing evolutionary algorithms. It is built off of the SCOOP framework for efficient parallel execution.
JOptimizer is a Java based open source library for constrained optimization. It includes gradient descent algorithms, linear programming, quadratic programming, etc. (Great for benchmarks)
OAT (the Optimization Algorithm Toolkit) is a Java based library with implementations of evolutionary algorithms, swarm intelligence (including ant colony optimization) and other CI algorithms.