In real world we have plenty of data sources (e.g. finances, bioinformatics, environment monitoring, multimedia etc). Often we want to leverage this data and to predict what event will occur, which strategy to choose. Simply put – what are those measurements hiding from us? Can we predict based on the observed data?
The Machine Learning (ML) [WIKI] field from Computer Science answers just this question under very different viewpoints and assumptions. In ML jargon we want to find a classifier or predictor (statistic parlance) which will serve as a black box spitting out predictions on data that we want to classify & categorize. The problem is that the box is not that black – it should be built first.
There is plethora of different methods to learn a classifier from given data. Ranging from simplest & dumbest to very sophisticated & complex. Not trying to be exhaustive, one could mention Nearest Neighbor, Naive Bayes, Logistic Regression, Support Vector Machines, Neural Networks, Random Forests and many others.
In the same real world, we are often given either very few data or really huge datasets. That’s how the world works, folks. In this blog entry we will speak about recent advances in Machine Learning that attempts to answer the second challenge – learning & classification from very large datasets.
I implemented the Extreme Learning Machines classifier during the Raspberry Pi [WWW] coding event at University of Rouen [WWW] due to its low power and memory requirements. The method uses Neural Networks approach but does not require very expensive and time demanding Back-Forward learning and similar iterative techniques as found in classic literature. It implements single hidden layer feed forward neural (SLFN) network with multiple theoretical and practical underpinnings:
- scales to large and even very large datasets;
- very few easy to tune parameters;
- can be cast naturally in classification or regression contexts;
- very fast training & classification times;
- can model highly nonlinear data with kernels;
- generalizes the LS-SVM and P-SVM algorithms.
The first main point is that learning such SLFN is possible without iterative tuning! In last decades, gradient descent, Back Propagation and Least Square solution to RBF network have been very popular and closely studied. They have nice properties but suffer from multiple problems:
- learning algorithms are seemingly different with no clear connections;
- specific parameters that need an expert to tune it;
- prone to overfitting;
- can be trapped in local minima/maxima;
- often very computationally expensive.
The second main point is that hidden node parameters need not to be tuned by a learning algorithm! This is backed by both theoretical and practical evaluation! It has been shown on multiple public datasets that the ELM algorithm provides better or comparable performance than LS-SVM at much lower computational cost!
For more technical information on inner workings and evaluations, see [WWW] with PDFs and slides. This code implements the regularized version of the ELM algorithm from [PDF].
Now back to the main reason: we want a code that runs on my data and returns quickly the trained model and is as fast to classify any future data. I started from publicly available MATLAB implementation [HERE] and tidied it up into two clean and small functions. Further, I developed in C++ a library that can be seamlessly integrated into any project. In my case, I interfaced the code with MATLAB using MEX interface.
You can download the pure MATLAB code and library with pre-compiled MEX files [HERE]. The code consists of two parts – training and prediction functions. Library is provided in the form of template code that needs to be simply included in the source code that will use it.
We have also set up a Git [HERE]. Feel free to branch, commit back, comment etc.
The only dependency to compile the MATLAB MEX or a C++ project using the library is the Eigen3 linear algebra [WWW] framework. The package provides pre-compiled MEX files for Mac OS X 64bit and Ubuntu 64bit systems.
Before launching training and prediction, make sure your data attributes are normalized between -1 and 1!
Want an idea how fast the algorithm on some large dataset is? The algorithm provides state-of-the-art classification results on the Covertype database [WWW] that features 581012 samples with 54 dimensions. Training on a random half of the database with 25 hidden neurons took only 0.72 seconds on a single core of Intel Xeon E3 1245 v2 core! Approximately the same amount of time was needed to classify the remaining half of the data.
On low power and low memory devices manipulating *much* smaller datasets, such as Raspberry Pi, learning and classification will still be carried out in a fraction of second.