10 Years of Open Source Machine Learning

Thomas Rory Stone, Ph.D.
3 min readMar 15, 2016

Over the past few years the field of Machine Learning has entered the general parlance. From free massive open online courses to image recognition benchmarks being broken and decades of Atari games being mastered.

During the same period developers have witnessed the release of several popular open source frameworks and libraries. The chart below shows different open source machine learning projects by initial commit date and programming language. The size represents the popularity of a project based on number of Github stargazers.

Visualising 10 Years of open source machine learning projects on Github

The earliest project released on Github was the Shogun Machine Learning Toolbox, originally initiated in 1999 by Soeren Sonnenburg and Gunnar Raetsch. The most common programming language is C++ including most recently Google’s TensorFlow. Developed by researchers and engineers working on the Google Brain team, it has quickly become the most popular open source machine learning project on Github. Sci-kit learn, an extensive machine learning toolkit for Python, is the next most popular with very active contributor community. PredictionIO, the only project written in Scala is currently the third most popular open source machine learning project — now fourth behind Caffe. I should have published this blog post a fortnight ago 😭

You can view all the repositories above on Github’s machine learning showcase including a total of 23 repositories in 10 different programming languages.

Clearly different projects serve different purposes and with different focus. For example, John Langford’s Vowpal Wabbit for fast out-of-core learning and online learning, or projects such as Caffe, developed by the Berkeley Vision and Learning Center, for deep learning. There are also notable projects which are not included in Github’s showcase such as Deeplearning4j, another deep learning library for Java, and Torch, which is maintained by research scientists at Google, Twitter and Facebook.

The first line of code for PredictionIO was committed in January 2013, which turned out to be a busy year in the world of open source machine learning! In the interim, whilst teams at Google DeepMind and Facebook AI Research (FAIR) have been learning Go (the ancient Chinese board game not the programming language) our core team and community of contributors have been making it easier for developers and data scientists to build and deploy machine learning on production to power smarter applications.

Lee Sedol forfeits and AlphaGo wins 3–0 (Final result was 4–1)

Looking at these contributions from academia, industry and startups over the past decade one thing is very clear. Machine Learning has a very bright future! From leading U.S. research labs at Stanford, Berkeley, CMU, MIT, Toronto, NYU and many more. To computer science departments across the Americas, Europe, Asia and far beyond. To the increasingly active tech companies such as Salesforce, Google, Facebook, Microsoft, Amazon, IBM, Yahoo! and Baidu. Nowadays every undergraduate computer science student and software developer is interested learning about Machine Learning.

So… What are you waiting for? Check out PredictionIO on Github today!

Nota Bene: Salesforce is hiring!

--

--

Thomas Rory Stone, Ph.D.

Founding Partner @kintsugiad . Previously Partner @AIseedVC , Lecturer @UCL , Co-founder @PredictionIO (Acquired by Salesforce) and Ph.D. @UCLCS