Top 100 Machine Learning Projects 2018

This time around, I plan to gather the best resources for you guys in mastering Machine Learning. The projects are all open source taken from their repository in Github. I took top 100 projects from more than 10,000 projects based on their stars and have activities for the past year. This top machine learning project list is dedicated for helping you in mastering Machine Learning.

The average stars for all of the projects is 8554

From those 100 projects, I grouped them into 6 (six) categories based on what are the projects actually doing. The categories are, Tutorial, Framework, Algorithm, Accompany, Deployment, and Resource. I list all of them in the table of contents below. For your ease of use, you can CTRL+F to find what you want.

The very basic tutorial for the popular Machine Learning framework. This tutorial itself gather more than 20k stars by itself. You definitely understand how popular it is. The content of this tutorial is pretty deep, starting from basic operation of TensorFlow, common Machine Learning algorithm like Linear Regression and Nearest Neighbor, into the complex DCGAN and how to train neural network on multi GPU setting.

This repo is also hugely popular. The content itself is quite simple, it is a study plan. But, that plan is something. It is perfectly thought and planned. It has around 14 chapters which every chapter have more than 7 lessons. You can follow his plan exactly the same order that he wrote, or you can pick anything you want to study first. It is quite feasible.

A step by step guide directly in IPython Notebooks. Means, you can just clone the repository and run the notebook to get into as soon as possible. The repo includes tutorials for popular Machine Learning frameworks, such as, Tensorflow, Theano, Keras, and Caffe.

It also includes the popular Scikit-Learn tutorials which you can use for building model on structured data. There are also tutorials on Numpy, Matplotlib, and Pandas, three essential framework to play with data in python.

Another popular guide using IPython Notebooks. With this tutorial, you will dive into the idea of the Machine Learning itself. Their noteworthy tagline is Play to learn, means you are expected to have fun while learning. It could help you a lot in your progress!

After you are hooked to the game, they want you to immerse yourself inside. They give you many useful resources that can help you along the way.

Huge 1000 lines of tutorial material. It boasts the largest curated list of lessons. You can learn pretty much everything from here. From the basic definition of Artificial Intelligence, Statistics, Machine Learning, Deep Learning, into how to do ensemble several Machine Learning models into one better models. This is quite useful when you concern a lot about getting a better accuracy on your model.

Another tutorial for the popular framework. It covers almost everything, from the basic data types you can work with, the ever useful scopes, feeding data to Tensorflow for training and testing. It also covers advanced algorithm like Beam Search, KL-divergence, and Leaky Relu. I personally will choose to start with this tutorial first before the others.

Now, this is different. Another popular framework besides Tensorflow, Pytorch is geared towards everyone who want to take advantage of dynamic graphs. Thus, making the source code more “pythonic” than Tensorflow. This tutorial contains 3 (three) level of difficulties. Basic which covers the basic of PyTorch. Intermediate which covers more deep topics like CNN and RNN, and advanced topics which covers some complex things like Auto-encoders.

This is special. It is actually a repository for a Massive Open Online Course with the same name by Jeremy Howard. A referred expert in Machine Learning. It contains all the course resources you need and also the code example he made for it. This tutorial is really recommended for you who want to really understand the choice of algorithm and what was working behind the scene.

An exciting piece of tutorial by Siraj Raval. It contains the curriculum needed to learn all the things to be able of building your own ML model in 3 months. He will guide you thoroughly by starting from the easiest one in Linear Algebra, the needed Calculus, Probability, and Algorithm. And increasing the difficulty while the month goes on. And obviously you will be a “wizard” after mastering this thing.

A must try for beginners. This guide is intended for complete beginners who doesn’t have any background in AI. He said that, you don’t need a PhD to create a neural network. And obviously, you don’t need to be a man who break through the current technology, and instead, use the already exist ones. The goal of the tutorial is to make you aware how simple it is and how much hype was wrong.

Randy Olson created this tutorial as accompany for his tutorial blog. This tutorial is geared more to data analysis and has several great problems covered. This tutorial covered things like optimized road trip or analyzing Twitter’s follower, that we can easily relate for our everyday’s life.

A mind map which summarize the concept of Machine Learning, for you, to be able to understand it more easily. It has several great mind map visualization that can clear your cloudy thought, such as, the Machine Learning process, data processing, and models. It also included the most hated mathematical terms, but you need to at least know about that. LOL.

This is a traditional Machine Learning tutorial. People like you who want to know the steps of doing data analytics until model making will love this tutorial. It explains how to pre-process the data, choose the best parameter, and how to evaluate the model. You will also know how to visualize the data to make it easier to understand.

Curated list of lessons and tutorial for Data Science, Machine Learning, and NLP in Python. It even covers the basic of Python programming language. And goes on to cover how to use notebooks to do experiments. Also how to use the popular Scikit-Learn library to create your model. This tutorial is absolutely covers more topics than others for traditional Machine Learning techniques.

Teachable Machine, an exciting tutorial for beginners to understand the concept of Machine Learning. You can just drag and drop training data and see how your machine do his learning. It is an amazing piece of web by Google team. After trying the tutorial, I am absolute sure you will know what kind of food Machine Learning is.

While every other tutorials give their best on explaining what Machine Learning is for you. This particular tutorial will tell you about what is the particular algorithm is doing behind the scene. It has great visualization that will make you hooked to the lesson.

I personally recommend this lesson for you to make you understand what are you doing. Instead of just creating a model and praying to God to make the model run well.

Framework

That’s all for the top tutorial I ranked. Now is the main course, the absolute things you will need to create your own Machine Learning model. The frameworks. Let’s get started.

The most popular Machine Learning framework in earth. It has a whooping 108k stars and more than 1.6k contributors. Based on the description, Tensorflow is an open source library for numerical computation using data flow graphs. It means you need to construct a computational graph and input your data flowing the graph and give you the result.

The flexibility of the graph itself make Tensorflow famous. In addition, it has a great way of deployment via Tensorflow Serving that can make a Continuous Deployment out of the box. Other than that, there is Tensorflow Lite, that can turn your model into Mobile-enabled one that you can deploy into Android and iOS apps. To complete the roster, Tensorboard, can make your training observable in a convenience way.

Keras is definitely your go to framework when you are just entering this field. Subbed as the Deep Learning framework for human, you can’t ask more than that. Simplifying its back end (Tensorflow, Theano, and CNTK supported) to a terrifying degree.

Using Keras, you can just state your needs, a dense layer, recurrent layer, or convolution layer. Define your loss function and optimization algorithm, and voila, you finished your model architecture building. Next, you can just put your data and watch the machine trains.

Now, Keras is officially included inside Tensorflow. So you can just install Tensorflow and use Keras inside or combine it with other functionality Keras didn’t have to make a better model.

The most popular traditional Machine Learning library. It comes to public way earlier than any Deep Learning framework, in 2007. This library includes lots of algorithm that you can use for your structured data. In example, Logistic Regression, Decision Tree, into ensembles like Random Forest.

I personally recommend you to use this framework if your data is structured, or even, sparse. The training time to such data can be faster using this framework instead of Tensorflow, with only negligible differences.

The ol’ good OpenCV. Although the way we do computer vision shifts to CNN, this library remains popular. Why though? Because the basic functionality of the library is still useful, in example, reading image or transforming image to gray scale. And the out of the box integration with NumPy is definitely the best thing we can get from this old framework.

Caffe, makes your deep learning research much easier. The only thing you need to create a deep learning model is just to make a config file, and you are done. This piece of amazing technology is developed by University of Berkeley and used widely all over the world.

If you wish to make a custom convolutional neural network god-knows what the performance level is, use Caffe. You will feel more productive.

PyTorch, my personal favorite. When it comes to computational graph, everyone will compare Tensorflow and PyTorch. Tensorflow with the static graph vs PyTorch with dynamic one.

Coding neural nets in Tensorflow could be confusing, especially for beginners. Since you need to declare your graph first without ability to feed data. So you need an imagination how will your graph will look like. On contrary, PyTorch create your graph as you execute the code, it means, you can change your graph anytime you want, hence, dynamic graph.

I don’t know is there any confusion with the Tensorflow team or not, because, as from the recent Tensorflow release, they add a feature called eager execution, a way to create dynamic graph in Tensorflow. Don’t want to lose I think.

With the demand of deep learning framework so high, and the scattered programming languages out there, MxNet comes to resque. While the above frameworks only compatible with Python, MxNet gives support to 8 (eight) languages, C++, Scala, R, Javascript, Python, Julia, Go, and even Matlab!

Currently incubating in Apache, you can probably wait a while before using this framework in production.

Microsoft’s CNTK, one of the best Deep Learning framework out there. Like Tensorflow and Theano, CNTK is one of the pioneer in computational graph. It has huge user base and used actively in Microsoft. It also can be used as a backend of Keras, means you can use its power without leveraging too deep inside. You will want to try this one!

Another framework maintained by Apache. PredictionIO is a framework for end to end usage. Means, it contains data collection, training, and inference inside one single framework. And with spark and lots of choices of database, PredictionIO is truly your one stop solution if you want to create a complete system.

An industrial level Natural Language Processing framework. spaCy is your choice if you want to create an NLP product such as chat bots. The team behind it designs spaCy for real use from day one. When I write this post, spaCy supports more than 20 languages and ready to release some more! It boasts the fastest syntactic parser in the world. So, you definitely want to use this for your project.

As you might expect, this one is a deep learning framework for java. With huge numbers of projects using java, you can directly embed a deep learning technique using this framework to your existing projects. What is more exciting than that? ND4J, your java NumPy equivalent. You can use what you can do in Python with NumPy. It means, vectorization is one of your arsenal too now.

Andrej Karpathy’s framework. Only for experimental use. But, with this library, you can directly train and infer your neural network inside your browser. You might want to try this if you are comfortable with Javascript. If not, then you can better choose Keras.

Another Deep Learning framework for Javascript. Although, this framework is using Tensorflow for the back end. So, you might expect to get a Tensorflow level performance when training and inferencing data. With this framework, you can easily use your pretrained network on Tensorflow, and of course retrain the network here.

The so called Keras’ alternative. Although not as popular as Keras, quite a lot of people use this framework because the low complexity when combining with legacy Tensorflow codes. This is especially true for people who have worked using low level Tensorflow API before and deciding to move into a higher level abstraction.

The next generation of Caffe. Built solely for python users. This release boasts the lightweight of the library instead of the huge bulk of Tensorflow. Because of the popularity and ease of use, Caffe2 now moved to PyTorch as the sub project for high level abstraction.

While people in west talks about Tensorflow everyday, the comrades in China doesn’t. Comes Paddle, developed by Baidu AI scientists to help them build their AI systems. We might know that, many amazing AI products comes from Baidu, such as self driving cars in Baidu Apollo. For that, we can safely guess that it use Paddle as their main framework.

Now, it’s Apple turn. Turicreate, a framework to simplify the development of custom Machine Learning model. Although not as flexible as other choices, this particular framework contains some interesting algorithms you might want to use. Such as, Object Detection and Neural Style Transfer. And you might expect, it is work seamlessly with iOS devices. So, iPhone users should happy with this.

While creating a chat bot seems like a lot of work, ChatterBot basically shatters that thought. It is a really easy way to create a conversational engine. And obviously possible for you to generate response based on the collection of known conversation. And you know what, it has a language independent architecture, means you can basically train any language and it will do its job.

Pattern is a web mining tools created with Python. It supports Natural Language Processing out of the box. Basically you can just tell the tools to crawl your favorite website and also get the text analysis report. It also gives you some Machine Learning algorithm such as K-Nearest Neighbor, SVM, and Neural Network, to further analyze your text.

An amazing Gradient Boosting library created by Microsoft. It boasts faster speed and lower memory consumption compare to other Gradient Boosting library. And the good news for you, it supports GPU training and inferencing out of the box.

Enough with PHP, we want to use Go! Go ahead. You will have another equipment ready for you. GoLearn, Machine Learning framework for Go. You would expect every common algorithms should be included here. From the usual Logistic Regression into Neural Network.

This particular framework is specialized towards several unique algorithms. You will see Online Hashing, AllReduce, Reductions, Learning2Search, Active, and Interactive Learning. Microsoft and Yahoo co-managed this library.

You saw Python based frameworks. You also saw PHP and Go based framework. Now, it is different. dlib is a library for Machine Learning directly in C++. While high level abstraction like Python and PHP would make faster codes, low level languages like C++ boasts manageability. It means you control almost everything, including the memory. So, basically you can do so with dlib, control the low level computation and get your best possible outcome!

Turicreate gives you the basic starter for doing Machine Learning in iOS and OSX. But, what will you do if you want more control to the algorithms? Swift AI comes to rescue. Although it is still in very early phase, the result is quite promising. Right now, the vanilla or dense neural network is available for you to use. And other architectures will come along the way.

A special framework built for accelerating AI research. It contains many models including the weights and also the datasets required to train it. So, if you want to train for ImageNet competition, in example, you can just call it in the parameter and you can immediately train your net.

Great library from Tencent. It is the high-performance neural network library, specialized for inferencing in mobile devices. Guess what? ncnn doesn’t have third party dependencies and cross platform, and runs absolutely faster than any other open source platform on mobile devices cpu. If you plan to create a smart app, this will be a good choice for you.

Heard about AutoML? An AI that can make a better AI? tpot actually comes close to that title too. It is a Machine Learning library that optimize your Machine Learning pipeline to give you a better accuracy or any metrics you know. It uses Genetic Algorithm that choose the best Machine Learning for you based on the data and the training result. So you don’t have to tweak your hyperparameter and let the tool choose it for you.

Another library for humans. AirBnB prepares this great library for you, especially for you who want to create a good item recommender on your website. Using the algorithm inside the framework, you can also rank images! yes, images! This library is created using Java and Scala, so it is really suited for production level system.

You might wonder, why isn’t there any framework geared towards IoT? Here it is, tiny dnn, a really tiny library developed using C++ 14, and specially built for IoT and embedded devices. This library is dependency free, means it is really tiny. Check it out!

While other high level abstraction, such as, Keras and TFLearn suffers on their performance, TensorLayer doesn’t. It boasts something they called zero-cost abstraction. Means, it still has the full power of Tensorflow although already have some high level abstraction on top of it. You might be interested in this one!

A.K.A Destiny. Abbreviated from Deep Scalable Sparse Tensor Network Engine. From the name alone, we will know. It should process sparse data better than the others. This thing is used by Amazon itself for their product recommendation on production level. You should not ask anything better than this.

This is your choice for building Deep Learning model in browser. Although it supports Deep Learning, it mainly focuses in Reinforcement Learning. So, fundamentally it will support advanced Reinforcement Learning technique such as replay buffers and actor-critic networks.

By default, Ray is a parallelizing framework. Or you can call it highly distributed framework. Coming with Ray Tune to optimize your hyper-parameter choice and RLlib, that can help you make Reinforcement Learning model, Ray can be your alternatives to the more popular frameworks.

A PyTorch alternative. This library focused on dynamic computational graph to make your code more “pythonic”. As for the features, there is no big differences between both of them, so you can safely choose one of them. And FYI, both of them still suffers for deployment. As there is no such Tensorflow Serving for them.

Statistical Machine Learning Engine. With fast and comprehensive features, this library can be your choice, especially if you are building system in Java or Scala. You can treat this as the alternative of Scikit-Learn in Java. It covers everything from Classification, Regression, Nearest Neighbor, and even Genetic Algorithms.

So, you are a game developer. Or intend to be a game developer. This is the library for you, if you create your game with Unity. This open source plugin gives your the capability of Reinforcement Learning agents out of the box. You can use this to control NPC behaviors, or even test your game builds to see the decision made inside the game.

A specific, really specific framework for Bayesian statistical modelling. This particular framework focuses on Markov chain Monte Carlo and Variational Inference. If you want to create something in this sub field, you will definitely choose this framework first.

Now, now. The Machine Learning framework for .NET has come. This library is geared towards software engineers without prior knowledge in Machine Learning. However, it is sufficient as Regression and Classification, two most common algorithms is already featured inside.

Similar to MXNet, H2O focused on cross language design. You can use R, Scala, Java, and Python. And the algorithms included here are no joke. You will see GBM, Random Forest, Deep Neural Networks, and even ensembles. Similar to PredictionIO, H2O gives you a way to collect data in production level system.

Different with other frameworks, GuessJS will help you collect your data instantly in your website. The goal of this framework is to give a predictive analytics towards your website user. Of course, later you will want to create a better experience for them. In example, you want to predict what will the next page user will read, or what content your user want to view next.

Ok, this is interesting. This thing will predict your location based on your wifi signal. It even works for small distance like 2-10 meters. So basically, your computer will know whether you are on desk no.1 or no.2. And aside of location, it also knows your altitude. WTF?

You heard about AutoML and TPOT. Now comes Auto SKLearn. This is your choice on automatically choose hyperparameter if you use Scikit-Learn. The API looks same with Scikit-Learn, so you can just drop in the code.

A low level Machine Learning framework for Go. It should work as a replacement of Tensorflow in Go language. As the need of deployment is high, you will need Gorgonia more if you are extensively working with Go stacks.

Algorithm

After the extensive list of tutorials and frameworks, this is the algorithm section. I will list down the most popular algorithms repository that stands out by itself in Github. Let’s check it out!

Open source algorithm for face recognition with Deep Neural Network. This repo has a huge 10k stars, means very popular. It is built on top of Torch, lua based Deep Learning framework (lua version of PyTorch). You can instantly use this library for your face recognition system, or even your face verification to change your old fingerprint system.

My personal favorite, also has 10k stars. As you can read from the title, it basically turns your jpg design file into code, REAL HTML CODE! with css of course. The creator did a really good job for introducing something “taboo” on software engineering field. The most frightening has yet to come though, so frontend engineers, you stay calm and composed.

This repo is basically the source of the previous algorithm. Tony Beltramelli, the guy responsible on bringing us this “doom”. LOL. It covers basic things he did to achieve this feat. And how to use Beam Search for choosing the right html tag for your code.

Ah, this is interesting. It is an amazing algorithm, once again, brought by CMU guys. It detects multiple joins (up to 25) for each human detected. And it can detects multiple humans! Go straight to its repository, and feel the awesomeness!

Ah, Generative Adversarial Network always amazing. It can generate many interesting piece of images. And this particular repository, can help you change your horse into a zebra and vice versa. Or you dislike your snowy road, you can change it to a summer one. A nice move!

The alternative for super popular openface. You can basically choose one of them for your dataset. And I think, there should be no big differences between both of them. I happen to make a tutorial about it here.

Basically you can change a photo into another photo. This could be really useful in some cases. In example, you want to map a village inside a jungle, this algorithm can turn the satellite photo of that village into a map. So convenient!

Flappy Bird, your favorite mobile game. Made millions of its player frustrated. Including me. As a result of that frustration, one of its fans create this repo. You might feel stupid when you saw this machine play the famous mobile game. It clears the game easily, while we are cursing.

Accompany

You have seen amazing tutorials, frameworks, and algorithms. You might want to dive in straight away into the amazing Machine Learning. This tools here will help you process your data better.

Data visualization. One of the most important tools you want to have in order to increase your productivity and understanding, You can embed this visualization tools easily into your jupyter notebook to make everything smoother.

Trust! the most basic thing you need when deploying a Machine Learning system for your users. How to make your users trust your Machine Learning prediction? Of course you need to trust it first. So, how to make you trust your Machine Learning prediction? Lime solve that issue, it will explain why a particular data outputting a particular prediction. You might need it now!

What will happen if you want to create an object detection system? You need to provide your training images. But, not necessarily your images is sufficient for training. You need to create a bounding box around the item you want to detect. How troublesome! Yes, it is. So, you might need this UI application to help you annotate your images.

Another problem arises when you tried to make your object detection system. It seems like you lack data. What should you do? The easiest thing to do is to do image augmentation. But how? This particular tool can help you achieve that feat. You can alter your image using some presets and include them back to your datasets and you are good to go.

Deployment

Your app is in kubernetes clusters. You want to deploy your machine learning inside the same cluster. How to do it? Kubeflow come to rescue. It can deploy Tensorflow out of the box to your kubernetes cluster. Now you have it ready for production!

Not necessarily a tool for deployment, ONNX opens an ecosystem which makes every single Machine Learning framework accessible to the others. This is pretty good if you have several framework in production but need some interaction between them.

At last, the most important thing in building Machine Learning system is inferencing for users. You need your user to make your system useful. Without it, you are just showing off. Tensorflow Serving makes your pipeline easy as it tackles all of your need for inferencing. It is fast, and scalable. You will need one!

Resource

The final section of this long article. Resource. You might already know what to do with your Machine Learning project. But, sometimes, you want more, more resources. This section actually gives you and round you up to 100 top Machine Learning projects.

Maybe you are interested in academic papers as the base of Deep Learning algorithm you found above. This repo gives you that. But not only that, it also gives you the roadmap, to make you understand the field better.

This is a cheat sheet. It contains the most important things you need to know as a Data Scientist or Machine Learning expert. It includes the frameworks such as Tensorflow and Keras, and even big data processing like Spark RDD.

I can tell you that the Machine Learning hype is exactly like Big Data hype version 2.0. It is everywhere! Since the introduction of Watson with claims that it can do everything, to newest technology Read more…