Starting a machine learning project sound really exciting. But when it comes to picking the right kind of net to use, well, things can get a little confusing.
First, you need to decide, whether you’re going to build a classifier, or you want to find a pattern in your data. What kind of learning you’re looking for?
- Unsupervised Learning – for pattern recognition.
- Supervised Learning is using a labeled data for building a classifier.
- Deep Belief Network is used for image recognition and API development.
- Convolutional Net is used for object recognition.
- Recurrent net is used for speech recognition.
Once you decide which type of machine learning net you need to develop, you can look for a proper machine learning library or framework, that supports required learning algorithms.
- Libraries, Frameworks, Platforms: What’s the difference?
- Most Popular Machine Learning Libraries
- Top Machine Learning Frameworks
- Microsoft Cognitive Toolkit (CNTK)
- Oryx 2
- Bottom Line
Libraries, Frameworks, Platforms: What’s the difference?
If you aren’t a developer or you’re unfamiliar with the software industry, those three terms may confuse you. Sometimes even specialists have hard times to distinguish libraries from frameworks. And the line becomes even more when you’re stepping into such terra incognita as Machine Learning. Nevertheless, there are some concepts that will help you distinguish one from another.
A software library is a premade set of functions and models that you can call through your own programs. So instead of reinventing the wheel, you can just integrate a library that already hosts one.
Libraries are typically created by high-qualified software teams. However, several machine learning libraries, like Scikit-learn, were created by individuals, pioneering specialists. In particular, this happens due to the young age of the industry. MA has plenty of great libraries available, several of which were created by key specialists in the field.
Popular libraries are regularly maintained by those specialists and community, which ensures that your program will stay up to date while using them. The most used libraries are continuously upgraded and enhanced by core contributors, that ensures the integrity of the code.
On the other hand, frameworks are much more complex and abstract organisms. Opposed to the libraries that are integrated into your program, frameworks provide a standardized environment to build and deploy applications. Your app can be built on top of a particular framework. The framework serves as an abstraction that provides tons of generic functionality. You can selectively change how those functions behave by adding additional code-lines.
The software platform is a software environment where libraries, frameworks, and computing programs are executed. It can be a browser or an Operating System. Software platforms can put some limitation on the program or can be not compatible with a particular app, as it happens in case of Android and iOS platforms. Meanwhile, the industry leaders in Machine Learning are trying to make their product compatible with as many libraries as possible, as they try to take a larger share of the market and become MA development platform #1.
Most Popular Machine Learning Libraries
GitHub commits and contributors to different ML tools
Have you ever tried to code your own Machine Learning algorithm or a Deep Network? If yes, you might already use one of the libraries listed below. Their purpose is to simplify the development process. Several of them are more specialized, suitable for particular fields or tasks, others are highly-customizable, like TensorFlow, that can be used in any given field.
If you’re building a commercial app that requires the use of a machine learning algorithm, your best bet is to use commercial grade libs like TensorFlow, Torch, and Caffe.
Google’s TensorFlow library is perhaps the most advanced and the easiest accessible libraries nowadays. It is open-source, supported by a vast highly-professional community of engineers. What’s more important TensorFlow is actually used by Google itself, it is developed as part of Google Brain project and already successfully used to train neural networks that operate in various Google’s services.
In essence, this library is designed to improve the palatability of machine learning so that research models could be more easily applied to commercial-grade applications. Even though TensorFlow was designed to support neural networks, it can support any domain where computation can be modeled as a data flow graph.
Much like Theano, it is based on the concept of a computational graph. In a computational graph, nodes represent either persistent data or math operation, and edges represent the flow of data between nodes. The data that flows through these edges is a multi-dimensional array, known as a tensor. That where the library’s name comes from.
As for an open-source library, it has impressively informative and comprehensive documentation, in addition to the active topic on StackOverflow. Plus the community is developing more language interfaces to connect new libraries.
If you need a library for machine vision or a forecasting application, then Caffe be a good choice. This library lets you build your own deep nets with the sophisticated set of layer configuration options. You can even access premade nets that were uploaded to a community website.
Caffe was originally designed for machine vision tasks, so it’s well-suited for convolutional nets. However, recent versions of the library provide support for speed and text, reinforcement learning, and recurrent nets for sequence processing.
Since the library is written in C++ with CUDA, it can easily switch between a CPU and a GPU as needed. Matlab and Python interfaces are also available for Caffe.
The library allows the creation of machine learning nets with different types of layers. Such as vision layer, a loss layer, an activation layer. Such multi-layer approach allows you to develop extremely complex nets for your applications.
Need a commercial-quality deep learning library with plenty of extensions and the support, then you should take a look at Torch. It was developed by a group of specialists from Google, Facebook, and Twitter.
It was grown from a library for LuaJIT, a popular implementation of the Lua programming language. As a result, it provides a powerful vectorized implementation of the math behind Deep Learning algorithms. In addition, there are many libs that extend the functionality of Torch for various applications.
To some extent, Torch even allows you to set up, train, and run a deep net by configuring its hyper-parameters. Once configured, such network can be called by your application on the routine basis.
There are also special features, that could be useful in for your project. For example, “CuTorch” library that provides GPU support. “NN” library the allows to stack different nets together.
If you want to use a net for educational purposes, then you should take a look at the Theano library. It provides an important set of function for building machine learning nets that will train quickly even on personal computers.
Theano is a Python library that lets you define and evaluate mathematical expressions with vectors and matrices. Shortly speaking, Theano allows you to quickly train enormous nets with the use of only one single machine. The drawback is that you’ll need to build a machine learning net from the ground up.
This library doesn’t provide complete functionality for creating a specific type of deep net. Instead, you’ll need to code every aspect of the net: like layers, models, activations and training methods by yourself.
Fortunately, there are some additional libraries, that may help you over the development of the net: Blocks platform provides a wrapper for each Theano function, Passage suited for text analysis.
Top Machine Learning Frameworks
Keras is more like a high-level interface that can run on top of many popular libraries like TensorFlow, Microsoft Cognitive Toolkit, Theano, and Apache MXNet. Keras has quickly grown in popularity in many thanks to its motto: to make drafting new deep learning models as easy as writing new methods in Python. Which it actually delivers as a perfect example of open source machine learning software.
If you’re struggling to make your way through tons of intricate methods and option present in TensorFlow, Keras is here to help you. It allows you to quickly create common types of neuron layers, select metrics, error function, the optimization method, and finally train the model.
The power of Keras lies in its modular architecture, that allows users to add new neural blocks on the go and customize their ML networks in various ways.
Despite the recent popularity of deep learning technology, many of 10 years old technologies are still rocking. This is exactly the case with the Scikit-learn framework, which is known to be an industry standard for years.
The framework was developed in academia environment and thus has almost every machine learning algorithms possible – starting from linear and logistic regressors to SVM classifiers and random forests. Maybe that is why it was adopted by Spotify, Evernote, e-commerce giant Birchbox, and Booking.com, for product recommendations and customer service.
Moreover, thanks to its age and academic environment, Scikit-learn is one of the most well-documented ML frameworks.
Microsoft Cognitive Toolkit (CNTK)
CNTK is a machine learning framework developed by Microsoft Research division. Its most popular field of application lies in speech recognition, but it is also well-suited for text and image training tasks.
Thanks to its well-developed architecture, CNTK supports a huge variety of machine learning algorithms including the most popular: CNN, LSTM, RNN, Sequence-to-Sequence and Feed Forward. On top of that CNTK can work using CPU and GPU hardware setups, which is a must in the modern world. You can work with languages like C++ and python and either uses the built-in training models or build your own.
Thanks to all these features Microsoft Cognitive toolkit is perceived as one of the easy-working ML frameworks in the field.
Oryx 2 is built on Apache Spark and Apache Kafka and specialized on real-time large-scale machine learning. The app is written in Java, using Apache Spark, Hadoop, Kafka, and few other libraries.
It can be used as a framework for building applications and most useful when it comes to complete implementations of the batch, speed and serving layer for three machine learning use cases. These are ready to deploy out-of-the-box, or to be used as the basis for a custom application:
It also includes packaged, end-to-end applications for collaborative filtering, classification, regression, and clustering.
When the patterns get really complex, neural nets start to outperform all of their competition. Whether you’re in fintech or in marketing, at some point you may want your computer to recognise VERY complex patterns – then you’ll need to use neural networks and machine learning.
The ML libraries and frameworks are here to help you get into the new field and grasp such complex subject as machine learning. And we’re here to help you with that!
Already have an awesome app idea?