Deep Learning and AI

10 GitHub Repositories for AutoML

September 15, 2022 • 12 min read

SPC-Blog-10-GitHub-Repos-for-AutoML-and-How-to-Use-Them.jpg

Breakthroughs in artificial intelligence and machine learning have been two of the most exciting topics of the last two decades. Extensive research and hard work are necessary for machine learning and data science engineers to understand and run their models effectively.

While they may differ depending on different individuals, the traditional machine learning steps include:

Data Acquisition
Data Exploration
Data Preparation
Feature Engineering
Model Selection
Model Training
Hyper Parameter Tuning
Predictions

While 8 steps may not seem like much when building a machine learning model, a single step from the ones above will take quite some time to perfect!

The problem is exacerbated when non-expert machine learning practitioners go through these steps for the first time; the process will usually require more time and resources to complete, and even then the final result may not be as expected.

AutoML comes in handy by automating the big chunk of the model creation process for both experts and non-experts alike.

What is Automated Machine Learning (AutoML)?

Automated Machine Learning, more commonly referred to as AutoML, is machine learning made easier. AutoML uses automatic processing done by given frameworks to make machine learning more accessible to non-machine learning experts.

It focuses on accelerating the research of artificial intelligence and improving the efficiency of machine learning models.

While the traditional machine learning process focuses on all 8 steps mentioned previously, AutoML covers two steps:

Data Acquisition is the process of collecting, filtering and cleaning the data used before storing it in a data warehouse.
Predictions refer to the actual output that a given model returns, a well-trained model will most likely return accurate final predictions.

Frameworks for data exploration, data preparation, feature engineering, model selection, model training, and tuning of the final model will cover the other 6 steps.

The Advantages of AutoML

Improves Work Efficiency
Better Final Results
Minimizing Errors
Out-Scales Machine Learning

AutoML Popular Frameworks

Now that we’ve discussed what AutoML is and gone over a few of its advantages, we will cover the top 10 AutoML frameworks, where to find them, and what functionalities they offer.

1. Google AutoML

Google AutoML is one of the most famous frameworks available, landing it the number one spot on our list. Google has launched many AutoML frameworks such as Google AutoML vision, Google AutoMl Natural Language, and more.

2. Auto SKLearn

Users who have dabbled in machine learning before may be familiar with the name SKlearn. Built as an add-on to the popular sci-kit-learn library, Auto SKLearn is an open-source machine learning framework that handles the automation of machine learning tasks.

The Auto Sklearn framework is capable of performing its model selection, hyperparameter tuning, and characterization, a unique feature of the Auto SKlearn framework.

By performing its model selection, Auto SKlearn will automatically search for the best algorithm that can handle the user’s given problem.

Moving on to Auto SKlearn's second functionality, we have hyperparameter tuning. As one of the final steps of any machine or deep learning model, users should find the best model parameters to optimize the result. This task requires a lot of time and can be easily automated by such frameworks.

The unique and final benefit of using Auto SKlearn is its ability to perform automatic characterization. Characterization is the process of transforming raw data into usable information.

3. TPot

TPOT, also known as Tree Pipeline Optimization Tool, is one of the first python open-source autoML software packages. It focuses on optimizing machine learning pipelines using genetic programming.

The main goal of TPOT is to automate the building of ML pipelines by combining a flexible expression tree representation of pipelines with stochastic search algorithms such as genetic programming.

Note that TPOT works on top of the sci-kit-learn library, which must be installed first.

4. AutoKeras

AutoKeras is an open-source library built for both AutoML and deep learning models, originally developed by DATA lab.

Auto Keras helps non-expert machine and deep learning enthusiasts run and train their models with minimum effort. With its goal of making machine learning accessible to everyone, Auto Keras is an excellent tool for beginners

5. Ludwig

Ludwig is an open-source autoML framework with the main focus of assembling and training deep learning models using a simple configuration file system.

By having its users provide a configuration file that defines the inputs and outputs of a given model with their respective data types, the Ludwig framework will utilize this data in building its deep learning model based on the previously mentioned attributes.

6. MLBOX

MLBOX is on the rise and quickly becoming one of the top automated machine learning framework tools.

According to the MLBOX official documentation, it offers the following benefits:

Fast reading and distributed data preprocessing/cleaning/formatting.
Highly robust feature selection and leak detection.
Accurate hyper-parameter optimization in high-dimensional space.
State-of-the-art predictive models for classification and regression (Deep Learning, Stacking, LightGBM, and more).
Prediction with model interpretation.

7. AutoGloun

Intended for both expert and non-expert machine learning practitioners alike, AutoGloun focuses on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data.

According to AutoGloun online documentation, AutoGLoun enables a user to:

Quickly prototype deep learning and classical ML solutions for raw data with only a few lines of code.
Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge.
Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing.
Easily improve/tune bespoke models and data pipelines, or customize AutoGluon for a particular use-case.

8. Microsoft Neural Network Intelligence (NNI)

The Microsoft Neural Network Intelligence, also known as NNI, is a toolkit designed for automating feature engineering, neural architecture search, hyperparameter tuning, and model compression for deep learning.

The NNI tool supports frameworks such as PyTorch, TensorFlow, Scikit-learn, XGBoost, LightGBM, and more. The main benefit of using the Microsoft Neural Network is the neural architecture search, the NNI tool supports Multi-trail (grid search, regularized evolution, policy-based IRL, etc) and One-shot ( DARTS, ENAS FBNet, etc ) neural architecture search.

This tool offers several hyper-parameter tuning algorithms such as Bayesian optimization, Exhaustive search, and Heuristic search. Check the readme file of the NNI on Github for more information on what else is provided by this tool.

9. TransmogrifAI

TransmogrifAI is designed to help developers accelerate their machine learning productivity. TransmogrifAI runs on top of Apache Spark.

As mentioned briefly in the Github readme file on Transmogrif is that “Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time.”

Like the other autoML frameworks mentioned, the TransmogrifAI tool is capable of choosing the most optimal algorithm for a user’s chosen dataset.

10. H2O AutoML

H2O autoML is an open-source framework tool created by H2O which supports both R and Python programming.

It also supports the most widely used statistical and machine learning algorithms including gradient boosted machines, generalized linear models, and deep learning.

The H2O autoML interface accommodates new machine learning users by asking for as few parameters as possible. A user’s main task when it comes to utilizing the H2O tool is to provide the dataset.

Other Useful AutoML Tools

1. Hypertunity

Hypertunity is a lightweight tool designed for optimizing a model’s given hyperparameters with a lightweight package. They are modular simple, and extensible to allow for seamless scheduling implementations.

Hypertunity supports Bayesian optimization with GPyOpt, Slurm compatible scheduler, and real-time visualization with Tensorboard (via HParams plugin).

2. Dragonfly

Dragonfly is an open-source autoML tool designed specifically for scalable Bayesian optimization.

Bayesian optimization is used to evaluate highly expensive black-box functions which are beyond normal vanilla optimization.

Dragonfly allows new users to solve scalable Bayesian optimization errors with the least amount of knowledge needed.

3. Ray Tune

As our second hyperparameter optimization tool, Ray Tune is a unified framework for scaling AI and Python applications.

It enables simple scaling of AI workloads with distributed data processing, distributed training, scalable hyperparameter tuning, scalable reinforcement learning, and scalable programmable serving.

4. Auto Graph Learning

Auto Graph Learning is a unique autoML framework that focuses on machine learning for graph data sets with great ease and simplicity.

They use datasets to maintain datasets for graphs-based machine learning based on Datasets in Pytorch Geometric or Deep Graph Library.

GitHub Repositories for Auto Machine Learning

With advancements in the machine and deep learning world, a large increase in the demand for machine learning experts is going unanswered.

Here is where the automation of machine learning tools and techniques comes in handy, by allowing new users to fully build functional and highly optimized models with greater ease than ever before.

To keep it short, when searching for the perfect automation machine learning tool you should focus on what you’re trying to achieve with your given model and the exact parts in the machine learning processes that you want automated. We recommend trying a couple of the above-mentioned autoML tools yourself and then staying only with a few that you find to be efficient and easy to use.

Tons of other autoML frameworks are available and may have been on the above list in different circumstances. If you are interested to check our other popular frameworks on Github please see “popular autoML frameworks on Github“.

If you are looking to start your machine learning journey, SabrePC offers compelling workstations and servers to get you started. With an assortment of GPUs and CPUs to choose from, our sales engineers can help spec out a system tailor fit for your needs.

Blog