Examples¶

Note that all the IPython notebooks used in the example are available here: https://github.com/ritabratamaiti/RapidML/tree/master/Examples

ASD (Autism Spectrum Disorder) Detection¶

GitHub: https://github.com/ritabratamaiti/Autism-Detection-API

This project utilizes RapidML to detect ASD cases in adults. The training data consists of responses provided by the patients on the AQ-10 questionnaire. RapidML is utilized for selecting, training, serializing and packaging a high accuracy classifier. The files directory generated by RapidML, containing the packaged model is then uploaded to a WSGI server (See various deployment options here: http://flask.pocoo.org/docs/1.0/deploying/).

PythonAnywhere (https://www.pythonanywhere.com) was used in this project. The builder_script.py utilizes RapidML.

import RapidML
import os
import pandas as pd


# This Autism Screening Adult Data Set is from UCI Machine Learning Repository and is available here: https://archive.ics.uci.edu/ml/datasets/Autism+Screening+Adult

df = pd.read_csv('out.csv')
df = df.drop(columns = ['Unnamed: 0'])
df.head()

ml_model = RapidML.rapid_classifier(df,name='ASDapi')

Note: The training data is an Autism Screening Adult DataSet from UCI Machine Learning Repository and is available here: https://archive.ics.uci.edu/ml/datasets/Autism+Screening+Adult

The code generates the following output.

RapidML, Version: 0.1, Author: Ritabrata Maiti


       .---.        .-----------
      /     \  __  /    ------
     / /     \(  )/    -----
    //////   ' \/ `   ---
   //// / // :    : ---
  // /   /  /`    '--
 //          //..\
        ====UU====UU====
            '//||\\`
              ''``

Warning: xgboost.XGBClassifier is not available and will not be used by TPOT.
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.

Using the RapidML Classifier; Experimental, For Issues Contact Author: ritabratamaiti@hiretrex.com
Label Encoding is being done....

Training....

Generation 1 - Current best internal CV score: 1.0
Generation 2 - Current best internal CV score: 1.0
Generation 3 - Current best internal CV score: 1.0
Generation 4 - Current best internal CV score: 1.0
Generation 5 - Current best internal CV score: 1.0

Best pipeline: DecisionTreeClassifier(input_matrix, criterion=entropy, max_depth=2, min_samples_leaf=4, min_samples_split=6)

Sample Output from input dataframe:
1,1,0,1,0,0,1,1,0,1,6,35.0,f,White-European,no,yes,United States,no,Self,NO

The generated model, scripts and serialized files are stored in the directory: ASDapi.This directory is uploaded to a WSGI server, for making cloud predictions.

Note: This is a complete project, and some parts (such as the creation of the Android application) is outside the scope of RapidML documentation. Please visit the project on Github for more details.

Boston House Prices¶

Let’s say we are building a machine learning model to run on the cloud and predict housing prices in an area, using parameters such as crime rates, business development, pollution metrics etc. We will be using the Boston House Prices dataset, due to its wide availability and usage within machine learning academia. Dataset description here: https://www.kaggle.com/c/boston-housing

Note: We will be using sklearn.datasets for easy loading of the Boston-housing dataset within Python.

Since we are predicting prices, it is clearly a regression problem. We will be using RapidML.rapid_regressor_arr for this task.

# coding: utf-8

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import RapidML

housing = load_boston()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target,
                                                    train_size=0.75, test_size=0.25)

model = RapidML.rapid_regressor_arr(X_train, y_train)

print(model.m_tpot.score(X_test, y_test))

The following output is generated.

RapidML, Version: 0.1, Author: Ritabrata Maiti


       .---.        .-----------
      /     \  __  /    ------
     / /     \(  )/    -----
    //////   ' \/ `   ---
   //// / // :    : ---
  // /   /  /`    '--
 //          //..\\
        ====UU====UU====
            '//||\\\`
              ''``

Warning: xgboost.XGBClassifier is not available and will not be used by TPOT.
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.

Using RapidML Regressor with arrays, Inputs will not be label encoded.; Experimental, For Issues Contact Author: ritabratamaiti@hiretrex.com

Training....

Generation 1 - Current best internal CV score: -11.913707598413463
Generation 2 - Current best internal CV score: -11.913707598413463
Generation 3 - Current best internal CV score: -11.913707598413463
Generation 4 - Current best internal CV score: -11.913707598413463
Generation 5 - Current best internal CV score: -11.404014702360742

Best pipeline: GradientBoostingRegressor(input_matrix, alpha=0.75, learning_rate=0.1, loss=huber, max_depth=3, max_features=1.0, min_samples_leaf=5, min_samples_split=4, n_estimators=100, subsample=0.6000000000000001)
-10.908425630183695

As we can see in this example, a score of -10.908425630183695 has been achieved. Do note that different models may be generated on a separate program run and hence the scores may fluctuate by a small margin (approximately 1% or so).

In the directory RapidML_files, the model file and API.py script has been generated which can be uploaded to a WSGI server (with Flask support) to perform cloud predictions.

Using RapidML to build a neural network (For recognizing hand-written digits)¶

This example serves to demonstrate the versatility of RapidMl, by using udm(User Defined Models). Do note that we will be using matplotlib to visualise the digits’ images. In this example, we use RapidML.rapid_udm_arr in order to feed a neural network classifier (sklearn.neural_network.MLPClassifier) as the machine learning model. We use the digits dataset from sklearn.datasets, and train the neural network on half the data. The other half is used for testing and visualization.

The following are Jupyter Notebook cells and their corresponding output.

Using RapidML with User Defined Models and Arrays, Inputs will not be label encoded; note that the model provided by the user should be a Scikit_learn model and not a TPOT object.; Experimental, For Issues Contact Author: ritabratamaiti@hiretrex.com

Training....

C:UsersRitabrata MaitiAnaconda3libsite-packagessklearnneural_networkmultilayer_perceptron.py:564: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
  % self.max_iter, ConvergenceWarning)

Classification report for classifier MLPClassifier(activation='relu', alpha=1, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False):
             precision    recall  f1-score   support

          0       0.99      0.97      0.98        88
          1       0.95      0.92      0.94        91
          2       0.99      0.98      0.98        86
          3       0.96      0.85      0.90        91
          4       0.99      0.89      0.94        92
          5       0.93      0.96      0.94        91
          6       0.91      0.99      0.95        91
          7       0.95      0.99      0.97        89
          8       0.93      0.94      0.94        88
          9       0.86      0.96      0.91        92

avg / total       0.95      0.94      0.94       899


Confusion matrix:
[[85  0  0  0  1  0  2  0  0  0]
 [ 0 84  0  1  0  1  0  0  0  5]
 [ 1  0 84  1  0  0  0  0  0  0]
 [ 0  0  1 77  0  3  0  4  6  0]
 [ 0  0  0  0 82  0  6  0  0  4]
 [ 0  0  0  0  0 87  1  0  0  3]
 [ 0  1  0  0  0  0 90  0  0  0]
 [ 0  0  0  0  0  1  0 88  0  0]
 [ 0  3  0  0  0  0  0  0 83  2]
 [ 0  0  0  1  0  2  0  1  0 88]]

Note: If you wish to use the model as a flask API, do remember to flatten the image, to turn the data in a (samples, feature) matrix, and then convert to URL argument. However, this method hasn’t undergone complete testing and is not guaranteed to work. However, it is possible to modify the API.py file to say, accept an image and then flatten it within the script itself.