Data Science From Scratch To Production MVP Style: API

April 14, 2019
scikit-learn ml data engineering

Serializing

If you’re not familiar with serialization and deserialization, “serialization” is the basic concept is taking an object in memory and converting it into a state that can be written to a file or sent over the network. While “deserialization” is the act of reversing that process to turn the file back into a object that can be used in code again.

We’re leverage this so we can package up our model in a way that can be deployed along side our API and not merried to it, allowing us to separate the notion of the API and the model.

%run ml-pipeline.py

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
model = LinearRegression().fit(X_train, y_train)
with open("api/artifact/pipe.dill", 'wb') as f:
    dill.dump(pipe, f)
with open("api/artifact/model.dill", 'wb') as f:
    dill.dump(model, f)

API

Now we’re ready to write a small generic API that can be passed a pipeline and model to be interacted with over HTTP using REST.

import dill
import requests
import pandas as pd
from flask import Flask, request, jsonify


app = Flask(__name__)

with open("api/artifact/pipe.dill", 'wb') as f:
    dill.dump(pipe, f)

with open("api/artifact/model.dill", 'wb') as f:
    dill.dump(model, f)

@app.route("/predict", methods=["post"])
def predict():
    raw_json = request.get_json(force=True)
    flat_table_df = pd.json_normalize(raw_json)
    processed = pipe.transform(flat_table_df)
    return str(model.predict(processed)[0])

One thing to point out is our API only deserializes the pipe.dill and model.dill upon launch. This is a benefit as it will be faster to respond to requests after initial boot but must be restarted if a newer pipe.dill or model.dill file are provided.

The above code creates a single endpoint that can be interacted with over HTTP REST, making a GTE request with a JSON body. An example of that using curl would look like:

!curl --request POST -H "Content-Type: application/json" --data '{"temperature_celsius": 5.004}' "127.0.0.1:5000/predict"

Tuning Support Vector Machines - Visualized

June 2, 2019
scikit-learn SVM classification ml

Visualizing `XGBoost` Hyperparameters

May 26, 2019
hyperparameters xgboost classification ml

Selecting a Machine Learning Algorithm - Part II

April 14, 2019
scikit-learn ml data engineering
comments powered by Disqus