Tuning Support Vector Machines - Visualized

June 2, 2019
scikit-learn SVM classification ml

Introduction

In this post I’m going to repeat the experiment we did in our XGBoost post, but for Support Vector Machines - if you haven’t read that one I encourage you to view that first!

Support Vector Machines are one of my favourite machine learning algorithms because they’re elegant and intuitive (if explained in the right way). All this algorithm tries to do is draw a line in the dataset that seperates the classes with as little error as possible.

SVM Intuition

Imagine you had a whole bunch of chocolate M&M’s on your counter top. Also, suppose that you only have two colors of M&M’s for this example: red and blue. A linear support vector machine would be equivalent to trying to seperate the M&M’s with a ruler, in such a way that you get the best color seperation possible.

During the demonstrations below, keep this analogy in mind. The datasets we show can be thought of as the M&M piles. There are three types of datasets and they’re designed to be seperated effectively by different types of support vector machines.

Below you’re going to see multiple lines and multiple color bands - this is because we’ve tasked the support vector machines to assign a probability of the datapoint being a blue dot or a red dot (Blue M&M or Red M&M). The different shades represent varying degrees of probability between 0 and 1.

The parameter C in each sub experiment just tells the support vector machine how many misclassifications are tolerable during the training process. C=1.0 represents no tolerance for errors. C=0.0 represents extreme tolerance for errors. In most real-world datasets, there can never be a perfect seperating boundary without overfitting the algorithm.

Effects of Changing the SVM Hyperparameters

Our first step is to import our libraries and declare our plotting function because we’re going to be re-using this code a lot.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.svm import SVC
import warnings
warnings.filterwarnings("ignore")



def plot_decision_bounds(names, classifiers):
    
    '''
    This function takes in a list of classifier variants and names and plots the decision boundaries
    for each on three different datasets that different decision boundary solutions.
    
    Parameters: 
        names: list, list of names for labelling the subplots.
        classifiers: list, list of classifer variants for building decision boundaries.
        
    Returns: 
    
        None
    '''
    
    h = .02  # step size in the mesh
    X, y = make_classification(n_features=2, 
                               n_redundant=0, 
                               n_informative=2,
                               random_state=1, 
                               n_clusters_per_class=1)
    
    rng = np.random.RandomState(2)
    X += 2 * rng.uniform(size=X.shape)
    linearly_separable = (X, y)
    
    datasets = [make_moons(noise=0.3, random_state=0),
                make_circles(noise=0.25, factor=0.5, random_state=1),
                linearly_separable
                ]
    
    figure = plt.figure(figsize=(13, 8))
    i = 1
    # iterate over datasets
    for ds_cnt, ds in enumerate(datasets):
        # preprocess dataset, split into training and test part
        X, y = ds
        X = StandardScaler().fit_transform(X)
        X_train, X_test, y_train, y_test = \
            train_test_split(X, y, test_size=.4, random_state=42)
    
        x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
        y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
        xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                             np.arange(y_min, y_max, h))
    
        # just plot the dataset first
        cm = plt.cm.cool
        cm_bright = ListedColormap(['#FF0000', '#0000FF'])
        ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
        if ds_cnt == 0:
            ax.set_title("Input data")
        # Plot the training points
        ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright,
                   edgecolors='k')
        # and testing points
        ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6,
                   edgecolors='k')
        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xticks(())
        ax.set_yticks(())
        i += 1
    
        # iterate over classifiers
        for name, clf in zip(names, classifiers):
            ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
            clf.fit(X_train, y_train)
            score = clf.score(X_test, y_test)
    
            # Plot the decision boundary. For that, we will assign a color to each
            # point in the mesh [x_min, x_max]x[y_min, y_max].
            if hasattr(clf, "decision_function"):
                Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
            else:
                Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
    
            # Put the result into a color plot
            Z = Z.reshape(xx.shape)
            ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)
    
            # Plot also the training points
            ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright,
                       edgecolors='k')
            # and testing points
            ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,
                       edgecolors='k', alpha=0.6)
    
            ax.set_xlim(xx.min(), xx.max())
            ax.set_ylim(yy.min(), yy.max())
            ax.set_xticks(())
            ax.set_yticks(())
            if ds_cnt == 0:
                ax.set_title(name)
            ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'),
                    size=15, horizontalalignment='right')
            i += 1
    
    plt.tight_layout()
    plt.show()
    
    return

Changing the Degree Parameter for Poly Kernel SVM

names = ['Degree=1', 'Degree=2', 'Degree=3', 'Degree=4']
classifiers = [SVC(probability=True, kernel='poly',degree=1,C=0.8),
               SVC(probability=True, kernel='poly',degree=2,C=0.8),
               SVC(probability=True, kernel='poly',degree=3,C=0.8),
               SVC(probability=True, kernel='poly',degree=4,C=0.8)]
               
plot_decision_bounds(names, classifiers);

png

We can see visually from the results below what we talked about above - that the amount of “bend” in our ruler can determine how well we can seperate our pile of M&M’s.

Using the RBF Kernel with different C Values

Recall that the RBF kernel is suspending our pile of M&M’s in the air and trying to seperate them with a sheet of paper instead of using a ruler when they’re all flat on the counter top. The effect you see below is a 2-D projection of how the plane slices through the 3-D pile of M&M’s.

We can see here that the effect the C-value has is very much dependent on the dataset. This highlights the importance of visualizing your data at the beginning of a machine learning project so that you can see what you’re dealing with!

names = ['C=1.0', 'C=0.8', 'C=0.6', 'C=0.2']
classifiers = [SVC(probability=True, kernel='rbf',C=1.0),
               SVC(probability=True, kernel='rbf',C=0.8),
               SVC(probability=True, kernel='rbf',C=0.6),
               SVC(probability=True, kernel='rbf',C=0.2)]
               
plot_decision_bounds(names, classifiers);

png

Using the sigmoid Kernel with different C Values

The sigmoid kernel is another type of kernel that allows more bend patterns to be used by the algorithm in the training process. The effect is visualized below.

names = ['C=1.0', 'C=0.8', 'C=0.6', 'C=0.2']
classifiers = [SVC(kernel='sigmoid',degree=3,C=1.0),
               SVC(kernel='sigmoid',degree=3,C=0.8),
               SVC(kernel='sigmoid',degree=3,C=0.6),
               SVC(kernel='sigmoid',degree=3,C=0.2)]
               
plot_decision_bounds(names,classifiers);

png

Summary

The top performers were:

I hope you enjoyed this post!

Visualizing `XGBoost` Hyperparameters

May 26, 2019
hyperparameters xgboost classification ml

Selecting a Machine Learning Algorithm - Part II

April 14, 2019
scikit-learn ml data engineering

Selecting a Machine Learning Algorithm

April 14, 2019
scikit-learn ml data engineering
comments powered by Disqus