Visualizing `XGBoost` Hyperparameters

May 26, 2019
hyperparameters xgboost classification ml

Goal: Understand - through visualization - the effect of changing the primary hyperparameters on decision boundaries in extreme gradient boosting machines with xgboost library.

Introduction

If you are like me, complex concepts are best grasped if you can see their effect visually. There are a lot of different machine learning algorithms and even more hyperparameters that need to be tuned. Although methods like randomized search and gridsearch negate some of the need to understand how the parameters work, it’s still a good idea to have an intuitive sense of their effect in case you’re not getting the performance you desire.

In part 1 of this series, I’m going to focus on XGBoost. I’ve spent a lot of time with XGBoost and its performance on most problems is exceptional. Most tuning guides and best practices you find on the internet provide numerial heuristics and rules-of-thumb for tuning the parameters. But what effect do these have on the decision boundaries? What does the effect look like? All else being equal, what effect does increasing the gamma parameter have on the decision boundary in different datasets? We’re going to explore this in detail in this post.

Attribution: The plotting solution used in this tutorial was borrowed from the great classifier comparison tutorial on the sklearn website. In this series I adapt this to visualize the effect of hyperparameter tuning on a variety of top performing algorithms.

I’m going to change each parameter in isolation and plot the effect on the decision boundary. All hyperparameters will be set to their defaults, except for the parameter in question. We’ll do this for:

Our first step is to import our libraries and declare our plotting function because we’re going to be re-using this code a lot.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from xgboost import XGBClassifier
import warnings
warnings.filterwarnings("ignore")



def plot_decision_bounds(names,classifiers):
    
    '''
    This function takes in a list of classifier variants and names and plots the decision boundaries
    for each on three different datasets that different decision boundary solutions.
    
    Parameters: 
        names: list, list of names for labelling the subplots.
        classifiers: list, list of classifer variants for building decision boundaries.
        
    Returns: 
    
        None
    '''
    
    h = .02  # step size in the mesh
    X, y = make_classification(n_features=2, 
                               n_redundant=0, 
                               n_informative=2,
                               random_state=1, 
                               n_clusters_per_class=1)
    
    rng = np.random.RandomState(2)
    X += 2 * rng.uniform(size=X.shape)
    linearly_separable = (X, y)
    
    datasets = [make_moons(noise=0.3, random_state=0),
                make_circles(noise=0.25, factor=0.5, random_state=1),
                linearly_separable
                ]
    
    figure = plt.figure(figsize=(13, 8))
    i = 1
    # iterate over datasets
    for ds_cnt, ds in enumerate(datasets):
        # preprocess dataset, split into training and test part
        X, y = ds
        X = StandardScaler().fit_transform(X)
        X_train, X_test, y_train, y_test = \
            train_test_split(X, y, test_size=.4, random_state=42)
    
        x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
        y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
        xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                             np.arange(y_min, y_max, h))
    
        # just plot the dataset first
        cm = plt.cm.cool
        cm_bright = ListedColormap(['#FF0000', '#0000FF'])
        ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
        if ds_cnt == 0:
            ax.set_title("Input data")
        # Plot the training points
        ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright,
                   edgecolors='k')
        # and testing points
        ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6,
                   edgecolors='k')
        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xticks(())
        ax.set_yticks(())
        i += 1
    
        # iterate over classifiers
        for name, clf in zip(names, classifiers):
            ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
            clf.fit(X_train, y_train)
            score = clf.score(X_test, y_test)
    
            # Plot the decision boundary. For that, we will assign a color to each
            # point in the mesh [x_min, x_max]x[y_min, y_max].
            if hasattr(clf, "decision_function"):
                Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
            else:
                Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
    
            # Put the result into a color plot
            Z = Z.reshape(xx.shape)
            ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)
    
            # Plot also the training points
            ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright,
                       edgecolors='k')
            # and testing points
            ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,
                       edgecolors='k', alpha=0.6)
    
            ax.set_xlim(xx.min(), xx.max())
            ax.set_ylim(yy.min(), yy.max())
            ax.set_xticks(())
            ax.set_yticks(())
            if ds_cnt == 0:
                ax.set_title(name)
            ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'),
                    size=15, horizontalalignment='right')
            i += 1
    
    plt.tight_layout()
    plt.show()
    
    return

The first parameter we are going to visualize is max_depth. This parameter determines how deep the algorithm is allowed to build the tree. In general, the deeper the tree gets the more resolution the tree will learn - as well as noise. Deep trees tend to suffer from overfitting, but this can be mitigated in some respects. The intention of the XGBoost classifier is to iteratively develop “weak learners” that improve on the error of the previous tree. Weak learners tend to be shallower trees. Because of this, going too deep can negate some of the benefits of ensemble methods.

Effects of XGBoost params on Decision Boundries

max_depth

names = ['Max Depth = 1', 'Max Depth = 3', 'Max Depth = 10', 'Max Depth = 20']
classifiers = [XGBClassifier(max_depth=1),
               XGBClassifier(max_depth=3),
               XGBClassifier(max_depth=10),
               XGBClassifier(max_depth=20)
               ]

plot_decision_bounds(names,classifiers);

png

gamma

names = ['gamma = 0', 'gamma = 0.5', 'gamma = 2', 'gamma = 10']
classifiers = [XGBClassifier(gamma=0),
               XGBClassifier(gamma=0.5),
               XGBClassifier(gamma=2),
               XGBClassifier(gamma=10)
               ]

plot_decision_bounds(names,classifiers);

png

subsample Size

names = ['subsample = 1.0', 'subsample = 0.9', 'subsample = 0.5',
         'subsample = 0.1']
classifiers = [XGBClassifier(subsample = 1.0),
               XGBClassifier(subsample = 0.9),
               XGBClassifier(subsample = 0.5),
               XGBClassifier(subsample = 0.1)
               ]

plot_decision_bounds(names,classifiers)

png

n_estimators

names = ['n_estimators = 2', 'n_estimators=5', 'n_estimators=10',
         'n_estimators=50']
classifiers = [XGBClassifier(n_estimators = 2),
               XGBClassifier(n_estimators = 5),
               XGBClassifier(n_estimators = 25),
               XGBClassifier(n_estimators = 100)
               ]

plot_decision_bounds(names,classifiers)

png

learning_rate

names = ['learning_rate = 0.001', 'learning_rate=0.01', 'learning_rate=0.1',
         'learning_rate=0.5']
classifiers = [XGBClassifier(learning_rate = 0.001),
               XGBClassifier(learning_rate = 0.010),
               XGBClassifier(learning_rate = 0.100),
               XGBClassifier(learning_rate = 0.500)
               ]

plot_decision_bounds(names,classifiers)

png

min_child_weight

names = ['min_child_weight = 0.01', 'min_child_weight = 0.05',
         'min_child_weight = 0.25', 'min_child_weight = 0.75']
classifiers = [XGBClassifier(min_child_weight = 0.01),
               XGBClassifier(min_child_weight = 0.05),
               XGBClassifier(min_child_weight = 0.25),
               XGBClassifier(min_child_weight = 0.75)
               ]

plot_decision_bounds(names,classifiers)

png

Summary

Thanks for reading!

Tuning Support Vector Machines - Visualized

June 2, 2019
scikit-learn SVM classification ml

Selecting a Machine Learning Algorithm - Part II

April 14, 2019
scikit-learn ml data engineering

Selecting a Machine Learning Algorithm

April 14, 2019
scikit-learn ml data engineering