January 1, 0001

Hyperparameter Selection: An Exploration

Hyperparameter selection process for model tuning.

Introduction

Hyperparameters express “higher-level” properties of the model such as its complexity or how fast it should learn. Hyperparameters are usually fixed before the actual training process begins, and cannot be learned directly from the data in the standard model training process. Model developers will predefine these hyperparameters by testing different values, training different models, and choosing the values that test better.

Examples of hyperparmeters are: - number of leaves in a tree based model - clusters in a k-means algorithm - number of hidden layers in a neural network - penalization incurred by additional model complexity learning rate

In this notebook I will do hyperparameter selection for a model, including visualization of the hyperparameter grid as well as a revision after seeing unintuitive results.

#Image displat packages
from IPython.display import Image
from IPython.core.display import HTML
from warnings import filterwarnings

#Loading Necessary Background Packages
from sklearn import datasets
import matplotlib.pyplot as plt
import numpy as np

#settings
filterwarnings('ignore')
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth'] = 1
%matplotlib inline

# Load data
iris = datasets.load_iris()
X = iris.data 
y = iris.target

Hyperparameter Optimization

We’ll be using cross validation to initially find the optimal hyperparameters for our SVM kernal classifier.

When training our SVM clasisifier with the Radial Basis Function (RBF) kernel, two parameters must be considered: C and gamma. Proper choice of these two hyperparameters is critical to the SVM classifier’s performance. We use cross validation and a separate train/test set because we do not want to overfit our hyperparamters to our data. Hence we will be finding the parameters on a separate dataset than the dataset we will be evaluating our model on.

The parameter C, common to a lot of ML algorithms, trades off misclassiciation for simpliciity. A low C will make the decision boundary smooth (trading off some accuracy), while a high C will trade off high accuracy for simplicity. See below for an example of this in action.

gamma defines how much influence a single training example has. Consider it a ‘sphere of influence’. The larger gamma is, the closer other examples must be to be affected.

We will be splitting our data into testing and training datasets, like we did before. We will be finding the our optimal hyperparameters on the training dataset before we evaluate them on our training set.

# split into test and train
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=False)
# load in SVC - a kernel classifier function from the 
# scikit-learn library
from sklearn.svm import SVC

# create a default instance of the classifier
classifier = SVC()

I will create a dictionary of possible gammas and C’s for our cross validator to run through. I looked through literature and examples of the SVM rbf kernel to get a sense of what others were using. For an initial search, a logarithmic grid with basis 10 is used.

C_range = np.logspace(-2, 10, 13)
gamma_range = np.logspace(-9, 3, 13)

I will be using the GridSearchCV package to run hyperparameter cross validation on the training dataset. GridSearchCV’s default settings will run 3-fold cross validation, but I will be doing 5-fold cross validation.

# import cross validation packages
from sklearn.model_selection import GridSearchCV, StratifiedShuffleSplit, KFold

# setup 5 fold cross validation to be fed into the GridSearchCV package
cv_5 = KFold(5)

# create a parameter range to test over
parameters = {'kernel':['rbf'], 'gamma':gamma_range, 'C':C_range}

# create an instance of the GridSearchCV cross-validator - using our classifier 
# and choice or parameters
cross_validator = GridSearchCV(classifier, parameters, cv=cv_5)

# fit the data
cross_validator.fit(X_train, y_train)

best_gamma = cross_validator.best_estimator_.gamma  
best_C = cross_validator.best_estimator_.C

print('Best Gamma: {}, Best C: {}'.format(best_gamma, best_C))

Best Gamma: 1e-07, Best C: 100000000.0

best_gamma = cross_validator.best_estimator_.gamma  
best_C = cross_validator.best_estimator_.C

print('Best Gamma: {}, Best C: {}'.format(best_gamma, best_C))

January 1, 0001

Hyperparameter Selection: An Exploration

Introduction

Hyperparameter Optimization

Comments