All machine learning models have a set of hyperparameters or arguments that drive optimal accuracy and performance. The best hyperparameter is subjective and is different for every dataset and it is up to the practitioner to evaluate.
To find the best possible hyperparameters specifications for your dataset is through trial and error, the main concept behind hyperparameter optimization. Hyperparameter optimization involves searching through a range of organized or random values to find a subset of parameter results that deliver the best performance on your data. Two prominent techniques include Grid Search and Random Search. Here’s an overtly simplified graphic with both Grid and Random using the same number of combinations on a two dimensional plane:
Using the Scikit-Learn Library we will instantiate the two methods and run them to achieve a best fit parameter.
Grid Search
When performing hyperparameter optimization, we first need to define a parameter space or parameter grid, where we include a set of possible hyperparameter values that can be used to build the model. Here we use the grid space for a Random Forest Classifier. Other algorithms would have different grid spaces.
grid_space={'max_depth':[3,5,10,None],
'n_estimators':[10,100,200],
'max_features':[1,3,5,7],
'min_samples_leaf':[1,2,3],
'min_samples_split':[1,2,3]
}
The grid search technique is then used to place these hyperparameters in a matrix-like structure, and the model is trained on every combination of hyperparameter values. Because there is a lot of different combinations of hyperparameters to test, Grid Search can take some time to complete and is generally slower than Random Search due to sheer volume. Also to store each model’s accuracy requires more memory too.
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(rf,param_grid=grid_space,cv=3,scoring='accuracy')
model_grid = grid.fit(X,y)
The model with the best performance is then selected.
print('Best parameters are: '+str(model_grid.best_params_))
print('Parameter accuracy: '+str(model_grid.best_accuracy_))
Random Search
While grid search looks at every possible combination of hyperparameters to find the best model, random search only selects and tests a random combination of hyperparameters.
This technique randomly samples from a grid of hyperparameters instead of conducting an exhaustive search. We use the random integer function to randomly select a number within each parameter in the grid.
from scipy.stats import randint
rs_space={'max_depth':list(np.arange(10, 100, step=10)) + [None],
'n_estimators':np.arange(10, 500, step=50),
'max_features':randint(1,7),
'criterion':['gini','entropy'],
'min_samples_leaf':randint(1,4),
'min_samples_split':np.arange(2, 10, step=2)
}
Now we can specify the number of runs of random search. Lower numbers will be faster but may result in less-than-ideal scores. More runs will start to trend towards the full exhaustive grid search for better results. You can get lucky though! Or if just getting a good enough accuracy is sufficient to your model intentions, then the time saved can be valuable.
from sklearn.model_selection import RandomizedSearchCV
rf = RandomForestClassifier()
rf_random = RandomizedSearchCV(rf, space, n_iter=500, scoring='accuracy', n_jobs=-1, cv=3)
model_random = rf_random.fit(X,y)
Then we can prompt to return the best model:
print('Best parameters are: '+str(model_random.best_params_))
print('Best score is: '+str(model_random.best_score_))
Should You Use Grid Search or Random Search?
Choosing grid search and random search is valid questions so here are some important considerations.
- Known or Unknown Range:
- Use Grid Search when you have a ballpark range of which parameters values would yield a favorable result. Because grid search is exhaustive, keep the range small to reduce the time spent optimizing.
- Use Random Search on a broad range of values if you are unsure of what range would perform well. Keep iterations reasonable, not too large not too small so you can achieve a good result.
- Speed
- Grid Search can take a long computation time especially on complex and large data where there are a lot of parameters to tune. When there are 3 or more hyperparameters to optimize, grid search can take a long time to compute
- Random Search computation time depends on the number of iterations. More iterations can yield a better result but can take time and possibly not give the best of the best. Lower number of iterations can leave performance on the table.
- Use Both!
- You can use both grid and random search in conjunction. Random search can find good enough results. This parameter can be widened a bit to then run grid search to find if there's a better value within a smaller range. The smaller range reduces the computationally expensive nature of grid while still yielding an optimal result.
If you are having a hard time with the performance of both grid and random search, you might want to consider better hardware. This can include a faster processor or additional GPUs. SabrePC offers custom deep learning workstations and machine learning workstations for running these algorithms. Visit our Deep Learning and AI Workstations page for our featured platforms including Intel Xeon W or AMD Threadripper PRO 7000WX. If you have any questions, feel free to contact us today.