Bayesian optimization frameworks¶

Bayesian optimization (BO) is a method for optimizing expensive black-box functions by means of a probabilistic surrogate model and an acquisition function that specifies a trade-off between exploration (improving the surrogate) and exploitation (finding the minimizer of the surrogate and thus of the black-box function).

As surrogate model typically Gaussians processes (GP) or tree-based models (RF, GBT ...) are used. These have their own parameters that need to be estimated via maximum likelihood (ML) or integrated out. The ML point estimate is typically searched for using a gradient-based local optimizer together with a global optimization heuristic such as multiple restarts to address the non-convexiy of the problem. For integrating out the model parameters either Markov-chain Monte Carlo (MCMC) or approximate variational inference (VI) methods are used, together with an assumption on the prior distribution.

Search spaces define the possible inputs. When applying BO for tuning hyperparameters of machine learning models the inputs include continuous, discrete and categorical variables and may include conditionals (think of having to choose beta1 and beta2 when selecting Adam). When applying BO to optimization of real-world processes there are typically some (in-)equality constraints that need to be satisfied.

Acquistion functions either have a analytic form, or need to be approximated via Monte Carlo methods. We want to find the minimizer of the acquistion over the search space in order to evaluate it next. This is typically done using a gradient-free or gradient-based local optimizer or via MC-sampling, depending on the nature of the acquistion function. The search space including its constraints needs to be handled by the optimizer. In multi-objective BO the black-box function has multiple outputs. Here the acquisition function needs to guide the search towards exploring the Pareto front. An comparison of acquisition functions is given here: single-objective, multi-objective.

There are a number of general-purpose (not focussing only on hyperparameter tuning) BO frameworks available in Python and R. In this post the main frameworks are compared in terms of supported features and development & support activity.

Feature comparison¶

All packages in this comparison support GP surrogates with analytic acquisitions EI/PI/CB over continuous search spaces. Beyond that it's interesting to note that no package currently provides all options in terms of surrogates, acquistions and search space, so it really depends on the type of problem you want to apply BO on.

Name	Surrogates	Hyperparameter handling	Single-objective acquisitions	Multi-objective acquisitions	Search space
mlrMBO	GP, RF (mlr)	ML	EI, CB, AEI, EQI, AdaCB	DIB	continuous, integer, categorical + constraints
pyGPGO	GP (native), GBT, RF, ET (scikit-learn)	ML, MCMC (pymc3)	EI, PI, CB		continuous, integer
acikit-optimize	GP, RF, GBT (scikit-learn)	ML	EI, PI, CB	-	continuous, discrete, categorical + constraints
GPyOpt	GP (GPy)	ML, MCMC	EI, PI, CB	-	continuous, discrete, categorical + constraints
GPflowOpt	GP (GPflow)	ML	EI, PI, CB, MES, PF	HVPI	continuous
BoTorch	GP (GPyTorch), extensible	ML, MCMC (Pyro)	EI, PI, CB, qMES, qKG, extensible	custom scalarizations, HVEI	continous + linear constraints
Emukit	GP (GPy), extensible	ML	EI, PI, CB, ES, MES, PF	-	continuous, integer, categorical + constraints
DragonFly	GP (native)	ML, posterior sampling	EI, CB, PI, TTEI, TS	scalarization with CB/TS	continuous, categorical

Activity comparison¶

The following table gives an impression on the popularity and development activity of the frameworks, based on github statistics. The top-3 in terms of stars, contributors, commits and issues are scikit-optimize, botorch (together with Ax which builds on botorch) and mlrMBO.

In [2]:

import util

df = util.compare_repos([
    'mlr-org/mlrMBO',
    'SheffieldML/GPyOpt',
    'scikit-optimize/scikit-optimize',
    'GPflow/GPflowOpt',
    'pytorch/botorch',
    'amzn/emukit',
    'dragonfly/dragonfly',
    'josejimenezluna/pyGPGO',
    'facebook/Ax'
])
df.sort_values(by='closed_issues', ascending=False)

Out[2]:

	stars	forks	contributors	commits	open_issues	closed_issues	created	last_commit	license
name
scikit-optimize	1836	349	55	1500	150	769	2016-03-20	2020-05-18	BSD-3-Clause
botorch	1630	137	28	656	24	437	2018-07-30	2020-06-24	MIT
mlrMBO	162	40	14	1600	84	410	2013-10-23	2020-06-15	NOASSERTION
Ax	1179	113	46	608	21	318	2019-02-09	2020-06-29	MIT
emukit	218	64	20	282	31	278	2018-09-04	2020-06-03	Apache-2.0
GPyOpt	621	190	35	504	89	235	2014-08-13	2020-03-19	BSD-3-Clause
GPflowOpt	213	51	5	426	22	97	2017-04-28	2018-09-12	Apache-2.0
dragonfly	498	57	8	393	17	49	2018-04-20	2020-03-13	MIT
pyGPGO	182	47	2	292	7	18	2016-11-23	2019-06-15	MIT

GPyOpt¶

GPy [code] [doc]
GPyOpt [code] [doc]

GPyOpt is a BO package built on top of the hugely popular GPy for flexible GP modeling. Both are being developed by the university of Sheffield. Together with mlrMBO, GPyOpt has been around the longest. However, development of GPyOpt has somewhat stalled.

Scikit-Optimize¶

[code] [doc]

Scikit-Optimize is an actively developed and well polished BO package based on the GP and tree-based models in Scikit-Learn. Not supported are model parameter integration and multi-objective optimization. Compared to GPy, the GP modeling in Scikit-Learn is rather rudimentary. Mixed search spaces and constraints are supported, as well as external, delayed and batched evaluations. For Hyperparameter tuning in Scikit-Learn there is a drop-in replacement for Grid/RandomSearchCV.

GPFlowOpt (with TF & GPFlow)¶

GPFlow [code] [doc] [paper]
GPFlowOpt [code] [doc] [paper]

GPFlowOpt is package built on top of GPFlow, which in turn uses TensorFlow for fast linear algebra computations with GPU-support and auto-differentiation. This makes it more extensible as different models and acquisition functions can be implemented without having to define gradients for the optimizer. The top-level API is inspired GPy.

Development on GPFlowOpt seems to have stopped since the end of 2018.

BoTorch (with PyTorch, GPyTorch & Pyro)¶

BoTorch is a package built on top of GPyTorch (GP modeling), Pyro (MCMC and variational inference) and PyTorch as its compution framework. Hence, it profits from GPU support and auto-differentiation. BoTorch is extremely flexible, e.g. in it's first class support for custom acquisition functions, which is made possible by MC integration (quasi-MC together with the reparametrisation trick) and auto-differentiation for gradient-based optimization. BoTorch supports:

GP models (multi-fidelity, multi-task, ...), variational neural networks
MC handling of model parameters is in principle supported via GPyTorch and Pyro, but is not yet implemented
analytic acquisitions (EI, PI, CB) and MC acquisitions: (knowledge gradient, max-value entropy search, posterior variance), cost awareness and custom acquisitions
multi-objective optimization via passing a custom pytorch function that scalarizes the objectives to the corresponding MC acqusisition
only continous search spaces; categorical / ordinal variables need to be encoded beforehand
parameter constraints (linear inequality constraints) and outcome constraints
batched proposals

Ax¶

[code] [doc] [test]

Ax is a high-level framework for Bayesian and Bandit optimization. For BO, Ax relies mostly on BoTorch from the same developers, hence the entire feature set of BoTorch is available in principle, but needs to implemented first. For Bandits optimization, Thompson sampling is used. Ax provides:

a service-like API for managing and storing experiments as JSON files (local) or in SQL databases. The service is only running locally.
transform pipelines to handle encoding of categorical and ordinal variables, scaling and log-transforms
limited support for parameter constraints of type $x1 \leq x2$ and $\sum x_i \leq c$ as well as output constraints
limited support for multi-objective optimization: only weighted sum scalarization and no tooling around it
examples for human-in-the-loop optimization

Overall, I think Ax needs more maturing.

Emukit¶

[code] [doc] [paper]

Emukit is a high-level framework for Bayesian optimization and and Bayesian quadrature. It is intended to be independent of the modeling framework, but supports first class support for GPy. Similar built-in support for other frameworks is apparently not planned. Mixed search spaces and constraints are supported, multi-objective optimization is currently not.

Emukit provides abstraction layers for the individual components of Bayesian optimization in order to implement algorithms independent of the concrete modeling framework. While the idea is intriguing, concepts like auto-differentiation that make GPFlowOpt and GPyTorchOpt powerfull fall short here. Instead, by focussing on GPy, Emukit seems to end up as replacement for GPyOpt.

Dragonfly¶

[code] [doc] [paper]

Dragonfly is package developed at Carneggie Melon University. It has native implementations of GPs with the typical kernels, an optimizer (DOO), MCMC samplers copied from copied from pymc3 and pgmpy (Metropolis, Slice, NUTS, HMC). Multi-objective optimization is supported via random scalarizations, constraints are not. Note that Thompson sampling from GPs seems to be incorrectly implemented, as points are sampled without keeping track of the previously sampled points.

Other projects¶

fmfn/BayesianOptimization - Small BO package using GPs from Scikit-Learn
Cornell-MOE - Relatively old Python / C++ package for BO
ProcessOptimizer - Fork of Scikit-Optimizer for optimizing real world processes. Provides classes for composable constraints. However, only rejection sampling is implemented.
Phoenics - BO package using kernel density estimates and a specific multi-objective acquistion called CHIMERA
TS-EMO - Matlab implementation of the TS-EMO algorithm

In [ ]: