Compromise-free Bayesian neural networks

Kavli Affiliate: Anthony Lasenby

| First 5 Authors: Kamran Javid, Will Handley, Mike Hobson, Anthony Lasenby,

| Summary:

We conduct a thorough analysis of the relationship between the out-of-sample
performance and the Bayesian evidence (marginal likelihood) of Bayesian neural
networks (BNNs), as well as looking at the performance of ensembles of BNNs,
both using the Boston housing dataset. Using the state-of-the-art in nested
sampling, we numerically sample the full (non-Gaussian and multimodal) network
posterior and obtain numerical estimates of the Bayesian evidence, considering
network models with up to 156 trainable parameters. The networks have between
zero and four hidden layers, either $tanh$ or $ReLU$ activation functions, and
with and without hierarchical priors. The ensembles of BNNs are obtained by
determining the posterior distribution over networks, from the posterior
samples of individual BNNs re-weighted by the associated Bayesian evidence
values. There is good correlation between out-of-sample performance and
evidence, as well as a remarkable symmetry between the evidence versus model
size and out-of-sample performance versus model size planes. Networks with
$ReLU$ activation functions have consistently higher evidences than those with
$tanh$ functions, and this is reflected in their out-of-sample performance.
Ensembling over architectures acts to further improve performance relative to
the individual BNNs.

| Search Query: ArXiv Query: search_query=au:”Anthony Lasenby”&id_list=&start=0&max_results=10

Read More

Leave a Reply