Effective Split-Merge Monte Carlo Methods 

for Nonparametric Models of Sequential Data


Michael C. Hughes, Emily B. Fox, and Erik B. Sudderth


Applications of Bayesian nonparametric methods require learning and inference algorithms which efficiently explore models of unbounded complexity. We develop new Markov chain Monte Carlo methods for the beta process hidden Markov model (BP-HMM), enabling discovery of shared activity patterns in large video and motion capture databases. By introducing split-merge moves based on sequential allocation, we allow large global changes in the shared feature structure. We also develop data-driven reversible jump moves which more reliably discover rare or unique behaviors. Our proposals apply to any choice of conjugate likelihood for observed data, and we show success with multinomial, Gaussian, and autoregressive emission models. Together, these innovations allow tractable analysis of hundreds of time series, where previous inference required clever initialization and lengthy burn-in periods for just six sequences.


The BP-HMM is a generative probabilistic model of sequential data.  It explains how a collection of sequences came to be.  Each individual sequence is modeled as an HMM, where the hidden states are selected from some global library of possible states (which we also call behaviors or features).  Each "timestep" of the sequence is assigned to one state, and this state provides a model for the data observed at that timestep. For example, in a simple time series of real values, the state's emission model would be a Gaussian with specific mean and variance. 

The BP-HMM is a nonparametric model, which means the size of the global behavior library is learned from data, so we need not specify how many behaviors are relevant in advance.  The chief problem of inference is to explore possible sets of behaviors and determine which are most likely.

The fundamental contribution of this work is to improve inference algorithms for fitting this model to real data. Most previously known techniques involve very local algorithms, such as Gibbs samplers, which make changes to subsets of variables while holding others fixed.  We develop new methods that allow changing many variables at once.  For example, our moves can create a new behavior and add it to every sequence all in one step.  This results in much faster mixing towards the true posterior distribution.

Matlab Code

Full Text [PDF]

Selected References

E.B. Fox, E.B. Sudderth, M.I. Jordan, A.S. Willsky, "Sharing Features among Dynamical Systems with Beta Processes," Proceeding of Neural Information Processing Systems, Vancouver, Canada December 2009.

S. Jain and R.M.Neal.  "Splitting and Merging Components of a Nonconjugate Dirichlet Process Mixture Model".  Bayesian Analysis 2(3). 2007.

Mike Hughes,
Nov 9, 2012, 1:38 PM