Research

My PhD research applies techniques from statistical machine learning (particularly hierarchical Bayesian models and nonparametric extensions) to create automated methods for extracting meaningful information from multimedia. I study theoretical methods for efficient inference in graphical models (e.g. MCMC sampling, variational methods) as well as practical ways to apply these methods to real-world data such as video clips or text documents.

I'm part of the Learning, Inference, and Vision group here at Brown advised by Prof. Erik Sudderth.

PUBLICATIONS

in AISTATS 2015

We develop a new objective function for the HDP topic model that allows merge and delete moves to remove ineffective topics during training.





We develop new learning algorithms for scalable variational inference, including new birth and merge moves. Start with just one cluster, and grow as needed.




We develop new data-driven inference methods that enable unsupervised behavior discovery in hundreds of motion capture sequences.


Nonparametric Metadata-Dependent Relational Model [PDF]

We find community structure in social & ecological networks, using metadata like age or organism type to improve community discovery.









in POCV 2012 (a workshop at CVPR)

Across many videos, we identify short segments showing the same human activity (motion and appearance), without predefining relevant activities or even their number.


TECH REPORTS



Sampling From Truncated Normal


Detailed analysis of algorithms for sampling from a truncated normal distribution. Email me if you're interested in getting the code.




CLASS PROJECTS



Using machine learning, can we take as input the "clicks" and "clacks" produced by a user typing at a keyboard and recover the typed text? I review modern techniques and make several suggestions.  I also collected my own dataset of audio recordings of text (along with ground truth).  Please email if interested.




I compare local space-time descriptors (HOG, HOF) with static scene descriptors (GIST, dense SIFT). Surprisingly, for 1/3 of all tested actions the global cues perform better.