As a rising undergraduate senior I happily became a summer intern at MIT Lincoln Laboratory. I joined a team of three full-time staff lead by Dr. Jason Thornton tasked with assessing and improving current techniques for automated video surveillance of wide-area critical infrastructure sites. My group’s overall goal that summer was to invent an automated multi-camera tracking system which could enable following a target across an entire facility. Working alone with occasional guidance, I tackled the problem of predicting which camera a moving target will appear in next after exiting a given view. This is especially difficult when views do not overlap, which often happens in real-world implementations. The process of infering the relative positions of cameras along common pedestrian routes is known as activity topology estimation.
Aaron Yahr, Technical staff, MIT Lincoln Laboratory
Makris et al.  provide a novel method for recovering topology of a non-overlapping camera network. Their method establishes that two cameras view a common route by detecting patterns in the cross-correlation of their respective entry and exit timing signals. When a related camera pair is detected, both a transition probability and a time distribution are recovered. They showed successful detection of ground truth topology in a six camera network for transition gaps as large as 10 seconds.
Seeking camera-to-camera transition prediction, I built software that could recover the topology of a camera network. I extended Makris et al.’s cross-correlation approach for inferring camera-to-camera transition time distributions so that it required only foreground motion detection data rather than a single-camera object tracker.
A major focus of my project was evaluating my system’s performance on sparse and dense traffic scenarios for networks of four to five cameras. I concluded that given only minutes of footage my topology inference method could successfully infer roughly half of ground truth topological links in sparse-traffic facilities and detect some wide-gap links even in dense traffic. Most errors in link detection were attributable to poor background subtraction or heavy occlusion. Pushed by my group leader to prove exactly how well these estimated links improved the track handoff problem, I showed that for sparse traffic scenes my procedure could predict a given target’s transition to a neighboring, non-overlapping camera with near 100% accuracy. I also showed that the predicted time window captured very few confuser objects (one or two on average) in addition to the target, making the correspondence decision a tractable problem.
First, while my evaluation concluded that the cross-correlation approach of Makris et al.  succeeded reasonably given the limited amount of footage (only 8 minutes) and the poor success of background subtraction, I knew that it was perhaps not the most theoretically rigorous method for statistical topology inference. I soon discovered a more proper approach in Tieu et al.  out of MIT's CSAIL vision laboratory.
Building on Makris et al.'s approach to non-overlapping cameras, Tieu et al.  propose an information-theoretic correspondence detection procedure with several advantages over cross-correlation. I would suggest that the new approach based on mutual information might perform better in a head-to-head comparison, though I would like to see an actual test.
Second, a fundamental limitation to both  and  in practice remains scalability. Both approaches require intense statistical calculations for all pairs of access regions, which can become exponentially intractable given the hundreds of cameras available in real-world facilities. Addressing this weakness, van den Hengel et al.  propose an inexpensive topology estimation method explicitly designed to be scalable. The crucial idea in  is that most camera pairs will not observe the same path at all, and that relation is quite simpler to estimate than an full statistical estimation of the transition likelihood and time distribution. The authors demonstrate feasible computational performance for a hundred network camera, though successful estimation of ground truth topology remains unevaluated. I suggest that combining this coarse exclusion process with the fine accuracy available from  might provide a scalable method for recovering topology that can effectively track in a hundred or thousand camera network.
functional tracking system, which could dramatically improve protection of secure facilities by allowing officers to track suspects throughout the facility in real time, even across large gaps in coverage. Effective camera-to-camera route prediction will be a crucial step towards this goal.
Improving topology estimation remains a critical requirement for achieving facilty-wide tracking systems. A reliable multi-camera tracker would revolutionize security at airports and other critical infrastructure sites. Officers could follow active suspects in real-time even across gaps in coverage and could reconstruct past activity to determine point-of-entry and identify collaborators. Automation would dramatically improve speed and accuracy of post-incident investigations and help prevent nascent attacks from materializing.
Finally, an operational multi-camera tracker could serve as a test bed for deeper analysis of surveillance video. For example, investigators could build a question answering system (“Who left the south exit after 5PM?”) on top of the existing tracker. Access to such a test bed would inspire progress in academic research and could improve facility security as well.
 Tieu, Dalley, & Grimson. Inference of Non-Overlapping Camera Network Topology by Measuring Statistical Dependence. International Conference on Computer Vision, 2005.
 van den Hengel, Dick, & Hill. Activity Topology Estimation for Large Networks of Cameras. Proceedings of the IEEE International Conference on Video and Signal Based Surveillance. 2006.