PROJECT BACKGROUND
As
a rising undergraduate senior I happily became a summer intern at MIT Lincoln
Laboratory. I joined a team of three full-time
staff lead by Dr. Jason Thornton tasked with assessing and improving current techniques
for automated video surveillance of wide-area critical infrastructure sites. My group’s overall goal that summer was
to invent an automated multi-camera tracking system which could enable following a target across an entire facility. Working
alone with occasional guidance, I tackled the problem of predicting which
camera a moving target will appear in next after exiting a given view. This is
especially difficult when views do not overlap, which often happens in
real-world implementations. The process of infering the relative positions of cameras along common pedestrian routes is known as activity topology estimation.
MENTORS
Jason Thornton, Technical staff, MIT Lincoln Laboratory
Aaron Yahr, Technical staff, MIT Lincoln Laboratory
PREVIOUS RESEARCH
Makris et al. [1] provide a novel method for
recovering topology of a non-overlapping camera network. Their method
establishes that two cameras view a common route by detecting patterns in the
cross-correlation of their respective entry and exit timing signals. When a related camera pair is detected, both a
transition probability and a time distribution are recovered. They showed
successful detection of ground truth topology in a six camera network for
transition gaps as large as 10 seconds.
METHODS
Seeking
camera-to-camera transition prediction, I built software that could recover the topology of a camera network. I extended Makris et al.’s cross-correlation
approach for inferring camera-to-camera transition time
distributions so that it required only foreground motion detection data rather
than a single-camera object tracker.
EVALUATION
A major focus of my project was
evaluating my system’s performance on sparse and dense traffic scenarios for
networks of four to five cameras. I
concluded that given only minutes of footage my topology inference method could
successfully infer roughly half of ground truth topological links in
sparse-traffic facilities and detect some wide-gap links even in dense traffic.
Most errors in link detection were attributable to poor background subtraction
or heavy occlusion. Pushed by my group leader to prove exactly how well these
estimated links improved the track handoff problem, I showed that for sparse
traffic scenes my procedure could predict a given target’s transition to a neighboring,
non-overlapping camera with near 100% accuracy.
I also showed that the predicted time window captured very few confuser
objects (one or two on average) in addition to the target, making the
correspondence decision a tractable problem.
FUTURE RESEARCHIn the final stages of the project, I found other existing work that indicated promising future directions for topology estimation research.
First, while my evaluation concluded that the cross-correlation approach of Makris et al. [1] succeeded reasonably given the limited amount of footage (only 8 minutes) and the poor success of background subtraction, I knew that it was perhaps not the most theoretically rigorous method for statistical topology inference. I soon discovered a more proper approach in Tieu et al. [2] out of MIT's CSAIL vision laboratory. Building on Makris et al.'s approach to non-overlapping cameras, Tieu et al. [2]
propose an information-theoretic correspondence detection procedure with
several advantages over cross-correlation. I would suggest that the new approach based on mutual information might perform better in a head-to-head comparison, though I would like to see an actual test.
Second, a fundamental limitation to both [1] and [2] in practice remains scalability. Both approaches require intense statistical calculations for all pairs of access regions, which can become exponentially intractable given the hundreds of cameras
available in real-world facilities. Addressing this weakness, van den Hengel
et al. [3] propose an inexpensive topology estimation method explicitly
designed to be scalable. The crucial idea in [3] is that most camera pairs will not observe the same path at all, and that relation is quite simpler to estimate than an full statistical estimation of the transition likelihood and time distribution. The authors demonstrate feasible computational
performance for a hundred network camera, though successful estimation of
ground truth topology remains unevaluated. I suggest that combining this coarse exclusion process with the fine accuracy available from [2] might provide a scalable method for recovering topology that can effectively track in a hundred or thousand camera network.
IMPACT
As its primary impact, my work at Lincoln
Laboratory provided government decision-makers with an assessment
of the capabilities and shortcomings of current surveillance technology. More
broadly, my work contributes toward efforts to produce a functional tracking system, which could dramatically improve protection of secure
facilities by allowing officers to track suspects throughout the facility in real
time, even across large gaps in coverage.
Effective camera-to-camera route prediction will be a crucial step
towards this goal.
Improving
topology estimation remains a critical requirement for achieving facilty-wide
tracking systems. A reliable
multi-camera tracker would revolutionize security at airports and other
critical infrastructure sites. Officers
could follow active suspects in real-time even across gaps in coverage and
could reconstruct past activity to determine point-of-entry and identify
collaborators. Automation would
dramatically improve speed and accuracy of post-incident investigations and
help prevent nascent attacks from materializing.
Finally, an operational multi-camera
tracker could serve as a test bed for deeper analysis of surveillance
video. For example, investigators could
build a question answering system (“Who left the south exit after 5PM?”) on top
of the existing tracker. Access to such
a test bed would inspire progress in academic research and could improve
facility security as well.
REFERENCES[1] Dimitrios Makris, Tim Ellis, James Black. Bridging the Gaps between Cameras. CVPR 2004: 205-210
[2]
Tieu, Dalley, & Grimson. Inference of
Non-Overlapping Camera Network Topology by Measuring Statistical Dependence. International
Conference on Computer Vision, 2005.
[3]
van den Hengel, Dick, & Hill. Activity Topology
Estimation for Large Networks of Cameras. Proceedings of the IEEE International
Conference on Video and Signal Based Surveillance. 2006.
|
|