This was my final project for Brown CS's graduate student seminar "Topics in Computer Systems Security", taught by Roberto Tomassia.
This report summarizes an investigation to reproduce and extend known vulnerabilities in the acoustic emanations of keyboards. We obtain new benchmark datasets with realistic typing speeds (>= 225 char/min), some of which use special characters (shift, backspace, etc.). We ﬁnd that at realistic typing speeds, detecting key press events from a continuous audio stream is difficult, and that algorithms suggested by previous work do not work as well as described. Using ground truth event detections, we then describe their information content via FFTs and attempt both supervised and unsupervised attacks. We ﬁnd supervised attacks to have high recovery rates (character accuracy > 80%), and unsupervised attacks using 2nd-order HMMs to have reasonable recovery rates (70% character accuracy), upon which later human or machine processing can certainly improve. We conclude by summarizing two possible extensions: nonparametric statistical language models whose vocabulary can grow with observed data (which allows learning frequently used passwords), and hierarchical HMMs for adapting text recovery even when target switches contexts (e.g. between typing email and writing code).
I have roughly 1-2 hours worth of audio recordings of raw keyboard typing sounds, combined with known ground truth key labels. Please email me if you're interested... I plan to make them publically available soon.
Attached at the bottom of this page.