This is a reprise of the talk I gave at ISMIR 2003 on some of the key insights behind the Shazam service.
The audio search algorithm is noise and distortion resistant, computationally efficient, and massively
scalable, capable of quickly identifying a short segment of music captured through a
mobile phone microphone in the presence of foreground voices and other dominant noise,
and through voice codec compression, out of a database of several million tracks. The
algorithm uses a combinatorially hashed time-frequency constellation analysis of the
audio, yielding unusual properties such as transparency, in which multiple tracks mixed
together may each be identified. Furthermore, for applications such as radio monitoring,
search times on the order of a few milliseconds per query are attained, even on a massive
music database.
Avery Wang has degrees in Mathematics and Electrical Engineering from Stanford University, specializing in digital signal processing algorithms. He wrote his dissertation on the auditory source separation problem at CCRMA under Julius Smith. He also spent two years at the Ruhr-Universität Bochum with Christof von der Malsburg at the Institut für Neuroinformatik on a Fulbright scholarship. He co-founded Shazam Entertainment in year 2000 and is the principal creator of the audio search technology.