Abstract
Multimedia fingerprinting, also known as robust or perceptual hashing, aims at representing multimedia
signals through compact and perceptually significant descriptors (hash values). In this paper, we
examine the probability of collision of a certain general class of robust hashing systems that, in its binary
alphabet version, encompasses a number of existing robust audio hashing algorithms. Our analysis relies
on modelling the fingerprint (hash) symbols by means of Markov chains, which is generally realistic
due to the hash synchronization properties usually required in multimedia identification. We provide
theoretical expressions of performance, and show that the use of M-ary alphabets is advantageous with
respect to binary alphabets. We show how these general expressions explain the performance of Philips
fingerprinting, whose probability of collision had only been previously estimated through heuristics.