Abstract
The Fourier/time transformation (FTT) has been proposed by Ernst Terhardt (1985, 1992, 1998) as a tool for analysis and representation of audio signals such as speech and music. Terhardt (1985) issued the FTT in the context of an updated interpretation of the Fourier transform (FT) and with the aim to develop a transform suited to perform time/frequency analysis comparable to that of the mammalian auditory system. FTT is re-examined in this chapter and some other methods relevant for musical acoustics and psychoacoustics such as the short-time Fourier transform (STFT), autoregressive spectral modeling (AR) and Wavelet transform (WT) are presented in a brief survey for comparison, and are illustrated by some examples. Different approaches to time/frequency analysis are also viewed as to their power with respect to the so-called uncertainty product Δt Δf.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Settings for the analysis performed with the Praat software (Boersma and Weenink 2011) were a time window of 30 ms with a Gaussian weighting, a time step of 2 ms from one frame to the next, an analysis bandwidth of 2 kHz and a frequency step of 2 Hz. The sound sample of 11.17 s was processed in 5253 (overlapping) frames.
- 2.
- 3.
Applying no specific windowing function means a rectangular window is chosen for which the so-called Equivalent Noise Bandwidth (ENBW [Bins], see DeFatta et al. 1988, 262ff.) is 1.0.
- 4.
There are several definitions as to ‚linear‘. In electronics, linear refers to circuits (like LRC filters) in which linear relations exist between physical magnitudes (induction, capacity, resistance, gain) and where all voltages and current are proportional to the electromotive force driving the system (cf. Küpfmüller 1968, 12f.). In signals and systems theory, linearity is defined by Bachmann (1992, 9) like this: superposition at the input has the same effect as superposition at the output.
- 5.
Analysis for 0–1 kHz was performed with the Sonogram software (Hiroshi Momose 1991); settings were FFT+Wigner, time window 2048 pts, Hanning weighting, time increment 85 pts = 1.77 ms; LPC, sideband suppression 80 Hz, dynamic range of analysis and graph representation −20 to −1 dB.
- 6.
Historic organ of St. Bartholomäus, Mittelnkirchen, Altes Land, build by Arp Schnitger, Jacob Albrecht and Johann Matthias Schreiber 1688–1753. The Quintadena 16′ pipe rank is in the Hauptwerk of the organ.
- 7.
Built by Joris du Mery 1742–1748.
- 8.
FFT: 8192, Hanning, Hop ratio 0.25, zero pad factor 2.0. Analysis performed with Spectro 3.01 (Perry Cook, Gary Scavone).
- 9.
The code used for analysis was programmed in MatLab by Can Karadogan and Florian Keiler while working in the Department of Signal Processing and Communications of the Helmut-Schmidt-Universität Hamburg. The AR tool was developed to be used in a joint project directed at the study of transients in the sound of musical instruments (cf. Keiler et al. 2003).
- 10.
Fourier transforms of the steady-state part of the sound show that partial frequencies for higher harmonic partials are not exactly at integer ratios. Moreover, frequencies for partials including the fundamental fluctuate over time as can be seen from increasing values for variance of frequencies in longer FFT transforms (e.g., 65536). However, ACF analysis clearly gives a single ‘pitch’ for this pipe tone corresponding to 143 Hz.
- 11.
- 12.
The same consideration was made in “running” autocorrelation algorithms, which typically “slide” along a time signal and include a weighting function to successively discard past sample values so that ACF in fact is computed from an “effective time window” of N samples up to the sample point t moving with time. As to the equivalence of “running” ACF and FTT, see Terhardt 1998, 94f.
- 13.
A more detailed analytic formulation of the FTT is given by Mummert 1997.
- 14.
For example, one CB included in the table given by Zwicker and Terhardt (1980, p. 152), ranges from 920 to 1080 Hz with f c = 1000 Hz and is 160 Hz wide; divided by 25, the frequency step would be 160/25 = 6.4 Hz as compared to the jnd at 1000 Hz, which is ca. 3 Hz.
- 15.
The ENBW for the Blackman window is 1.73 bins in DFT and the 3.0 dB bandwidth is 1.68 bins.
References
Bachmann, W. (1992). Signalanalyse. Grundlagen und mathematische Verfahren. Braunschweig: Vieweg.
Beauchamp, James. (2007). Analysis and synthesis of musical instrument sounds. In J. Beauchamp (Ed.), Analysis, synthesis, and perception of musical sounds (pp. 1–89). New York: Springer.
Bilsen, F. & Kievits, I. (1989). The minimum integration time of the auditory system. Preprint 2746, AES Convention Hamburg March 1989.
Boersma, P. & Weenink, D. (2011). Praat. Doing phonetics by computer (version 5232). Amsterdam: University of Amsterdam, Institute of Phonetics.
Boersma, P. (1993). Accurate short-term Analysis of the fundamental frequency and the harmonic-to-noise ratio of a sampled sound: Proceedins of Institute of Phonetics, University of Amsterdam (Vol. 17 pp. 97–110).
Bracewell, R. (1978). Fourier transform (2nd ed.). New York: McGraw-Hill.
Bregman, A. (1990). Auditory scene analysis. Cambridge: MIT Press.
Bürck, W., Kotowski, P. & Lichte, H. (1935). Der Aufbau des Tonhöhenbewußtseins. Elektrische Nachrichtentechnik, 12, 326–333.
Cohen, L. (1995). Time-frequency analysis. Upper Saddle River, N.J.: Prentice-Hall.
de Boer, E. (1976). On the “residue” and auditory pitch perception. In W. D. Keidel & W. D. Neff (Eds.), Handbook of sensory physiology (Vol. 3, pp. 479–583). New York: Springer.
de Cheveigné, A. (2005). Pitch perception models. In C. Plack, A. Oxenham, R. Fay, A. Popper (Eds.). Pitch. neural coding and perception (pp. 169–230). New York: Springer.
DeFatta, D., J. Lucas, & Hodgkiss, W. (1988). Digital signal processing. A system design approach. New York: Wiley.
Dellomo, M., & Jacyna, G. (1991). Wigner transforms, Gabor coefficients, and Weyl-Heisenberg wavelets. Journal of Acoustical Society of America, 89, 2355–2361.
Dutilleux, P., Grossmann A. & Kronland-Martinet, R. (1988). Application of the wavelet transform to the analysis, transformation and synthesis of musical sound. Preprint 2727, AES Convention 85, November 1988.
Eddins, D., & Green, D. (1995). Temporal integration and temporal resolution. In B. C. J. Moore (Ed.), Hearing (pp. 207–242). San Diego: Academic Press.
Evangelista, G. (1997). Wavelet representations of musical signals. In C. Roads, St. Pope, A. Piccialli, G. de Poli (Eds.), Musical signal processing (pp. 127–153). Lisse: Swets and Zeitlinger.
Flandrin, P. (1999). Time-Frequency/Time-Scale Analysis. San Diego: Academic Press.
Gabor, D. (1946). Theory of communication. Journal of Institution of Electrical Engineering, 93, 429–457.
Gafori, F. (1496/1967/1968). Practica Musicae. Milan (Reprint Farnborough, Hants.: Gregg Pr. 1967); [English translation and transcription of musical examples by Clement Miller]. American Institute of Musicology 1968).
Greenwood, D. (1990). A cochlear frequency-position function for several species—29 years later. Journal of Acoustical Society of America, 87, 2592–2605.
Heldmann, K. (1993). Wahrnehmung, gehörgerechte Analyse und Merkmalsextraktion technischer Schalle. Ph.D. thesis, Technical University of Munich.
Hut, R., Boone, M., & Gisolf, A. (2006). Cochlear modeling as time-frequency analysis tool. Acustica, 92, 629–636.
Jurado, C., & M, Brian. (2010). Frequency selectivity for frequencies below 100 Hz: Comparison with mid-frequencies. Journal of Acoustical Society of America, 128, 3585–3596.
Keiler, F., Karadogan, C., Zölzer, U. & Schneider, A. (2003). Analysis of transient musical sounds by auto-regressive modeling: Proceedings of the 6 th International Conference on Digital Audio Effects (DAFx-03) (pp. 301–304). London: St. Marys.
Kostek, B. (2005). Perception-based data processing in acoustics. Berlin: Springer.
Kral, A., & Majérnik, V. (1996). Neural networks simulating the frequency discrimination of hearing for non-stationary short tone stimuli. Biological Cybernetics, 74, 359–366.
Küpfmüller, K. (1968). Die Systemtheorie der elektrischen Nachrichtenübertragung (3rd ed.). Stuttgart: Hirzel.
Mammano, F., & Nobili, R. (1993). Biophysics of the cochlea: Linear approximation. Journal of Acoustical Society of America, 93, 3320–3332.
Markel, J., & Gray, A. (1976). Linear prediction of speech. Berlin: Springer.
Marple, S. L. (1987). Digital spectral analysis. Englewood Cliffs, N.J.: Prentice-Hall.
Meddis, R., & O’Mard, L. (1997). A unitary model of pitch perception. Journal of Acoustical Society of America, 102, 1811–1820.
Meddis, R., & O’Mard, L. (2006). Virtual pitch in a computational physiological model. Journal of Acoustical Society of America, 120, 3861–3869.
Meddis, R. & Lopez-Poveda, E. (2010). Auditory periphery: From pinna to auditory nerve. In R. Meddis et al. (Eds.), Computational models of the auditory system (pp. 7–38). New York: Springer.
Meddis, R., Lopez-Poveda, E., Fay, R., & Popper, A. (Eds.). (2010). Computational models of the auditory system. New York: Springer.
Messner, G. (2011). Du krächzt wie ein Rabe…, singst wie eine Nachtigall…In A. Schmidhofer, St. Jena (Eds.), Klangfarbe. Vergleichend-systematische und musikhistorische Perspektiven. Frankfurt/M.: P. Lang, pp. 205–217 (plus sound examples on a CD in the book).
Mertins, A. (1996). Signaltheorie. Stuttgart: Teubner.
Mertins, A. (1999). Signal analysis. Chichester: Wiley.
Meyer, E., & Guicking, D. (1974). Schwingungslehre. Braunschweig: Vieweg.
Momose, H. (1991). Sonogram. Davis, CA: University of Cal.
Moore, B. (1995). Frequency analysis and masking. In B. Moore (Ed.), Hearing (pp. 161–205). San Diego: Academic Press.
Moore, B. (2008). An introduction to the psychology of hearing (5th ed.). Bingley: Emerald.
Mummert, M. (1997). Sprachcodierung durch Konturierung eines gehörangepaßten Spektrogramms und ihre Anwendung zur Datenreduktion. Ph.D. thesis, Technical University of Munich.
Netten, S., & Duifhuis, H. (1983). Modelling an active, nonlinear cochlea. In E. de Boer & M. Viergever (Eds.), Mechanics of Hearing. Delft: Delft University Pr., 143–151.
Nobili, R., & Mammano, F. (1999). Biophysics of the cochlea II: Stationary nonlinear phenomenology. Journal of Acoustical Society of America, 99, 2244–2255.
Oertel, D., Fay, R., & Popper, A. (Eds.). (2002). Integrative functions in the mammalian auditory pathway. New York: Springer.
Papoulis, A. (1962). The Fourier Integral and its applications. New York: McGraw-Hill.
Patterson, R., Nimmo-Smith, I., Weber, D., & Milroy, R. (1982). The deterioration of hearing with age: Frequency selectivity, the critical ratio, the audiogram, and speech threshold. Journal of the Acoustical Society of America, 72, 1788–1803.
Patterson, R., Robinson, K., Holdsworth, J., McMcKeown, D., Zhang, C., & Allerhand, M. (1992). Complex sounds and auditory images. Advances in the Biosciences, 83, 429–443.
Pickles, Ja. (2008). An Introduction the Physiology of Hearing (3rd ed.). Bingley: Emerald.
Pressnitzer, D., Patterson, R., & Krumbholz, K. (2001). The lower limit of melodic pitch. Journal of the Acoustical Society of America, 109, 2074–2084.
Rodet, X., & Schwarz, D. (2007). Spectral envelopes and additive+residual analysis/synthesis. In J. Beauchamp (Ed.), Analysis, Synthesis, and Perception of Musical Sounds (pp. 174–227). New York: Springer.
Rossing, T. (1982). The Science of Sound. CA: Addison—Wesley.
Rücker, C. (1997). Berechnung von Erregungsverteilungen aus FTT-Spektren. Fortschritte der Akustik—DAGA 1997, pp. 484–485.
Russo, M., Rožić, N., & Stella, M. (2011). Biophysical cochlear model: Time-frequency analysis and signal reconstruction. Acustica, 97, 632–640.
Schlang, M. & Mummert, M. (1990). Die Bedeutung der Fensterfunktion für die Fourier-t-Transformation als gehörgerechte Spektralanalyse. Fortschritte der Akustik, DAGA 1990, Bad Honnef 1990, pp. 1043–1046.
Schneider, A. (1997). Tonhöhe, Skala, Klang. Akustische, tonometrische und psychoakustische Studien auf vergleichender Grundlage. Bonn: Orpheus-Verlag für Syst. Musikwiss.
Schneider, A. (2001). Complex inharmonic sounds, perceptual ambiguity, and musical imagery. In R. I. Godøy & H. Jørgensen (Eds.), Musical imagery (pp. 95–116). Lisse: Swets and Zeitlinger.
Schneider, A. & Frieler, K. (2009). Perception of harmonic and inharmonic sounds: Results from ear models. In S. Ystad, R. Kronland-Martinet & K. Jensen (Eds.), Computer music modeling and retrieval. Genesis of meaning in sound and music (pp. 18–44). Berlin: Springer.
Schneider, A., von Ruschkowski, A., & Bader, R. (2009). Klangliche Rauhigkeit, ihre Wahrnehmung und Messung. In R. Bader (Ed.), Musical acoustics, neurocognition and psychology of music (pp. 103–148). Frankfurt: P. Lang.
Schneider, A., & Tsatsishvili, V. (2011). Perception of musical intervals at very low frequencies: Some experimental findings. In A. Schneider & A. von Ruschkowski (Eds.), Systematic musicology: Empirical and theoretical studies (pp. 99–125). Frankfurt: P. Lang.
Solbach, L., Wöhrmann, R., & Kliewer, J. (1998). The complex-valued continuous wavelet transform as a preprocessor for auditory scene analysis. In D. F. Rosenthal & H. G. Okuno (Eds.), Computational auditory scene analysis (pp. 273–292). Mahwah, N.J.: Erlbaum.
Snyder, B. (2000). Music and memory. Cambridge, MA: MIT Press.
Terhardt, E. (1985). Fourier transformation of time signals: Conceptual revision. Acustica, 57, 242–256.
Terhardt, E. (1992). From Speech to language: on auditory information processing. In M.E.H. Schouten (Ed.). The Auditory Processing of Speech. From Sounds to Words (pp. 363-380). New York: Mouton de Gruyter.
Terhardt, E. (1998). Akustische Kommunikation. Berlin: Springer.
Vormann, M. (1995). Psychoakustische Modellierung der virtuellen Tonhöhe. Diploma thesis (Physics), Carl von Ossietzky University, Oldenburg.
Vormann, M. & Weber, R. (1995). Gehörgerechte Darstellung von instationären Umweltgeräuschen mittels Fourier-Time-Transformation (FTT). Fortschritte der Akustik—DAGA 1995, pp. 1191–1194.
Winer, J., & Schreiner, C. (Eds.). (2011). The Auditory Cortex. New York: Springer.
Yen, N. (1987). Time and frequency representation of acoustic signals by means of the wigner distribution function: Implementation and interpretation. Journal of the Acoustical Society of America, 81, 1841–1850.
Zhu, X., & Kim, J. (2006). Application of analytic wavelet transform to analysis of highly impulsive noises. Journal of Sound and Vibration, 294, 841–855.
Zwicker, E., & Terhardt, E. (1980). Analytical expressions for critical-band rate and critical bandwidth. Journal of Acoustical Society of America, 68, 1523–1525.
Zwicker, E., & Fastl, H. (1999). Psychoacoustics. Facts and models (2nd ed.). Berlin: Springer.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Schneider, A., Mores, R. (2013). Fourier-Time-Transformation (FTT), Analysis of Sound and Auditory Perception. In: Bader, R. (eds) Sound - Perception - Performance. Current Research in Systematic Musicology, vol 1. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00107-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-00107-4_13
Published:
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00106-7
Online ISBN: 978-3-319-00107-4
eBook Packages: EngineeringEngineering (R0)