Re: your mail ("Argiris A. Kranidiotis" )

> I was asked for some references on books (preferably textbooks or
> conference proceedings) that connect physiology/neurophysiology
> and music perception. I am working in psychoacoustics but I am
> not familiar with this specific topic. Someone pointed towards
> Helmholtz' "Lehre von den Tonempfindungen" but I thought there
> should be something more recent.
>
> I would be glad if anyone could help me.
>
> Stefan
Dear Stefan,
Some time ago I constructed a mini-FAQ about psychoacoustics. I think
the most interesting part of it , is the bibliography at the end.
Here it is:
CUT HERE-----CUT HERE-----CUT HERE-----CUT HERE-----CUT HERE-----CUT HERE-----
______________________________________________________
| |
| HUMAN AUDIO PERCEPTION FREQUENTLY ASKED QUESTIONS |
| version 2.0 June 4 , 1994 |
|______________________________________________________|
I n t r o d u c t i o n
---------------------------
All started from a recent UseNet posting of mine. From the volume of mail I
received , it seems to be a very interesting subject.I decided to release
an edited version of all the answers I received so far in the form of a
F.A.Q. (Frequently Asked Questions).
This version is preliminary.It is still *VERY* incomplete .With your help I
will try to make it as complete as possible.Please read on to see what
other additional information is needed...
The main topic remains the same :
Given two spectra ( STFFT's Short Time Fast Fourier Transforms for example
) we try to estimate a psychoacoustic distance between them (i.e.: a
timbral metric). This involves some additional data:
1) Equal loudness curves (Fletcher-Munson). Originally published in
J.A.S.A. (Journal of the Acoustical Society of America) in 1933. Please
send to me your data/approximations/formulae. Still more information
needed on this subject.
2) Bark frequency scale (Critical Bands) . I have found some
approximations in the range 0..5 KHz . Again more precise information
needed.
3) "Masking" effects . Useful introductory information can be found at the
MPEG Audio compression FAQ (available via anonymous FTP at sunsite.unc.edu,
at IUMA archive).
4) Other psychoacoustic data ?
______________________________________________________________________________
-MANY THANKS to all those kind people who contributed to this text (they
are too many to list).
-My comments are put in square brackets [ ... ].
-A recent version of this text is available via anonymous FTP at:
svr-ftp.eng.cam.ac.uk ( maintained by Tony Robinson <ajr(at)eng.cam.ac.uk> )
Directory: /pub/comp.speech/info , Filename: HumanAudioPerception.
Please note that this FAQ is *NOT* restricted in speech topics.
Argiris A. Kranidiotis
University Of Athens
Informatics Department
akra(at)zeus.di.uoa.ariadne-t.gr
______________________________________________________________________________
Equal loudness curves
______________________________________________________________________________
From: Various people
------------------------------------------------------------------------
-Flecher-Munson curves (the most popular answer).
Peak sensitivity at 3,300 Hz , falling off below 40 Hz, and above 10 kHz.
-"An Introduction to the Psychology of Hearing". By Moore , 3d edition.
(the most popular reference).
From: Vincent Pagel <Vincent.Pagel(at)loria.fr>
------------------------------------------------------------------------
[...]
It's a family of curves [Fletcher Munson curves --AK] a bit like this:
Db ^|
|| |
| \ |
| | |
| \ /
| | /
| \________ ______/
| \___/
|
|
|_________________________________________________> Frequency (Hz)
400 2500 6000 10000 20000
PERCEPTUALLY all the sounds corresponding to the points on the curve have
the same intensity : this means that the ear has a large range where it is
nearly linear ( 1000 to 8000 Hz ), achieving better result on a little
domain (around 3000 Hz if my memory serves).
[ the curve has a minimum at 3,300 Hz -- AK ]
The rate drops dramatically after 10000 Hz and before 500 Hz ).
You can draw different equal loudness curves depending on the first
intensity you begin with ( e.g. if the intensity at 2500Hz is 50 db you
get one curve, but if you start at 2500 Hz with 70 db you get another equal
loudness curve .... generally equal loudness curves have nearly the same
shape and it does not depend too much on the point it begins at)
To my knowledge there is no mathematical formula given to approximate equal
loudness curves, but with the data in the book by Moor it should not be
very difficult to find an approximation.
From: Angelo Campanella <acampane(at)magnus.acs.ohio-state.edu>
------------------------------------------------------------------------
Obtain the ISO "Zero Phons" standard threshold of human hearing.
-The standard was ISO 389-1975 "Audiometer Standard Reference Zero".
-The US Equivalent is ANSI S3.6 - 1969.
The following numbers apply:
These are dB re 20 micropascals for a sound of pure tone or very narrow
band noise:
--------------------------------------------------------------------------
Audio Frequency 125 250 500 1000 2000 3000 4000 6000 8000
=========================================================================
Human (Monaural)
Threshold of Hearing rmal young adult
with undisturbed
hearing. dB re
20 micropascals.
Binaural hearing is 10 to 15 dB better, since the brain has a magnificent
capability to correlate the simultaneous listening of both ears.
From: walkow(at)compsci.bristol.ac.uk (Tomasz Walkowiak)
------------------------------------------------------------------------
The equal loudness curve can be approximated by:
E(w)=1.151*SQRT( (w^2+144*10^0^4)) )
From: Robinson et al.: Br.J.A.Phys. 7, 166-181, 1956.
This approximation is for Nyquist frequency equal to 5 kHz, so
w = 2*Pi*f/5kHz , for 0<f<5kHz. Therefore E(w) is defined for 0<w<Pi. The
E(w) is linear. And usually is applied to the power spectrum.
______________________________________________________________________________
Bark scale / Critical Bands
_________________________________________________________________g(at)netcom.com
(Filiz Basbug)
------------------------------------------------------------------------
>From a paper given by David Lubman at Inter-Noise '92(Toronto) the critical
band rate (z) in Bark can be determined by
z=[13*arctan(0.76*f)+3.5*arctan(f^2/56.25)]
where f is in kHz and the angles returned from the arctangent expressions
are in radians. When z is an integer, f is the dividing line frequency
between two critical bands.
If the frequency corresponding to a particular Bark (z) is desired, use the
following:
f={[(exp(0.219*z)/352)+0.1]*z-0.032*exp{-0.15*(z-5)^2]}
where f is in kHz.
Finally, the critical bandwith (df) can be calculated for a given center
frequency (f) by
df={25+7z and df is in Hz.
There are no explicitly stated limits on the variables, but according to
the table that Mr. Lubman generated from the formulas, 1<=z<=24 for Bark,
and 20<=f<=15500 for frequency, except 50<=f<=13500 for the center
frequencies. (df) ranges from 100 Hz to 3500 Hz.
Also note that these formulas are generally accepted approximations but, as
far as I know, are not yet standardized. I believe they have all been
empirically derived.
Calculation of psychoacoustic Loudness steady-state sounds is defined in
ISO 532, ISO Rec. 675, and DIN 45631.
Extension to non-steady sounds was defined by Zwicker but is not yet
standardized (as of 1992).
___________________________________________cts
______________________________________________________________________________
From: Vincent Pagel <Vincent.Pagel(at)loria.fr>
--------------------------------------------------------------------------
[...]
About curves corresponding to the masking effect:
Those curves show the minimal intensity a sound with a given frequency must
have to be perceived, when played simultaneously with a sound having a
constant frequency during th masking effect of a 500 Hz frequency .... you'll
play it for
example a 50 db ....and at the same time you'll play another frequency and
you adjust the level of the second frequency to find out the limen where it
is perceived. For example a soundz ).
______________________________________________________________________________
Psychoacoustic norm / Timbral Metric
______________________________________________________________________________
From: Fahey(at)psyvax.psy.utexas.edu (Richard Fahey)
--------------------------------------------------------------------------
These curves [Fletcher-Munson again...--AK] may be used to normalize
spectra for loudness at different frequencies (changing dB into phont can be
made more psychologically real by changing the frequency
scale to the Bark scale, and using an auditory filter to smear the
spectrum.
The distance between two spectra represented in ways similar to this can be
calculated as a Euclidean distance, and compared with psychoacoustic data.
From: James Beauchamp <beaucham(at)uxh.cso.uiuc.edu>
------------------------------we are comparing two time-varying spectra which
are very similar to
one another.
This would be used to measure the efficiency of a particular synthesis
technique. Our first guess was to use :
SUM(k=1 to n) ((A2(t,k) - A1(t,k))^2
e(t) = sqrt( ----------------is the partial
number t is time, and A1(t,k) and A2(t,k) are the kth partial amplitudes
vs. time for signals s1(t) and s2(t). Then the average error over time is
given by
e_ave = (1/DUR) SUM(t=0 to DUR) e(t)
The theory is that given two syntheses of signal s1, namely s2 and s3, s2
is a better synthesis of s1 than is s3 if e_ave_2 < e_ave_3. This
formulation seems to work fairly well, but it really fails when a synthesis
has weak upper partials not found in the original. The weak upper partials
contribute very little to the error calculation, but make a big difference
in the perceived result. Therefore, it would probably be much better to
add up the amplitudes within critical bands than to give all frequencies
equal weights as we have been doing, and also to use an
amplitude-to-loudness (in sones) translation. (Usually, S = K*A^0.6).
The problem with equalizing the A(k,t) using the Fletcher-Munson curves is
that one doesn't really know the absolute level of a given sound prior to
playing it back, except in a lab testing situation, perhaps. Thus, the
difference result would vary with playback level, an uncomfortable
situation.
From: Richard Parncutt <parncutt(at)sound.music.mcgill.ca>
-------------------------------------------------------------------------
The psychoacoustic distance between two steady state complex sounds (or its
converse, perceived similarity) is i, and the degree to
which the sounds have pitches in common (where by "pitch" I mean PERCEIVED
pitch in the psychoacoustic sense.)
Terhardt (1972) distinguished two kinds of pitch. Spectral pitches
correspond to individual audiboximately harmonic pattern, suggesting the
presence of an (embedded)
harmonic-complex tone. Most pitches perceived in everyday and musical
sounds are virtual pitches. The relative perceptual salience of pitches
may be estimated by the algorithm of Terhardt et al. (1982).
Parncutt (1989) defined the pitch commonality of two comple they have
perceived pitches in common, depending on the
number and salience of coinciding pitches (by comparison to non-coinciding
pitches). Calculated pitch commonality values correlate well with
similarity judgments of pairs of complex sounds that differ relatively
little in loudness and timbre (Parncutt, 1989, 1oretic accounts of
the strength of harmonic relationship between
musical tones and chords (Parncutt, 1989).
From: Christopher John Rolfe <rolfe(at)sfu.ca>
-------------------------------------------------------------------------
Metric Cognitive science, however, points out that perceptual
space may be non-Euclidean. In other words, there is NO simple metric.
______________________________________________________________________________
References / Books
______________________________________________________________________________
"Loudness: its definition, measurement, and calculation, Journal of the
Acoustical Society of America, 1933, vol 5, p 9.
Author: Fry R.B. PhD Dissertation, Duke Unive, Stevens S.S.
Title: Voice Level: Autophonic Scale, Perceived Loudness, and Effects of
Sidetone
Journal: JASA
Volume: 33
Number: 2
Page(s): 160-167
Date: 1961
Author: Peterson G E, McKinney N P
Title: The measurement of speech power
Journal: Phonetica
Volume: 7
Page(s): 65-84
Date: 1961
Author: Schlauch R.S., Wier C.C.
Title: A Method for Relating Loudness-Matching and Intensity-Discrimination
Data
Journal: Journal of Speech and Hearing Research
Volume: 30
Page(s): 13-20
Date: 1987
Author: Small AM, Brandt JF, Cox PG
Title: [...?] function of signal duration
Journal: JASA
Volume: 34
Page(s): 513-514
Date: 1962
Author: Stevens S.S.
Title: Calculation of the Loudness of Complex Noise
Journal: JASA
Volume: 28
Number: 5
Page(s): 807-832
Date: 1956
Handel, S. (1989). "Listening: an introduction to the perception of
auditory events." MIT, Cambridge, MA
Dooling, R. J. and Hulse, S. H. (ed.) (1989). The comparative
psychologoy of audition: Perceiving complex sounds. Erlbaum, Hillsdale, NJ.
McAdams, S. and Bigand, E. (ed.) (1993). Thinking in sound: the
cognitive psychology of human audition. Oxford Univ. Press, NY
Sloboda, J. A. (1985). The musical mind: The cognitive psychology of
music. Clarendon, Oxford
Proceedings of IEEE, V. 81, No 10 ,"Signal Compression Based on Models
of Human Perception".
Grey, J.M. "Multidimensional Perceptual Scaling of Musical Timbres"
Journal of the Acoustical Soceiety of America, 63, 1493-1500.
Repp, B.H (1984) "Categorical perception: Issues, methods, findings"
In N.J. Lass (ed.) Speech and Language: Advances in Basic
Research and Practice. Vol. 10. 1249-1257.
Moore and Glasberg, JASA 74(3) 1983. "Suggested formulae for calculating
auditory-filter bandwidths and excitation patterns"
Bladon and Lindblom, JASA 69(5) 1981. "Modeling the judgement of vowel
quality differences"
J. R. Pierce, The Science of Musical Sound (Freenam, New York, 1983).
J. G. Roederer, Introduction to the Physics and Psychophysics of Music
(Springer-Verlag, New York, 1975).
S. S"Measurement of Loudness", JASA 27 (1955): 815
S. S. Stevens, "Neural Events ans Psyhcophysical Law", _Science 170_
(1970): 1043
E. Zwicker, G. Flottorp, and S. S. Stevens, "Critical Bandwidth in Loudness
Summation", JASA 29 (1957): 548
Author:Hynek Hermansky
Institution:Speech Technology Laboratory, Divisios, Inc., 3888 State Street,
Santa Barbara, CA 93105, USA
Title:Perceptual linear predictive ({PLP}) analysis of speech},
Journal: JASA
Year:1990
Vol.87 ,Number 4 , Page(s):1738-1752
Gersho et al (Bark Spectral Distance).
IEEE Journal Selected areas of Communications Sept. (?) 1992
Name: "An Introduction to the Physiology of Hearing"
Author: James O. Pickles,Dept. of Physiology,Uni. Birmingham,England.
Publisher: Academic Press,1982.
ISBN 0-12-554750-1 (hardback)
ISBN 0-12-554 (paperback).
"An introduction to the psychology of hearing" by B. MOORE , 3d Edition.
Terhardt, E. (1972mplex tones). Acustica, 26, 173-199.
Terhardt, E., Stoll, G., & Seewann, M. (1982). Algorithm for
extraction of pitch and pitch salience from complex tonal signals.
Journal of the Acoustical Society of America, 71, 679-688.
[ The following papers are from Richard Parncutt
(parncutt(at)sound.music.mcgill.ca) -- AK ]
Bigand, E., Parncutt, R., & Lerdahl, F. (under review). Perception of
musical tension in short chord sequences: The influence of harmonic
function, sensory dissonance, horizontal motion, and musical
training. Perception and Psychophysics.
Parncutt, R. (1993). Pitch properties of chords of octave-spaced
tones. Contemporary Music Review, 9, 35-50.
Parncutt, R. (1989). Harmony: A Psychoacoustical Approach.
Springer-Verlag, Berlin. (Springer Series in Information Sciences,
Vol. 19. Eds.: T.S. Huang & M.R. Schroeder. ISBN 3-540-51279-9. 218
pages, 22 figs.)
Stoll, G., & Parncutt, R. (1987). Harmonic relationship in similarity
judgments of nonsimultaneous complex tones. Acustica, 63, 111-119.
Terhardt, E., Stoll., G., Schermbach, R., & Parncutt, R. (1986).
Tonhoehenmehrdeutigkeit, Tonverwandschaft und Identifikation von
Sukzessivintervallen (Pitch ambiguity, harmonic relationship, and
melodic interval identification). Acustica, 61, 57-66.
Parncutt, R. (1989). Harmony. A psychoacoustical approach.
Heidelberg: Springer-Verlag.
Parncutt, R. (1993). Pitch properties of chords of octave-spaced
tones. Contemporary Music Review, 9, 35-50.
______________________________________________________________________________
--
____________________________ __________________________________
/ /\ / /\
/ Argiris A. Kranidiotis _/ /\ / E-mail (InterNet): _/ /\
/ University Of Athens / \/ / / \/
/ Informatics Department /\ / akra(at)di.uoa.ariadne-t.gr /\
/___________________________/ / /_________________________________/ /
\___________________________\/ \_________________________________\/
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
--
____________________________ __________________________________
/ /\ / /\
/ Argiris A. Kranidiotis _/ /\ / E-mail (InterNet): _/ /\
/ University Of Athens / \/ / / \/
/ Informatics Department /\ / akra(at)di.uoa.ariadne-t.gr /\
/___________________________/ / /_________________________________/ /
\___________________________\/ \_________________________________\/
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \