Jim Gilsinan IV - Senior Thesis
Yong Jiu Fa Yin: A Simple Mandarin Chinese Tone Recognizer
Thesis Final Draft
Pitch Tracking
Used "Speech Filing System" for Win32 from the
University College London Department of Phonetics and Linguistics to
track pitch (fundamental frequency) in the samples below. From the sample image directly below, we can see
the four tones of Mandarin (marked 1, 2, 3, and 4) in both waveform and pitch-track view.
Code
Simple tone estimator that attempts to compare a specified pitch-track file containing any number
of syllables with a specified tone pitch template file containing the four tones in order. Normalizes
syllable picth tracks around the mean pitch, expands the tracks to a standard length with linear
interpolation, and then compares the pitch track with the template, returning the best match using the
specified statistic regression and specified number of cepstral regions. Currently implemented
are mean-squared distance analysis (no cepstral regions) and best fit linear analysis.
Make: make, make make_template
Tone Usage: ./tone {test pitch track file} {template pitch track file} {regression} {cepstral regions per syllable}
Regressions:
- -1 = straw man shape analysis (template and cepstral args ignored)
- 0 = mean-squared distance (cepstral arg ignored)
- 1 = best fit linear
- 2 and up = best fit polynomial
Template Usage: ./make_template {output file} {sample file} {sample file} [{sample file} ...]
Combined template files
Samples from the Oregon Graduate Institute Multi Language Telephone Speech Corpus
Samples, Male
Provided are wav files of all speech samples. These wav files are modified versions of
the original wav files compiled from sessions with native Mandarin speaking volunteers.
Specifically, the noise level has been reduced and some of the more egregious exhales at the
ends of syllables have been cut out to increase pitch tracker clarity. All accomplished
with Cool Edit Pro.
Also provided are selected waveform and pitch-track jpeg files, as explained above.
The wav files might require a right click / save as to play;
then again, they might not, depending on your browser.
Zhongjue Chen
Syllables
Files contain four of the same syllable with each of the tones: First Tone, High; Second Tone,
Rising; Third Tone, Falling then Rising; Fourth Tone, Falling.
- "Bao", voiced plosive onset, [ao] vowel rhyme, no coda.
[ wav | jpg ]
- "Chen", voiceless affricate onset, [e] vowel rhyme, [n] syllable coda
[ wav | jpg ]
- "Er", [er] vowel rhyme / coda.
[ wav | jpg
| pitch track ]
- "Hai", [ai] vowel rhyme.
[ wav | jpg
| pitch track ]
- "Jing", voiced affricate onset, [i] vowel rhyme, [ng] coda.
[ wav | jpg
| pitch track ]
- "Lei", liquid onset, [ei] vowel rhyme.
[ wav | jpg
| pitch track]
- "Ma", nasal onset, [a] vowel rhyme.
[ wav | jpg ]
- "Pu", voiceless plosive onset, [u] vowel rhyme.
[ wav | jpg ]
- "Xie", sibilant onset, [ie] vowel rhyme.
[ wav | jpg ]
- "Yu", glide onset, [u] vowel rhyme.
[ wav | jpg
| pitch track]
Phrases
- Sentence with four tones of "Ma."
[ wav | jpg ]
- "Ni Hao Ma"
[ wav | jpg ]
- "Xin Nian Kuai Le"
[ wav | jpg ]
- "Zai Jian"
[ wav | jpg ]
Samples, Female
Tsiyun (Cherry) Fu
- "Bao", voiced plosive onset, [ao] vowel rhyme, no coda.
[ wav | jpg
| pitch track ]
- "Chen", voiceless affricate onset, [e] vowel rhyme, [n] syllable coda
[ wav | jpg
| pitch track ]
- "Er", [er] vowel rhyme / coda.
[ wav | jpg
| pitch track ]
- "Hai", [ai] vowel rhyme.
[ wav | jpg
| pitch track ]
- "Jing", voiced affricate onset, [i] vowel rhyme, [ng] coda.
[ wav | jpg ]
- "Lei", liquid onset, [ei] vowel rhyme.
[ wav | jpg
| pitch track ]
- "Ma", nasal onset, [a] vowel rhyme.
[ wav | jpg
| pitch track ]
- "Pu", voiceless plosive onset, [u] vowel rhyme.
[ wav | jpg ]
- "Xie", sibilant onset, [ie] vowel rhyme.
[ wav | jpg ]
- "Yu", glide onset, [u] vowel rhyme.
[ wav | jpg ]
Phrases
Copyright © 2000-2001 Jim Gilsinan IV ( gilsinan@post.harvard.edu )
Last Update 30 March 2001