Mmm whatcha say? Uncovering distal and proximal context effects in first and second-language word perception using psychophysical reverse correlation

1School of Computing Science, 3Department of Linguistics, Simon Fraser University, Canada
2Université de Franche-Comté, SUPMICROTECH, CNRS, Institut FEMTO-ST, France
INTERSPEECH 2024
Code arXiv CLEESE

Abstract

Acoustic context effects, where surrounding changes in pitch, rate or timbre influence the perception of a sound, are well documented in speech perception, but how they interact with language background remains unclear. Using a reverse-correlation approach, we systematically varied the pitch and speech rate in phrases around different pairs of vowels for second language (L2) speakers of English (/i/-/I/) and French (/u/-/y/), thus reconstructing, in a data-driven manner, the prosodic profiles that bias their perception. Testing English and French speakers (n=25), we showed that vowel perception is in fact influenced by conflicting effects from the surrounding pitch and speech rate: a congruent proximal effect 0.2s pre-target and a distal contrastive effect up to 1s before; and found that L1 and L2 speakers exhibited strikingly similar prosodic profiles in perception. We provide a novel method to investigate acoustic context effects across stimuli, timescales, and acoustic domain.

Method Overview

stimulis-choices

We selected vowels known to be difficult for speakers of the second language. Random pitch and stretch modifications were made using the CLEESE toolbox both over a single word and the entire phrase. The phrases were chosen in an attempt to control contextual bias. Participants had a first language of either French or English and moderate to advanced proficiency in the second language. Each participant listened to 250 of these manipulated phrases and selected which of the two words they heard, e.g., peel or pill for English. To control selection bias we generated intermediate vowel sounds using ASR and gradual formant modification.

modify-vowels

Audio Examples

English Phrases

French Phrases

Results

Other than in “pill” for English first language speakers we reproduced the known linguistic effects of duration within the word, i.e., "peel" is longer than "pill" and "poule" is longer than "pull", for both English and French first language speakers
For duration, in the phrase we have a distal contrastive effect (opposite of the duration within the vowel) with a strong proximally congruent effect (the same as the duration within the vowel) 200-300 ms before the vowel
Pitch has weaker effects than duration, even more so for French first language speakers
Second language speakers often have stronger effects in the vowel, suggesting they rely more on prosody to parse difficult phonemes
*Represent a time point of significantly different pitch/duration preference to bias towards one word or the other

results
results

Acknowledgements

This work was supported by NSERC Discovery Grant 06908- 2019, the France Canada Research Fund, the Mitacs Globalink Research Award, and the Fondation Pour l’Audition (FPA RD2021-12). The authors thank P. Maublanc, R. Guha, and A. Adl Zarrabi for their valuable discussions; V. Yang, B. Burkanova, C. Zhang, and M. Durana for their help running our study; and the Rajan Family for their support. This work has been conducted in the framework of the EIPHI Graduate school (ANR17-EURE-0002 contract).

BibTeX

@misc{tuttösí2024mmm,
        title={Mmm whatcha say? Uncovering distal and proximal context effects in first and second-language word perception using psychophysical reverse correlation}, 
        author={Paige Tuttösí and H. Henny Yeung and Yue Wang and Fenqi Wang and Guillaume Denis and Jean-Julien Aucouturier and Angelica Lim},
        year={2024},
        eprint={2406.05515},
        archivePrefix={arXiv},
        primaryClass={cs.SD}
  }

Title inspiration: Jason Derulo - Whatcha Say