Mattes Ohlenbusch, Christian Rollwage, Simon Doclo
Many hearables contain an in-ear microphone, which may be used to capture the own voice of its user. However, due to the hearable occluding the ear canal, the in-ear microphone mostly records body-conducted speech, typically suffering from band-limitation effects and amplification at low frequencies. Since the occlusion effect is determined by the ratio between the air-conducted and body-conducted components of own voice, the own voice transfer characteristics between the outer face of the hearable and the in-ear microphone depend on the speech content and the individual talker. In this paper, we propose a speech-dependent model of the own voice transfer characteristics based on phoneme recognition, assuming a linear time-invariant relative transfer function for each phoneme. We consider both individual models as well as models averaged over several talkers. Experimental results based on recordings with a prototype hearable show that the proposed speech-dependent model enables to simulate in-ear signals more accurately than a speech-independent model in terms of technical measures, especially under utterance mismatch and talker mismatch. Additionally, simulation results show that talker-averaged models generalize better to different talkers than individual models.
Journal paper: https://doi.org/10.1051/aacus/2024032
Arxiv preprint: https://arxiv.org/abs/2310.06554
Dataset of German own voice recordings: https://doi.org/10.5281/zenodo.10844599
recorded outer microphone |
recorded in-ear microphone |
simulated in-ear (speech-independent individual) |
simulated in-ear (speech-independent talker-averaged) |
simulated in-ear (speech-dependent individual) |
simulated in-ear (speech-dependent talker-averaged) |