Mattes Ohlenbusch, Christian Rollwage, Simon Doclo, Jan Rennies
Own voice pickup technology for hearable devices facilitates communication in noisy environments. Own voice reconstruction (OVR) systems enhance the quality and intelligibility of the recorded noisy own voice signals. Since disturbances affecting the recorded own voice signals depend on individual factors, personalized OVR systems may outperform generic ones. In this paper, we propose personalizing OVR systems through data augmentation and fine-tuning, comparing them to their generic counterparts. We investigate the influence of personalization on speech quality assessed by objective metrics and conduct a subjective listening test to evaluate quality under various conditions. In addition, we assess the prediction accuracy of the objective metrics by comparing predicted quality with subjectively measured quality. Our findings suggest that personalized OVR provides benefits over generic OVR for some talkers only. Our results also indicate that performance comparisons between systems are not always accurately predicted by objective metrics. In particular, certain disturbances lead to a consistent over-estimation of quality compared to actual subjective ratings.
Dataset of German own voice recordings: https://doi.org/10.5281/zenodo.10844599
Transfer function measurements for simulating environmental noise at hearable microphones: https://doi.org/10.5281/zenodo.11196867
| Processing condition | Low predicted benefit from personalization | High predicted benefit from personalization |
| Clean outer microphone | ||
| Noisy outer microphone | ||
| Noisy in-ear microphone | ||
| EBEN | ||
| MWF | ||
| Generic data augmentation | ||
| Generic data augmentation, generic fine-tuning | ||
| Generic data augmentation, personalized fine-tuning | ||
| Personalized data augmentation, personalized fine-tuning |