IDLab | Internet technology and Data science Lab

AI on Speech/Audio data

IDLab Ghent’s activities on speech and audio data processing revolve around:

Audio and speech enhancement for communications devices for communication devices and hearing aids
Passive and active monitoring of machines and industrial systems for fault detection/prediction/preventive maintenance
Improved audio reproduction for virtual reality applications
Extracting verbal (what is being said) and non-verbal information (who is speaking, how is he/she speaking) from speech and audio

Our focus and expertise lies in the marriage of domain knowledge (in the form of appropriate models) and the strength afforded by data-driven deep learning methods. This allows us to bundle the discriminatory power of deep learning with the better generalisability of model-based approaches, leading to robust, practical algorithms.

IDLab Ghent’s speech and audio data expertise can be applied to any domain where single or multiple channel series recordings need to be captured, enhanced and analysed. Signal processing and signal extraction from noisy signals are broadly applicable techniques, including in biomedical signal processing. Some example areas are:

Industry 5.0: Predictive maintenance, machine monitoring
Portable and wearable communication devices: Speech and audio capture and analysis for communications
Infrastructure: underwater acoustics
Media: Virtual reality increased immersion, increased intelligibility
Education & therapy: handling speech education problems by analysing children's speech

In addition to the above, speech and audio data are intrinsically time series as well, implying at least partial transferability of research expertise to other domains dealing with time series.