AI on Speech/Audio data

IDLab Ghent’s activities on speech and audio data processing revolve around:

  • Audio and speech enhancement for communications devices for communication devices and hearing aids
  • Passive and active monitoring of machines and industrial systems for fault detection/prediction/preventive maintenance
  • Improved audio reproduction for virtual reality applications
  • Extracting verbal (what is being said) and non-verbal information (who is speaking, how is he/she speaking) from speech and audio


Our focus and expertise lies in the marriage of domain knowledge (in the form of appropriate models) and the strength afforded by data-driven deep learning methods. This allows us to bundle the discriminatory power of deep learning with the better generalisability of model-based approaches, leading to robust, practical algorithms.

IDLab Ghent’s speech and audio data expertise can be applied to any domain where single or multiple channel series recordings need to be captured, enhanced and analysed. Signal processing and signal extraction from noisy signals are broadly applicable techniques, including in biomedical signal processing. Some example areas are:

  • Industry 5.0: Predictive maintenance, machine monitoring
  • Portable and wearable communication devices: Speech and audio capture and analysis for communications
  • Infrastructure: underwater acoustics
  • Media: Virtual reality increased immersion, increased intelligibility
  • Education & therapy: handling speech education problems by analysing children's speech


In addition to the above, speech and audio data are intrinsically time series as well, implying at least partial transferability of research expertise to other domains dealing with time series.

Copyright © 2025 IDLab. All rights reserved.