IDLab | Internet technology and Data science Lab

AI on Textual Data

AI driven textual data generation, analysis and information extraction is invaluable in almost every domain. In HR it can be used to automatically extract keywords, in healthcare, to scan scientific literature for compound-adverse reaction pairs, and chatbots or conversational agents are perhaps the hot topic of 2023. Beyond this, the notion of text can be generalized as a sequence of symbols that have contextual meaning, opening the door for even more valuable applications, such as protein language models for generating meaningful biological sequences. IDLab focuses on:

Conversational agents: conversation flow induction (e.g., to bootstrap application specific chatbots from non-annotated conversational data), procedural assistance (e.g., executing recipes)
Information extraction: document-level entity & relation extraction, coreference resolution; entity linking, esp. in case of time-varying knowledge bases (e.g., media/news, legal, health)
Training efficiently/robustly on little data (e.g., for controllable sentence transformations, counterfactual generation, adopting causality models)

Specific research track differentiators include a focus on causality in collaboration with Stanford University, to support explainable AI, domain-tailored solutions, such as NLP for healthcare applications, both traditional human text and protein language applications, using the latest neural network architectures (e.g., transformers in BERT-like models) and the creation of new, valuable training data sets.

IDLAB Ghent has expertise in (applied) research on machine learning for textual data, mainly within the domains of:

Healthcare
e.g. extraction of adverse reactions from medical literature and real-world data, protein language modelling

HR
e.g. automatic processing of vacancies/CVs, … (e.g., skill extraction and matching)

Education
e.g. automatic question generation (factual questions; language learning exercises; distractor generation for multiple choice)

Media
e.g. Classification of news articles