Ablation study: What level of linguistic detail is needed for word-level modelling?


External Member

Saliha Muradoglu, Supervisor


As NLP (Natural language processing) tools are expanded to include new languages one of the big bottlenecks is labelled data availability. This issue is particularly acute for low-resource languages. So the question of annotation detail and quality is important. How much detail is needed for supervised learning? Is there a minimum number of labels to capture linguistic patterns?


In this project, you will explore the importance of labels/tags for word-level modelling (morphology and phonology) by performing an ablation study. You will be training several ML (machine learning) models for word-level phenomenon and contextualise your study findings in the body of existing literature. You may even choose to utilise information theoretic metrics to quantify the informativeness of each tag.


  • Experience with Python.
  • Strong interest and skills in linguistics, NLP, language
  • Experience in NLP/computational linguistic experience is preferable.
  • Completed coursework in Document Analysis (COMP4650) , machine learning, AI, or data science.


machine learning, natural language processing, computational linguistics, language

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing