Natural Language Processing for Small Languages

People

Supervisor

Research areas

External Member

Dr Danielle Barth. ANU College of Asia and the Pacific

Description

We need to understand what kinds of groups of people are similar or different, how and why. This project is about using machine learning to classify text collected in Papua New Guinea as part of language documentation project for an endangered indigenous language. Machine learning will be used to classify speakers into groups (younger vs older, male vs female, etc.) and to classify texts into types (conversation vs narrative, etc.). What are the n-grams that are most associated with each group? What kind of model works best with this small dataset? This project will help with the anthropological and sociolinguistic understanding of a small community of PNG, as well as provide valuable insight into using machine learning with non-English data in a small and variable dataset, a very common kind of data.

 

External supervisor

Dr. Danielle Barth.

ANU College of Asia and the Pacific

Requirements

Familiarized with Machine Learning. Good coding skills in Python coding is a plus!

 

Gain

Gain a good understanding of machine learning models for natural language processing, and learn how to implement and apply these techniques in a research project

Keywords

Natural Language Processing, Machine Learning, Small data

Updated:  1 June 2019/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing