Comparing Classifiers for Sentiment Analysis on Commit Messages

People

Supervisor

Description

Among others, negative affective states recently received particular attention due to their detrimental impact on developers productivity and ability to react to undesirable facts. In particular, frustration/anger may lead to poor outcomes and negative learning performance.

A prior (currently under revision) work, generated a large dataset of GitHub commit messages for Java, Python and C (~2.4 million commits) to explore frustration, and multiple libraries were combined to assess a range of sentiment-driven characteristics and detect frustration. In this work, you will be expected to work with this dataset to determine the best model that can improve the performance of the prior implementation. You will have to run at least ME, SVM, LR, CNN, and three BERTs (ROBERTa, ALBERT and Distillbert). Results will be compared to manually-verified samples, in order to verify the models' results. You will not be developing new algorithms but certainly assessing their performance and accuracy.

This project can be undertaken in a single semester, but you will have to work fast and effectively, providing advances every week.

 

Requirements

  • Programming knowledge. Python is a must, as all algorithms are in Python. Appropriate use of Jupyter notebooks is highly desirable.
  • A laptop that can handle large datasets. Although you can do the prediction on each separate dataset (we have three, one per programming language) you may need to combine them to analyse results.
  • Demonstrated academic writing/speaking skills.
  • Excellent attention to details.
  • Willingness to follow a systmatic process and provide updates as requested.
Please, contact me via email with a detailed resume, and your comments (1 page only) on why you are interested in on this project.
 
Anybody is welcome to apply. However, female candidates (or female-identifying) are especially encouraged to submit.
 

Keywords

  • Mining Software Repositories
  • Machine Learning
  • Sentiment Analysis
  • Algorithms Comparative

 

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing