This page highlights some of the work I produced while conducting research at the University of Oulu.
textpype is a (WIP) Python library for producing a pipeline/workflow for evaluating combinations of various ML classification algorithms, NLP preprocessing steps and data sampling techniques on different datasets. It is partly based on a paper presented at MSR 2021.
FinnishSentiment is a package for conducting sentiment analysis of Finnish text using logistic regressions. The data used for training the model is based on almost 2000 tweets about COVID-19 that were manually classified and the paper behind it is currently under review. There is also second repository containing additional scripts for replicating all the results of the paper.
20-MAD is a dataset that was shared and presented during MSR 2020. The data itself is hosted at OSF. The main code for data extraction and processing is hosted in a GitHub repository as an R package while a second repository is used for gluing together different packages, documenting the data and contains a few additional scripts. A Docker image is also available for replication purposes.
Natural Language or Not (NLoN)
RSentiStrength and RSenti4SD
TextFeatures is an R package for generating features to feed to machine learning for text classification.
EmoticonFindeR is a package for detecting emoticons and emojis from text data.