View on GitHub

Maëlick Claes - Portfolio

A portfolio of various data/ML/NLP software I produced.

Portolio

This page highlights some of the work I produced while conducting research at the University of Oulu.

textpype

textpype is a (WIP) Python library for producing a pipeline/workflow for evaluating combinations of various ML classification algorithms, NLP preprocessing steps and data sampling techniques on different datasets. It is partly based on a paper presented at MSR 2021.

FinnishSentiment

FinnishSentiment is a package for conducting sentiment analysis of Finnish text using logistic regressions. The data used for training the model is based on almost 2000 tweets about COVID-19 that were manually classified and the paper behind it is currently under review. There is also second repository containing additional scripts for replicating all the results of the paper.

20-MAD

20-MAD is a dataset that was shared and presented during MSR 2020. The data itself is hosted at OSF. The main code for data extraction and processing is hosted in a GitHub repository as an R package while a second repository is used for gluing together different packages, documenting the data and contains a few additional scripts. A Docker image is also available for replication purposes.

Natural Language or Not (NLoN)

NLoN is an R package to automaticelly detect whether a line of text is natural language or not. This work is the result of a paper presented at MSR 2018.

RSentiStrength and RSenti4SD

RSentiStrength and RSenti4SD are two simple R packages for running two sentiment analysis tools: SentiStrength and Senti4SD

TextFeatures

TextFeatures is an R package for generating features to feed to machine learning for text classification.

EmoticonFindeR

EmoticonFindeR is a package for detecting emoticons and emojis from text data.