What have I done?

Academic projects

Cross-Linguistic Prosodic Priming (2019)

Term paper written with a fellow student for undergraduate laboratory linguistics course. The paper reports on an experiment investigating whether stress patterns are able to be primed crosslinguistically.

Lexical Variation and Gender in Informal Social Media Environments (2018)

Term paper written for undergraduate sociolinguistics course. Data related to preferences about acronym and abbreviation usage on social media was collected via survey along with various pieces of demographic information about the respondent. An analysis of lexical variation across gender was subsequently performed. Selected for presentation at the 2019 edition of CLAUSE̥ (Canadian Linguistics Annual Undergraduate Symposium).


Personal projects

Papers

An expanded demographically informed analysis of Toontown Rewritten chat message data (2024)

A much more extensive study expanding upon previous demographic NLP research. A second, larger demographic corpus was compiled by hand and merged with the original from my 2023 study and an analysis was subsequently carried out in Python. GitHub repository here.

A preliminary demographically informed analysis of Toontown Rewritten chat message data (2023)

Initial study integrating demographic and NLP research related to Toontown Rewritten into a unified investigation. A demographic corpus was assembled by hand over a span of approximately six months and an analysis was performed in R. GitHub repository here.

Predicting different leg colour in Toontown Rewritten with supervised classification (2022)

Various supervised classification algorithms were utilized to attempt to predict if members of the dataset had legs which were a different colour than the rest of their body. GitHub repository here.

An exploration of Toontown Rewritten demographics (2022)

Large study investigating the demographics of a sample of the population of the MMORPG Toontown Rewritten. Data was collected by hand in the summer of 2021 and cleaned, analyzed, visualized in Python. GitHub repository here.


Projects

Toontown Rewritten LDA and NMF (2022) and Toontown Rewritten LDA topic modeling with gensim (2024)

Various iterations of performing topic modeling via latent Dirichlet allocation (LDA) and non-negative matrix factorization (NMF) on a text corpus consisting of chat messages from Toontown Rewritten. The 2022 project is purely linguistic whereas the recent 2024 project utilizes my extended demographic corpus.

Dota 2 corpus analysis (2022)

Exploration and visualization in Python of a large corpus of in-game chat messages from Dota 2 through measures of token frequency, polarity, and message length.

Toontown Rewritten corpus analysis (2022)

A simple non-demographic linguistic analysis exploring a corpus of in-game chat messages.

Dota hero clustering with k-means (2022)

Usage of k-means clustering to group Dota heroes into three clusters based on their base attributes. High accuracy was achieved after exploratory analysis and feature selection.