Software & Corpora

Tools and datasets I've developed or helped build.

Shiny app · R

MaxEnt with Hidden Structure in R

An interactive Shiny application designed to help linguists generate phonological grammars (weights) using a Maximum Entropy model. Built on top of the HGR model developed by Staubs (2011). HGR finds solutions for learning problems (with or without hidden structure), generates distributions over forms, and performs online learning simulations.

Advisor: Joe Pater.

Documentation & user guide →    Launch the app ↗

Corpus · IPA-transcribed

Moroccan Arabic Plurals Corpus

A corpus of 1,166 singular–plural noun pairs in Moroccan Arabic, derived from the Darija Open Dataset (DODa, Outchakoucht & Es-Samaali 2021). Each noun is transcribed in IPA and annotated with its plural form, gloss, pattern (template), and plural type (sound or broken).

Access the corpus ↗