Corpora & annotation

Corpus


a helluva lot of text, stored on a computer” (Leech 1992)

Annotating the CHLG

I was responsible for the conducting the syntactic annotation of the Corpus of Historical Low German, a Penn-style treebank of Middle Low German.

I also recently experimented with different schemes to add a layer of pragmatic annotation to the corpus, and published a new version with annotations for givenness.

Handling ambiguity & uncertainty

Together with colleagues at Konstanz, I explored how representation problems such as ambiguity, uncertainty, error and bias can be adequately treated in corpus annotation to improve reliability and inform the more general reproducibility crisis.