Abstract
We suggest a simple procedure for the extraction of Wikipedia sub-domains,
propose a plain-text (human and machine readable) corpus exchange format,
reflect on the interactions of Wikipedia markup and linguistic analysis,
and report initial experimental results in parsing and treebanking a domainspecific
sub-set of Wikipedia content.