Extracting and Annotating Wikipedia Sub-Domains — Towards a New eScience Community Resource

Ytrestøl, Gisle; Flickinger, Dan; Oepen, Stephan

Extracting and Annotating Wikipedia Sub-Domains — Towards a New eScience Community Resource

DSpace/Manakin Repository

Extracting and Annotating Wikipedia Sub-Domains — Towards a New eScience Community Resource

Ytrestøl, Gisle; Flickinger, Dan; Oepen, Stephan

(2008) LOT Occasional Series, volume 12, pp. 185 - 197

(Part of book or chapter of book)

Abstract

We suggest a simple procedure for the extraction of Wikipedia sub-domains, propose a plain-text (human and machine readable) corpus exchange format, reflect on the interactions of Wikipedia markup and linguistic analysis, and report initial experimental results in parsing and treebanking a domainspecific sub-set of Wikipedia content.

Download/Full Text

Open Access version via Utrecht University Repository

ISSN: 1572-199X

Publisher: LOT, Netherlands Graduate School of Linguistics

See more statistics about this item