Abstract
In this paper on-going work of creating an extensive multilingual parallel corpus of movie
subtitles is presented. The corpus currently contains roughly 23,000 pairs of aligned subtitles
covering about 2,700 movies in 29 languages. Subtitles mainly consist of transcribed
speech, sometimes in a very condensed way. Insertions, deletions and paraphrases are very
frequent which
... read more