Discovering the language of wine reviews: A text mining account
Lefever, Els; Hendrickx, Iris; Croijmans, Ilja; Van Den Bosch, Antal; Majid, Asifa; Isahara, Hitoshi; Maegaard, Bente; Piperidis, Stelios; Cieri, Christopher; Declerck, Thierry; Hasida, Koiti; Mazo, Helene; Choukri, Khalid; Goggi, Sara; Mariani, Joseph; Moreno, Asuncion; Calzolari, Nicoletta; Odijk, Jan; Tokunaga, Takenobu
(2019)
LREC 2018 - 11th International Conference on Language Resources and Evaluation, pp. 3297 - 3302
11th International Conference on Language Resources and Evaluation, LREC 2018, pp. 3297 - 3302
(Part of book)
Abstract
It is widely held that smells and flavors are impossible to put into words. In this paper we test this claim by seeking predictive patterns in wine reviews, which ostensibly aim to provide guides to perceptual content. Wine reviews have previously been critiqued as random and meaningless. We collected an
... read more
English corpus of wine reviews with their structured metadata, and applied machine learning techniques to automatically predict the wine's color, grape variety, and country of origin. To train the three supervised classifiers, three different information sources were incorporated: lexical bag-of-words features, domain-specific terminology features, and semantic word embedding features. In addition, using regression analysis we investigated basic review properties, i.e., review length, average word length, and their relationship to the scalar values of price and review score. Our results show that wine experts do share a common vocabulary to describe wines and they use this in a consistent way, which makes it possible to automatically predict wine characteristics based on the review text alone. This means that odors and flavors may be more expressible in language than typically acknowledged.
show less
Download/Full Text
Keywords: Classification, Supervised learning, Terminology extraction, Wine reviews, Wine vocabulary, Linguistics and Language, Education, Library and Information Sciences, Language and Linguistics
ISBN: 9791095546009
Publisher: European Language Resources Association (ELRA)
(Peer reviewed)
See more statistics about this item