HateCheck: Functional Tests for Hate Speech Detection Models

Röttger, Paul; Vidgen, Bertie; Nguyen, Dong; Waseem, Zeerak; Margetts, Helen; Pierrehumbert, Janet

doi:https://doi.org/10.18653/v1/2021.acl-long.4

HateCheck: Functional Tests for Hate Speech Detection Models

DSpace/Manakin Repository

HateCheck: Functional Tests for Hate Speech Detection Models

Röttger, Paul; Vidgen, Bertie; Nguyen, Dong; Waseem, Zeerak; Margetts, Helen; Pierrehumbert, Janet

(2021) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 41 - 58

(Part of book)

Abstract

Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also ... read more

Download/Full Text

Open Access version via Utrecht University Repository

Publisher version

DOI: https://doi.org/10.18653/v1/2021.acl-long.4

Publisher: Association for Computational Linguistics

(Peer reviewed)

See more statistics about this item