Belz, Anya; Thomson, Craig; Reiter, Ehud; Abercrombie, Gavin; Alonso-Moral, Jose M.; Arvan, Mohammad; Cheung, Jackie; Cieliebak, Mark; Clark, Elizabeth; Deemter, Kees van; Dinkar, Tanvi; Dušek, Ondřej; Eger, Steffen; Fang, Qixiang; Gatt, Albert; Gkatzia, Dimitra; González-Corbelle, Javier; Hovy, Dirk; Hürlimann, Manuela; Ito, Takumi; Kelleher, John D.; Klubicka, Filip; Lai, Huiyuan; Lee, Chris van der; Miltenburg, Emiel van; Li, Yiru; Mahamood, Saad; Mieskes, Margot; Nissim, Malvina; Parde, Natalie; Plátek, Ondřej; Rieser, Verena; Romero, Pablo Mosteiro; Tetreault, Joel; Toral, Antonio; Wan, Xiaojun; Wanner, Leo; Watson, Lewis; Yang, Diyi
(Association for Computational Linguistics, 2023-05-01)
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results ...