What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

Mario Giulianelli, Joris Baan, Wilker Aziz, Raquel Fernandez, Barbara Plank

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review


In Natural Language Generation (NLG) tasks,
for any input, multiple communicative goals are
plausible, and any goal can be put into words,
or produced, in multiple ways. We characterise
the extent to which human production varies
lexically, syntactically, and semantically across
four NLG tasks, connecting human production
variability to aleatoric or data uncertainty. We
then inspect the space of output strings shaped
by a generation system’s predicted probability
distribution and decoding algorithm to probe
its uncertainty. For each test input, we measure
the generator’s calibration to human production
variability. Following this instance-level ap-
proach, we analyse NLG models and decoding
strategies, demonstrating that probing a genera-
tor with multiple samples and, when possible,
multiple references, provides the level of detail
necessary to gain understanding of a model’s
representation of uncertainty.
Original languageEnglish
Title of host publicationProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Number of pages22
Place of PublicationSingapore
PublisherAssociation for Computational Linguistics
Publication dateDec 2023
Pages 14349–14371
Publication statusPublished - Dec 2023


  • Natural Language Generation
  • Communicative goals
  • Human production variability
  • Aleatoric uncertainty
  • Decoding algorithm
  • Lexical variability
  • Syntactic variability
  • Semantic variability
  • Model calibration
  • Uncertainty representation


Dive into the research topics of 'What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability'. Together they form a unique fingerprint.

Cite this