Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

Stefan Debortoli, Oliver Müller, Iris Junglas, Jan vom Brocke

Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review


t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches, such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic topic modeling via Latent Dirichlet Allocation, an unsupervised text miningtechnique, in combination with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifactby automatically analyzing more than 12,000 online customer reviews. For fellow information systems researchers,this tutorial provides some guidance for conducting text mining studies on their own and for evaluating the quality ofothers.
Original languageEnglish
Article number7
JournalCommunications of the Association for Information Systems (CAIS)
Issue number1
Number of pages28
Publication statusPublished - 2016


Dive into the research topics of 'Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial'. Together they form a unique fingerprint.

Cite this