TY - JOUR
T1 - Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial
AU - Debortoli, Stefan
AU - Müller, Oliver
AU - Junglas, Iris
AU - vom Brocke, Jan
PY - 2016
Y1 - 2016
N2 - t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches, such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic topic modeling via Latent Dirichlet Allocation, an unsupervised text miningtechnique, in combination with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifactby automatically analyzing more than 12,000 online customer reviews. For fellow information systems researchers,this tutorial provides some guidance for conducting text mining studies on their own and for evaluating the quality ofothers.
AB - t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches, such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic topic modeling via Latent Dirichlet Allocation, an unsupervised text miningtechnique, in combination with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifactby automatically analyzing more than 12,000 online customer reviews. For fellow information systems researchers,this tutorial provides some guidance for conducting text mining studies on their own and for evaluating the quality ofothers.
KW - Text Mining
KW - Topic Modeling
KW - Latent Dirichlet Allocation
KW - Online Customer Reviews
KW - User Satisfaction
U2 - 10.17705/1CAIS.03907
DO - 10.17705/1CAIS.03907
M3 - Journal article
SN - 1529-3181
VL - 39
JO - Communications of the Association for Information Systems (CAIS)
JF - Communications of the Association for Information Systems (CAIS)
IS - 1
M1 - 7
ER -