Abstract
t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches, such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic topic modeling via Latent Dirichlet Allocation, an unsupervised text miningtechnique, in combination with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifactby automatically analyzing more than 12,000 online customer reviews. For fellow information systems researchers,this tutorial provides some guidance for conducting text mining studies on their own and for evaluating the quality ofothers.
Original language | English |
---|---|
Article number | 7 |
Journal | Communications of the Association for Information Systems (CAIS) |
Volume | 39 |
Issue number | 1 |
Number of pages | 28 |
ISSN | 1529-3181 |
DOIs | |
Publication status | Published - 2016 |
Keywords
- Text Mining
- Topic Modeling
- Latent Dirichlet Allocation
- Online Customer Reviews
- User Satisfaction