Header menu link for other important links
X
WikiLDA: Towards more effective knowledge acquisition in topic models using wikipedia
Published in Association for Computing Machinery, Inc
2017
Abstract
Towards the goal of enhancing interpretability of Latent Dirichlet Allocation (LDA) topics, we propose WikiLDA, an enhancement to LDA using Wikipedia concepts. In WikiLDA, initially, for each document in a corpus we "sprinkle" (append) its most relevant Wikipedia concepts. We then use Generalized Pólya Urn (GPU) to incorporate word-word, word-concept, and concept-concept semantic relatedness into the generative process of LDA. As the most probable concepts from inferred topics can be referred on Wikipedia, the topics are likely to become more interpretable and hence more usable in acquiring domain knowledge from humans for various text mining tasks (e.g. eliciting topic labels for text classification). Empirical results show that a projection of documents by WikiLDA in a semantically enriched and coherent topic space leads to improved performance in text classification like tasks, especially in domains where the classes are hard to separate. © 2017 Copyright held by the owner/author(s).
About the journal
JournalData powered by TypesetProceedings of the Knowledge Capture Conference, K-CAP 2017
PublisherData powered by TypesetAssociation for Computing Machinery, Inc
Open AccessNo
Concepts (17)
  •  related image
    Classification (of information)
  •  related image
    Information retrieval systems
  •  related image
    Knowledge acquisition
  •  related image
    Mergers and acquisitions
  •  related image
    Natural language processing systems
  •  related image
    Semantics
  •  related image
    Statistics
  •  related image
    Text processing
  •  related image
    Domain knowledge
  •  related image
    GENERATIVE PROCESS
  •  related image
    Interpretability
  •  related image
    LATENT DIRICHLET ALLOCATIONS
  •  related image
    Semantic relatedness
  •  related image
    Text classification
  •  related image
    Text mining
  •  related image
    TOPIC MODEL
  •  related image
    Data mining