Evaluation of topic models by topic coherence and perplexity.

figure

posted on 2023-04-20, 17:25 authored by Peter A. Takizawa

LDA Mallet was used to generate topic models for texts from the class of 2024. Models were generated for topic numbers from 10 to 150 at increments of 10. The quality of the model at each topic number was evaluated by topic coherence and perplexity on a held-out set of texts. Models with higher topic coherence scores have been found to generate topics that make more sense to human reviewers. Models with lower perplexity scores more accurately predict words in an unseen set of texts.

(TIF)

Evaluation of topic models by topic coherence and perplexity.

History

Usage metrics

Categories

Keywords

Licence

Exports