Titel thesis A comparative study of text classification algorithms, TF‐IDF with K‐Means clustering and LDA on Dutch online news articles and domain specific forum documents Taal thesis (Nederlands/Engels/enz). Engels Titelblad met naam student, onderdeel, datum, namen 1e en 2e beoordelaar, openbaar tonen, aanwezig? ja Samenvatting thesis Over the past years, different machine learning algorithms have proven to be successful in the task of automated text classification. Based on previous results, this study compares Term Frequency Inverse Document Frequency (TF-IDF) with K-Means clustering and Latent Dirichlet Allocation (LDA) in a text classification task of Dutch text corpora. The text corpora used in this study consists of a Dutch news articles corpus and a Dutch forum thread corpus. Performances of both classifiers are analyzed by their scores on Precision, Recall and F1-score. The results of this experiment show that TF-IDF with K-Means clustering outperforms LDA on both text corpora. However, further research should determine if the results remain stable when expanding the number of topics. A comparative study of text classification algorithms, TF‐IDF with K‐Means clustering and LDA on Dutch online news articles and domain specific forum documents

DSpace/Manakin Repository

 
 
See more statistics about this item