View Proposal


Proposer
Pierre Le Bras
Title
Heterogeneous Ensemble Topic Models
Goal
To evaluate the quality of ensemble topic models generated from heterogeneous methods
Description
Ensemble approaches to modelling topics from large test corpora have shown to improve topic coherency topics and model stability, compared to traditional approaches. However, most attempts so far have concentrated on the evaluation of homogeneous ensembles. The emergence of new topic modelling systems (e.g., BERTopic, Top2Vec) offers the possibility to explore heterogeneous ensembles, mixing these new approaches to classical ones (e.g., NMF, LDA). This project would involve the integration of multiple topic modelling technique in one system, followed by the computation of ensemble topic models (topical alignment and/or weighted term co-association), and finally the evaluation of several metrics of interests.
Resources
Belford, M. and Greene, D., 2020. Ensemble topic modeling using weighted term co-associations. In Expert systems with applications, 161. doi:10.1016/j.eswa.2020.113709. Chuang, J., Roberts, M.E., Stewart, B.M., Weiss, R., Tingley, D., Grimmer, J. and Heer, J., 2015. TopicCheck: Interactive alignment for assessing topic model stability. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 175-184). Egger, R. and Yu, J., 2022. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Frontiers in sociology, 7, p.886498.
Background
Url
Difficulty Level
Challenging
Ethical Approval
None
Number Of Students
2
Supervisor
Pierre Le Bras
Keywords
topic modelling, large language models, ensemble methods
Degrees
Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Computing (2 Years)
Master of Science in Data Science
Bachelor of Science in Computing Science