View Proposal
-
Proposer
-
Pierre Le Bras
-
Title
-
Heterogeneous Ensemble Topic Models
-
Goal
-
To evaluate the quality of ensemble topic models generated from heterogeneous methods
-
Description
- Ensemble approaches to modelling topics from large test corpora have shown to improve topic coherency topics and model stability, compared to traditional approaches.
However, most attempts so far have concentrated on the evaluation of homogeneous ensembles. The emergence of new topic modelling systems (e.g., BERTopic, Top2Vec) offers the possibility to explore heterogeneous ensembles, mixing these new approaches to classical ones (e.g., NMF, LDA).
This project would involve the integration of multiple topic modelling technique in one system, followed by the computation of ensemble topic models (topical alignment and/or weighted term co-association), and finally the evaluation of several metrics of interests.
- Resources
-
Belford, M. and Greene, D., 2020. Ensemble topic modeling using weighted term co-associations. In Expert systems with applications, 161. doi:10.1016/j.eswa.2020.113709.
Chuang, J., Roberts, M.E., Stewart, B.M., Weiss, R., Tingley, D., Grimmer, J. and Heer, J., 2015. TopicCheck: Interactive alignment for assessing topic model stability. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 175-184).
Egger, R. and Yu, J., 2022. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Frontiers in sociology, 7, p.886498.
-
Background
-
-
Url
-
-
Difficulty Level
-
Challenging
-
Ethical Approval
-
None
-
Number Of Students
-
2
-
Supervisor
-
Pierre Le Bras
-
Keywords
-
topic modelling, large language models, ensemble methods
-
Degrees
-
Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Computing (2 Years)
Master of Science in Data Science
Bachelor of Science in Computing Science