ICA 2007

ICA 2007
7th International Conference on
Independent Component Analysis
and Signal Separation

London, UK        9 - 12 September 2007

Banner showing images of London
- Home
- Committee
- Call for Papers
- Submission
- Info for Presenters
- Dates
- Programme
- Tutorials
- Keynotes
- Papers
- Registration
- Accommodation
- Venue
- Maps
- Arrival
- Travel Tips
- Links
- Contact

Paper No: 120

Text Clustering on Latent Thematic Spaces: Variants, Strengths and Weaknesses

Author(s): Xavier Sevillano, Germán Cobo, Francesc Alías, Joan Claudi Socoró

Abstract

Deriving a thematically meaningful partition of an unlabeled document corpus is a challenging task due to several issues, such as the difficulty of determining a priori the optimal document indexing technique to apply. This work presents an empirical comparison between several latent thematic generative models applied to the text clustering problem. As results demonstrate, document representations on latent thematic spaces can lead to improved clustering, but the superiority of none of these models can be guaranteed a priori. So as to overcome this situation, we propose creating consensus clusterings upon several document representations. Experiments conducted on subsets of two standard text corpora evaluate several clustering strategies applied on latent thematic spaces and highlight the appropriateness of our proposal.

Last Updated: 14-Aug-2007   Please read our disclaimer