Back RSS stream

Publications of Jérôme Darmont

Reference (inproceedings)

X. Wang, J. Ah-Pine, J. Darmont, "SHCoClust, a Scalable Similarity-based Hierarchical Co-clustering Method and its Application to Textual Collections", IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 17), Naples, Italy, July 2017 (ID F-0529).

BibTeX entry

@INPROCEEDINGS{fuzzieee17,
     Author = {Xinyu Wang and Julien Ah-Pine and Jérôme Darmont},
     Title = {SHCoClust, a Scalable Similarity-based Hierarchical Co-clustering Method and its Application to Textual Collections},
     Booktitle = {IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 17), Naples, Italy},
     Month = {July},
     Year = {2017},
     Note = {ID F-0529},
     Abstract = {In comparison with flat clustering methods, such as K-means, hierarchical clustering and co-clustering methods are more advantageous, for the reason that hierarchical clustering is capable to reveal the internal connections of clusters, and co-clustering can yield clusters of data instances and features. Interested in organizing co-clusters in hierarchies and in discovering cluster hierarchies insides co-clusters, in this paper, we propose SHCoClust, a scalable similarity-based hierarchical co-clustering method. Except possessing the above-mentioned advantages in unison, SHCoClust is able to employ kernel functions, thanks to its utilization of inner product. Furthermore, having all similarities between 0 and 1, the input of SHCoClust can be sparsified by threshold values, so that less memory and less time are required for storage and for computation. This grants SHCoClust scalability, i.e, the ability to process relatively large datasets with reduced and limited computing resources. Our experiments demonstrate that SHCoClust significantly outperforms the hierarchical clustering and the co-clustering methods, on which it is built upon. In addition, thresholding on the input similarity matrices obtained by linear kernel and by Gaussian kernel, SHCoClust is capable to guarantee its clustering quality, even with its input being largely sparsified, when up to 90% memory gain and time gain are obtained. }
}

[ Export | Back ]