Book Review: Graph-Based Clustering and Data Visualization Algorithms

Graph-Based Clustering and Data Visualization Algorithms: A review by  Song Chen

ISBN: 978-1-4471-5157-9 (Print) –  978-1-4471-5158-6 (Online)

The book, authored by Ágnes Vathy-Fogarassy and Janos Abonyi presents the topic of graph-based clustering and presents several algorithms. Besides introducing several related methods in representing and clustering a network, the authors also proposed a novel clustering algorithm to cluster and visualize the datasets. The first part of the book talks about vector quantization methods and different ways to represent graphs with vectors and to reduce the dimensionalities. The second part of this book introduces a few clustering algorithms including Neighborhood Based algorithm, Minimal Spanning Tree Clustering and Jarvis-Patrick Clustering algorithms. The last part of this book deals with high dimensional data and introduces a few ways to reduce the dimensionalities.

Although this is a short book, it covers several of the commonly-used algorithms and addresses two of the most important questions in this field: (1) how to represent a network, and (2) how to discover clusters. The authors explain algorithms in detail and provide pseudo code examples to assist comprehension. It is also notable that different datasets were employed to implement the algorithms and to display the result on graphs, which goes further to assist in comprehension. The authors implemented these algorithms in Matlab and make the codes and functions available on their website, for readers.

In introducing different methods of dimensionality reduction, the authors introduced a novel approach that combines the Topology Representation Network-based geodisc distance with the multidimensional scaling method.  Unlike the Isomap algorithm, which cannot model multi-class problems which will easily fall in local minima, the proposed method, Topology Representing Network Map (TRNMap), is a group of unsupervised nonlinear mapping methods shows several benefits. TRNMap is a self-organizing model with no predefined structure. This provides an expressive presentation of high-dimensional data in a low-dimensional vector space. Since it is based on graph distances, it is therefore able to handle the set of data lying on a low-dimensional manifold that is nonlinearly embedded in a higher-dimensional input space. To illustrate this algorithm, the authors implemented it in different datasets and obtained desirable results. In addition, the authors presented two methods of partitioning a graph. One is a hierarchical clustering algorithm using Minimal Spanning Tree (MST) and Gath-Geva fuzzy method. The second method is to use a neighborhood-based fuzzy similarity measure to improve on a k-nearest neighbor graph. Although there are many other clustering algorithms that can be applied to graph clustering, the authors presented detailed illustrations on the two algorithms and applied them on a few sample datasets.

Overall, this is a good book that introduces a few methods in graph-based clustering. This book doesn’t intend to cover most of the existing algorithms on this topic, however it introduces the most commonly-used algorithms and presents them in detail. The author also presented their own novel approach and compared them with similar algorithms. Finally the book includes a good set of references, so this will provide a good source for interested readers to explore further in-depth material.



About Song Chen
Song Chen

Song Chen is currently a Lead Computer Scientist with MITRE Corporation. He holds a PhD degree in Information Systems from University of Maryland Baltimore County and Masters in Business Administration and Health Administration from University of Washington in Seattle. He also received a bachelor degree in Pharmaceutical Chemistry from Peking University. He has many years of experience in using data mining and machine learning algorithms for health-care fraud detection and anomaly detection. His research mainly focuses on applying data analytics to health care datasets and assists with improving the overall quality and effectiveness. His recent research involves using matrix vectorization methods to uncover the unusual relationships within a social network.