Reading Cluto Files in Matlab (Part 1)
Update: I never completed this series, but I do have a Cluto sparse matrix reading for Matlab. Contact me for information, or check out readCluto on Matlab central.
Today, let’s see a Matlab solution to reading and writing CLUTO data files. CLUTO is a clustering toolkit by George Karypis at U of Minn. There are FOUR possible input files CLUTO might see.
- Dense Graph
- Sparse Graph
- Dense Matrix
- Sparse Matrix
The difference between the dense and sparse files is simply a matter of header information. Let’s describe a few CLUTO files.
- Suppose we have a 5-node line graph
v1 <-> v2 <-> v3 <-> v4 <-> v5
In dense graph format, this graph is
5 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0
In sparse graph format, this graph is
5 8 2 1 1 1 3 1 2 1 4 1 3 1 5 1 4 1
To wit, the dense graph format is merely an explicit specification of the adjacency matrix for the graph with a single line specifying the number of vertices. The sparse graph format is a sparse adjacency representation. The sparse adjacency is somewhat strange in that it uses 1 based columns, and implicit rows.
More formally, the sparse adjacency structure has 1 line of header information:
<number of vertices> <number of edges*2>
and the i+1th line of the file contains
adj_1 weight_1 adj_2 weight_2 … adj_d weight_d
where adj_j is the jth adjacent vertex and weight_j is the weight of that edge and d is the degree of the ith vertex. The input must be symmetric.
The sparse and dense matrix file formats are similar. The difference is the matrices involved are not square which changes the header.
- dense matrix header:
<number of rows> <number of columns> - sparse matrix header:
<number of rows> <number of columns> <number of nonzeros>
The dense matrix format is just a row-by-row listing of the elements of the matrix. The sparse matrix format is a sparse row-by-row listing. In the sparse matrix format, the i+1th line of the file contains information about the non-zeros in row i.
column_1 value_1 column_2 value_2 … column_d value_d
Coming next, we’ll see how to read these files in Matlab using a combination of mex files and scripts.