This is a short tutorial on how to use Mapper. Before we begin, I would really appreciate your feedback on the software as well as this tutorial. If you want any new features or new/more tutorials or if you find bugs please contact me. You can find a copy of the paper and some presentation material here.
You will need the following:
At this point, the code requires the user to specify two data inputs and two parameters. The data required are:
- The inter-point distance matrix. This is an n by n matrix of distances from all points to all points.
- A real-valued function defined for each point. This is a real valued n by 1 array.
- The length of the intervals in the range of the filter. This is currently specified as the number of intervals in the range of the filter i.e. filter samples = 5 will set the length of each interval to (max(f) - min(f))/5.
- The percentage overlap between successive intervals. A value of 0.5 will result in successive
intervals which have a 50% overlap. Note that the percentage overlap does not depend on the length
of intervals. If filter samples is set to m and percentage overlap is set to p, the total number
of intervals will be
.
In this tutorial, we will generate a unit circle in 2D Euclidean Plane. We will use the distance from an arbitrary point on the circle as the filter. After running mapper, we use Graphviz for visualizing the output graph. The function writeDotFile writes the Graphviz input file using the output of mapper. Finally, the tutorial:
Contents
Generate a circle in 2 dimensions (300 points in 2 dimensions)
X = randn(300, 2); X = X./(sqrt(sum(X.*X,2))*ones(1, 2));
Find the Inter-point Distance Matrix
d = L2_distance(X',X',1);
Find the filter, here we use the distance from the first point
eccFilter = d(1, :); scatter(X(:,1), X(:,2), 1000, eccFilter, '.'); axis equal;
Parameters for Mapper
filterSamples = 5; overlapPct = 50;
Run Mapper
[adja, nodeInfo, levelIdx] = mapper(d, eccFilter, 1/filterSamples,...
overlapPct);
Mapper : Filter Range [0.00-2.00] Mapper : Interval Length : 0.40 Mapper : Overlap : 50.00 Mapper : magicFudge : 10.00 Mapper : Filter Indices from range [0.00-0.40] Mapper : Filter Indices from range [0.20-0.60] Mapper : Filter Indices from range [0.40-0.80] Mapper : Filter Indices from range [0.60-1.00] Mapper : Filter Indices from range [0.80-1.20] Mapper : Filter Indices from range [1.00-1.40] Mapper : Filter Indices from range [1.20-1.60] Mapper : Filter Indices from range [1.40-1.80] Mapper : Filter Indices from range [1.60-2.00] Mapper : Filter Indices from range [1.80-2.20] Mapper : Filter Indices from range [2.00-2.40] Mapper : Finished
Prepare inputs for GraphViz
For each node of the output graph, find the size (~ cardinality of the cluster) and the average function value of points in the cluster.
label{1} = sprintf('Dataset Name : test');
label{2} = sprintf('Filter Samples : %d', filterSamples);
label{3} = sprintf('Overlap Pct : %0.2f', overlapPct);
for i=1:length(nodeInfo)
ecc(i) = nodeInfo{i}.filter;
setSize(i) = length(nodeInfo{i}.set);
end
Generate the input to Graphviz
writeDotFile(sprintf('/tmp/t1.dot'), adja, ecc, setSize, label);
Execute Graphviz
system(sprintf('neato -Tpng /tmp/t1.dot -o /tmp/t1.png')); imshow('/tmp/t1.png')