\documentclass[twoside]{article}
\usepackage[T1]{fontenc}
\usepackage[latin9]{inputenc}
\usepackage{amssymb, amsmath}
\usepackage{mathrsfs}

\usepackage{esint}

\oddsidemargin  0in \evensidemargin 0in \topmargin -0.5in
\headheight 0.2in \headsep 0.2in
\textwidth   6.5in \textheight 9in 
\parskip 1.5ex  \parindent 0ex \footskip 40pt
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
\newcommand{\noun}[1]{\textsc{#1}}

\begin{document}

\framebox[6.4in]{
\begin{minipage}{6.4in}
  \vspace{1mm}
  \center \makebox[6.2in]{{\bf CS369M: Algorithms for Modern Massive Data Set Analysis \hfill Lecture 12 - 11/04/2009}} 
  \vspace{2mm} \\
  \center \makebox[6.2in]{{\Large Introduction to Graph Partitioning}} 
  \vspace{1mm} \\
  \center \makebox[6.2in]{{\it Lecturer: Michael Mahoney \hfill Scribes:  Noah Youngs and Weidong Shao}}
  \vspace{1mm}
\end{minipage}
} \vspace{2mm} \\
\mbox{{ \it *Unedited Notes}}

\section{Graph Partition}

A graph partition problem is to cut a graph into 2 or more ``good'' pieces. The methods are based on 

\begin{enumerate}
\item spectral. Either global (e.g., Cheeger inequality,) or local. 
\item flow-based. min-cut/max-flow theorem. LP formulation. Embeddings. Local Improvement.
\item combination of spectral and flow.
\end{enumerate}

Note that not all graphs have good partitions.

Question: Can we certify that there are no good clusters in a graph?

``Good'' clusters have the following properties:
\begin{enumerate}

\item  internally (intra) - well connected. 
\item externally (inter) - relatively poor

\end{enumerate}
How do we quantify this?

 Extreme cases:
\begin{enumerate}
\item split into           2  disconnected pieces
\item  split into $S, \bar S$ on 2 maximum complete induced subgraphs. 


\end{enumerate}

\section{Min cut problem}
\underbar{\noun{Define}} Given $G=(V, E)$,  a cut is a partition of $V$, $(S, \bar S)$, where  $S \subset V$.\\
Given $s, t \in V$, an $(s, t)$ cut is a cut s.t. $s\in S, t\in \bar S$ \\
A cut set of a cut is ${(u, v): (u, v) \in E, u\in S, v\in \bar S }$

The min cut problem: find the cut of "smallest" edge weights

\begin{enumerate}
\item good: Polynomial time algorithm   (min-cut = max flow)
\item bad:   often get very inbalanced cut

  \item in theory: cut algorithms are used as a sub-routine in divide and conquer algorithm
      \item   in practice: often want to "interpret" the clusters or partitions 
\end{enumerate}


\section{Max Flow Problem}



\underbar{\noun{Define}} Call the capacity of an edge $(u,v)\in E$ : $e_{uv}$ \\
Let there be a cost function: $c: E \rightarrow R^+ $ , delineated $c_{uv}$ or $c_{e}$\\
Then a flow is function of $f: E \rightarrow R^+ $

\begin{enumerate}
\item $f_{uv} \le C_{uv}    \forall u, v$ (capacity constraints)
\item $\sum_{(u,v)\in E} f_{uv} = \sum f_{vu}$   (conservation of flows)
\end{enumerate}

Then the value of the flow 
\[ |f| = \sum _v f_{s v} \]


The MAX flow problem:

\[   \max |f|\]

The capacity of $(s, t)$ cut is $c(S, \bar S)  =\sum  C_{uv}$.

The min cut problem is 

\[\min C(S, T)\]


Note: this is a "single  flow problem" ... i.e. only one $s$ and one $t$

Theorem: the max value of an $s-t$ flow is equal to the min capacity of an $s-t$ cut.

Proof idea:

$\max flow \le \min cut$   (weak duality)

Does there exists a cut that achieves equality?\\
Yes, from the strong duality theorem we can also solve the dual of the max-flow problem, which is the min-flow problem

Primal: (max flow)


\[ \max |f| \]
subject to
\[ f_{uv} \le C_{uv}\]


Dual: (min cut)

\[\min \sum_{(i, j) \in E} c_{ij} d_{ij}  \]
s.t.
\[    d_{ij} - p_i + p_j \ge 0, ij\in E \]
\[  p_s=1, p_t=0, p_i \geq 0,  \in V \]
\[ d_{ij} \geq 0, ij \in E \]


Can we add a "balance" condition?

\begin {enumerate}
\item  want a good cut value  $E(S, \bar S)$
\item  want $S, \bar S$ both to be balanced    - same size, or approximately same size
\end{enumerate}

the answer is "Yes"\\

Explicit balance conditions:\\
Graph bisection - min cut s.t. $|S| = |\bar S| = n/2$\\
$\beta$ balanced cut
   min cut s.t  $|S| = \beta n $, $  |\bar S| = (1-\beta) n$\\

Implicit Balance conditions:



\begin{enumerate}

\item  input balance constraints
 
\item expansion.    $\frac{E(S, \bar S)}{\frac{|S| }{n}}$ (def this as :h(S) )
\item  sparsity   $\frac{E(S, \bar S)}{|S| |\bar S|}$   (def this as :sp(S) )
\item conductance $\frac{E(S, \bar S)}{\frac{Vol(S)}{n}}$ (with $Vol(S) = \sum_{ij\in E}{ deg(V_i)}$
\item normalized cut    $\frac{E(S, \bar S)}{vol(|S|) vol( |\bar S|)}$

    (latter two are used in ML)
\item quotien cut $\frac{E(S, \bar S)}{min(vol(|S|), vol( |\bar S|))}$
\end{enumerate}
expansion and sparcity: are "same"  (in the following sense:)

\[\min h(S) \approx \min  sp(S)\]

Quotient cuts yield a tight bound on cheeger inequality\\
In-practice:  bias towards high degree nodes\\


 


Note: 

quotient cuts get balanced implicitly, no explicit constraints on inter or intra connectivity
  

$Z^2$ on random geometric graps or nice planer graphs yield good quotient cuts

More generally, 
  - very inbalanced
   - disconnected clusters. 
\\
\\
\\
Example:
extremely sparse random graph   
$G(n, p)$ model, $p \ge \log n^2 / n $  expander
  $ p ~ log n/n  $   




\section{Graph Partition Algorithms}

\subsection{Local Improvement}

Developed in the 70's\\  
    Often it is a greedy improvemnt\\
    Local minima are a big problem\\
    Usual methods improve them by constant factors\\
   - simulated annealing\\
    - big difference in practice\\

     Kernighan-Lin algorithm, fundamental work, no-longer used due to $\Theta (n^2)$ performance

     Fiduccia-Mattheyses algorithm, linear time, still commonly used

     METIS algorithm from Karypis and Kumar, works very well in practice, especially on low dimensional graphs

 





\subsection{Spectral methods}

   Develped in the 70's and 80's\\
   Serivce level gaurantee  (Cheeger's inequality)\\
   At root, this is relaxation or rounding method    related to QIP formualation :

$  MAX_{ x\in (-1. 1)^n}    \frac{x^t L x }{x^t x }   $ \\ \\
  - quadratic worst case.

\begin{itemize}

\item hyperplane rounding:\\
-compute an eigenvector  \\ 
- cut according to some rules\\
 - post processing with local improvments

\end{itemize}
\subsection{Flow-based methods}

Developed in the 90's\\
Consider all pairs, multi-commodity flow problem.\\
Want to route the commodities s.t. the constraints are satisfied without bottlenecks.

Idea: bottleneck in flow computation corresponds to good cuts.\\

 $k-$commodity  problem:   does not satisfy strong duality. 
          does satisfy approx min-cut max flow
    value gap $\le \Theta(log n)$
\begin{itemize}

 \item  releax flow to LP
 \item  embed solution in $l_1$
  \item Round soltuion to ${0,1}$, $\Theta(\log n)$ worst case. 
\end{itemize}


\subsection{Additional Graph Partitioning Notes}

 These methods "fail".... i.e. achieve the worst case, on the following graphs:\\
 - spectral methods - fail on long stringy pieces     ----- -------------- \\
 - flow-based methods -   fail on expander graphs. n choose 2 pairs but most pairs are far apart.  (log n) apart. \\


Improvements/extensions for large data:

  there exist hybrid flow based and local methods\\ 
     (cut around the cut)
   local spectrum methods \\
      --- good cut  around a start node of a given size\\
      -- time depends on the size of the output. 


 \subsection{Methods that combine spectral and flow}

\begin{itemize}
  \item ARV algorithm (developed a few years ago by Arora, Rao, and Vazirani)
\item most hyrbid algorithms are theoretical, but some implementations embed in SDP. 
  \item approximate solution  
     (two-player game). 
  \item boosting \& emsemble methods
\end{itemize}


\section{References}
\begin{enumerate}
\item Schaeffer, "Graph Clustering", Computer Science Review 1(1): 27-64, 2007
\item Kernighan, B. W.; Lin, Shen (1970). "An efficient heuristic procedure for partitioning graphs". Bell Systems Technical Journal 49: 291-307.
\item CM Fiduccia, RM Mattheyses. "A Linear-Time Heuristic for Improving Network Partitions". Design Automation Conference. 
\item G Karypis, V Kumar (1999). "A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs". Siam Journal on Scientific Computing.
\end{enumerate}
\end{document}
