[NOTES] Semi-Supervised Classification with Graph Convolutional Networks

What?

Classifying nodes in a graph where labels are only available for a small subset of nodes.

Why?

Usually smoothing label information over graphs is done via graph-based regularization (see ¹ ² ³). This can be an issue as it assumes that connected nodes are likely to share the same label, which is not always the case. This methods encodes actual node similarity w.r.t to the vertex features.

How?

Proposes learning a feature map $f(X,A)$ , as a function of both the labels and the adjacency, so as to encode the node similarity and graph structure at the same time. These features are then optimised as loss terms, so the whole process is learned end-to-end.

This feature map is learned using Graph Convolutional Networks. The authors provide a derivation of the propagation model through first-order approximation of localized spatial filters ⁴ ⁵. The main take away is that $k$ applications of a linear filter result in convolving the $k^{th}$ -order neighbourhood, as if is was a filter of degree $k$ . This is what makes the method tractable.

The authors also provide the propagation model as a derivation of the Weisfeiler-Lehman algorithm ⁶ in Appendix A.

Evaluation

Document classification in citation networks with low label rates: Cora, PubMed, Citeseer
Semi-supervised entity classification in bipartite graphs extracted from knowledge graphs: NELL
Random graphs

The authors achieve high classification accuracies for all the proposed tasks, and compare different propagation models.

Comments

Graph-based methods are a very exciting domain, with applications in domains like network analysis, power-grid balancing, computer graphics, relational networks (social, citation, political, …), video tracking, … Seeing that the semi-supervised setting is natural is very encouraging for applications in these domains (note that some limitations were presented in F.Huszar’s blog post, linked below)
It is a powerful formulation for graphs where either the structure of the graph or the nodes have partial or missing information: inference can still be run.
Can be extended to do graph-level (or subgraph) classification with a pooling layer (such as ⁷ ⁸)
Seems like there were quite a few memory issues in implementation

Questions

How to handle mini-batches? Since the Adjacency matrix is needed, information about k-neighbours is needed for each iteration.
- [EDIT] The authors have pointed out that this issue has been addressed in frameworks like GraphSAGE ⁹
The authors note that the method does not naturally support directed graphs, but propose to represent the original graph as a bipartite graph to model directed edges and edge features.
The semi-supervised setting could be better defined, taking cues from the Realistic Evaluation of Semi-Supervised Learning Algorithms by Oliver et al. ¹⁰.
- What are the class distributions for unlabeled data? (Only label sparsity seems to be reported)
- How does the proportion of unlabeled data influence the classification quality (and the label propagation model)?
- How were the different sets sampled (1k labeled examples for test)?

Resources

Code

Tensorflow Implementation

Discussion

References

Learning with local and global consistency, by Zhou et al. link ↩
Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, by Belkin et al. link ↩
Deep learning via semi-supervised embedding, by Weston et al. link ↩
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, by Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst link ↩
Wavelets on graphs via spectral graph theory, by Hammond et al. link ↩
On the Combinatorial Power of the Weisfeiler-Lehman Algorithm, by Marten Fürer link ↩
Gated Graph Sequence Neural Networks, by Yujia Li et al. link ↩
Convolutional Networks on Graphs for Learning Molecular Fingerprints, by David Duvenaud et al. link ↩
Inductive Representation Learning on Large Graphs, by Hamilton et al. link ↩
Realistic Evaluation of Semi-Supervised Learning Algorithms, by Oliver et al. link ↩