What?
Classifying nodes in a graph where labels are only available for a small subset of nodes.
Why?
Usually smoothing label information over graphs is done via graphbased regularization (see ^{1} ^{2} ^{3}). This can be an issue as it assumes that connected nodes are likely to share the same label, which is not always the case. This methods encodes actual node similarity w.r.t to the vertex features.
How?
Proposes learning a feature map , as a function of both the labels and the adjacency, so as to encode the node similarity and graph structure at the same time. These features are then optimised as loss terms, so the whole process is learned endtoend.
This feature map is learned using Graph Convolutional Networks. The authors provide a derivation of the propagation model through firstorder approximation of localized spatial filters ^{4} ^{5}. The main take away is that applications of a linear filter result in convolving the order neighbourhood, as if is was a filter of degree . This is what makes the method tractable.
The authors also provide the propagation model as a derivation of the WeisfeilerLehman algorithm ^{6} in Appendix A.
Evaluation
 Document classification in citation networks with low label rates: Cora, PubMed, Citeseer
 Semisupervised entity classification in bipartite graphs extracted from knowledge graphs: NELL
 Random graphs
The authors achieve high classification accuracies for all the proposed tasks, and compare different propagation models.
Comments
 Graphbased methods are a very exciting domain, with applications in domains like network analysis, powergrid balancing, computer graphics, relational networks (social, citation, political, …), video tracking, … Seeing that the semisupervised setting is natural is very encouraging for applications in these domains (note that some limitations were presented in F.Huszar’s blog post, linked below)
 It is a powerful formulation for graphs where either the structure of the graph or the nodes have partial or missing information: inference can still be run.
 Can be extended to do graphlevel (or subgraph) classification with a pooling layer (such as ^{7} ^{8})
 Seems like there were quite a few memory issues in implementation
Questions
 How to handle minibatches? Since the Adjacency matrix is needed, information about kneighbours is needed for each iteration.
 [EDIT] The authors have pointed out that this issue has been addressed in frameworks like GraphSAGE ^{9}
 The authors note that the method does not naturally support directed graphs, but propose to represent the original graph as a bipartite graph to model directed edges and edge features.
 The semisupervised setting could be better defined, taking cues from the Realistic Evaluation of SemiSupervised Learning Algorithms by Oliver et al. ^{10}.
 What are the class distributions for unlabeled data? (Only label sparsity seems to be reported)
 How does the proportion of unlabeled data influence the classification quality (and the label propagation model)?
 How were the different sets sampled (1k labeled examples for test)?
Resources
Code
Discussion
 /r/MachineLearning
 Ferenc Huszar post
 Thomas Kipf (author) post
 /r/MachineLearning discussion of @fhuszar’s post
 HN Discussion
 Discussion on Twitter with the Thomas Kipf, addressing some of the comments highlighted here
References

Learning with local and global consistency, by Zhou et al. link ↩

Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, by Belkin et al. link ↩

Deep learning via semisupervised embedding, by Weston et al. link ↩

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, by Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst link ↩

Wavelets on graphs via spectral graph theory, by Hammond et al. link ↩

On the Combinatorial Power of the WeisfeilerLehman Algorithm, by Marten Fürer link ↩

Gated Graph Sequence Neural Networks, by Yujia Li et al. link ↩

Convolutional Networks on Graphs for Learning Molecular Fingerprints, by David Duvenaud et al. link ↩

Inductive Representation Learning on Large Graphs, by Hamilton et al. link ↩

Realistic Evaluation of SemiSupervised Learning Algorithms, by Oliver et al. link ↩