Classifying nodes in a graph where labels are only available for a small subset of nodes.
Usually smoothing label information over graphs is done via graph-based regularization (see 1 2 3). This can be an issue as it assumes that connected nodes are likely to share the same label, which is not always the case. This methods encodes actual node similarity w.r.t to the vertex features.
Proposes learning a feature map , as a function of both the labels and the adjacency, so as to encode the node similarity and graph structure at the same time. These features are then optimised as loss terms, so the whole process is learned end-to-end.
This feature map is learned using Graph Convolutional Networks. The authors provide a derivation of the propagation model through first-order approximation of localized spatial filters 4 5. The main take away is that applications of a linear filter result in convolving the -order neighbourhood, as if is was a filter of degree . This is what makes the method tractable.
The authors also provide the propagation model as a derivation of the Weisfeiler-Lehman algorithm 6 in Appendix A.
- Document classification in citation networks with low label rates: Cora, PubMed, Citeseer
- Semi-supervised entity classification in bipartite graphs extracted from knowledge graphs: NELL
- Random graphs
The authors achieve high classification accuracies for all the proposed tasks, and compare different propagation models.
- Graph-based methods are a very exciting domain, with applications in domains like network analysis, power-grid balancing, computer graphics, relational networks (social, citation, political, …), video tracking, … Seeing that the semi-supervised setting is natural is very encouraging for applications in these domains (note that some limitations were presented in F.Huszar’s blog post, linked below)
- It is a powerful formulation for graphs where either the structure of the graph or the nodes have partial or missing information: inference can still be run.
- Can be extended to do graph-level (or subgraph) classification with a pooling layer (such as 7 8)
- Seems like there were quite a few memory issues in implementation
- How to handle mini-batches? Since the Adjacency matrix is needed, information about k-neighbours is needed for each iteration.
- The authors note that the method does not naturally support directed graphs, but propose to represent the original graph as a bipartite graph to model directed edges and edge features.
- The semi-supervised setting could be better defined, taking cues from the Realistic Evaluation of Semi-Supervised Learning Algorithms by Oliver et al. 10.
- What are the class distributions for unlabeled data? (Only label sparsity seems to be reported)
- How does the proportion of unlabeled data influence the classification quality (and the label propagation model)?
- How were the different sets sampled (1k labeled examples for test)?
- Ferenc Huszar post
- Thomas Kipf (author) post
- /r/MachineLearning discussion of @fhuszar’s post
- HN Discussion
- Discussion on Twitter with the Thomas Kipf, addressing some of the comments highlighted here