MagNet: A Neural Network for Directed Graphs

3. MagNet

Most graph neural network architectures can be described as being either spectral or spatial . Spatial networks such as [ 43 , 21 , 1 , 14 ] typically extend convolution to graphs by performing a weighted average of features over neighborhoods 𝒩( u ) = { v : ( u , v ) ∈ E }. These neighborhoods are well-defined even when E is not symmetric, so spatial methods typically have natural extensions to directed graphs. However, such simplistic extensions may miss important information in the directed graph. For example, filters defined using 𝒩( u ) are not capable of assimilating the equally important information contained in { v : ( v , u ) ∈ E }. Alternatively, these methods may also use the symmetrized adjacency matrix, but they cannot learn to balance directed and undirected approaches.

In this section, we show how to extend spectral methods to directed graphs using the magnetic Laplacian introduced in Section 2 . To highlight the flexibility of our approach, we show how three spectral graph neural network architectures can be adapted to incorporate the magnetic Laplacian. Our approach is very general, and so for most of this section, we will perform our analysis for a general complex Hermitian, positive semidefinite matrix. However, we view the magnetic Laplacian as our primary object of interest (and use it in all of our experiments in Section 5 ) because of the large body of literature studying its spectral properties and applying it to data science (see Section 4 ).

3.1. Spectral convolution via the magnetic Laplacian

In this section, we let 𝓛 denote a Hermitian, positive semidefinite matrix, such as the normalized or unnormalized magnetic Laplacian introduced in Section 2 , on a directed graph G = ( V , E ), | V | = N . We let u ₁ …, u _N be an orthonormal basis of eigenvectors for 𝓛 and let U be the N × N matrix whose k -th column is u _k . We define the directed graph Fourier transform for a signal x : V → ℂ by $\hat{x} = U^{†} x$ , so that $\hat{x} (k) = 〈 x, u_{k} 〉$ . We regard the eigenvectors u ₁ , …, u _N as the generalizations of discrete Fourier modes to directed graphs. Since U is unitary, we have the Fourier inversion formula $x = U \hat{x} = \sum_{k = 1}^{N} \hat{x} (k) u_{k} .$

(2)

In Euclidean space, convolution corresponds to pointwise multiplication in the Fourier basis. Thus, we define the convolution of x with a filter y in the Fourier domain by $\hat{y * x} (k) = \hat{y} (k) \hat{x} (k)$ . By ( 2 ), this implies $y * x = U D iag (\hat{y}) \hat{x} = (U D iag (\hat{y}) U^{†}) x$ , and so we say Y is a convolution matrix if $Y = U Σ U^{†},$ for a diagonal matrix Σ . This is the natural generalization of the class of convolutions used in [ 6 ].

Next, following [ 13 ] (see also [ 22 ]), we show that a spectral network can be implemented in the spatial domain via polynomials of 𝓛 by having Σ be a polynomial of Λ in ( 3 ). This reduces the number of trainable parameters to prevent overfitting, avoids explicit diagonalization of the matrix 𝓛, (which is expensive for large graphs), and improves stability to perturbations [ 27 ]. As in [ 13 ], we define a normalized eigenvalue matrix, with entries in [−1, 1], by $\tilde{Λ} = \frac{2}{λ_{max}} Λ - I$ and assume $Σ = \sum_{k = 0}^{K} θ_{k} T_{k} (\tilde{Λ})$ for some real-valued θ ₁ , …, θ _k , where T _k is the Chebyshev polynomial defined by T ₀ ( x ) = 1, T ₁ ( x ) = x , and T _k ( x ) = 2 xT _{k

−1} ( x ) + T _{k

−2} ( x ) for k ≥ 2. With ${(U \tilde{Λ} U^{†})}^{k} = U {\tilde{Λ}}^{k} U^{†}$ , one has $Y x = U \sum_{k = 0}^{K} θ_{k} T_{k} (\tilde{Λ}) U^{†} x = \sum_{k = 0}^{K} θ_{k} T_{k} (\tilde{𝓛}) x,$ where, analogous to $\tilde{Λ}$ , we define $\tilde{𝓛} ≔ \frac{2}{λ_{max}} 𝓛 - I$ . It is important to note that, due to the complex Hermitian structure of $\tilde{𝓛}$ , the value Yx ( u ) aggregates information both from the values of x on 𝒩 _k ( u ), the k -hop neighborhood of u , and the values of x on { v : dist( v , u ) ≤ k }, which consists of those of vertices that can reach u in k -hops. While in an undirected graph these two sets of vertices are the same, that is not the case for general directed graphs. Furthermore, due to the difference in phase between an edge ( u , v ) and an edge ( v , u ), the filter matrix Y is also capable of aggregating information from these two sets in different ways. This capability is in contrast to any single, symmetric, real-valued matrix, as well as any matrix that encodes just 𝒩( u ).

To obtain a network similar to [ 24 ], we set K = 1, assume that $𝓛 = L_{N}^{(q)}$ , using λ _max ≤ 2 make the approximation λ _max ≈ 2, and set θ ₁ = − θ ₀ . With this, we obtain $Y x = θ_{0} (I + (D_{s}^{- 1 / 2} A_{s} D_{s}^{- 1 / 2}) ⊙ exp (i Θ^{(q)})) x .$

As in [ 24 ], we substitute $I + (D_{s}^{- 1 / 2} A_{s} D_{s}^{- 1 / 2}) ⊙ exp (i Θ^{(q)}) \to {\tilde{D}}_{s}^{- 1 / 2} {\tilde{A}}_{s} {\tilde{D}}_{s}^{- 1 / 2} ⊙ exp (i Θ^{(q)})$ . This renormalization helps avoid instabilities arising from vanishing/exploding gradients and yields $Y x = θ_{0} {\tilde{D}}_{s}^{- 1 / 2} {\tilde{A}}_{s} {\tilde{D}}_{s}^{- 1 / 2} ⊙ exp (i Θ^{(q)}),$ where ${\tilde{A}}_{s} = A_{s} + I$ and ${\tilde{D}}_{s} (i, i) = \sum_{j} {\tilde{A}}_{s} (i, j)$ .

In theory, the matrix exp( i Θ ^{(

q

)} ) is dense. However, in practice, one only needs to compute a small fraction of its entries. In most real-world datasets, the symmetrized adjacency matrix will be sparse. Since the Hermitian adjacency matrix is constructed via pointwise multiplication between the symmetrized adjacency matrix and the phase matrix, it is only necessary to compute the phase matrix for entries ( u , v ) where A _s ( u , v ) ≠ 0. Thus, the efficiency of the proposed algorithm is comparable to standard GCN algorithms, and can leverage any existing developments such as [ 18 ] that increase efficiency of standard GCNs (although the computational complexity our method does differ by a factor of four because of the computational complexity of complex-valued multiplication).

3.2. The MagNet architecture

Let L be the number of convolution layers in our network, and let X ⁽⁰⁾ be an N × F ₀ input feature matrix with columns $x_{1}^{(0)}, \dots X_{F_{0}}^{(0)}$ . Since our filters are complex, we use a complex version of ReLU defined by σ ( z ) = z , if − π /2 ≤ arg( z ) < π /2, and σ ( z ) = 0 otherwise (where arg( z ) is the complex argument of z ∈ ℂ). Let F _ℓ be the number of channels in layer ℓ , and for 1 ≤ ℓ ≤ L , 1 ≤ i ≤ F _{ℓ

−1} , and 1 ≤ j ≤ F _ℓ , we let $Y_{i j}^{(ℓ)}$ be a convolution matrix defined in the sense of either ( 3 ), ( 4 ), or ( 5 ). Define the ℓ ^th layer feature matrix X ^{(

ℓ

)} with columns $X_{1}^{(ℓ)}, \dots X_{F_{ℓ}}^{(ℓ)}$ as: $x_{j}^{(ℓ)} = σ (\sum_{i = 1}^{F_{ℓ - 1}} Y_{i j}^{(ℓ)} x_{i}^{(ℓ - 1)} + b_{j}^{(ℓ)}),$ with $b_{j}^{(ℓ)} (v) = b_{j}^{(ℓ)}$ and $real (b_{j}^{(ℓ)}) = imag (b_{j}^{(ℓ)})$ . In matrix form we write X ^{(

ℓ

)} = Z ^{(

ℓ

)} ( X ^{(

ℓ

−1)} ), where Z ^{(

ℓ

)} is a hidden layer of the form ( 6 ). In the numerical experiments reported in Section 5 , we utilize formulation ( 4 ) with $𝓛 = L_{N}^{(q)}$ . In most cases we set K = 1, for which $X^{(ℓ)} = σ (X^{(ℓ - 1)} W_{self}^{(ℓ)} + {\tilde{L}}_{N}^{(q)} X^{(ℓ - 1)} W_{neigh}^{(ℓ)} + B^{(ℓ)}),$ where $W_{self}^{(ℓ)}$ and $W_{neigh}^{(ℓ)}$ are learned weight matrices corresponding to the filter weights in ( 4 ), and $B^{(ℓ)} (v, \cdot) = (b_{1}^{(ℓ)}, \dots, b_{F_{ℓ}}^{(ℓ)})$ for each v ∈ V .

After the convolutional layers, we unwind the complex N × F _L matrix X ^{(

L

)} into a real-valued N ×2 F _L matrix, apply a linear layer, consisting of right-multiplication by a 2 F _L × n _c weight matrix W ^{(

L

+1)} (where n _c is the number of classes) and apply softmax. In our experiments, we set L = 2 or 3. When L = 2, our network applied to node classification, as illustrated in Figure 1 , is given by $softmax (unwind (Z^{(2)} (Z^{(1)} (X^{(0)}))) W^{(3)}) .$

An external file that holds a picture, illustration, etc. Object name is nihms-1829243-f0001.jpg

Open in a separate window

Figure 1:

MagNet ( L = 2) applied to node classification.

For link-prediction, we apply the same method through the unwind layer, and then concatenate the rows corresponding to pairs of nodes to obtain the edge features.

4. Related work

In Section 4.1 , we describe other graph neural networks designed specifically for directed graphs. Notably, none of these methods encode directionality with complex numbers, instead opting for real-valued, symmetric matrices. In Section 4.2 , we review other work studying the magnetic Laplacian which has been studied for several decades and lately has garnered interest in the network science and graph signal processing communities. However, to the best of our knowledge, this is the first work to use it to construct a graph neural network. We also note there are numerous approaches to graph signal processing on directed graphs. Many of these rely on a natural analog of Fourier modes. These Fourier modes are typically defined through either a factorization of a graph shift operator or by solving an optimization problem. For further review, we refer the reader to [ 30 ].

4.1. Neural networks for directed graphs

In [ 29 ], the authors construct a directed Laplacian, via identities involving the random walk matrix and its stationary distribution Π . When G is undirected, one can use the fact that Π is proportional to the degree vector to verify this directed Laplacian reduces to the standard normalized graph Laplacian. However, this method requires G to be strongly connected, unlike MagNet. The authors of [ 42 ] use a first-order proximity matrix A _F (equivalent to A _s here), as well as two second-order proximity matrices $A_{S_{in}}$ and $A_{S_{out}}$ . $A_{S_{in}}$ is defined by $A_{S_{in}} (u, v) \neq 0$ if there exists a w such that ( w , u ), ( w , v ) ∈ E , and $A_{S_{out}}$ is defined analogously. These three matrices collectively describe and distinguish the neighborhood of each vertex and those vertices that can reach a vertex in a single hop. The authors construct three different Laplacians and use a fusion operator to share information across channels. Similarly, inspired by [ 3 ], in [ 33 ], the authors consider several different symmetric Laplacian matrices corresponding to a number of different graph motifs.

The method of [ 41 ] builds upon the ideas of both [ 29 ] and [ 42 ] and considers a directed Laplacian similar to the one used in [ 29 ], but with a PageRank matrix in place of the random-walk matrix. This allows for applications to graphs which are not strongly connected. Similar to [ 42 ], they use higher-order receptive fields (analogous to the second-order adjacency matrices discussed above) and an inception module to share information between receptive fields of different orders. We also note [ 25 ], which uses an approach based on PageRank in the spatial domain. There are also some related methods for directed graphs that are not based on the graph Laplacian, such as the directed graph embedding [ 39 ], and directed message passing for molecular graphs [ 26 ].

4.2. Related work on the magnetic Laplacian and Hermitian adjacency matrices

The magnetic Laplacian has been studied since at least [ 28 ]. The name originates from its interpretation as a quantum mechanical Hamiltonian of a particle under magnetic flux. Early works focused on d -regular graphs, where the eigenvectors of the magnetic Laplacian are equivalent to those of the Hermitian adjacency matrix. The authors of [ 20 ], for example, show that using a complex-valued Hermitian adjacency matrix rather than the symmetrized adjacency matrix reduces the number of small, non-isomorphic cospectral graphs. Topics of current research into Hermitian adjacency matrices include clustering tasks [ 12 ] and the role of the parameter q [ 32 ].

The magnetic Laplacian is also the subject of ongoing research in graph signal processing [ 19 ], community detection [ 17 ], and clustering [ 10 , 16 , 15 ]. For example, [ 16 ] uses the phase of the eigenvectors to construct eigenmap embeddings analogous to [ 2 ]. The role of q is highlighted in the works of [ 16 , 17 , 20 , 32 ], which show how particular choices of q may highlight various graph motifs. In our context, this indicates that q should be carefully tuned via cross-validation. Lastly, we note that numerous other directed graph Laplacians have been studied and applied to data science [ 7 , 8 , 35 ]. However, as alluded to in Section 2 , these methods typically do not use complex Hermitian matrices.

5. Numerical experiments

5.1. Datasets

5.1.1. Directed Stochastic Block Model

We construct a directed stochastic block (DSBM) model as follows. First we divide N vertices into n _c equally-sized clusters $C_{1}, \dots, C_{n_{c}}$ . We define ${α_{i, j}}_{1 \leq i, j \leq n_{c}}$ to be a collection of probabilities, 0 < α _{i

,

j} ≤ 1 with α _{i

,

j} = α _{j

,

i} , and for an unordered pair u ≠ v create an undirected edge between u and v with probability α _{i

,

j} if u ∈ C _i , v ∈ C _j . To turn this undirected graph into a directed graph, we define ${β_{i, j}}_{1 \leq i, j \leq n_{c}}$ to be a collection of probabilities such that 0 ≤ β _{i

,

j} ≤ 1 and β _{i

,

j} + β _{j

,

i} = 1. For each undirected edge { u , v }, we assign that edge a direction by the rule that the edge points from u to v with probability β _{i

,

j} if u ∈ C _i and v ∈ C _j , and points from v to u otherwise. If α _{i

,

j} is constant, then the only way to determine the clusters will be from the directional information.

In Figure 3 , we plot the performance of MagNet and other methods on variations of the DSBM. In each of these, we set n _c = 5 and the goal is to classify the vertices by cluster. We set N = 2500, except in Figure 3d where N = 500. In Figure 3a , we plot the performance of our model on the DSBM with α _{i

,

j} ≔ α * = .1, .08, and .05 for i ≠ j , which varies the density of inter-cluster edges, and set α _{i

,

i} = .1. Here we set β _{i

,

i} = .5 and β _{i

,

j} = .05 for i > j . This corresponds to the ordered meta-graph in Figure 2a . Figure 3b also uses the ordered meta-graph, but here we fix α _{i

,

j} = .1 for all i , j , and set β _{i

,

j} = β *, for i > j , and allow β * to vary from .05 to .4, which varies the net flow (related to flow imbalance in [ 23 ]) from one cluster to another. The results in Figure 3c utilize a cyclic meta-graph structure as in Figure 2b (without the gray noise edges). Specifically, we set α _{i

,

j} = .1 if i = j or i = j ± 1 mod 5 and α _{i

,

j} = 0 otherwise. We define β _{i

,

j} = β *, β _{j

,

i} = 1 − β * when j = ( i − 1) mod 5, and β _{i

,

j} = 0 otherwise. In Figure 3d we add noise to the cyclic structure of our meta-graph by setting α _{i

,

j} = .1 for all i , j and β _{i

,

j} = .5 for all ( i , j ) connected by a gray edge in Figure 2b (keeping β _{i

,

j} the same as in Figure 3c for the blue edges).

An external file that holds a picture, illustration, etc. Object name is nihms-1829243-f0002.jpg

Open in a separate window

Figure 2:

Meta-graphs for the synthetic data sets.

An external file that holds a picture, illustration, etc. Object name is nihms-1829243-f0003.jpg

Open in a separate window

Figure 3:

Node classification accuracy. Error bars are one standard error. MagNet is bold red.

5.1.2. Real datasets

Texas , Wisconsin , and Cornell are WebKB datasets modeling links between websites at different universities [ 36 ]. We use these datasets for both link prediction and node classification with nodes labeled as student, project, course, staff, and faculty in the latter case. Telegram [ 5 ] is a pairwise influence network between 245 Telegram channels with 8, 912 links. To the best of our knowledge, this dataset has not previously been studied in the graph neural network literature. Labels are generated from the method discussed in [ 5 ], with a total of four classes. The datasets Chameleon and Squirrel [ 37 ] represent links between Wikipedia pages related to chameleons and squirrels. We use these datasets for link prediction. Likewise, WikiCS [ 31 ] is a collection of Computer Science articles, which we also use for link prediction (see the tables in Section 7 of the supplement ). Cora-ML and CiteSeer are popular citation networks with node labels corresponding to scientific subareas. We use the versions of these datasets provided in [ 4 ]. Further details are given in the supplementary material .

5.2. Training and implementation details

Node classification is performed in a semi-supervised setting (i.e., access to the test data, but not the test labels, during training). For the datasets Cornell , Texas , Wisconsin , and Telegram we use a 60%/20%/20% training/validation/test split, which might be viewed as more akin to supervised learning, because of the small graph size. For Cora-ML and CiteSeer , we use the same split as [ 41 ]. For all of these datasets we use 10 random data splits. For the DSBM datasets, we generated 5 graphs randomly for each type and for each set of parameters, each with 10 different random node splits. We use 20% of the nodes for validation and we vary the proportion of training samples based on the classification difficulty, using 2%, 10%, and 60% of nodes per class for the ordered, cyclic, and noisy cyclic DSBM graphs, respectively, during training, and the rest for testing. Hyperpameters were selected using one of the five generated graphs, and then applied to the other four generated graphs.

In the main text, there are two types of link prediction tasks conducted for performance evaluation. The first type is to predict the edge direction of pairs of vertices u , v for which either ( u , v ) ∈ E or ( v , u ) ∈ E . The second type is existence prediction. The model is asked to predict if ( u , v ) ∈ E by considering ordered pairs of vertices ( u , v ). For both types of link prediction, we removed 15% of edges for testing, 5% for validation, and use the rest of the edges for training. The connectivity was maintained during splitting. 10 splits were generated randomly for each graph and the input features are in-degree and out-degree of nodes. In the supplement, we report on two additional link prediction tasks based on a three-class classification setup: ( u , v ) ∈ E , ( v , u ) ∈ E , or ( u , v ), ( v , u ) ∉ E . Full details are provided in the supplementary material .

In all experiments, we used the normalized magnetic Laplacian and implement MagNet with convolution defined as in ( 4 ), meaning that our network may be viewed as the magnetic Laplacian generalization of ChebNet. The setting of the hyperparameter q and other network hyperparameters is obtained by cross-validation. Since currently complex tensors are still in beta in PyTorch, we did not use them, and instead we stored any complex tensor as two real tensors (one for the real part, one for the imaginary part), and carried out complex multiplication using the standard formula: ( a + ib )( c + id ) = ( ac − bd ) + i ( bc + ad ) (note, a , b , c , d can be real numbers or real matrices). We compare with multiple baselines in three categories: (i) spectral methods: ChebNet [ 13 ], GCN [ 24 ]; (ii) spatial methods: APPNP [ 25 ], SAGE [ 21 ], GIN [ 45 ], GAT [ 43 ]; and (iii) methods designed for directed graphs: DGCN [ 42 ], and two variants of [ 41 ], a basic version (DiGraph) and a version with higher order inception blocks (DiGraphIB). All baselines in the experiments have two graph convolutional layers, except for the node classification on the DSBM using the cyclic meta-graphs ( Figures 3c , ,3d, 3d , and and2b) 2b ) for which we also tested three layers during the hyperparameter search. For ChebNet, we use the symmetrized adjacency matrix. For the spatial networks we apply both the symmetrized and asymmmetric adjacency matrix for node classification. The results reported are the better of the two results. The supplement provides full details, as well as results for two other types of baselines: (i) BiGCN, BiSAGE, BiGAT which are obtained by applying GCN, SAGE, GAT on both the original adjacency matrix and the transposed adjacency matrix; and (ii) a k -nearest neighbors classifier based on the eigenvector with the smallest eigenvalue of the magnetic Laplacian [ 17 ].

5.3. Results

We see that MagNet performs well across all tasks. As indicated in Table 1 , our cross-validation procedure selects q = 0 for node classification on the citation networks Cora-ML and CiteSeer . This means we achieved the best performance when regarding directional information as noise, suggesting symmetrization-based methods are appropriate in the context of node classification on citation networks. This matches our intuition. For example, in Cora-ML , the task is to classify research papers by scientific subarea. If the topic of a given paper is “machine learning,” then it is likely to both cite and be cited by other machine learning papers. For all other datasets, we find the optimal value of q is nonzero, indicating that directional information is important. Our network exhibits the best performance on three out of six of these datasets and is a close second on Texas and Telegram . We also achieve an at least four percentage point improvement over both ChebNet and GCN on the four data sets for which q > 0. These networks are similar to ours but with the classical graph Laplacian. This isolates the effects of the magnetic Laplacian and shows that it is a valuable tool for encoding directional information. MagNet also compares favorably to non-spectral methods on the WebKB networks ( Cornell , Texas , Wisconsin ). Indeed, MagNet obtains a ~ 4% improvement on Cornell and a ~ 2.5% improvement on Wisconsin , while on Texas it has the second best accuracy, close behind SAGE. We also see the other directed methods have relatively poor performance on the WebKB networks, perhaps since these graphs are fairly small and have very few training samples. To make this analysis more quantitative, we computed the absolute difference of the classification accuracy of each method from the classification accuracy of the top performing method (in percentage points) on each data set, and averaged over the six data sets. In this context, lower scores are better, and a method with a score of zero indicates the method is the top performing method on each data set. As reported in Table 1 , MagNet achieved a best score of 1.1 percent.

Table 1:

Node classification accuracy (%). The best results are in bold and the second are underlined .

Type	Method	Cornell	Texas	Wisconsin	Cora-ML	CiteSeer	Telegram	Score
Spectral	ChebNet	79.8±5.0	79.2±7.5	81.6±6.3	80.0±1.8	66.7±1.6	70.2 ±6.8	6.94
Spectral	GCN	59.0±6.4	58.7±3.8	55.9±5.4	82.0±1.1	66.0±1.5	73.4 ±5.8	19.16
Spatial	APPNP	58.7±4.0	57.0±4.8	51.8±7.4	82.6 ± 1.4	66.9±1.8	67.3±3.0	18.75
	SAGE	80.0±6.1	84.3 ± 5.5	83.1±4.8 82.3±1.2	66.0±1.5	66.4±6.4
	GIN	57.9±5.7	65.2±6.5	58.2±5.1	78.1±2.0	63.3±2.5	86.4±4.3	16.53
	GAT	57.6±4.9	61.1±5.0	54.1±4.2	81.9±1.0	67.3±1.3	72.6±7.5	16.39
Directed	DGCN	67.3±4.3	71.7±7.4	65.5±4.7	81.3±1.4	66.3±2.0	90.4 ± 5.6	8.55
	Digraph	66.8±6.2	64.9±8.1	59.6±3.8	79.4±1.8	62.6±2.2	82.0±3.1	15.70
	DiGraphIB	64.4±9.0	64.9±13.7	64.1±7.0	79.3±1.2	61.1±1.7	64.1±7.0	16.36
This paper	MagNet	84.3 ± 7.0	83.3±6.1	85.7 ± 3.2	79.8±2.5	67.5 ± 1.8	87.6±2.9
This paper	Best q	0.25	0.15	0.05	0.0	0.0	0.15	-

Open in a separate window

On the DSBM datasets, as illustrated in Figure 3 (see also the tables in Section 7 of the supplement ), we see that MagNet generally performs quite well and is the best performing network in the vast majority of cases (for full details, see Section 7 of the supplement ). The networks DGCN and DiGraphIB rely on second order proximity matrices. As demonstrated in Figure 3c , these methods are well suited for networks with a cyclic meta-graph structure since nodes in the same cluster are likely to have common neighbors. MagNet, on the other hand, does not use second-order proximity, but can still learn the clusters by stacking multiple layers together. This improves MagNet’s ability to adapt to directed graphs with different underlying topologies. This is illustrated in Figure 3d where the network has an approximately cyclic meta-graph structure. In this setting, MagNet continues to perform well, but the performance of DGCN and DiGraphIB deteriorate significantly. Interestingly, MagNet performs well on the DSBM cyclic meta-graph ( Figure 3c ) with q ≈ .1, whereas q ≥ .2 is preferred for the other three DSBM tests; we leave a more in-depth investigation for future work. Further details are available in Section 8 of the supplement .

For link prediction, we achieve the best performance on seven out of eight tests as shown in Table 2 . We also note that Table 2 reports optimal non-zero q values for each task. This indicates that incorporating directional information is important for link prediction, even on citation networks such as Cora and CiteSeer . This matches our intuition, since there is a clear difference between a paper with many citations and one with many references. More results on different datasets, and closely related tasks (including three-class classification), are provided in Section 7 in the supplement .

Table 2:

Link prediction accuracy (%). The best results are in bold and the second are underlined .

	Direction prediction				Existence prediction
	Cornell	Wisconsin	Cora-ML	CiteSeer	Cornell	Wisconsin	Cora-ML	CiteSeer
ChebNet	71.0±5.5	67.5±4.5	72.7±1.5	68.0±1.6	80.1±2.3	82.5±1.9	80.0±0.6	77.4±0.4
GCN	56.2±8.7	71.0±4.0	79.8±1.1	68.9±2.8	75.1±1.4	75.1±1.9	81.6±0.5	76.9±0.5
APPNP	69.5±9.0	75.1±3.5	83.7±0.7	77.9±1.6	74.9±1.5	75.7±2.2	82.5±0.6	78.6±0.7
SAGE	75.2±11.0	72.0±3.5	68.2±0.8	68.7±1.5	79.8±2.4	77.3±2.9	75.0±0.0	74.1±1.0
GIN	69.3±6.0	74.8±3.7	83.2±0.9	76.3±1.4	74.5±2.1	76.2±1.9	82.5±0.7	77.9±0.7
GAT	67.9±11.1	53.2±2.6	50.0±0.1	50.6±0.5	77.9±3.2	74.6±0.0	75.0±0.0	75.0±0.0
DGCN	80.7±6.3	74.5±7.2	79.6±1.5	78.5±2.3	80.0±3.9	82.8 ± 2.0	82.1±0.5	81.2±0.4
DiGraph	79.3±1.9	82.3±4.9	80.8±1.1	81.0±1.1	80.6±2.5	82.8 ± 2.6	81.8±0.5	82.2 ± 0.6
DiGraphIB	79.8±4.8	82.0±4.9	83.4±1.1	82.5±1.3	80.5±3.6	82.4±2.2	82.2±0.5	81.0±0.5
MagNet	82.9 ± 3.5	83.3 ± 3.0	86.5 ± 0.7	84.8 ± 1.2	81.1 ± 3.3	82.8 ± 2.2	82.7 ± 0.7	79.9±0.6
Best q	0.20	0.10	0.20	0.15	0.25	0.05	0.05	0.05

Open in a separate window

6. Conclusion

We have introduced MagNet, a neural network for directed graphs based on the magnetic Laplacian. This network can be viewed as the natural extension of spectral graph convolutional networks to the directed graph setting. We demonstrate the effectiveness of our network, and the importance of incorporating directional information via a complex Hermitian matrix, for link prediction and node classification on both real and synthetic datasets. Interesting avenues of future research would be using multiple q ’s along different channels, exploring the role of different normalizations of the magnetic Laplacian, and incorporating the magnetic Laplacian into other network architectures.

Limitations and Ethical Considerations:

Our method has natural extensions to weighted, directed graphs when all edges are directed. However, it not clear what is the best way to extend it to weighted mixed graphs (with both directed and undirected edges). Our network does not incorporate an attention mechanism and, similar to many other networks, is not scalable to large graphs in its current form (although this may be addressed in future work). All of our data is publicly available for research purposes and does not contain personally identifiable information or offensive content. The method presented here has no greater or lesser impact on society than other graph neural network algorithms.

Supplementary Material

Supplement

Click here to view. ^{(719K, pdf)}

Acknowledgments

We would like to thank Jie Zhang who pointed out that our definition of the magnetic Laplacian differed by an entry-wise complex conjugation from the most commonly used definition in the literature. Y.H. thanks her supervisors Mihai Cucuringu and Gesine Reinert for their guidance.

This work was supported by the National Institutes of Health [grant NIGMS-R01GM135929 to M.H. and supporting X.Z, N.B.]; the University of Oxford [the Clarendon scholarship to Y.H.]; the National Science Foundation [grant DMS-1845856 to M.H.]; and the Department of Energy [grant DE-SC0021152 to M.H.].

Footnotes

¹ Our definition of Θ coincides with that used in [ 19 ]. However, another definition (differing by a minus sign) also appears in the literature. These resulting magnetic Laplacians have the same eigenvalues and the corresponding eigenvectors are complex conjugates of one another. Therefore, this difference does not affect the performance of our network since our final layer separates the real and imaginary parts before multiplying by a trainable weight matrix (see Section 3 for details on the network structure).

References

[1] Atwood James and Towsley Don. Diffusion-convolutional neural networks. In Lee D, Sugiyama M, Luxburg U, Guyon I, and Garnett R, editors, Advances in Neural Information Processing Systems, volume 29 , pages 1993–2001. Curran Associates, Inc., 2016. [ Google Scholar ]

[2] Belkin Mikhail and Niyogi Partha. Laplacian eigenmaps for dimensionality reduction and data representation . Neural computation , 15 ( 6 ):1373–1396, 2003. [ Google Scholar ]

[3] Benson Austin R, Gleich David F, and Leskovec Jure. Higher-order organization of complex networks . Science , 353 ( 6295 ):163–166, 2016. [ PMC free article ] [ PubMed ] [ Google Scholar ]

[4] Bojchevski Aleksandar and Günnemann Stephan. Deep Gaussian embedding of graphs: Unsupervised inductive learning via ranking . In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2017. [ Google Scholar ]

[5] Bovet Alexandre and Grindrod Peter. The activity of the far right on telegram . https://www.researchgate.net/publication/346968575_The_Activity_of_the_Far_Right_on_Telegram_v21 , 2020.

[6] Bruna Joan, Zaremba Wojciech, Szlam Arthur, and LeCun Yann. Spectral networks and deep locally connected networks on graphs . In International Conference on Learning Representations (ICLR), 2014. [ Google Scholar ]

[7] Chung Fan. Laplacians and the Cheeger inequality for directed graphs . Annals of Combinatorics , 9 ( 1 ):1–19, 2005. [ Google Scholar ]

[8] Chung Fan and Kempton Mark. A local clustering algorithm for connection graphs. In International Workshop on Algorithms and Models for the Web-Graph , pages 26–43. Springer, 2013. [ Google Scholar ]

[9] Chung Fan RK and Graham Fan Chung. Spectral graph theory. Number 92 . American Mathematical Soc., 1997. [ Google Scholar ]

[10] Cloninger Alexander. A note on markov normalized magnetic eigenmaps . Applied and Computational Harmonic Analysis , 43 ( 2 ):370–380, 2017. [ Google Scholar ]

[11] Coifman Ronald R and Lafon Stéphane. Diffusion maps . Applied and computational harmonic analysis , 21 ( 1 ):5–30, 2006. [ Google Scholar ]

[12] Cucuringu Mihai, Li Huan, Sun He, and Zanetti Luca. Hermitian matrices for clustering directed graphs: insights and applications. In International Conference on Artificial Intelligence and Statistics , pages 983–992. PMLR, 2020. [ Google Scholar ]

[13] Defferrard Michaël, Bresson Xavier, and Vandergheynst Pierre. Convolutional neural networks on graphs with fast localized spectral filtering . In Advances in Neural Information Processing Systems 29 , pages 3844–3852, 2016. [ Google Scholar ]

[14] Duvenaud David K, Maclaurin Dougal, Iparraguirre Jorge, Bombarell Rafael, Hirzel Timothy, Aspuru-Guzik Alan, and Adams Ryan P. Convolutional networks on graphs for learning molecular fingerprints. In Cortes C, Lawrence N, Lee D, Sugiyama M, and Garnett R, editors, Advances in Neural Information Processing Systems , volume 28 , pages 2224–2232. Curran Associates, Inc., 2015. [ Google Scholar ]

[15] de Resende Bruno Messias F. and da F. Costa Luciano. Characterization and comparison of large directed networks through the spectra of the magnetic laplacian . Chaos: An Interdisciplinary Journal of Nonlinear Science , 30 ( 7 ):073141, 2020. [ PubMed ] [ Google Scholar ]

[16] Fanuel Michaël, Alaíz Carlos M., Fernández Ángela, and Suykens Johan A.K.. Magnetic eigenmaps for the visualization of directed networks . Applied and Computational Harmonic Analysis , 44 :189–199, 2018. [ Google Scholar ]

[17] Fanuel Michaël, Alaiz Carlos M, and Suykens Johan AK. Magnetic eigenmaps for community detection in directed networks . Physical Review E , 95 ( 2 ):022302, 2017. [ PubMed ] [ Google Scholar ]

[18] Fey Matthias, Lenssen Jan E, Weichert Frank, and Leskovec Jure. GNNAutoScale: Scalable and expressive graph neural networks via historical embeddings . In International Conference on Machine Learning (ICML), 2021. [ Google Scholar ]

[19] Furutani Satoshi, Shibahara Toshiki, Akiyama Mitsuaki, Hato Kunio, and Aida Masaki. Graph signal processing for directed graphs based on the hermitian laplacian . In Machine Learning and Knowledge Discovery in Databases, pages 447–463, 2020. [ Google Scholar ]

[20] Guo Krystal and Mohar Bojan. Hermitian adjacency matrix of digraphs and mixed graphs . Journal of Graph Theory , 85 ( 1 ):217–248, 2017. [ Google Scholar ]

[21] Hamilton William L., Ying Rex, and Leskovec Jure. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17 , page 1025–1035, Red Hook, NY, USA, 2017. Curran Associates Inc. [ Google Scholar ]

[22] Hammond David K, Vandergheynst Pierre, and Gribonval Rémi. Wavelets on graphs via spectral graph theory . Applied and Computational Harmonic Analysis , 30 ( 2 ):129–150, 2011. [ Google Scholar ]

[23] He Yixuan, Reinert Gesine, and Cucuringu Mihai. Digrac: Digraph clustering with flow imbalance . arXiv preprint arXiv:2106.05194, 2021. [ Google Scholar ]

[24] Kipf Thomas N. and Welling Max. Semi-supervised classification with graph convolutional networks . In International Conference on Learning Representations (ICLR), 2017. [ Google Scholar ]

[25] Klicpera Johannes, Bojchevski Aleksandar, and Günnemann Stephan. Predict then propagate: Graph neural networks meet personalized pagerank . In ICLR, 2019. [ Google Scholar ]

[26] Klicpera Johannes, Groß Janek, and Günnemann Stephan. Directional message passing for molecular graphs . In International Conference on Learning Representations, 2019. [ Google Scholar ]

[27] Levie Ron, Huang Wei, Bucci Lorenzo, Bronstein Michael M, and Kutyniok Gitta. Transferability of spectral graph convolutional neural networks . arXiv preprint arXiv:1907.12972, 2019. [ Google Scholar ]

[28] Lieb Elliott H and Loss Michael. Fluxes, laplacians, and kasteleyn’s theorem. In Statistical Mechanics , pages 457–483. Springer, 1993. [ Google Scholar ]

[29] Ma Yi, Hao Jianye, Yang Yaodong, Li Han, Jin Junqi, and Chen Guangyong. Spectral-based graph convolutional network for directed graphs . arXiv :1907.08990, 2019. [ Google Scholar ]

[30] Marques Antonio G, Segarra Santiago, and Mateos Gonzalo. Signal processing on directed graphs: The role of edge directionality when processing and learning from network data . IEEE Signal Processing Magazine , 37 ( 6 ):99–116, 2020. [ Google Scholar ]

[31] Mernyei Péter and Cangea Cătălina. Wiki-cs: A wikipedia-based benchmark for graph neural networks . arXiv preprint arXiv:2007.02901, 2020. [ Google Scholar ]

[32] Mohar Bojan. A new kind of hermitian matrices for digraphs . Linear Algebra and its Applications , 584 :343–352, 2020. [ Google Scholar ]

[33] Monti Federico, Otness Karl, and Bronstein Michael M.. Motifnet: A motif-based graph convolutional network for directed graphs . In 2018 IEEE Data Science Workshop , pages 225–228, 2018. [ Google Scholar ]

[34] Ortega Antonio, Frossard Pascal, Kovačević Jelena, Moura José MF, and Vandergheynst Pierre. Graph signal processing: Overview, challenges, and applications . Proceedings of the IEEE , 106 ( 5 ):808–828, 2018. [ Google Scholar ]

[35] Palmer William R and Zheng Tian. Spectral clustering for directed networks . Studies in Computational Intelligence , 943 , 2021. [ Google Scholar ]

[36] Pei Hongbin, Wei Bingzhe, Chang Kevin Chen-Chuan, Lei Yu, and Yang Bo. Geom-gcn: Geometric graph convolutional networks . arXiv preprint arXiv:2002.05287, 2020. [ Google Scholar ]

[37] Rozemberczki Benedek, Allen Carl, and Sarkar Rik. Multi-scale attributed node embedding . arXiv preprint arXiv:1909.13021, 2019. [ Google Scholar ]

[38] Shi Jianbo and Malik Jitendra. Normalized cuts and image segmentation. In Proceedings of IEEE computer society conference on computer vision and pattern recognition , pages 731–737. IEEE, 1997. [ Google Scholar ]

[39] Sim Aaron, Wiatrak Maciej, Brayne Angus, Creed Páidí, and Paliwal Saee. Directed graph embeddings in pseudo-riemannian manifolds. In Meila Marina and Zhang Tong, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research , pages 9681–9690. PMLR, 2021. [ Google Scholar ]

[40] Spielman Daniel A and Teng Shang-Hua. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems . In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 81–90, 2004. [ Google Scholar ]

[41] Tong Zekun, Liang Yuxuan, Sun Changsheng, Li Xinke, Rosenblum David, and Lim Andrew. Digraph inception convolutional networks . In NeurIPS, 2020. [ Google Scholar ]

[42] Tong Zekun, Liang Yuxuan, Sun Changsheng, Rosenblum David S., and Lim Andrew. Directed graph convolutional network . arXiv :2004.13970, 2020. [ Google Scholar ]

[43] Veličković Petar, Cucurull Guillem, Casanova Arantxa, Romero Adriana, Liò Pietro, and Bengio Yoshua. Graph Attention Networks . International Conference on Learning Representations, 2018. [ Google Scholar ]

[44] Wu Zonghan, Pan Shirui, Chen Fengwen, Long Guodong, Zhang Chengqi, and Yu Philip S.. A comprehensive survey on graph neural networks . IEEE Transactions on Neural Networks and Learning Systems , 32 ( 1 ):4–24, 2020. [ PubMed ] [ Google Scholar ]

[45] Xu Keyulu, Hu Weihua, Leskovec Jure, and Jegelka Stefanie. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018. [ Google Scholar ]

[46] Zhou Jie, Cui Ganqu, Zhang Zhengyan, Yang Cheng, Liu Zhiyuan, Wang Lifeng, Li Changcheng, and Sun Maosong. Graph neural networks: A review of methods and applications . arXiv preprint arXiv:1812.08434, 2018. [ Google Scholar ]