Introduction

Graph Neural Networks (GNNs) have emerged as a transformative paradigm in machine learning and artificial intelligence. The ubiquitous presence of interconnected data in various domains, from social networks and biology to recommendation systems and cybersecurity, has fueled the rapid evolution of GNNs. These networks have displayed remarkable capabilities in modeling and understanding complex relationships, making them pivotal in solving real-world problems that traditional machine-learning models struggle to address. GNNs’ unique ability to capture intricate structural information inherent in graph-structured data is significant. This information often manifests as dependencies, connections, and contextual relationships essential for making informed predictions and decisions. Consequently, GNNs have been adopted and extended across various applications, redefining what is possible in machine learning.

In this comprehensive review, we embark on a journey through the multifaceted landscape of Graph Neural Networks, encompassing an array of critical aspects. Our study is motivated by the ever-increasing literature and diverse perspectives within the field. We aim to provide researchers, practitioners, and students with a holistic understanding of GNNs, serving as an invaluable resource to navigate the intricacies of this dynamic field. The scope of this review is extensive, covering fundamental concepts that underlie GNNs, various architectural designs, techniques for training and inference, prevalent challenges and limitations, the diversity of datasets utilized, and practical applications spanning a myriad of domains. Furthermore, we delve into the intriguing future directions that GNN research will likely explore, shedding light on the exciting possibilities.

In recent years, deep learning (DL) has been called the gold standard in machine learning (ML). It has also steadily evolved into the most widely used computational technique in ML, producing excellent results on various challenging cognitive tasks, sometimes even matching or outperforming human ability. One benefit of DL is its capacity to learn enormous amounts of data [1]. GNN variations such as graph convolutional networks (GCNs), graph attention networks (GATs), and GraphSAGE have shown groundbreaking performance on various deep learning tasks in recent years [2].

A graph is a data structure that consists of nodes (also called vertices) and edges. Mathematically, it is defined as G = (V, E), where V denotes the nodes and E denotes the edges. Edges in a graph can be directed or undirected based on whether directional dependencies exist between nodes. A graph can represent various data structures, such as social networks, knowledge graphs, and protein–protein interaction networks. Graphs are non-Euclidean spaces, meaning that the distance between two nodes in a graph is not necessarily equal to the distance between their coordinates in an Euclidean space. This makes applying traditional neural networks to graph data difficult, as they are typically designed for Euclidean data.

Graph neural networks (GNNs) are a type of deep learning model that can be used to learn from graph data. GNNs use a message-passing mechanism to aggregate information from neighboring nodes, allowing them to capture the complex relationships in graphs. GNNs are effective for various tasks, including node classification, link prediction, and clustering.

Organization of paper

The paper is organized as follows:

  1. 1)

    The primary focus of this research is to comprehensively examine Concepts, Architectures, Techniques, Challenges, Datasets, Applications, and Future Directions within the realm of Graph Neural Networks.

  2. 2)

    The paper delves into the Evolution and Motivation behind the development of Graph Neural Networks, including an analysis of the growth of publication counts over the years.

  3. 3)

    It provides an in-depth exploration of the Message Passing Mechanism used in Graph Neural Networks.

  4. 4)

    The study presents a concise summary of GNN learning styles and GNN models, complemented by an extensive literature review.

  5. 5)

    The paper thoroughly analyzes the Advantages and Limitations of GNN models when applied to various domains.

  6. 6)

    It offers a comprehensive overview of GNN applications, the datasets commonly used with GNNs, and the array of Python libraries that support GNN models.

  7. 7)

    In addition, the research identifies and addresses specific research gaps, outlining potential future directions in the field.

"Introduction" section describes the Introduction to GNN. "Background study" section provides background details in terms of the Evolution of GNN. "Research motivation" section describes the research motivation behind GNN. Section IV describes the GNN message-passing mechanism and the detailed description of GNN with its Structure, Learning Styles, and Types of tasks. "GNN Models and Comparative Analysis of GNN Models" section describes the GNN models with their literature review details and comparative study of different GNN models. "Graph Neural Network Applications" section describes the application of GNN. And finally, future direction and conclusions are defined in "Future Directions of Graph Neural Network" and "Conclusions" sections, respectively. Figure 1 gives the overall structure of the paper.

Fig. 1
figure 1

The overall structure of the paper

Background study

As shown in Fig. 2 below, the evolution of GNNs started in 2005. For the past 5 years, research in this area has been going into great detail. Neural graph networks are being used by practically all researchers in fields such as NLP, computer vision, and healthcare.

Fig. 2
figure 2

Year-wise publication count of GNN (2005–2022)

Graph neural network research evolution

Graph neural networks (GNNs) were first proposed in 2005, but only recently have they begun to gain traction. GNNs were first introduced by Gori [2005] and Scarselli [2004, 2009]. A node's attributes and connected nodes in the graph serve as its natural definitions. A GNN aims to learn a state embedding hv ε Rs that encapsulates each node's neighborhood data. The distribution of the expected node label is one example of the output. An s-dimension vector of node v, the state embedding hv, can be utilized to generate an output Ov, such as the anticipated distribution node name. The predicted node label (Ov) distribution is created using the state embedding hv [30]. Thomas Kipf and Max Welling introduced the convolutional graph network (GCN) in 2017. A GCN layer defines a localized spectral filter's first-order approximation on graphs. GCNs can be thought of as convolutional neural networks that have been expanded to handle graph-structured data.

Graph neural network evolution

As shown in Fig. 3 below, research on graph neural networks (GNNs) began in 2005 and is still ongoing. GNNs can define a broader class of graphs that can be used for node-focused tasks, edge-focused tasks, graph-focused tasks, and many other applications. In 2005, Marco Gori introduced the concept of GNNs and defined recursive neural networks extended by GNNs [4]. Franco Scarselli also explained the concepts for ranking web pages with the help of GNNs in 2005 [5]. In 2006, Swapnil Gandhi and Anand Padmanabha Iyer of Microsoft Research introduced distributed deep graph learning at scale, which defines a deep graph neural network [6]. They explained new concepts such as GCN, GAT, etc. [1]. Pucci and Gori used GNN concepts in the recommendation system.

Fig. 3
figure 3

Graph Neural Network Evolution

2007 Chun Guang Li, Jun Guo, and Hong-gang Zhang used a semi-supervised learning concept with GNNs [7]. They proposed a pruning method to enhance the basic GNN to resolve the problem of choosing the neighborhood scale parameter. In 2008, Ziwei Zhang introduced a new concept of Eigen-GNN [8], which works well with several GNN models. In 2009, Abhijeet V introduced the GNN concept in fuzzy networks [9], proposing a granular reflex fuzzy min–max neural network for classification. In 2010, DK Chaturvedi explained the concept of GNN for soft computing techniques [10]. Also, in 2010, GNNs were widely used in many applications. In 2010, Tanzima Hashem discussed privacy-preserving group nearest neighbor queries [11]. The first initiative to use GNNs for knowledge graph embedding is R-GCN, which suggests a relation-specific transformation in the message-passing phases to deal with various relations.

Similarly, from 2011 to 2017, all authors surveyed a new concept of GNNs, and the survey linearly increased from 2018 onwards. Our paper shows that GNN models such as GCN, GAT, RGCN, and so on are helpful [12].

Literature review

In the Table 1 describe the literature survey on graph neural networks, including the application area, the data set used, the model applied, and performance evaluation. The literature is from the years 2018 to 2023.

Table 1 GNN papers with their performance

Research motivation

We employ grid data structures for normalization of image inputs, typically using an n*n-sized filter. The result is computed by applying an aggregation or maximum function. This process works effectively due to the inherent fixed structure of images. We position the grid over the image, move the filter across it, and derive the output vector as depicted on the left side of Fig. 4. In contrast, this approach is unsuitable when working with graphs. Graphs lack a predefined structure for data storage, and there is no inherent knowledge of node-to-neighbor relationships, as illustrated on the right side of Fig. 4. To overcome this limitation, we focus on graph convolution.

Fig. 4
figure 4

CNN In Euclidean Space (Left), GNN In Euclidean Space (Right)

In the context of GCNs, convolutional operations are adapted to handle graphs’ irregular and non-grid-like structures. These operations typically involve aggregating information from neighboring nodes to update the features of a central node. CNNs are primarily used for grid-like data structures, such as images. They are well-suited for tasks where spatial relationships between neighboring elements are crucial, as in image processing. CNNs use convolutional layers to scan small local receptive fields and learn hierarchical representations. GNNs are designed for graph-structured data, where edges connect entities (nodes). Graphs can represent various relationships, such as social networks, citation networks, or molecular structures. GNNs perform operations that aggregate information from neighboring nodes to update the features of a central node. CNNs excel in processing grid-like data with spatial dependencies; GNNs are designed to handle graph-structured data with complex relationships and dependencies between entities.

Limitation of CNN over GNN

Graph Neural Networks (GNNs) draw inspiration from Convolutional Neural Networks (CNNs). Before delving into the intricacies of GNNs, it is essential to understand why Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) may not suffice for effectively handling data structured as graphs. As illustrated in Fig. 5, Convolutional Neural Networks (CNNs) are designed for data that exhibits a grid structure, such as images. Conversely, Recurrent Neural Networks (RNNs) are tailored to sequences, like text.

Fig. 5
figure 5

Convolution can be performed if the input is an image using an n*n mask (Left). Convolution can't be achieved if the input is a graph using an n*n mask. (Right)

Typically, we use arrays for storage when working with text data. Likewise, for image data, matrices are the preferred choice. However, as depicted in Fig. 5, arrays and matrices fall short when dealing with graph data. In the case of graphs, we require a specialized technique known as Graph Convolution. This approach enables deep neural networks to handle graph-structured data directly, leading to a graph neural network.

Fig. 5 illustrates that we can employ masking techniques and apply filtering operations to transform the data into vector form when we have images. Conversely, traditional masking methods are not applicable when dealing with graph data as input, as shown in the right image.

Graph neural network

Graph Neural Networks, or GNNs, are a class of neural networks tailored for handling data organized in graph structures. Graphs are mathematical representations of nodes connected by edges, making them ideal for modeling relationships and dependencies in complex systems. GNNs have the inherent ability to learn and reason about graph-structured data, enabling diverse applications. In this section, we first explained the passing mechanism of GNN ("Message Passing Mechanism in Graph Neural Network Section"), then described graphs related to the structure of graphs, graph types, and graph learning styles ("Description of GNN Taxonomy" Section).

Message passing mechanism in graph neural network

Graph symmetries are maintained using a GNN, an optimizable transformation on all graph properties (nodes, edges, and global context) (permutation invariances). Because a GNN does not alter the connectivity of the input graph, the output may be characterized using the same adjacency list and feature vector count as the input graph. However, the output graph has updated embeddings because the GNN modified each node, edge, and global-context representation.

In Fig. 6, circles are nodes, and empty boxes show aggregation of neighbor/adjacent nodes. The model aggregates messages from A's local graph neighbors (i.e., B, C, and D). In turn, the messages coming from neighbors are based on information aggregated from their respective neighborhoods, and so on. This visualization shows a two-layer version of a message-passing model. Notice that the computation graph of the GNN forms a tree structure by unfolding the neighborhood around the target node [17]. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs [30].

Fig. 6
figure 6

How a single node aggregates messages from its adjacent neighbor nodes

The message-passing mechanism of Graph Neural Networks is shown in Fig. 7. In this, we take an input graph with a set of node features X ε Rd⇥|V| and Use this knowledge to produce node embeddings zu. However, we will also review how the GNN framework may embed subgraphs and whole graphs.

Fig. 7
figure 7

Message passing mechanism in GNN

At each iteration, each node collects information from the neighborhood around it. Each node embedding has more data from distant reaches of the graph as these iterations progress. After the first iteration (k = 1), each node embedding expressly retains information from its 1-hop neighborhood, which may be accessed via a path in the length graph 1. [31]. After the second iteration (k = 2), each node embedding contains data from its 2-hop neighborhood; generally, after k iterations, each node embedding includes data from its k-hop setting. The kind of “information” this message passes consists of two main parts: structural information about the graph (i.e., degree of nodes, etc.), and the other is feature-based.

In the message-passing mechanism of a neural network, each node has its message stored in the form of feature vectors, and each time, the neighbor updates the information in the form of the feature vector [1]. This process aggregates the information, which means the grey node is connected to the blue node. Both features are aggregated and form new feature vectors by updating the values to include the new message.

$${h}_{u}^{\left(k+1\right)}={UPDATE}^{\left(k\right)}\left({h}_{u}^{\left(k\right)}, {AGGREGATE}^{\left(k\right)}\left(\left\{{h}_{v}^{\left(k\right)}, {\forall }_{v} \in N\left(u\right)\right\}\right)\right)$$
(4.1)
$$= {UPDATE}^{\left(k\right)}\left({h}_{u}^{\left(k\right)},{m}_{N\left(u\right)}^{(k)}\right)$$
(4.2)

Equations 4.1 and 4.2 shows that h denotes the message, u represents the node number, and k indicates the iteration number. Where AGGREGATE and UPDATE are arbitrarily differentiable functions (i.e., neural networks), and mN(u) is the “message,” which is aggregated from u's graph neighborhood N(u). We employ superscripts to identify the embeddings and functions at various message-passing iterations. The AGGREGATE function receives as input the set of embeddings of the nodes in the u's graph neighborhood N (u) at each iteration k of the GNN and generates a message. \({m}_{N(u)}^{k}\). Based on this aggregated neighborhood information. The update function first UPDATES the message and then combines the message.\({m}_{N(u)}^{k}\) with the previous message \({h}_{u}^{(k-1)}\) of node, u to generate the updated message \({h}_{u}^{k}\).

Description of GNN taxonomy

We can see from Fig. 8 below shows that we have divided our GNN taxonomy into 3 parts [30].

Fig. 8
figure 8

Graph Neural Network Taxonomy

1. Graph Structures 2. Graph Types 3. Graph Learning Tasks

Graph structure

The two scenarios shown in Fig. 9 typically present are structural and non-structural. Applications involving molecular and physical systems, knowledge graphs, and other objects explicitly state the graph structure in structural contexts.

Fig. 9
figure 9

Graph Structure

Graphs are implicit in non-structural situations. As a result, we must first construct the graph from the current task. For text, we must build a fully connected “a word” graph and a scene graph for images.

Graph types

There may be more information about nodes and links in complex graph types. Graphs are typically divided into 5 categories, as shown in Fig. 10.

Fig. 10
figure 10

Types of Graphs

Directed/undirected graphs

A directed graph is characterized by edges with a specific direction, indicating the flow from one node to another. Conversely, in an undirected graph, the edges lack a designated direction, allowing nodes to interact bidirectionally. As illustrated in Fig. 11 (left side), the directed graph exhibits directed edges, while in Fig. 11 (right side), the undirected graph conspicuously lacks directional edges. In undirected graphs, it's important to note that each edge can be considered to comprise two directed edges, allowing for mutual interaction between connected nodes.

Fig. 11
figure 11

Directed/Undirected Graph

Static/dynamic graphs

The term “dynamic graph” pertains to a graph in which the properties or structure of the graph change with time. In dynamic graphs shown in Fig. 12, it is essential to account for the temporal dimension appropriately. These dynamic graphs represent time-dependent events, such as the addition and removal of nodes and edges, typically presented as an ordered sequence or an asynchronous stream.

A noteworthy example of a dynamic graph can be observed in social networks like Twitter. In such networks, a new node is created each time a new user joins, and when a user follows another individual, a following edge is established. Furthermore, when users update their profiles, the respective nodes are also modified, reflecting the evolving nature of the graph. It's worth noting that different deep-learning libraries handle graph dynamics differently. TensorFlow, for instance, employs a static graph, while PyTorch utilizes a dynamic graph.

Fig. 12
figure 12

Static/Dynamic Graph

Homogeneous/heterogeneous graphs

Homogeneous graphs have only one type of node and one type of edge shown in Fig. 13 (Left). A homogeneous graph is one with the same type of nodes and edges, such as an online social network with friendship as edges and nodes representing people. In homogeneous networks, nodes and edges have the same types.

Heterogeneous graphs shown in Fig. 13 (Right) , however, have two or more different kinds of nodes and edges. A heterogeneous network is an online social network with various edges between nodes of the ‘person’ type, such as ‘friendship’ and ‘co-worker.’ Nodes and edges in heterogeneous graphs come in several varieties. Types of nodes and edges play critical functions in heterogeneous networks that require further consideration.

Fig. 13
figure 13

Homogeneous (Left), Heterogeneous (Right) Graph

Knowledge graphs

An array of triples in the form of (h, r, t) or (s, r, o) can be represented as a Knowledge Graph (KG), which is a network of entity nodes and relationship edges, with each triple (h, r, t) representing a single entity node. The relationship between an entity’s head (h) and tail (t) is denoted by the r. Knowledge Graph can be considered a heterogeneous graph from this perspective. The Knowledge Graph visually depicts several real-world objects and their relationships [32]. It can be used for many new aspects, including information retrieval, knowledge-guided innovation, and answering questions [30]. Entities are objects or things that exist in the real world, including individuals, organizations, places, music tracks, movies, and people. Each relation type describes a particular relationship between various elements similarly. We can see from Fig. 14 the Knowledge graph for Mr. Sundar Pichai.

Fig. 14
figure 14

Knowledge graph

Transductive/inductive graphs

In a transductive scenario shown in Fig. 15 (up), the entire graph is input, the label of the valid data is hidden, and finally, the label for the correct data is predicted. However, with an inductive graph shown in Fig. 15 (down), we also input the entire graph (but only sample to batch), mask the valid data’s label, and forecast the valuable data’s label. The model must forecast the labels of the given unlabeled nodes in a transductive context. In the inductive situation, it is possible to infer new unlabeled nodes from the same distribution.

Fig. 15
figure 15

Transductive/Inductive Graphs

Transductive Graph:

  • In the transductive approach, the entire graph is provided as input.

  • This method involves concealing the labels of the valid data.

  • The primary objective is to predict the labels for the valid data.

Inductive Graph:

  • The inductive approach still uses the complete graph, but only a sample within a batch is considered.

  • A crucial step in this process is masking the labels of the valid data.

  • The key aim here is to make predictions for the labels of the valid data.

Graph learning tasks

We perform three tasks with graphs: node classification, link prediction, and Graph Classification shown in Fig. 16.

Fig. 16
figure 16

Node Level Prediction (e.g., social network) (LEFT), Edge Level Prediction (e.g., Next YouTube Video?) (MIDDLE), Graph Level Prediction (e.g., molecule) (Right)

Node-level task

Node-level tasks are primarily concerned with determining the identity or function of each node within a graph. The core objective of a node-level task is to predict specific properties associated with individual nodes. For example, a node-level task in social networks could involve predicting which social group a new member is likely to join based on their connections and the characteristics of their friends' memberships. Node-level tasks are typically used when working with unlabeled data, such as identifying whether a particular individual is a smoker.

Edge-level task (link prediction)

Edge-level tasks revolve around analyzing relationships between pairs of nodes in a graph. An illustrative application of an edge-level task is assessing the compatibility or likelihood of a connection between two entities, as seen in matchmaking or dating apps. Another instance of an edge-level task is evident when using platforms like Netflix, where the task involves predicting the following video to be recommended based on viewing history and user preferences.

Graph-level

In graph-level tasks, the objective is to make predictions about a characteristic or property that encompasses the entire graph. For example, using a graph-based representation, one might aim to predict attributes like the olfactory quality of a molecule or its potential to bind with a disease-associated receptor. The essence of a graph-level task is to provide predictions that pertain to the graph as a whole. For instance, when assessing a newly synthesized chemical compound, a graph-level task might seek to determine whether the molecule has the potential to be an effective drug. The summary of all three learning tasks are shown in Fig. 17.

Fig. 17
figure 17

Graph Learning Tasks Summary

GNN models and comparative analysis of GNN models

Graph Neural Network (GNN) models represent a category of neural networks specially crafted to process data organized in graph structures. They've garnered substantial acclaim across various domains, primarily due to their exceptional capability to grasp intricate relationships and patterns within graph data. As illustrated in Fig. 18, we've outlined three distinct GNN models. A comprehensive description of these GNN models, specifically Graph Convolutional Networks (GCN), Graph Attention Networks (GAT/GAN), and GraphSAGE models can be found in the reference [33]. In Sect. "GNN models", we delve into these GNN models' intricacies; in "Comparative Study of GNN Models" section, we provide an in-depth analysis that explores their theoretical and practical aspects.

Fig. 18
figure 18

GNN Models

GNN models

Graph convolution neural network (GCN)

GCN is one of the basic graph neural network variants. Thomas Kipf and Max Welling developed GCN networks. Convolution layers in Convolutional Neural Networks are essentially the same process as 'convolution' in GCNs. The input neurons are multiplied by weights called filters or kernels. The filters act as a sliding window across the image, allowing CNN to learn information from nearby cells. Weight sharing uses the same filter within the same layer throughout the image; when CNN is used to identify photos of cats vs. non-cats, the same filter is employed in the same layer to detect the cat's nose and ears. Throughout the image, the same weight (or kernel or filter in CNNs) is applied [33]. GCNs were first introduced in “Spectral Networks and Deep Locally Connected Networks on Graphs” [34].

GCNs, which learn features by analyzing neighboring nodes, carry out similar behaviors. The primary difference between CNNs and GNNs is that CNNs are made to operate on regular (Euclidean) ordered data. GNNs, on the other hand, are a generalized version of CNNs with different numbers of node connections and unordered nodes (irregular on non-Euclidean structured data). GCNs have been applied to solve many problems, for example, image classification [35], traffic forecasting [36], recommendation systems [17], scene graph generation [37], and visual question answering [38].

GCNs are particularly well-suited for tasks that involve data represented as graphs, such as social networks, citation networks, recommendation systems, and more. These networks are an extension of traditional CNNs, widely used for tasks involving grid-like data, such as images. The key idea behind GCNs is to perform convolution operations on the graph data. This enables them to capture and propagate information through the nodes in a graph by considering both a node’s features and those of its neighboring nodes. GCNs typically consist of several layers, each performing convolution and aggregation steps to refine the node representations in the graph. By applying these layers iteratively, GCNs can capture complex patterns and dependencies within the graph data.

Working of graph convolutional network

A Graph Convolutional Network (GCN) is a type of neural network architecture designed for processing and analyzing graph-structured data. GCNs work by aggregating and propagating information through the nodes in a graph. GCN works with the following steps shown in Fig. 19:

  1. 1)

    Initialization:

Fig. 19
figure 19

Working of GCN

Each node in the graph is associated with a feature vector. Depending on the application, these feature vectors can represent various attributes or characteristics of the nodes. For example, in a social network, each node might represent a user, and the features could include user profile information.

  1. 2)

    Convolution Operation:

The core of a GCN is the convolution operation, which is adapted from convolutional neural networks (CNNs). It aims to aggregate information from neighboring nodes. This is done by taking a weighted sum of the feature vectors of neighboring nodes. The graph's adjacency matrix determines the weights. The resulting aggregated information is a new feature vector for each node.

  1. 3)

    Weighted Aggregation:

The graph's adjacency matrix, typically after normalization, provides weights for the aggregation process. In this context, for a given node, the features of its neighboring nodes are scaled by the corresponding values within the adjacency matrix, and the outcomes are then accumulated. A precise mathematical elucidation of this aggregation step is described in "Equation of GCN" section.

  1. 4)

    Activation function and learning weights:

The aggregated features are typically passed through an activation function (e.g., ReLU) to introduce non-linearity. The weight matrix W used in the aggregation step is learned during training. This learning process allows the GCN to adapt to the specific graph and task it is designed for.

  1. 5)

    Stacking Layers:

GCNs are often used in multiple layers. This allows the network to capture more complex relationships and higher-level features in the graph. The output of one GCN layer becomes the input for the next, and this process is repeated for a predefined number of layers.

  1. 6)

    Task-Specific Output:

The final output of the GCN can be used for various graph-based tasks, such as node classification, link prediction, or graph classification, depending on the specific application.

Equation of GCN

The Graph Convolutional Network (GCN) is based on a message-passing mechanism that can be described using mathematical equations. The core equation of a superficial, first-order GCN layer can be expressed as follows: For a graph with N nodes, let's define the following terms:

Equation 5.1 depicts a GCN layer's design. The normalized graph adjacency matrix A' and the nodes feature matrix F serve as the layer's inputs. The bias vector b and the weight matrix W are trainable parameters for the layer.

$${\text{Z}}=\upsigma ({{\text{A}}}^{\mathrm{^{\prime}}}\mathrm{ F W}+\mathrm{ b })$$
(5.1)

When used with the design matrix, the normalized adjacency matrix effectively smoothes a node’s feature vector based on the feature vectors of its close graph neighbors. This matrix captures the graph structure. A’ is normalized to make each neighboring node’s contribution proportional to the network's connectivity.

The layer definition is finished by applying A'FW + b to an element-wise non-linear function, such as ReLU. The downstream node classification task requires deep neural architectures to learn a complicated hierarchy of node attributes. This layer's output matrix Z can be routed into another GCN layer or any other neural network layer to do this.

Summary of graph convolution neural network (GCN) is shown in Table 2.
Table 2 Summary of Graph Convolution Network with the technique used, datasets used, and performance measure (accuracy)

Graph attention network (GAT/GAN)

Graph Attention Network (GAT/GAN) is a new neural network that works with graph-structured data. It uses masked self-attentional layers to address the shortcomings of past methods that depended on graph convolutions or their approximations. By stacking layers, the process makes it possible (implicitly) to assign various nodes in a neighborhood different weights, allowing nodes to focus on the characteristics of their neighborhoods without having to perform an expensive matrix operation (like inversion) or rely on prior knowledge of the graph's structure. GAT concurrently tackles numerous significant limitations of spectral-based graph neural networks, making the model suitable for both inductive and transductive applications.

Working of GAT

The Graph Attention Network (GAT) is a neural network architecture designed for processing and analyzing graph-structured data shown in Fig. 20. GATs are a variation of Graph Convolutional Networks (GCNs) that incorporate the concept of attention mechanisms. GAT/GAN works with the following steps shown in Fig. 21.

  1. 1)

    Initialization:

Fig. 20
figure 20

Working of GAT

Fig. 21
figure 21

How attention Coefficients updates

As with other graph-based models, GAT starts with nodes in the graph, each associated with a feature vector. These features can represent various characteristics of the nodes.

  1. 2)

    Self-Attention Mechanism and Attention Computation:

GAT introduces an attention mechanism similar to what is used in sequence-to-sequence models in natural language processing. The attention mechanism allows each node to focus on different neighbors when aggregating information. It assigns different attention coefficients to the neighboring nodes, making the process more flexible. For each node in the graph, GAT computes attention scores for its neighboring nodes. These attention scores are based on the features of the central node and its neighbors. The attention scores are calculated using a weighted sum of the features of the central node and its neighbors.

  1. 3)

    Weighted Aggregation:

The attention scores determine how much each neighbor’s feature contributes to the aggregation for the central node. This weighted aggregation is carried out for all neighboring nodes, resulting in a new feature vector for the central node.

  1. 4)

    Multiple Attention Heads and Output Combination:

GAT often employs multiple attention heads in parallel. Each attention head computes its attention scores and aggregation results. These multiple attention heads capture different aspects of the relationships in the graph. The outputs from the multiple attention heads are combined, typically by concatenation or averaging, to create a final feature vector for each node.

  1. 5)

    Learning Weights and Stacking Layers:

Similar to GCNs, GATs learn weight parameters during training. These weights are learned to optimize the attention mechanisms and adapt to the specific graph and task. GATs can be used in multiple layers to capture higher-level features and complex relationships in the graph. The output of one GAT layer becomes the input for the next layer.

The learning weights capture the importance of node relationships and contribute to information aggregation during the neighborhood aggregation process. The learning process in GNNs also relies on backpropagation and optimization algorithms. The stacking of GNN layers enables the model to capture higher-level abstractions and dependencies in the graph. Each layer refines the node representations based on information from the previous layer.

  1. 6)

    Task-Specific Output:

The final output of the GAT can be used for various graph-based tasks, such as node classification, link prediction, or graph classification, depending on the application.

Equation for GAT

GAT’s main distinctive feature is gathering data from the one-hop neighborhood [30]. A graph convolution operation in GCN produces the normalized sum of node properties of neighbors. Equation 5.2 shows the Graph attention network, which \({h}_{i}^{(l+1)}\) defines the current node output, \(\sigma\) denotes the non-linearity ReLU function, \(j\varepsilon N\left(i\right)\) one hop neighbor, \({\complement }_{i,j}\) normalized vector, \({W}^{\left(l\right)}\) weight matrix, and \({h}_{j}^{(l)}\) denotes the previous node.

$${h}_{i}^{\left(l+1\right)} = \sigma \left(\sum_{j\varepsilon N\left(i\right)}\frac{1}{{\complement }_{i,j}}{W}^{\left(l\right)}{h}_{j}^{\left(l\right)}\right)$$
(5.2)
Why is GAT better than GCN?

We learned from the Graph Convolutional Network (GCN) that integrating local graph structure and node-level features results in good node classification performance. The way GCN aggregates messages, on the other hand, is structure-dependent, which may limit its use.

How attention coefficients update: the attention layer has 4 parts: [47]
  1. 1)

    A linear transformation: A shared linear transformation is applied to each node in the following Equation.

$${{\text{z}}}_{{\text{i}}}^{\left({\text{l}}\right)}= {{\text{W}}}^{({\text{l}})} . {{\text{h}}}_{{\text{i}}}^{\left({\text{l}}\right)}$$
(5.3)

where h is a set of node features. W is the weight matrix. Z is the output layer node.

  1. 2)

    Attention Coefficients: In the GAT paradigm, it is crucial because every node can now attend to every other node, discarding any structural information. The pair-wise un-normalized attention score between two neighbors is computed in the next step. It combines the 'z' embeddings of the two nodes. Where || stands for concatenation, a learnable weight vector a(l) is put through a dot product, and a LeakyReLU is used [1]. Contrary to the dot-product attention utilized in the Transformer model, this kind of attention is called additive attention. The nodes are subsequently subjected to self-attention.

$${\text{e}}_{{{\text{ij}}}}^{{\left( {\text{l}} \right)}} = {\text{LeakyReLU}}\left( {{\text{a}}^{ \to } { }^{{\left( {\text{l}} \right)^{{\text{T}}} }} \left( {{\text{z}}_{{\text{i}}}^{{\left( {\text{l}} \right)}} { }||{\text{ z}}_{{\text{j}}}^{{\left( {\text{l}} \right)}} } \right)} \right)$$
(5.4)
  1. 3)

    Softmax: We utilize the softmax function to normalize the coefficients over all j values, improving their comparability across nodes.

$${\propto }_{{\text{ij}}}^{({\text{l}})}= \frac{\mathrm{ exp}\left({\mathrm{ e}}_{\mathrm{ij }}^{({\text{l}})}\right)}{{\sum }_{{\text{k}}\in {\text{N}}({\text{i}})}{\text{exp}}\left({\mathrm{ e}}_{\mathrm{ik }}^{({\text{l}})}\right)}$$
(5.5)
  1. 4)

    Aggregation: This process is comparable to GCN. The neighborhood embeddings are combined and scaled based on the attention scores.

$${h}_{i}^{\left(l+1\right)} = \sigma \left(\sum_{j\in N\left(i\right)}{{\propto }_{ij}^{\left(l\right)}z}_{j}^{\left(l\right)}\right)$$
(5.6)
Summary of graph attention network (GAT) is shown in Table 3.
Table 3 Summary of Graph Attention Network with Application area, technique, datasets used, and performance measure (accuracy)

GraphSAGE

GraphSAGE represents a tangible realization of an inductive learning framework shown in Fig. 22. It exclusively considers training samples linked to the training set's edges during training. This process consists of two main steps: “Sampling” and “Aggregation.” Subsequently, the node representation vector is paired with the vector from the aggregated model and passed through a fully connected layer with a non-linear activation function. It's important to note that each network layer shares a standard aggregator and weight matrix. Thus, the consideration should be on the number of layers or weight matrices rather than the number of aggregators. Finally, a normalization step is applied to the layer's output.

Two major steps:

  1. 1.

    Sample It describes how to sample a large number of neighbors.

  2. 2.

    Aggregator refers to obtaining the neighbor node embedding and then determining how to collect these embeddings and change your embedding information.

Fig. 22
figure 22

Working of Graph SAGE Method

Working of graphSAGE model:
  1. 1.

    First, initializes the eigenvectors of all nodes in the input graph

  2. 2.

    For each node, get its sampled neighbor nodes

  3. 3.

    The aggregation function is used to aggregate the information of neighbor nodes

  4. 4.

    And combined with embedding, Update the same by a non-linear transformation embedding Express.

Types of aggregators

In the GraphSAGE method, 4 types of Aggregators are used.

  1. 1)

    Simple neighborhood aggregator:

$${h}_{v}^{k}=\sigma \left({W}_{k}\sum_{u \in N\left(v\right)}\frac{{h}_{u}^{k-1}}{N\left(v\right)} + {B}_{k}{h}_{v}^{k-1}\right)$$
(5.7)
  1. 2)

    Mean aggregator

$${h}_{v}^{k}\leftarrow \sigma (W\cdot MEAN\left(\left\{{h}_{i}^{K-1}\right\}\cup \left\{{h}_{u}^{K-1},{\forall }_{u}\in N\left(v\right)\right\}\right)$$
(5.8)
  1. 3)

    LSTM Aggregator: Applies LSTM to a random permutation of neighbors.

  2. 4)

    Pooling Aggregator: It applies a symmetric vector function and converts adjacent vectors.

    $${{\text{AGGREGATE}}}_{{\text{k}}}^{{\text{pool}}}={\text{max}}\left(\left\{\upsigma \left({{\text{W}}}_{{\text{pool}}}{{\text{h}}}_{{\text{ui}}}^{{\text{k}}}+{\text{b}}\right),{\forall }_{{\text{ui}}}\in {\text{N}}\left({\text{v}}\right)\right\}\right)$$
    (5.9)
Equation of graphSAGE
$${h}_{v}^{k}= \sigma \left(\left[{W}_{k} \cdot AGG\left(\left\{{h}_{u}^{k-1} , {\forall }_{u} \in N\left(v\right)\right\}\right) , {B}_{k}{h}_{v}^{k-1}\right]\right)$$
(5.10)

Here,

Wk, Bk : is learnable weight matrices.

\({W}_{k}{B}_{k}=\) is learnable wight matrices.

\({h}_{v}^{0}= {x}_{v}:initial 0-\) the layer embeddings are equal to node features.

\({h}_{u}^{k-1}=\) Generalized Aggregation.

\({z}_{v }= {h}_{v}^{k}n\): embedding after k layers of neighborhood aggregation.

\(\sigma\)– non linearity (ReLU).

Summary of graphSAGE is shown in Table 4.
Table 4 Summary of GraphSAGE Network with Application area, technique, datasets used, and performance measure (accuracy)

Comparative study of GNN models

Comparison based on practical implementation of GNN models

Table 5 describes the dataset statistics for different datasets used in literature for graph type of input. The datasets are CORA, Citeseer, and Pubmed. These statistics provide information about the kind of dataset, the number of nodes and edges, the number of classes, the number of features, and the label rate for each dataset. These details are essential for understanding the characteristics and scale of the datasets used in the context of citation networks. Comparison of the GNN model with equation in shown in Fig. 23.

Fig. 23
figure 23

Equations of GNN Models

Table 5 Different Dataset Statistics of Citation Network [33]

Table 6 shows the performance results of different Graph Neural Network (GNN) models on various datasets. Table 6 provides accuracy scores for other GNN models on different datasets. Additionally, the time taken for some models to compute results is indicated in seconds. This information is crucial for evaluating the performance of these models on specific datasets.

Table 6 Performance metrics of different models with different datasets [33, 47, 48]

Comparison based on theoretical concepts of GNN models are described in Table 7.

Table 7 Comparison of the GNN model with Advantages, Disadvantages, and application areas: [30]

Graph neural network applications

Graph construction

Graph Neural Networks (GNNs) have a wide range of applications spanning diverse domains, which encompass modern recommender systems, computer vision, natural language processing, program analysis, software mining, bioinformatics, anomaly detection, and urban intelligence, among others. The fundamental prerequisite for GNN utilization is the transformation or representation of input data into a graph-like structure. In the realm of graph representation learning, GNNs excel in acquiring essential node or graph embeddings that serve as a crucial foundation for subsequent tasks [61].

The construction of a graph involves a two-fold process:

  1. 1)

    Graph creation and

  2. 2)

    Learning about graph representations

  3. 3)

    Graph Creation: The generation of graphs is essential for depicting the intricate relationships embedded within diverse incoming data. With the varied nature of input data, various applications adopt techniques to create meaningful graphs. This process is indispensable for effectively communicating the structural nuances of the data, ensuring the nodes and edges convey their semantic significance, particularly tailored to the specific task at hand.

  4. 4)

    Learning about graph representations: The subsequent phase involves utilizing the graph expression acquired from the input data. In GNN-based Learning for graph representations, some studies employ well-established GNN models like GraphSAGE, GCN, GAT, and GGNN, which offer versatility for various application tasks. However, when faced with specific tasks, it may be necessary to customize the GNN architecture to address particular challenges more effectively.

The different application which is considered a graph

  1. 1)

    Molecular Graphs: Atoms and electrons serve as the basic building blocks of matter and molecules, organized in three-dimensional structures. While all particles interact, we primarily acknowledge a covalent connection between two stable atoms when they are sufficiently spaced apart. Various atom-to-atom bond configurations exist, including single and double bonds. This three-dimensional arrangement is conveniently and commonly represented as a graph, with atoms representing nodes and covalent bonds representing edges [62].

  2. 2)

    Graphs of social networks: These networks are helpful research tools for identifying trends in the collective behavior of individuals, groups, and organizations. We may create a graph that represents groupings of people by visualizing individuals as nodes and their connections as edges [63].

  3. 3)

    Citation networks as graphs: When they publish papers, scientists regularly reference the work of other scientists. Each manuscript can be visualized as a node in a graph of these citation networks, with each directed edge denoting a citation from one publication to another. Additionally, we can include details about each document in each node, such as an abstract's word embedding [64].

  4. 4)

    Within computer vision: We may want to tag certain things in visual scenes. Then, we can construct graphs by treating these things as nodes and their connections as edges.

GNNs are used to model data as graphs, allowing for the capture of complex relationships and dependencies that traditional machine learning models may struggle to represent. This makes GNNs a valuable tool for tasks where data has an inherent graph structure or where modeling relationships is crucial for accurate predictions and analysis.

Graph neural networks (GNNs) applications in different fields

NLP (natural language processing)

  1. a)

    Document Classification: GNNs can be used to model the relationships between words or sentences in documents, allowing for improved document classification and information retrieval.

  2. b)

    Text Generation: GNNs can assist in generating coherent and contextually relevant text by capturing dependencies between words or phrases.

  3. c)

    Question Answering: GNNs can help in question-answering tasks by representing the relationships between question words and candidate answers within a knowledge graph.

  4. d)

    Sentiment Analysis: GNNs can capture contextual information and sentiment dependencies in text, improving sentiment analysis tasks.

Computer vision

  1. a)

    Image Segmentation: GNNs can be employed for pixel-level image segmentation tasks by modeling relationships between adjacent pixels as a graph.

  2. b)

    Object Detection: GNNs can assist in object detection by capturing contextual information and relationships between objects in images.

  3. c)

    Scene Understanding: GNNs are used for understanding complex scenes and modeling spatial relationships between objects in an image.

Bioinformatics

  1. a)

    Protein-Protein Interaction Prediction: GNNs can be applied to predict interactions between proteins in biological networks, aiding in drug discovery and understanding disease mechanisms.

  2. b)

    Genomic Sequence Analysis: GNNs can model relationships between genes or genetic sequences, helping in gene expression prediction and sequence classification tasks.

  3. c)

    Drug Discovery: GNNs can be used for drug-target interaction prediction and molecular property prediction, which is vital in pharmaceutical research.

Table 8 offers a concise overview of various research papers that utilize Graph Neural Networks (GNNs) in diverse domains, showcasing the applications and contributions of GNNs in each study.

Table 8 Different application areas with their proposed methodology of Graph Neural Networks

Table 9 highlights various applications of GNNs in Natural Language Processing, Computer Vision, and Bioinformatics domains, showcasing how GNN models are adapted and used for specific tasks within each field.

Table 9 Different Domains with their Tasks in Graph Neural Networks

Future directions of graph neural network

The contribution of the existing literature to GNN principles, models, datasets, applications, etc., was the main emphasis of this survey. In this section, several potential future study directions are suggested. Significant challenges have been noted, including unbalanced datasets, the effectiveness of current methods, text classification, etc. We have also looked at the remedies to address these problems. We have suggested future and advanced directions to address these difficulties regarding domain adaptation, data augmentation, and improved classification. Table 10 displays future directions.

  1. 1)

    Imbalanced Datasets—Limited labeled data, domain-dependent data, and imbalanced data are currently issues with available datasets. Transfer learning and domain adaptation are solutions to these issues.

  2. 2)

    Accuracy of Existing Systems/Models—can utilize deep learning models such as GCN, GAT, and GraphSAGE approaches to increase the efficiency and precision of current systems. Additionally, training models on sizable, domain-specific datasets can enhance performance.

  3. 3)

    Enhancing Text Classification: Text classification poses another significant challenge, which is effectively addressed by leveraging advanced deep learning methodologies like graph neural networks, contributing to the improvement of text classification accuracy and performance.

Table 10 A list of research gaps and future research directions

The above Table 10 describes the research gaps and future directions presented in the above literature. These research gaps and future directions highlight the challenges and proposed solutions in the field of text classification and structural analysis.

Table 11 provides an overview of different research papers, their publication years, the applications they address, the graph structures they use, the graph types, the graph tasks, and the specific Graph Neural Network (GNN) models utilized in each study.

Table 11 Summary of Graph Neural Networks with application area, graph structure, type, task, and model used

Conclusions

Graph Neural Networks (GNNs) have witnessed rapid advancements in addressing the unique challenges presented by data structured as graphs, a domain where conventional deep learning techniques, originally designed for images and text, often struggle to provide meaningful insights. GNNs offer a powerful and intuitive approach that finds broad utility in applications relying on graph structures. This comprehensive survey on GNNs offers an in-depth analysis covering critical aspects such as GNN fundamentals, the interplay with convolutional neural networks, GNN message-passing mechanisms, diverse GNN models, practical use cases, and a forward-looking perspective. Our central focus is on elucidating the foundational characteristics of GNNs, a field teeming with contemporary applications that continually enhance our comprehension and utilization of this technology.

The continuous evolution of GNN-based research has underscored the growing need to address issues related to graph analysis, which we aptly refer to as the frontiers of GNNs. In our exploration, we delve into several crucial recent research domains within the realm of GNNs, encompassing areas like link prediction, graph generation, and graph categorization, among others.