NetworKit is implemented as a hybrid of performance-aware code written in C++ (often parallelized using OpenMP) with an interface and additional functionality written in Python. More details and an illustration are provided in the Architecture Section below. NetworKit is distributed as a Python package, ready to use interactively from a Python shell, which is the main usage scenario we envision for domain experts. If you want to know more about our design goals, then take a look at our Design Goals and Principles Section below.
The best way to get an overall picture of a network is to use the Profiling module. Take a look at the Network Profiling Section below. If you are only interested in a small subset of network analysis measures, it might be more convenient to compute them separately instead of using the Profiling module. Check out the Network Analytics Section to get an overview of the most important measures NetworKit supports.
NetworKit also comes with several community detection algorithms that reveal insights into the community structure of a network. For the generation of synthetic networks with specific properties, the toolkit provides several graph generators.
A good (albeit in some parts slightly outdated) introduction to NetworKit and its features is given in the following video.
Please note that the video is more than two years old and is therefore slightly outdated in some parts.
To see the most important features of a network at a glance, NetworKit provides the Profiling module. The module assembles many algorithms into one program, automates analysis tasks and produces a graphical report to be displayed in the Jupyter Notebook or exported to an HTML or LATEX report document. For example, the following is a an excerpt of the Profiling report for the graph MIT8.edgelist (available in the NetworKit repository):
For detailed instructions on how to use the Profiling module take a look at our Profiling Notebook.
NetworKit provides a rich set of network analysis methods. For details on the supported measures take a look at the Technical Report.
Empirically observed complex networks tend to show a heavy tailed degree distribution which follow a power-law with a characteristic exponent. NetworKit provides functions to analyze the degree distribution of a network. For details visit the Degree Distribution Section of the User Guide. The algorithm runs in \(O(n)\).
Degree assortativity measures how well nodes with similar node degrees are connected to each other which can point to important aspects such as a hierarchical network composition. In NetworKit, we implemented Newman’s formulation in linear time (\(O(m)\)) and constant memory requirements.
The diameter of a graph is the maximum length of a shortest path between any two nodes. Many real-world complex networks have a very small and often constant diameter. NetworKit provides a function to calculate the exact diameter as well as several approximation algorithms for large networks. The exact algorithm runs in \(O(n*(n+m))\) or \(O(n*(n*log(n) + m))\) if the network is weighted, where \(n\) and \(m\) are defined as number of nodes and edges respectively.
Clustering coefficients are key figures for the amount of transitivity in networks. NetworKit provides functions for both the global clustering coefficient as well as the local clustering coefficient. NetworKit implements the wedge sampling approximation algorithm. It runs in essentially linear or even constant time, depending on the respective measure. For details on the usage visit the Clustering Coefficient Section of the User Guide.
We compute connected components in linear time using a parallel label propagation scheme in which each node adopts the maximum label in its neighborhood. Take a look at the Connected Components Section in the User Guide.
The core decomposition algorithm implemented in NetworKit uses a bucket data structure for managing remaining node degrees and has a running time which is linear in the number of edges. Visit the Core Decomposition Section of the User Guide for usage details.
Centrality refers to the relative importance of a node or edge within a network. We distribute efficient implementations for betweenness, closeness, degree, Katz, eigenvector centrality and PageRank.
Community detection is the task of identifying groups of nodes in the network which are significantly more densely connected among each other than to the rest of the nodes. Faced with an NP-hard optimization problem, we engineered parallel heuristics which deliver a good tradeoff between quality and running time.
Generative models aim to explain how networks form and evolve specific structural features. Such models and their implementations as generators have at least two important uses: On the one hand, software engineers want generators for synthetic datasets which can be arbitrarily scaled and produce graphs which resemble the real application data. On the other hand, network scientists employ models to increase their understanding of network phenomena. So far, NetworKit provides efficient generators for the following models:
As a Python module, NetworKit enables seamless integration with Python libraries for scientific computing and data analysis, e.g.
pandas for dataframe processing and analytics,
matplotlib for plotting,
scipy for numerical and scientific computing and
networkx for additional network analysis tasks.
Furthermore, NetworKit provides functions to convert graph objects to NetworkX and thereby connects the two modules. One can also use some of the numerous NetworkX functions by importing NetworkX. This opens up a wide range of possibilities which are not yet or will never be implemented within NetworKit. Note however that NetworkX is written mostly in pure Python, its data structures are more memory-intensive and its algorithms do not target very large graphs. You are likely to reach limits of your machine for graphs with millions of edges, while NetworKit aims for good performance for three more orders of magnitude.
With the hybrid approach, we are able to combine the performance of C++ with the easy and interactive environment of Python and Jupyter Notebook. We provide a Python package that can be installed easily via pip (see Install NetworKit via Pip). This makes it very easy to start working with NetworKit interactively. However, the code can also be used as a library for application programming, either at the Python or the C++ level. Throughout the project we use object-oriented and functional concepts. On the C++ level, we make extensive use of closures, using the lambda syntax introduced with C++11. Shared-memory parallelism is realized with OpenMP, providing loop parallelization and synchronization constructs while abstracting away the details of thread creation and handling.
Connecting these native implementations to the Python world is enabled by the Cython toolchain. Among other things, Cython can compile pure Python code to C or C++, circumventing the Python interpreter, and also allows for static type annotations – yielding considerable speedup in combination. Currently we use Cython merely to integrate native code by compiling it into a native Python extension module. As a benefit of Python integration, NetworKit’s functionality can be accessed interactively. Thus, analysis kernels can be freely combined. Furthermore, NetworKit can be seamlessly integrated into the rich Python ecosystem for data analysis. We consider this kind of integration crucial for real-world data analysis workflows.
There is a variety of software packages which provide graph algorithms in general and network analysis capabilities in particular. However, NetworKit aims to balance a specific combination of strengths: