NetworKit is a growing open-source toolkit for large-scale network analysis. Its aim is to provide tools for the analysis of large networks in the size range from thousands to billions of edges. For this purpose, it implements efficient graph algorithms, many of them parallel to utilize multicore architectures. These are meant to compute standard measures of network analysis, such as degree sequences, clustering coefficients, and centrality measures. In this respect, NetworKit is comparable to packages such as NetworkX, albeit with a focus on parallelism and scalability. NetworKit is also a testbed for algorithm engineering and contains novel algorithms from recently published research (see list of Publications).

NetworKit is a Python module. Performance-aware algorithms are written in C++ (often using OpenMP for shared-memory parallelism) and exposed to Python via the Cython toolchain. Python in turn gives us the ability to work interactively and with a rich environment of tools for data analysis. Furthermore, NetworKit’s core can be built and used as a native library.

Clone from GitHub
Install via pip3
Download the Technical Report
Mailing List
Main Design Goals

Interactive Workflow

NetworKit takes inspiration from other software like R, MATLAB or Mathematica and provides an interactive shell via Python. This allows users to freely combine functions from NetworKit and also use the results with other popular Python packages. In combination with Jupyter Notebook, NetworKit provides an intuitive computing environment for scientific workflows, even on a remote compute server.

High Performance

In NetworKit, algorithms and data structures are selected and implemented with a combination of good software engineering as well as high performance and parallelism in mind. Some implementations are among the fastest in published research. For example, community detection in a 3 billion edge web graph can be performed on a 16-core server in a matter of a few minutes.

Easy Integration

As a Python module, NetworKit enables seamless integration with Python libraries for scientific computing and data analysis, e.g. pandas for data framework processing and analytics, matplotlib for plotting, networkx for additional network analysis tasks, or numpy and scipy for numerical and scientific computing. Furthermore, NetworKit aims to support a variety of input/output formats.

from networkit import *
G = readGraph("skitter.graph", Format.METIS)
print(G.toString())
'Graph(name=skitter, n=1696415, m=11095298)'
cc = components.ConnectedComponents(G)
cc.run()
compSizes = cc.getComponentSizes()
numCC = len(compSizes)
maxCC = max(compSizes.values())
print("#cc = %d,largest = %d"%(numCC,maxCC))
#cc = 756,largest = 1694616
communities = community.detectCommunities(G)
PLM(balanced,pc) detected communities in 17.86 [s]
solution properties:
-------------------  -------------
# communities          1637
min community size        2
max community size   233061
avg. community size    1036.3
modularity                0.825245
-------------------  -------------
        
%matplotlib inline
import matplotlib.pyplot as plt
sizes = communities.subsetSizes()
sizes.sort(reverse=True)
plt.xscale("log")
plt.xlabel("community id")
plt.yscale("log")
plt.ylabel("size")
plt.plot(sizes)
plt.show()

Using NetworKit is as simple as importing the networkit Python package. In the example above, we then read a network of autonomous systems from disk and print some very basic statistics about the network. We go on by computing the connected components and outputting their number and size.

Continuing with the example on the left, we tell NetworKit to detect communities for the skitter network. Thanks to our parallel modularity-driven community detection algorithms, this takes only about 18 seconds on a consumer notebook even though the network has more than 11 million edges.

Visualizing the size of the communities computed in the example in the middle is very easy due to the seamless integration of NetworKit into the Python ecosystem. We use matplotlib to plot a log-log graph of the community sizes sorted in descending order. When using Jupyter Notebook the resulting plot appears directly below the plot command.