Graphs and Social Networks

Random Graphs, Giant Components

Large, complex networks often have one giant component that contains a significant proportion of the nodes. Erdős and Rényi (1960) have found this to be almost certainly the case with a random graph on $n$ vertices with $\frac{n}{2}$ edges.

Example. A Random Graph

A random graph is a mathematical model of a family of networks, where certain parameters (like the number of nodes and edges) have fixed values, but other aspects (like the actual edges) are randomly assigned. The simplest example of a random graph is in fact the network $G(n,m)$ with fixed numbers $n$ of nodes and $m$ of edges, placed randomly between the vertices. Although a random graph is not a specific object, many of its properties can be described precisely, in the form of expected values, or probability distributions.

Here we will look at degree distributions and the emergence of a giant component in a random graph. Other properties of random graphs and their variants are discussed in [Newman, Chapter 12], in [Latora et el., Chapter 3] and in [Barabasi, Chapter 3].

Random Graphs.

Let us denote by $G(n, m)$ a network with $n$ nodes and $m$ chosen edges, chosen uniformly at random (out of the possible $\binom{n}{2}$). Equivalently, one can choose uniformly at random one network in the set $\mathfrak{G}(n, m)$ of all networks on a given set of $n$ nodes with exactly $m$ edges.

More precisely, one should think of $G(n, m)$ as a probability distribution $P \colon \mathfrak{G}(n, m) \to \mathbb{R}$, that assigns to each network $G \in \mathfrak{G}(n, m)$ the same probability $P(G) = \Omega^{-1}$ for

where $N = \binom{n}{2}$.

Then, for example, the diameter $\langle l \rangle$ of $G(n, m)$ can be determined as the average of the diameters $l(G)$ of the networks $G \in \mathfrak{G}(n, m)$:

In the same way, the number $\langle m \rangle$ of edges of $G(n, m)$ is

as every network $G \in \mathfrak{G}(n, m)$ has exactly $m$ edges, and the average degree $\langle k \rangle$ of the nodes in $G(n, m)$ is

as every network $G$ with $n$ nodes and $m$ edges has average degree $\frac{2m}{n}$.

The Erdős-Rényi model $G(n, p)$

The calculation of other properties of $G(n, m)$ are less straightforward. It turns out that it is easier to work in the related, but different random graph $G(n, p)$. Here $p \in [0, 1]$ is a probability, and $G(n, p)$ is a network on a given set of $n$ nodes, where each of the $N = \binom{n}{2}$ possible edges is present with probability $p$ (and absent with probability $(1-p)$). This model gives a probability distribution $P$ on the set of all networks on the given $n$ nodes, where

if $G$ has exactly $m$ nodes. The networks in this model may have different numbers of edges. In fact $G(n, p)$ could be the empty network with no edges between the $n$ nodes at all (with (small) probability $(1-p)^N$), or the complete graph on $n$ nodes (with (small) probability $p^N$).

Number of Edges. The expected number of edges $\langle m \rangle$ of $G(n, p)$ is determined as follows. First note that the probability of $G$ having exactly $m$ edges is

as each of the $\binom{N}{m}$ graphs $G$ with $m$ edges occurs with equal probability $P(G) = p^m (1-p)^{N-m}$. This is a binomial distribution. The expected number of edges then is

taking into account that the case $m = 0$ contributes $0$ to the sum. Using the absorption law, $\binom{n}{k} = \frac{n}{k} \binom{n-1}{k-1}$, one finally gets


by the Binomial Theorem.

So $\langle m \rangle = N p$, which is not really surprising since $p = \langle m \rangle / N$ is the expected proportion of edges present in $G(n, p)$.

Mean Degree. The expected mean degree $\langle k \rangle$ is derived from the expected number of edges as

This again is no surprise, as any node is connected with probability $p$ to any of its $(n-1)$ potential neighbours in the network. We use the notation

from now on for the mean degree of $G(n, p)$.

Degree Distribution

The probability of a node to have exactly degree $k$ is

reflecting the fact that there are $\binom{n-1}{k}$ ways to choose $k$ other nodes to connect to, and that the probability of being connected to exactly those (and not connected to the remaining $n - 1 - k$ nodes) is $p^k (1-p)^{n-1-k}$. This is again a binomial distribution.

Poisson’s Theorem, also known as the Law of Rare Events, states that, for large values of $n$ (keeping the mean degree $c = (n-1)p$ constant), the binomial distribution can be approximated by a Poisson distribution. This can be seen as follows.

Assuming $n$ large makes $p = c/(n-1)$ small. Taking logarithms,

using the Taylor series for the logarithm,

and ignoring (small) terms of higher order. Thus



This is the Poisson distribution. For large $n$, the degrees in the random graph $G(n, p)$ follow a Poisson distribution.

Clustering Coefficient.

Define the clustering coefficient $C$ of a network $G = (X, E)$ as the probability that two neighbours of a node in the network are connected to each other, that is the proportion of the paths of length $2$ whose vertices form a triangle in the network. The clustering coefficient thus measures the transitivity, or the degree of triadic closure, in a network. (This notion of a clustering coefficient in general is different from the average of the (local) clustering coefficients of the individual nodes.)

In the random graph $G(n, p)$ the probability of any edge is just $p$, hence

For fixed $c$, this number tends to $0$ with $n \to \infty$.

The Giant Component.

For $p = 0$, the largest component of the random graph $G(n, p)$ has size $1$ (as each node is a connected component of its own).

At the other extreme, for $p = 1$, the entire network forms a single component of size $n$.

Somewhere between these extreme ends, the network undergoes a phase transition, from a network with a largest component of constant size (independent of $n$), to a network with largest component of a size proportional to $n$ (growing in size with $n$). A giant component is a component whose size is extensive, i.e., proportional to $n$.

Interactive Example. In this example, a network of $n = 100$ nodes, starting off as an empty network, becomes more and more connected, through the addition of randomly chosen edges. Most of the time, one can observe the emergence of a dominant large component after about $\tfrac12 n = 50$ edges have been added. The growth stops when there is only a single connected component left, usually after adding about 250 edges. Nodes grow in size according to their degree. Press Ctrl-R to reload.

Using properties of the Poisson distribution, it can be shown that in a random graph $G(n, p)$, when the average degree $c = (n-1)p$ increases from $0$ upwards,

  • for $c < 1$, the graph has no giant component, and the individual (small) components are trees;
  • at $c = 1$, the graph has large components and the individual components may contain cycles;
  • for $c > 1$, there is a single giant component which has loops, whereas all other components are trees;
  • for $c$ larger than $\ln n$ the network is a single giant component.

Real networks are not random

Most real networks have a giant component, and the random graph helps to explain this.

However, in many respects a real network differs from a random network:

  • Social networks are small worlds due to short paths and cycles; a random graph consists mostly of trees and cycles, if any, tend to be long.

  • Social networks contain highly connected communities, forming almost complete subgraphs; a random network only becomes highly connected when $p$ is very large and almost all edges are present.

  • The clustering coefficient in a social network is high (as high as $0.20$ in the actors costarring network), whereas the clustering coefficient $C = \tfrac{c}{n-1}$ of the random $G(n, p)$ is small, tending to $0$ for growing $n$ and fixed $c$.

  • Most significantly, the degree distribution of a real network does not follow a Poisson distribution due to the presence of highly connected vertices. More on this later.