## Graphs and Social Networks

# Small Worlds and Triadic Closure

## Small Worlds

Some social networks have become very accessible for systematic investigations, as they manifest themselves in a digital form on the internet. One such example of a social network that is accessible online is maintained by the American Mathematical Society (AMS) as part of MathSciNet, a database of mathematical research publications.

This network is formed by a *collaboration graph*: its **nodes** are
the mathematical researchers, and two researchers are connected by and **edge**
whenever they have jointly authored a publication in the database.
In this network, the **Erdős number** of a mathematician is their
distance from the Erdős node. This number can be computed by Breadth First Search. The AMS offers this as a service on a dedicated
collaboration distance page.

The Hungarian mathematician Paul Erdős (1913-1996) is very well-known for his publication records, with more than 1500 publications, and more than 500 collaborators. It turns out, that most mathematicians have an Erdős number of 4 or 5. In fact, random spot checks seem to indicate that virtually all mathematicians are connected with each other in this network (in a big giant component), and that any two of them are rarely further than distance 6 apart.

This is an instance of a phenomenon that has often been observed:
social networks tend to have very short paths between arbitrary pairs of
people. Such a network is called a **small world network**.
The name refers to the property that the diameter
$\mathrm{diam}(G)$, or the average distance
between nodes in a network $G = (X,E)$ with $n$ nodes is
small, or smaller than one would naively expect.
In popular culture, the phenomenon is known as “six degrees of separation”.
It goes at least back to experiments conducted in the 1960s
by the Yale psychologist Stanley Milgram (who is even better known
for another experiment designed by him).

In a similar way, the **co-star graph** (with actors as nodes, and edges
between two actors whenever they starred in a common film) as
maintained by the IMDb, has been used to define the **Bacon number**
of an actor as their distance to the American actor Kevin Bacon.

The presence of short paths certainly has some relevance in relation to the potential speed of information (or diseases) travelling through the network.

## Triadic Closure

Networks evolve over time, edges come and go. The concept of triadic closure
is concerned with the presence of *triangles* in a network:
if node $A$ is connected to nodes $B$ and $C$, is $B$ connected to $C$?

**Clustering Coefficient.**
A set of nodes in a graph with the property that
any two distinct nodes are joined by an edge is called
a **clique** (in other words, a clique is a subset of the nodes
such that the induced subgraph is complete).

The graph below contains two cliques of four nodes each.

A clique of $m$ nodes has $\binom{m}2 = \frac12 m(m-1)$ edges, e.g., a clique of $4$ nodes has $\frac{4 \cdot 3}2 = 6$ edges.

The clustering coefficient of a node $A$ measures, how close
the set $N(A)$ of friends of $A$ comes to being a clique:
if the induced subgraph on $N(A)$ has $f$ edges, then the
**clustering coefficient** of the node $A$ is defined as the fraction
$f/\binom{m}2$.
This is a number between $0$ and $1$.

Triadic closure increases the clustering coefficient. In a social network, there are incentives for triadic closure to happen. If person $A$ is friends with $B$ and $C$, this provides opportunities for $B$ and $C$ to meet and become friends. On the other hand, $B$ not being friends with $C$ can be a source of stress for $A$, who has to attend to their friends separately. More generally, a lower clustering coefficient can mean that more work is required to maintain a circle of friends.

## Bridges; local Bridges

An edge $AB$ is called a **bridge** if deleting the edge would cause $A$
and $B$ to lie in different connected components. (In other words, if
the edge $AB$ is a bridge then it is the only route from $A$ to $B$).

**Example.** The edge $AB$ in the graph below is clearly a bridge.

This notion might not be very useful for the study of social networks, as cycles and short paths make bridges very rare, or computationally expensive to detect.

More interesting than the *global* notion of a bridge, is the
*local* notion of a local bridge.

An edge $AB$ is a **local bridge** if $N(A) \cap N(B) = \emptyset$,
i.e., if $A$ and $B$ have no friends in common.

**Example.** The edge $AB$ in the graph below is a local bridge.

Deleting a local bridge $AB$ increases the distance between
$A$ and $B$ to at least 3.
This new distance is called the **span** of the local bridge.

A local bridge is the conceptual opposite of triadic closure: an edge $AB$ is a local bridge if and only it is not involved in any triangle. (Proof?)

## Strong and Weak Ties

In many examples it is possible to distinguish between different
levels of *strength* of the links of a network. Here, we are going to
study networks with (only) two types of edges: **strong ties**
(corresponding to close friends, say), and **weak ties**
(corresponding to acquaintances).

The assumption that triadic closure is more likely to happen in the presence of strong links can be formalized as the following property.

**Strong Triadic Closure Property**.
A node $A$ violates the Strong Triadic Closure Property
if it has strong ties with two other nodes $B$ and $C$, and there
is no edge at all between $B$ and $C$. A node $A$ satisfies the
Strong Triadic Closure Property if it does not violate it.

**Example.** In the graph below, the strong edges have a label
($s$) and the weak edges don’t. Each node of this graph satisfies the
Strong Triadic Closure Property: whenever a node has two strongly tied neighbours, there is a tie (weak or strong) between those neighbours.

Triadic closure establishes a connection between the local notion of link strength and the structural notion of local bridges, as follows.

**Proposition.** If a node $A$ in a network satisfies the Strong Triadic
Closure Property and is involved in at least two strong ties, then any
local bridge it is involved in must be a weak tie.

In other words, assuming that all nodes in a network satisfy the Strong Triadic Closure Property and have sufficiently many strong ties, local bridges are necessarily weak ties.

**Proof.** Suppose that node $A$ does satisfy the Strong Triadic Closure
Property and is involved in at least two strong ties. For a
contradiction, suppose the edge $AB$ is a local bridge *and* a strong
tie. As $AB$ is a local bridge, nodes $A$ and $B$ have no friends in
common. Let $AC$ be another strong tie involving $A$. Then the Strong
Triadic Closure Property requires the existence of an edge $BC$, making
$C$ a common friend of $A$ and $B$, contradiction.

Simplifying assumptions (like the Strong Triadic Closure Property) are useful when they lead to statements that are robust in practice, in the sense that qualitative conclusions still hold in approximate forms, even when the assumptions are slightly relaxed.

The surprising strength of weak ties (as experienced in the case of information leading to a new job or other new opportunities typically coming from distant acquaintances) can be partially explained in this framework. Links connecting people to new sources of information tend to be local bridges of a certain “span”, which by the above are necessarily weak ties. The Stanford sociologist Mark Granovetter has pioneered the theoretical study of social networks along such lines in the 1970s.

Today, digital communication networks allow for empirical verifications of theoretical predictions.

The **Twitter** network, for example, has its users as **nodes**, and
one can use its social network features to distinguish strong and weak
ties. For example, a *weak* tie can correspond to one user
*following* another, and a *strong* tie can correspond to at least two
messages directly addressed at another user. An empirical study of
the social network with its strong and weak ties defined in this way,
shows that people have vastly different numbers of followees (between
none and more than 1000), and the number of strong ties tends to
increase with the number of followees. However, as soon as the number
of followees reaches 400, the number of strong ties seems to stabilize
at around 50. An explanation of this phenomenon can be that it
requires little effort to “follow” a large number of Twitter users.
But it requires time and energy to maintain a strong tie relationship,
and there are only so many hours in a day …