## Small Worlds

Some social networks have become very accessible for systematic investigations, as they manifest themselves in a digital form on the internet. One such example of a social network that is accessible online is maintained by the American Mathematical Society (AMS) as part of MathSciNet, a database of mathematical research publications.

This network is formed by a collaboration graph: its nodes are the mathematical researchers, and two researchers are connected by and edge whenever they have jointly authored a publication in the database. In this network, the Erdős number of a mathematician is their distance from the Erdős node. This number can be computed by Breadth First Search. The AMS offers this as a service on a dedicated collaboration distance page.

The Hungarian mathematician Paul Erdős (1913-1996) is very well-known for his publication records, with more than 1500 publications, and more than 500 collaborators. It turns out, that most mathematicians have an Erdős number of 4 or 5. In fact, random spot checks seem to indicate that virtually all mathematicians are connected with each other in this network (in a big giant component), and that any two of them are rarely further than distance 6 apart.

This is an instance of a phenomenon that has often been observed: social networks tend to have very short paths between arbitrary pairs of people. Such a network is called a small world network. The name refers to the property that the diameter $\mathrm{diam}(G)$, or the average distance $\frac1{n^2}\sum_{x, y} d(x, y)$ between nodes in a network $G = (X,E)$ with $n$ nodes is small, or smaller than one would naively expect. In popular culture, the phenomenon is known as “six degrees of separation”. It goes at least back to experiments conducted in the 1960s by the Yale psychologist Stanley Milgram (who is even better known for another experiment designed by him).

In a similar way, the co-star graph (with actors as nodes, and edges between two actors whenever they starred in a common film) as maintained by the IMDb, has been used to define the Bacon number of an actor as their distance to the American actor Kevin Bacon.

The presence of short paths certainly has some relevance in relation to the potential speed of information (or diseases) travelling through the network.

Networks evolve over time, edges come and go. The concept of triadic closure is concerned with the presence of triangles in a network: if node $A$ is connected to nodes $B$ and $C$, is $B$ connected to $C$?

Clustering Coefficient. A set of nodes in a graph with the property that any two distinct nodes are joined by an edge is called a clique (in other words, a clique is a subset of the nodes such that the induced subgraph is complete).

The graph below contains two cliques of four nodes each.

A clique of $m$ nodes has $\binom{m}2 = \frac12 m(m-1)$ edges, e.g., a clique of $4$ nodes has $\frac{4 \cdot 3}2 = 6$ edges.

The clustering coefficient of a node $A$ measures, how close the set $N(A)$ of friends of $A$ comes to being a clique: if the induced subgraph on $N(A)$ has $f$ edges, then the clustering coefficient of the node $A$ is defined as the fraction $f/\binom{m}2$. This is a number between $0$ and $1$.

Triadic closure increases the clustering coefficient. In a social network, there are incentives for triadic closure to happen. If person $A$ is friends with $B$ and $C$, this provides opportunities for $B$ and $C$ to meet and become friends. On the other hand, $B$ not being friends with $C$ can be a source of stress for $A$, who has to attend to their friends separately. More generally, a lower clustering coefficient can mean that more work is required to maintain a circle of friends.

## Bridges; local Bridges

An edge $AB$ is called a bridge if deleting the edge would cause $A$ and $B$ to lie in different connected components. (In other words, if the edge $AB$ is a bridge then it is the only route from $A$ to $B$).

Example. The edge $AB$ in the graph below is clearly a bridge.

This notion might not be very useful for the study of social networks, as cycles and short paths make bridges very rare, or computationally expensive to detect.

More interesting than the global notion of a bridge, is the local notion of a local bridge.

An edge $AB$ is a local bridge if $N(A) \cap N(B) = \emptyset$, i.e., if $A$ and $B$ have no friends in common.

Example. The edge $AB$ in the graph below is a local bridge.

Deleting a local bridge $AB$ increases the distance between $A$ and $B$ to at least 3. This new distance is called the span of the local bridge.

A local bridge is the conceptual opposite of triadic closure: an edge $AB$ is a local bridge if and only it is not involved in any triangle. (Proof?)

## Strong and Weak Ties

In many examples it is possible to distinguish between different levels of strength of the links of a network. Here, we are going to study networks with (only) two types of edges: strong ties (corresponding to close friends, say), and weak ties (corresponding to acquaintances).

The assumption that triadic closure is more likely to happen in the presence of strong links can be formalized as the following property.

Strong Triadic Closure Property. A node $A$ violates the Strong Triadic Closure Property if it has strong ties with two other nodes $B$ and $C$, and there is no edge at all between $B$ and $C$. A node $A$ satisfies the Strong Triadic Closure Property if it does not violate it.

Example. In the graph below, the strong edges have a label ($s$) and the weak edges don’t. Each node of this graph satisfies the Strong Triadic Closure Property: whenever a node has two strongly tied neighbours, there is a tie (weak or strong) between those neighbours.

Triadic closure establishes a connection between the local notion of link strength and the structural notion of local bridges, as follows.

Proposition. If a node $A$ in a network satisfies the Strong Triadic Closure Property and is involved in at least two strong ties, then any local bridge it is involved in must be a weak tie.

In other words, assuming that all nodes in a network satisfy the Strong Triadic Closure Property and have sufficiently many strong ties, local bridges are necessarily weak ties.

Proof. Suppose that node $A$ does satisfy the Strong Triadic Closure Property and is involved in at least two strong ties. For a contradiction, suppose the edge $AB$ is a local bridge and a strong tie. As $AB$ is a local bridge, nodes $A$ and $B$ have no friends in common. Let $AC$ be another strong tie involving $A$. Then the Strong Triadic Closure Property requires the existence of an edge $BC$, making $C$ a common friend of $A$ and $B$, contradiction.

Simplifying assumptions (like the Strong Triadic Closure Property) are useful when they lead to statements that are robust in practice, in the sense that qualitative conclusions still hold in approximate forms, even when the assumptions are slightly relaxed.

The surprising strength of weak ties (as experienced in the case of information leading to a new job or other new opportunities typically coming from distant acquaintances) can be partially explained in this framework. Links connecting people to new sources of information tend to be local bridges of a certain “span”, which by the above are necessarily weak ties. The Stanford sociologist Mark Granovetter has pioneered the theoretical study of social networks along such lines in the 1970s.

Today, digital communication networks allow for empirical verifications of theoretical predictions.

The Twitter network, for example, has its users as nodes, and one can use its social network features to distinguish strong and weak ties. For example, a weak tie can correspond to one user following another, and a strong tie can correspond to at least two messages directly addressed at another user. An empirical study of the social network with its strong and weak ties defined in this way, shows that people have vastly different numbers of followees (between none and more than 1000), and the number of strong ties tends to increase with the number of followees. However, as soon as the number of followees reaches 400, the number of strong ties seems to stabilize at around 50. An explanation of this phenomenon can be that it requires little effort to “follow” a large number of Twitter users. But it requires time and energy to maintain a strong tie relationship, and there are only so many hours in a day …