Modern societies are in many ways highly connected. Certain aspects of this phenomenon are frequently described as networks:
- social networks (of people linked by friendship or social interaction)
- computer networks (the Internet)
- document networks (World Wide Web)
- academic networks (collaborations, citations of scientific publications)
- transportation networks (road, rail, airline, subway)
- networks in biology (e.g., biochemical, ecological, or neural networks)
- financial networks (transactions between traders)
In its simplest form, a network is just a collection of points (called vertices or nodes), some of which are joined in pairs (called edges or links). Many systems of interest are composed of individual parts that are in some way linked together. Such systems can be regarded as networks, and thinking about them in this way can often lead to new and useful insights. Network science studies the pattern of connections between the components of a system (rather than focusing on individual parts, or single connections). Naturally, the structure of the network can have a big impact on the behavior of a system. The connections in a social network, for example, affect how people learn, form opinions, or spread a disease.
A network is a simplified representation of a complex system by vertices and edges. The scientific study of networks is an interdisciplinary undertaking that combines ideas from mathematics, physics, biology, computer science, the social sciences and many other areas. Between these scientific fields, many tools have been developed for analyzing, modeling and understanding networks.
An example of a useful and important type of network measures is that of centrality. It is concerned with the question of how important a particular vertex or edge is in a networked system. Different concepts have suggested for capturing mathematically what it means to be central. A simple measure of the centrality of a vertex is its degree, i.e., the number of edges it is part of.
Another interesting network concept is the small-world effect. It is concerned with the question of how far apart two randomly chosen points in a network typically are. Here, distance is usually measured by the number of edges one would need to cross over, when travelling along a path from one vertex to the other. In real world social networks the distance between people tends to be rather small, and is referred to as the six degrees of separation in popular culture.
A third network concept of practical importance is that of clusters, or communities in a network. Social networks tend to break up into groups of close friends. The same is true for other types of real world networks. The way a network breaks down into communities can reveal details about the organization of a network that are impossible to see without network data.
Which measurements and calculations give useful answers for a particular system depends of course on the specific nature of the system and the questions one wants to ask.
This course is meant as an introduction to Network Science where, within the given time and space constraints, some but certainly not all of its interesting aspects will be discussed. A rough guide is provided by the recent book Networks, Crowds and Markets.
Examples of Networks
Newman (Chapters 2-5) broadly divides the most commonly studied real world networks into four classes:
- technological networks,
- social networks,
- information networks and
- biological networks.
There is of course some overlap between these classes. But it is an interesting exercise to list important examples within each class, and to describe their general structure, and the techniques used to discover and measure this structure in each example.
Technological networks rely on a physical infrastructure. In many cases, this infrastructure has been built over many decades and forms part of the backbone of modern societies. This includes road and other transportation networks, power grids, and the telephone network. A relatively young example in this class of networks is the Internet.
The Internet is the worldwide network of physical data connections between computers and related devices. In this network, the vertices are the computers, and the edges are the physical connections like optical fibre lines.
The actual structure of the Internet is somewhat difficult to determine, due to the large number of nodes and vertices involved, and since its topology is not controlled by a central authority.
Overall, the Internet is composed of three layers, or circles.
The innermost circle is the backbone of the network, providing long-distance high-bandwidth data transport around the globe. The backbone is operated by Network Backbone Providers (NBP) like national governments or big telecommunication companies.
The second circle consists of Internet Service Providers (ISP) such as universities, commercial companies and governments, which link up with each other and NBPs and provide Internet access to the end users.
These end users, businesses, academics and other people using computers form the third and outer circle of the Internet.
This hierarchical description of the Internet however
captures only one small aspect of its structure. More detailed data can
be obtained by monitoring actual journeys of data packages
over the Internet, for example with the Unix
The Internet Mapping Project started in 1998 to collect such data systematically and has resulted in many visual impressions of the structure of the Internet like the one below.
The telephone network
The telephone network is the network with telephones as its vertices and landlines and wireless links as its edges. It is one of the oldest communication networks still in use. The recent advent of mobile phones has resulted in a drastic rise of the number of vertices in the network, but it hasn’t much changed the structure of the telephone network, as mobile phone calls too are transmitted across landlines for most of the way.
Similar to the Internet, the telephone network has basically a three tiered structure consisting of
- a layer of telephones, connected to
- a layer of local exchanges, which are in turn connected to
- a layer of long-distance offices.
The long distance offices are connected amongst themselves, and so are some of the local exchanges. This three-level topology can exploit the fact that most phone calls are local calls.
The exact structure of the telephone network is known, but not easily accessible as this information is mainly owned by the telephone companies.
In contrast to the Internet, the telephone network traditionally is circuit switched (where a collection of lines, or circuits, between points on the network is reserved to a call for the duration of that call) and not packet switched (where the information to be submitted is broken up into small packages, each of which travelling independently across the network to their common destination, where they are reassembled).
A power grid is a network of high-voltage transmission lines that provides long distance transport of electricity within and between countries. The vertices in this network are generating stations and switching stations. The edges are the high-voltage lines between the stations. The topology of a power grid is usually well documented, e.g., in the form of a map published and maintained by the authority overseeing the grid.
Eirgrid is the state-owned company that manages and operates the transmission grid across the island of Ireland. (Click on the image below to get a detailed map.)
Power grids are very complicated systems. They can display surprising behaviour such as cascading failures, leading to large scale power outages. The flow of power in a grid depends on a variety of factors, in addition to the bare topology of the network.
Transportation networks connect geographical locations by physical links such as roads, railway lines or airline routes. Due to their physical manifestation on the face of the earth, such networks are usually easy to visually by a flat drawing, or a planar graph in mathematical terms.
Transportation networks are typically well documented in the form of maps and have been studied for a long time, for example with a view to economical implications of the network structure. Or to guide the users of the network: the London Tube Map omits geographical details in favour of the bare network structure. This now universal design of subway and other public transport maps was pioneered by the British electrical designer Harry Beck in 1931.
Closely related to transportation networks are delivery or distribution networks, like systems of oil or gas pipelines, water and sewerage lines, or the postal network for the delivery of mail. A particular kind of natural delivery network is provided by river networks. This example of a type of network is special in the sense that it contains no loops, and thus resembles a tree. In fact, tree is precisely the term used in graph theory, to describe a connected network without loops. Another property of a river-like network worth pointing out is its natural sense of direction coming from the fact that water only flows downhill.
The vertices of a social network are people, with edges representing some sort of social interaction, like friendship. In sociology, the vertices are often called actors, and the edges are called ties.
A social network’s existence does not depend on social networking
Traditionally, data about the structure of social networks have been compiled by interviewing the people involved. There are certain issues in relation to the accuracy of information gained in this way. People might understand the interviewer’s questions in different ways, they might not always answer truthfully. Friendship, in theory, is a symmetric concept. I am your friend if and only if you are my friend.
Being asked independently, might reveal that I think we are friends while you believe we’re not. That’s why data obtained from interviews has to be used with care. This also applies to data obtained from social media sites, for similar reasons. Links in social networks are not as clear cut as they are in a network that is built on a physical infrastructure. An exception to this observation is maybe given by links that are defined via common affiliations.
An affiliation network has two types of nodes, one type representing people and another type representing groups of people. These groups might be defined by club membership, common interests, events attended, etc. Links only exist between people and groups. Each link involves exactly one node of each type, no two nodes of the same type are ever linked. In mathematical terms, an affiliation network is an example of a bipartite graph, a graph whose vertices are either black or white, say, and whose links connect black with white vertices, and nothing else.
Classical examples of affiliation networks in the sociology literature include the “Southern Women Study” from 1941, relating a set of 18 society women in a particular city to a set of 14 social events they attended, according to reports in the local press. Another study relates the CEOs of Chicago based companies in the 1970s to the clubs they were members of.
Network science techniques can be used to identify groups of closely related people in such a network. For this, the affiliation network can be projected onto the set of people, defining a new network which has only the people as vertices, with a link between two whenever they have a common affiliation. This process is a common change of point of view which not only applies to social networks.
An information network consists of data items which are linked to each other in some way. This happens for example in every relational database. Like social networks, an information network does not need computers to exist. Bits of information (like scientific publications) have been linking to each other (e.g., through citations) long before computers were invented. However, links in digital form are usually easier to follow than vague references to items stored in a different building (or city), and thus support a more intense experience of network structure.
The (World Wide) Web
The WWW is probably the most wide spread and best known examples of an information network. Its nodes are web pages containing information in form of text and pictures, and its edges are the hyperlinks, allowing us to surf, or navigate from page to page.
Hyperlinks run in one direction only, from the page that contains the hyperlink to the page that is referenced. Therefore, the WWW is a directed network, a graph where each edge has a direction.
The WWW was invented around 1990 at the CERN high-energy physics laboratory in Geneva as a way of sharing information between the scientists and their collaborators. It soon became clear that its potential was far greater, leading to an unexpected sudden growth in the number of pages and links.
This fast and uncontrolled growth makes it difficult to analyze the structure of the WWW in detail. On the other hand, the digital nature of this network makes it possible to leave the task of exploring to machines, so-called robots or web crawlers, who search a given web page for hyperlinks, then follow those links, and search the pages they arrive at, over an over again.
Search engines like Google use web crawlers on a massive scale, to classify and index all web pages that can be found, in a way that allows a speedy listing of interesting pages, based on one or more keywords.
A citation network consists of academic papers and the citations between them. Citations are usually listed in a bibliography at the end of a paper, and as such are well-documented. As a citation points from one paper to another (and not back), the resulting network is directed, like the WWW.
There are many reasons why one paper might cite another: to point to additional useful information, to give credit, or to disagree with the content or point out some error in paper.
Bibliometrics is the branch of information science that deals with the statistical study of publications and citations. There are now a number of online services concerned with the collection and distribution of bibliometric data, such as citeseer, scopus, and google scholar. The American Mathematical Society has been collecting reviews of mathematical papers for decades. These days their database of reviews and citations can be accessed online.
It is tempting to use crude and readily available bibliometrical data as sole indicators of the value of scientific work. There is a growing concern within the scientific community about a worrying trend of this kind of evaluation becoming more wide spread.
We’ll look at examples of biological networks later.