University Graph Analysis icon University Graph Analysis
University Project #Data Science#Mathematics

University Graph Analysis with Writeup#

A mathematical and visual analysis of a graph representing the relationships between members of a university faculty.

.rmd knitted with execution results available for download here

Plotting the Graph#

Start by adding the required packages.

# install required libraries
if (!require("igraph")) {
install.packages("igraph")
library(igraph)
}
## Loading required package: igraph
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
if (!require("igraphdata")) {
install.packages("igraphdata")
library(igraphdata)
}
## Loading required package: igraphdata
## Warning: package 'igraphdata' was built under R version 4.3.2

Then, we can get our data from the ‘igraphdata’ package for use with ‘igraph’

# load the data into 'UKfaculty' variable
data(UKfaculty, package="igraphdata")
# listen to the warning given
G = upgrade_graph(UKfaculty)

The data ‘UKfaculty’ consists of 81 nodes, each representing a person at a UK university.

Each node has a ‘Group’ attribute, which is an integer id for their school affiliation.

There are a total of 817 edges, each with a weight.

The weight of each edge represents the ‘closeness’ of the friendship between two faculty members, so lower ‘weight’ means the two are better friends.

Each edge is directed, since a friendship could be one-sided.

It would be helpful to encode more information into the graph, instead of just nodes and edges.

Lets encode the ‘Group’ attribute into the vertices’ color.

# loop through all the vertices
for(i in c(1:length(V(G)))){
# set the vertex color based on 'Group'
if(V(G)[i]$Group == 1){
V(G)[i]$color = "white"
} else if(V(G)[i]$Group == 2){
V(G)[i]$color = "lightgreen"
} else if(V(G)[i]$Group == 3){
V(G)[i]$color = "lightblue"
} else if(V(G)[i]$Group == 4){
V(G)[i]$color = "lightpink"
}
}

After that, we can plot the graph.

plt = function(G, log=TRUE){
# reduce the margin size
par(mar=c(0,0,0,0)+.1)
# consistent graph generation
# doesn't change community structure for later analysis
set.seed(7881954)
# plot the graph with some layout changes to prevent it from being a ball of vertices on top of each other.
plot(G, layout=layout_with_fr(G), # use force-directed graph drawing to minimize crossing edges, and place similar edge lengths nearby
asp=1, edge.arrow.size=0.5, edge.arrow.width=1, vertex.size=10, # now some coloring to encode more info into the graph
vertex.color=V(G)$color, vertex.label.color="black")
# create a legend for the different node colors/groups
if(log){
legend(x = 1, y = 1.1, xjust = 0.5, ncol = 2, c("Group 1", "Group 2", "Group 3", "Group 4"),
pch = 21, , pt.cex = 1.5,
col = "black", pt.bg = c("white", "lightgreen", "lightblue", "lightpink"))
}
}
plt(G)

Analyzing Plotted Graph#

From this graph, we can see three general communities form based on the faculty members ‘Group’ attribute.

While there are friendships between groups, its clear to see that in general the members of one group are more friendly with other members of their group, instead of members of another group.

Additionally, we can see that the members of Group 1 and Group 2 are much closer to the members of their own group, compared to the members of Group 3 which are not as close with each other.

Next we can see that in general, Group 1 and Group 2 are much more friendly with the members of the other group (Group 2 and Group 1 respectively) than they are with members of Group 3.

Also, members of Group 4 are friendly enough with Group 1 (and vice versa) for Group 4 to be included in Group 1’s ‘community’.

Finally, there are some outliers: like node 62, which is more friendly with Groups 1 and 4 than with their own group (Group 3), and node 11 which isn’t part of their groups community, only having 2 distant friendships.

Mathematical Analysis#

Now that we’ve analyzed the graph’s structure when plotting it, it’s time to perform some mathematical analysis on it’s structure.

First, lets start with the graph’s center, AKA the list of nodes in G with the lowest eccentricity (radius).

# init list
center = list()
# loop through all nodes
for(i in V(G)){
# if current node has lowest possible eccentricity in G
if(eccentricity(G)[i] == radius(G)){
# it's a center node, add to list
center[length(center) + 1] = i
}
}
# the number of center nodes is
length(center)
## [1] 61
# the center nodes are:
# to print w/out weird [[i]] stuff
for(i in c(1:length(center))){
print(center[[i]])
}
## [1] 1
## [1] 2
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 9
## [1] 10
## [1] 12
## [1] 13
## [1] 15
## [1] 17
## [1] 18
## [1] 19
## [1] 20
## [1] 21
## [1] 22
## [1] 23
## [1] 25
## [1] 26
## [1] 27
## [1] 29
## [1] 31
## [1] 32
## [1] 33
## [1] 34
## [1] 35
## [1] 37
## [1] 38
## [1] 39
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 46
## [1] 48
## [1] 49
## [1] 50
## [1] 51
## [1] 52
## [1] 54
## [1] 55
## [1] 57
## [1] 58
## [1] 60
## [1] 61
## [1] 62
## [1] 64
## [1] 65
## [1] 68
## [1] 69
## [1] 70
## [1] 71
## [1] 72
## [1] 74
## [1] 75
## [1] 76
## [1] 77
## [1] 79
## [1] 80
## [1] 81
# with an eccentricity/radius of
radius(G)
## [1] 3

So there’s a total of 61 center nodes, each with an eccentricity of 3.

That means that 61 people in the faculty have at most 3 degrees of separation from all the other faculty members.

In terms of the relationships defined by the edges: for any faculty member that is also a center node, the list of the center node’s friends, the friends of the center node’s friends, and the friends of the friends of the center node’s friends all combined will be the set of all faculty members.

max(eccentricity(G))
## [1] 4

Since the max eccentricity is 4, for all 81 nodes there will be at most 4 degrees of separation from all the other faculty members.

Next we can look at betweenness, a centrality measure for graphs.

Betweenness for a vertex ‘v’ is calculated as the sum of fractions, where each fraction is the sum of shortest paths from vertex ‘s’ to ‘t’ (s != v != t) that pass through vertex ‘v’, divided by the sum of shortest paths from ‘s’ to ‘t’ that don’t pass through ‘v’.

Overall, this gives a measure of what vertices are included in more shortest paths compared to others, giving a measure of what vertices are near the ‘center’ of the graph.

# order returns indices
# pass these indices to V(G) to sort V(G) by betweenness descending
V(G)[order(betweenness(G, normalized = TRUE, directed=TRUE), decreasing = TRUE)]
## + 81/81 vertices, from 6f42903:
## [1] 37 62 5 2 29 38 17 72 58 10 54 52 69 77 1 48 75 6 9 35 27 49 7 24 51
## [26] 22 53 70 4 40 42 21 57 39 31 46 61 15 13 55 80 67 28 43 20 79 81 68 25 34
## [51] 59 30 12 66 33 16 19 76 56 64 65 47 3 26 50 45 44 23 41 71 8 11 14 18 32
## [76] 36 60 63 73 74 78

So using betweenness as a centrality measure, the ‘center’ of the graph is node 37.

That means that node 37 is in the most shortest paths from any other two nodes.

In the context of the faculty members, since node 37 is in more shortest paths, this means that node 37 probably has ‘closer’ (lower edge weight) friendships with other faculty members in general, or that their collection of friends are ‘closer’ comparatively.

Next, we can look at closeness, another centrality measure.

Closeness for a vertex ‘v’ is the mean distance between ‘v’ and all other vertices that ‘v’ is connected with.

# order returns indices
# pass these indices to V(G) to sort V(G) by closeness descending
V(G)[order(closeness(G, mode="all", normalized = TRUE), decreasing = TRUE)]
## + 81/81 vertices, from 6f42903:
## [1] 37 62 54 9 10 29 5 38 18 2 7 40 58 77 55 79 52 69 16 20 32 35 34 23 49
## [26] 57 70 27 61 17 22 48 6 51 42 65 15 41 4 39 76 26 72 43 80 60 59 81 1 21
## [51] 24 28 44 13 53 67 33 45 75 31 12 68 8 25 64 19 14 50 66 3 30 47 78 11 56
## [76] 63 74 36 71 46 73

Here we can see that node 37 is also the graphs ‘center’ when using closeness as a centrality measure.

This tracks with what we concluded when measuring betweenness.

With betweenness we concluded that node 37 was in the most shortest paths between any two nodes, then it follows that the path from most nodes to node 37 will be the shortest path, leading to node 37 having the ‘closest’ mean distance between itself and any other vertex.

Next let’s look at periphery.

The periphery of our graph is the nodes that have the maximum eccentricity.

# graph vertices whose eccentricity is the max value for G
V(G)[eccentricity(G)==max(eccentricity(G))]
## + 20/81 vertices, from 6f42903:
## [1] 3 8 11 14 16 24 28 30 36 40 45 47 53 56 59 63 66 67 73 78

In the context of the faculty members, a person on the periphery of the graph would be someone who isn’t as ‘friendly’ and whose friends would be less ‘friendly’

Next let’s look at the graph’s girth.

The girth of a graph is the length of the shortest possible cycle.

This most likely will be 2 for the faculty graph, so long as there are two people who are mutually friends

girth(G)
## $girth
## [1] 3
##
## $circle
## + 3/81 vertices, from 6f42903:
## [1] 4 1 36

I have no idea why it says 3, since this can easliy be shown as incorrect.

As we can see from the following:

E(G)[.from(1)]
## + 6/817 edges from 6f42903:
## [1] 1->62 1->45 1->36 1->61 1-> 4 1->44
E(G)[.to(1)]
## + 9/817 edges from 6f42903:
## [1] 62->1 61->1 38->1 4->1 45->1 81->1 52->1 44->1 36->1

In ‘G’ there are the edges ‘1->4’ and ‘4->1’, meaning the shortest cycle in ‘G’ is of length 2 not 3.

So the girth of ‘G’ is 2, and there is at least one mutual friendship among the faculty members.

Now lets look at the degree distribution of our graph.

The degree distribution does exactly what you’d expect, it creates a distribution based on the percentage chance that any given vertex will have the associated degree.

dd = degree_distribution(G)*100
plot(dd, ylab="% Chance for Degree", xlab="Vertex Degree")
as.data.frame(dd)
## dd
## 1 0.000000
## 2 0.000000
## 3 1.234568
## 4 0.000000
## 5 1.234568
## 6 1.234568
## 7 3.703704
## 8 0.000000
## 9 3.703704
## 10 2.469136
## 11 8.641975
## 12 2.469136
## 13 6.172840
## 14 7.407407
## 15 3.703704
## 16 2.469136
## 17 1.234568
## 18 2.469136
## 19 2.469136
## 20 4.938272
## 21 4.938272
## 22 2.469136
## 23 3.703704
## 24 0.000000
## 25 3.703704
## 26 1.234568
## 27 3.703704
## 28 2.469136
## 29 1.234568
## 30 2.469136
## 31 1.234568
## 32 0.000000
## 33 0.000000
## 34 1.234568
## 35 0.000000
## 36 3.703704
## 37 1.234568
## 38 2.469136
## 39 2.469136
## 40 1.234568
## 41 0.000000
## 42 0.000000
## 43 0.000000
## 44 1.234568
## 45 1.234568
## 46 0.000000
## 47 0.000000
## 48 0.000000
## 49 0.000000
## 50 0.000000
## 51 0.000000
## 52 0.000000
## 53 0.000000
## 54 0.000000
## 55 1.234568
## 56 0.000000
## 57 0.000000
## 58 0.000000
## 59 0.000000
## 60 0.000000
## 61 0.000000
## 62 0.000000
## 63 1.234568

Note: degree_distribution starts at degree 0, so % chance of degree ‘deg’ is dd[deg+1]

From this we can see several things:

  • Faculty members have the highest chance of being friends with 10 people.
V(G)[degree(G) == 10]
## + 7/81 vertices, from 6f42903:
## [1] 30 32 36 53 56 65 81
  • There is no faculty member with no friends.
V(G)[degree(G) == 0]
## + 0/81 vertices, from 6f42903:
  • The highest number of friends any faculty member has is 62.
max(degree(G)) == 62
## [1] TRUE
V(G)[degree(G) == max(degree(G))]
## + 1/81 vertex, from 6f42903:
## [1] 29
  • The lowest number of friends any faculty member has is 2.
min(degree(G)) == 2
## [1] TRUE
V(G)[degree(G) == min(degree(G))]
## + 1/81 vertex, from 6f42903:
## [1] 11

Now lets look at graph density.

Graph density measures how many edges are there in the graph compared to how many there could possibly be.

With our graph of order 81 (number of vertices), it could have a total of 6480 edges (# vertices * (# vertices -1))

So our graph density is:

graph.density(G)
## [1] 0.1260802

So that means, that of all the possible friendships all the faculty members could make with each other, only around 12.6% of them exist.

Now let’s look at the density of the different groups in the faculty

s1 = induced_subgraph(G, V(G)$Group==1, impl="create_from_scratch")
plt(s1,FALSE)
graph.density(s1)
## [1] 0.3001894
s2 = induced_subgraph(G, V(G)$Group==2, impl="create_from_scratch")
plt(s2,FALSE)
graph.density(s2)
## [1] 0.3561254
s3 = induced_subgraph(G, V(G)$Group==3, impl="create_from_scratch")
plt(s3,FALSE)
graph.density(s3)
## [1] 0.2807018
s4 = induced_subgraph(G, V(G)$Group==4, impl="create_from_scratch")
plt(s4,FALSE)
graph.density(s4)
## [1] 1

From this, we can see that each group in the faculty is tighter knit than the faculty as a whole.

Relatively, there are more friendships within a group than in the faculty as a whole.

Additionally, Group 4 has the highest friendship density, but with only two members, that isn’t particularly notable.

The group with the second highest friendship density is Group 2, with around a 35.6% friendship density.

← Back to Projects