Data Analysis: Recipe DB

4.3 Network Analysis - Community Detection


Community detection is a method used to find groups or clusters within network through graphs.

We have used the Louvain algorithm for community detection in our ingredients network. This algorithm uses a parameter called modularity to extract communities or groups. Modularity measures the strength of division of a network into modules (groups). High modularity in networks represents dense connections between the nodes within modules, and sparse connections in different modules. The Louvain Algorithm is a greedy optimisation method that tries to optimise the modulariry of a community in the network.
Thus, the Louvain algorithm evaluates how densely connected the nodes in a partition are and recursively merges communities into a single node and executes the modularity clustering on the condensed graphs.

For this we use the python library community_louvain.

In [1]:
import pandas as pd
import community as community_louvain
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import networkx as nx
import csv
In [2]:
#read the graph
G = nx.read_edgelist('output_files/ingredient_weights.csv', delimiter=',' ,encoding='latin1', create_using=nx.Graph(), nodetype=str, data=(('weight',int),))

# compute the best partition
partition = community_louvain.best_partition(G)

Visualising the Community Structure

In [3]:
# draw the graph
pos = nx.spring_layout(G, k=0.25)

plt.figure(figsize=(10, 10))
plt.axis('off')
plt.title('Community Detection - Louvain Algorithm')

# color the nodes according to their partition
cmap = cm.get_cmap('Set1', max(partition.values()) + 1)
nx.draw_networkx_nodes(G, pos, partition.keys(), node_size=40, cmap=cmap, node_color=list(partition.values()))
nx.draw_networkx_edges(G, pos, alpha=0.5)

plt.show()
plt.savefig('plots/community_detection.png')
<Figure size 432x288 with 0 Axes>

Saving the Communities

These ingredients along with their community number are stored in the file 'community_lists.csv'.

In [4]:
#create the csv file
csv_file = open('output_files/community_lists.csv','w',newline='')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Name','Category'])

arr_cats = []
for x,y in partition.items():
  arr_cats.append(y)
  csv_writer.writerow([x,y])

csv_file.close()

#save the ingredients with their community number
data = pd.read_csv('output_files/community_lists.csv')
data = data.sort_values(["Category"])
data.to_csv('output_files/community_lists.csv', index=False)

print('Number of communities detected:', len(set(arr_cats)))
Number of communities detected: 7