Conceptual outline of the knowledge graph building process.
(A) Every document is split into its constituent sentences and each of them is scanned to identify expressions registered on the dictionary. In the figure, two sentences are highlighted and the matching expressions are enclosed in coloured boxes. Every one of these expressions is associated to a concept in the dictionary. (B) The concepts co-occurring in a sentence are connected pairwise. A sentence is therefore abstracted as a complete graph where the occurring concepts are the nodes and a single co-occurrence is a link. The weight of a link is increased if more instances of the same co-occurrence are present. (C) The sentence graphs are then merged in such a way that each node (concept) appears only once in the graph. In the figure it is evident that the «LAM» node (abbreviation for Lymphangioleiomyomatosis – a rare disease) appears in every graph and the «Lung» node in two of them. (D) The result of the merging is a new graph – which is no more complete – where the weight of the link is associated to the frequency of the same co-occurrence.