Prediction of viral targets.

(A) The heatmap indicated Pearson correlation values between the distributions of degree, betweenness centrality, number of pathways a protein is involved in, protein PageRank index and indispensability of a protein. Notably, degree, protein PageRank, betweenness centrality and appearance in pathways appeared best correlated while indispensability of proteins showed lowest levels of correlation with other topological measures. (B) Considering target sets of Hepatitis, Herpes, HIV, Influenza and other viruses, we randomly sampled sets of non-targeted proteins of equal size. Determining the area under the ROC curves (AUC), we observed that protein PageRank index and pathway participation of a protein allowed the most thorough classification of (non-)targets. (C) As a corollary, we utilized all five topological measures to predict viral targets using a random forest. We found that protein PageRank had the highest impact on the classification process, a result that was independent of the underlying virus. In (D), we randomly sampled sets of non-targeted proteins 1,000 times that were equal in size to the set of HIV targets and determined the area under the ROC curve (AUC) of the classification process with a random forest. In particular, we predicted if a protein was (not) targeted as a function of the three most (protein PageRank index, degree and pathway appearance) and least important topological features (betweenness centrality, pathway appearance, control). Notably, the distributions of AUC values thus obtained were statistically significant (Student’s t-test, P < 10−20), suggesting that most important features allowed a significantly better classification result.