10.1371/journal.pone.0146801.g001 Mariano Rodriguez Mariano Rodriguez M. Dolores Salmeron M. Dolores Salmeron Alejandro Martin-Malo Alejandro Martin-Malo Carlo Barbieri Carlo Barbieri Flavio Mari Flavio Mari Rafael I. Molina Rafael I. Molina Pedro Costa Pedro Costa Pedro Aljama Pedro Aljama Brief Description Random Forest. Public Library of Science 2016 rf pth kdigo New Data Analysis System data analysis system mineral bone disease phosphate 1758 adult HD patients correlation coefficient variable mineral metabolism parameters 2016-01-28 12:41:28 Figure https://plos.figshare.com/articles/figure/_Brief_Description_Random_Forest_/1642030 <p>(<b>A)</b> Dataset. (<b>B)</b> Outcome variable; (<b>C and D)</b> generation of decision trees. (<b>E)</b> Decision Trees. (A) The dataset “X” includes all variables (J-1, J-2….J-n) such as serum levels of P, Ca, PTH, alkaline phosphatase, etc. The values of these variables are obtained from all patients included in the study: Pt1, Pt2, Pt3 up to Ptn. A first step is to construct “intelligent” decision trees based on the available data. (B) Let´s construct a decision tree that predicts an output variable Y that in this case will be age. Open and closed circles represent patients with age < 65 and >65 respectively). Decision trees will be constructed based on the values of many variables from many patients. (C) A set of two variables, J1 (P) and J3 (PTH) are chosen at random from a subset of data that is also chosen at random out of the entire dataset. The values of P are plotted against the corresponding PTH; open and closed circles represent patients with age < 65 and >65 respectively. The best PTH value that discriminate reds from blues is 700. (D) The first discrimination of the decision tree is that with a PTH greater than 700 all (100%) of patients are less than 65 and 75% of patients with less that 700 are more than 65. The next step will be to separate each one of the two groups according to any other variable chosen at random that could be serum calcium or any other. (E) This process will be done using, large number patients and will be repeated many times in different subsets of data at random. A large number of decision trees are constructed; as many as needed to obtain reliable predictions. Further information may be obtained in Antonio Criminisi, Jamie Shotton, Duncan Robertson, and Ender Konukoglu (Anatomy Regression Forests for Efficient Detection and Localization in CT Studies B. Menze et al. (Eds.): MCV MICCAI 2010 Workshop, LNCS 6533, pp. 106–117, 2011.c_Springer-Verlag Berlin Heidelberg 2011).</p>