Average precision, recall and F1 scores for all the methods and all the datasets (T = 2/3).

Please note that DS results are averaged over only 3 datasets and thus cannot be taken into account for a fair comparison.