|
Abstract: We conducted simulation experiments to study SVM weight-based ranking and variable selection methods using two network structures that are often encountered in biological systems and are likely to occur in many other settings as well. We attempted to recover both causally and non-causally relevant variables using SVM weight-based methods under a variety of experimental settings (data-generating network, noise level, sample size, and SVM penalty parameter). Our experiments show that SVMs produce excellent classifiers that often assign higher weights to irrelevant variables than to the relevant ones. Likewise, the application of the recursive variable selection technique SVM-RFE, does not remedy this problem. More importantly, we found that when it comes to identifying causally relevant variables, SVM weight-based methods can fail by assigning higher weights or selecting (in the context of SVM-RFE) variables that are relevant but non-causally so. Furthermore, even irrelevant variables can have higher weights or can be selected more frequently than the causally relevant ones. We show that this problem is not linked to the specific variable selection techniques studied but rather that the maximum margin inductive bias, as typically employed by SVM-based methods, is locally causally inconsistent. New SVM methods may be needed to address this issue and this is an exciting and challenging area of research. |