The debate over antidepressants, especially SSRIs, has lasted for more than a decade, the controversy revolving mostly around their efficacy over placebo as a treatment of depression. Sure enough, one gets conflicting results depending on which primary outcome measure is chosen as that operationalizing 'treatment effect' and where to delimit its 'clinically significant' size. Moreover, including only published studies in an evidence base assumed wholly unbiased artificially inflates the efficacy claims being made.
The stakes are high: depression is estimated as one of the costliest of all illnesses, accounting for as much as 12% of the global burden of disease. However, so far, the debate has been nitpicking in the sense of not seeing the forest for the trees; and the poor forestry may just end up killing more than the weight of the already hefty one eighth. What has not been pointed out prior, then, is that there is something more fundamentally wrong here; something only indirectly reflected in the multiple points of controversy, responsible for this and other debates akin to it resting in a state of stalemate and confusion. I argue that this something is the wrong statistical paradigm we've embraced in clinical guideline development. I make my case by identifying two classes of controversy in the antidepressant debate: those related with patient preference and those with model choice. I hint at how these issues could be resolved, if a better framework was adopted, yet cannot even hope to be resolved within the present paradigm.
In terms of the first class of issues, I argue that we haven't been able to agree upon an appropriate operationalization of 'treatment effect', nor its threshold of 'clinically significant', nor 'severity of depression', because we have discounted decision theory and the incorporation of patient preference in treatment choice to go along with it. In terms of establishing the conclusions to a given operationalization of any of the above by the means of valid argument based on premises backed up by trustworthy evidence, there is next to nothing to cling on to in the present guidelines of depression. For valid argument, we require the language of decision theory; for a purposeful evidence base we require utilities and outcome measures so chosen as to be able to establish rational patient centered choice. We need predictions of all important treatment outcomes for a patient exchangeable with some subset of those in the evidence base, not parameter inferences of summary statistics of treatment effect, the size of which is interpreted from a vantage point dependent on arbitrary choices over the primary outcome measure and a cut-off value for clinical significance.
In terms of the second class of issues, we haven't taken proper account of uncertainty in deriving the interval estimates for our already ill-founded meta-analytic summary effects either. What full probability model to entertain for one's inferences – how to account for publication bias, between trial heterogeneity, inconsistency, poor quality trials and the like – is a matter of subjective model choice. The seemingly 'objective' protocol having been applied across the board is based on classical fixed and random effects meta-analyses with a tacit assumption of zero bias. Applying such protocol means placing all our eggs in a broken basket. I show that at present the meta-analytic models endeared rest on false assumptions, yielding biased and overconfident interval estimates. Since the evidence statements depend on whether an arbitrarily chosen threshold value for reaching 'clinical significance' is included in the fallacious interval estimate inferred, the evidence statements, that in turn ground the recommendations of the guidelines, are fallacious, too. I show the 'subjectivity' of the allegedly 'objective' evidence statements by conditioning the inferences over meta-analytic summary statistics of treatment effect on a set of plausible candidate models none of which, importantly enough, can be considered 'objective' in the sense of dominating as model choices universally preferred over all the others. I apply the framework to an evidence base used in the leading guideline of depression treatment. I show how the flexibility of the Bayesian framework enables more credible meta-analytic models, and better evidence statements with more intuitive interpretations. The Bayesian paradigm also allows for merging the output from multiple candidate models in a model-averaged posterior predictive better calibrated for rational choice than any classical parameter estimator could ever hope to be.
The antidepressant debate is therefore not only a singular case with problems particular to it, but rather one sad example of classical statistics, and the evidence based medicine movement it has hijacked, being unable to deal with decision nor uncertainty. What to measure, how, and where to draw threshold values cannot be answered without accounting for patient preference. Since parameter and model uncertainty are always present, this uncertainty needs to be accounted for one way or the other, the most plausible means so far offered being probability calculus, the coherent application of which requires assigning distributions to unknowns classical statistics treats as fixed. To be able to convincingly deal with both decision and uncertainty is a prerequisite for any viable framework steering the development of clinical guidelines that ultimately cannot help but concern decision under uncertainty, be the application to the treatment of depression, or any other ailment found in DSMIV, or ICD10, for that matter. The wrong paradigm is bankrupting us, placing a serious threat on the credibility of all applied medicine. Something needs to be done.