ISDS reform and air guitar: A response to Grant and Kieff

In issue 2020(2) of EJIL, we published an empirical study concerning the public perception of investment arbitration. Our article presents the results of four experiments that we conducted to assess which factors mostly affect the public acceptance of investor-State dispute resolution outcomes. In our study we tested a number of possible factors that influence public perception – i.e. the institutional design of ISDS, the rights investment arbitration protects, and the inclusion or exclusion of domestic investors from the benefits of ISDS.

Thomas D. Grant and F. Scott Kieff have published a fascinating and thought-provoking response to our study. We are grateful to Grant and Kieff for the attention they have dedicated to our article, and for the points they raise, which allow us to rekindle the EJIL debate on the “experimental turn” in international law scholarship. In this post, we will pick up some of Grant and Kieff’s suggestions, in the hope of encouraging them (and any other interested EJIL reader) to experiment with international law.

The hazards of crate digging

Have you ever browsed through a crate of old vinyl records at a flea market? Sometimes, you give in to the urge to buy an old album from some unknown 1980s Italian synthpop duo, just because you like the cover art. You run home and put the record on. Most of the time, it turns out to be a disappointment: the music is almost unlistenable, and the band was unknown for good reasons. You would have been better off with a less obscure choice (a classic Beatles album, perhaps?). Occasionally, however, you hit the jackpot: the obscure album turns out to be a hidden gem, and it quickly becomes one of your personal favourites.

A similar dynamic might be at play when empirical investigators formulate hypotheses. A wide range of independent variables (in our study, features of investment arbitration) may affect the dependent variable (public acceptance of ISDS). How to choose which variables to test? “The more features addressed in an experiment, the more reliable the experimental results”, Grant and Kieff suggest. We agree: the devil is in the detail. Yet, granularity does come at a cost. As we explain in our article, “(w)hile it is rather obvious that the more questions that are asked, the more accurate the data will be, there is a crucial counterpoint in terms of respondent retention: if the respondents’ interest lapses, there may be no data at all”. Even the most eager music collector cannot buy the entire record store, and choices need to be made. But how?

In a debate such as the one concerning investor-State arbitration, the tension between untenured and tenured adjudicators cannot and certainly should not be overlooked, especially considering the prominence it has gained globally. In no way, however, did we posit that the tenured vs. untenured distinction is “the factor that most strongly affects preference”, as Grant and Kieff put it. Our argument is much humbler: some institutional design features (such as the tenured vs. untenured character of the adjudicators) are likely to influence public perception. That does not exclude the relevance of other factors, which may also affect the public acceptance of ISDS outcomes. In fact, one of our experiments (which has unfortunately not caught Grant and Kieff’s eyes) focuses on one such factor: the problem of access to justice and the consequent exclusion of domestic investors from ISDS.

In short, some of the hypotheses we tested (e.g. the tenured vs. untenured tension) are at the forefront of the current ISDS debate, just like the Abbey Road album may beckon to the collector from the window of a record store. Other hypotheses, such as the exclusion of domestic investors, are often forgotten in a dusty crate. We have tried to strike a balance, testing both some well-known and some more obscure hypotheses. Some, however, is the key word here. Our article expressly acknowledges that there are further hypotheses to test and further reform paths to explore, and invites further empirical investigation: “the insights resulting from the experiments should not be taken as definitive evidence but, rather, as the first step towards a systematic empirical investigation of the public perception of investment arbitration. As always with empirical studies, replication is key to the attainment of robust results”. Far from claiming that our study provides the final answer to all ISDS reform questions, we echo the remarks of Dunoff and Pollack on the importance of replication studies. In short: keep calm, and carry on testing—we all seem to agree on this point. What is more, Grant and Kieff put forth some concrete proposals for future studies, which we found particularly stimulating. Let us have a look at them, and reflect on how our Respondents may go about testing their hypotheses.

Proxy music

Grant and Kieff posit that, if the general public responds more favourably to court judgments than to arbitral awards, that is not because of any institutional design difference between the two dispute resolution settings. On the contrary, according to our Respondents, the difference should be explained entirely in light of the personal attributes that the general public assumes “people appointed to courts have”. Just like a Beatlesian fundamentalist may catch a glimpse of the iconic Abbey Road cover photo, and immediately hear the intro to Come Together play in their head, it is sufficient for the general public to read the word “court”, to immediately assume that the adjudicators of that court possess certain “attributes”—so goes Grant and Kieff’s theory. The word “court”, according to them, functions as a proxy for the adjudicators’ personal features, triggering a subliminal, pleasing music in the heads of the respondents. Arbitration, instead, produces no such “proxy music”.

This is a fascinating hypothesis. It is, also, a distinctively empirical research question. Therefore, we suggest that Grant and Kieff test it through an experiment, before drawing any inference from it. The hypothesis, in this case, is that the general public reaction to a certain dispute resolution outcome does not change, irrespective of whether the decision has been issued by a tenured court or an untenured tribunal, and of the presence or absence of party-appointed adjudicators, as long as the individual adjudicators possess (and are perceived to possess) certain “attributes”. What is not altogether clear from Grant and Kieff is: what are these attributes, which untenured arbitrators and tenured judges alike possess, but which the general public only associates with the latter? Testing the hypothesis empirically would provide Grant and Kieff with an opportunity to elaborate on what the “attributes” are exactly, and how to replicate them across institutional settings. That, in turn, could meaningfully inspire ISDS reform proposals.

We would like to add two further clarifications on this point, which may help Grant and Kieff in their own experimental design. First, our experiments do not speak of “courts” in generic, abstract terms. As detailed in our article, we present our respondents with concrete case studies. Some of them mention domestic courts (the  German Federal Constitutional Court). Others refer to international courts (the European Court of Human Rights). Yet others mention fictional adjudicative bodies (the “International Economic Court”). Does the “proxy music” remain the same, irrespective of these differences, just because of the word “court”? This takes us to the second clarification: although our article was written in English, not all of our experiments were conducted in the same language. Our questionnaire was available in five languages and, in some of those languages, “court” and “tribunal” translate into the same word. Is it possible that the same word will produce different proxy effects, depending on contextual reputational factors (but not on institutional design features which, Grant and Kieff hypothesize, should be irrelevant)? For now, this very interesting hypothesis remains untested.

Another source of inspiration for Grant and Kieff, when designing their experiment, may come from the world trade system. The recent developments that led to the Appellate Body paralysis indicate that even States (through their representatives) may dislike and stop using a dispute settlement system that is no longer serving their needs, regardless of the “attributes” arbitrators (in this case Appellate Body members) possess. Grant and Kieff might also consider factoring in the involvement of ICJ judges in investment arbitrations, at least until the decision in 2018 to put an end to this practice. Would the same individual be perceived as possessing different attributes when sitting on the ICJ bench, as compared to when delivering an investment treaty award?

The issue of language allows us to address an important point, which has often been raised during the conferences and seminars at which we have had the opportunity to present our study. As we acknowledged, a limitation of our project is its very broad geographical scope: given the wide range of regional contexts in which our participants were located, as well as the uneven level of respondent participation across the globe, our results remain somewhat EU-focused, and not granular enough to “zoom in” on specific national realities. The proxy effects that Grant and Kieff hypothesize may well exist in some regions of the world, but not in others, depending on a number of factors (e.g. the personal “attributes” of the local judiciary that a certain group of respondents is most familiar with). Further empirical studies on the topic, thus, should not only aim at replicating our experiments and extending the focus to additional hypotheses, but also zero in on narrower communities, constituencies, and regions.

Show me water!

When it comes to experimental methods and international law, Dunoff and Pollack offer a precious warning against the risk of conflating internal validity (the ability to isolate the causal impact of an independent variable on a dependent variable, within the confines of a study) and external validity (the ability to generalize experimental results to more complex real-life situations, outside of the confines of the experiment). Internal validity is of little practical use if the causal relationship demonstrated in an experiment has no real-life equivalent. On this point, Grant and Kieff seem to disagree. In fact, they suggest that the survey respondents should have been invited to treat the case studies as stipulative, i.e. excluding the participants’ “extrinsic understandings of the subject matter”. At the same time, our Respondents acknowledge that this is practically impossible, even in the solemn context of a jury trial (let alone a large-scale experimental study). As a result, Grant and Kieff argue that “extrinsic information or bias might misshape” outcomes. In their view, this is problematic for arbitral tribunals, even though we never stipulate that court members/judges possess particular attributes other than being tenured and affiliated to a (sometimes fictional) court.

In the context of investment policy preferences, we find the label of “bias” inapposite. Those factors that Grant and Kieff call “extrinsic information” and “bias” are, in the realm of politics (unlike in a jury trial), a constituency’s way of political worldmaking. This rich set of preferences and opinions plays a key role in real-life political discourse and moves the ballot box dial. When it comes to ISDS, “extrinsic information” will contribute to shaping the general public’s reaction, when an unfavorable award has real-life effects on a given community, and when the taxpayers of an unsuccessful respondent State will need to foot the bill. Then, if we are to ensure that internal and external validity do not remain divorced from each other, we should take those preferences seriously.

In his inspiring book International Law Theories, Andrea Bianchi echoes Stanley Fish in reminding us that “we are never not in a situation”. At the beginning of his book, Bianchi evokes an old parable about two fish swimming in a pond. One fish tells the other that, apparently, they have been spending their whole life immersed in water, without ever realizing it. “Water?” the other fish asks. “What’s that? Show me water!”

We think that, if Grant and Kieff tested their hypothesis about the “proxy effects” of courts and the “attributes” of individual decision-makers, their results would help us understand the “water” in which public perceptions of ISDS develop. To be sure, the results of any such empirical study cannot be uncritically translated into policy recommendations. Our study does not aim to provide an answer to the thorny question of what relevance public opinion should have, in the current process of ISDS reform. The answers to this question will fall somewhere on a wide spectrum. On the one far end, we find the position whereby public opinion should have no weight whatsoever (perhaps the water is just too murky for the fish to know where they are going?). On the other far end, we find the opposite standpoint: whatever the general public prefers, that should be translated into international investment policy, never mind technical expertise (never mind how clear, sweet and fresh is the water in which ISDS specialists have been immersed for their whole professional existence). We find both of these extreme standpoints difficult to defend. Rather than purporting to offer a definitive solution to this problem, our study has simply aimed at highlighting that, since public opinion is not irrelevant to the legitimacy of international adjudication, the perceptions of civil society should be taken seriously. Given that the ongoing reform of ISDS is exploring certain directions of travel (like the tenured vs. untenured dichotomy), it was of paramount importance to take them into account when designing our experiments. Our experiments are only a first, modest step in shedding light on how the public is likely to respond to reform inputs. We hope that Grant and Kieff, together with other EJIL readers, will take the next step.

Air guitar and air experiments

In a seminal article, Pierre Schlag compares legal scholarship to “air guitar”, i.e. the practice of “imitating rock starts by pretending to play a non-existent guitar”. According to Schlag’s critical account, law review articles mimic the style of legal briefs and judicial opinions, except that…there is no case. There is no client. It is all “air law”. Now that we start to investigate international law through experimental methods, we should avoid making the same mistake. Untested hypotheses remain “air experiments”, and air guitar, Schlag warns us, is “a practice of dubious value”. Sure—no air guitarist has ever struck a false note. Air guitars, though, produce no real sound.

