Reply to Dunoff and Pollack: ‘Experimenting with International Law’

Written by and

In the last issue of the European Journal of International Law we published an experimental study on the ability of international law students and experts to ignore information in the context of treaty interpretation. The same issue included a follow-up article by Jeffrey Dunoff and Mark Pollack. We find Dunoff and Pollack’s practical exercise of critically reading experimental studies important and helpful in moving the broader methodological and theoretical concerns into a concrete discussion of actual studies. In the following sections, we will try to contribute to this effort by reflecting on their assessment of our study.

The Study

Before delving into Dunoff and Pollack’s discussion of our paper, we would like to briefly summarize our study, which one of us also summarized in this EJIL:Live! interview. Our study was designed to empirically test a notion that has been mentioned in the treaty interpretation literature, which suggests that it is practically impossible to ignore the content of preparatory work after exposure, even when a rule prohibits the use of such material. This notion is supported by studies on the difficulty of ignoring information in other legal contexts, such as exposure to inadmissible evidence. To test this notion’s validity, we conducted three experiments that examined the ability of international law students and experts to ignore information about preparatory work while interpreting treaties. Our findings indicate that experts are better able than students to ignore preparatory work when they believe that the Vienna Convention on the Law of Treaties (VCLT) rules on treaty interpretation do not allow the use of such information. This suggests that there is something unique about international law expertise (or legal expertise in general) that enables the experts to resist the effect of exposure to such information.

In addition, we used the study to gather information on contemporary approaches to the debate about the hierarchy of the VCLT interpretation rules. We found that the vast majority of experts support the “traditional approach” to the rules, which prohibits the use of supplementary means of interpretation – including preparatory work – to determine the meaning of the text if interpretation under the General Rule leads to a clear and reasonable result. International human rights law (IHRL) experts provided the main exception to this finding. Although, like other types of experts, the majority of IHRL experts supported the traditional approach, their support rates were significantly lower than those of other types of international law experts. Almost half of the IHRL experts supported the “corrective approach,” which always allows the use of supplementary means to determine the meaning of the text.  

Dunoff and Pollack raise some concerns regarding our findings and conclusions, to which we will reply in the following section. We begin by addressing the concerns regarding external validity and continue to discuss the internal validity of our study.

External Validity

External validity is the ability to generalize from the findings of a study to real-life settings. This is a serious concern in experimental research, and every experimental study should be read while taking into account its limitations. Dunoff and Pollack focus on three issues: (i) the difference between the scenarios used in the study and real cases; (ii) the potential difference between individual and group decision making; and (iii) the ability to generalize from studies that are being conducted on students’ decision making to experts’ decision making. All three issued were discussed and acknowledged in our paper and, in a way, the external validity section of Dunoff and Pollack’s paper rephrases parts of the general discussion and limitations section of our paper.

The first external validity concern involves the difference between the scenarios in the study and actual cases. While the former feature a clear divergence between the text and the preparatory work, the latter are usually more nuanced. The implications of these differences, according to Dunoff and Pollack’s account, are that in actual cases adjudicators might ignore preparatory material more easily. This might be a valid concern if our study suggested that preparatory work affects those experts who try to ignore it. Since we did not find any effect of the preparatory work on such experts, it is hard to think of the relevance of such a concern to our case. This point demonstrates the importance of taking into account the direction of the findings when discussing these concerns.

Second, Dunoff and Pollack refer to the difficulty of generalizing from individual to group decision making, taking into account that in many cases treaty interpretation involves a collective decision-making setting. They do not, however, explain how the potential differences between individual and group decision making can influence our results. We explicitly addressed this concern in our limitations section, by mentioning that we believe that group decision making is not likely to change one’s ability to ignore preparatory work since in other contexts it has a moderating effect on the difficulty of ignoring information. However, we raised the possibility that it might influence the positions in the hierarchy debate. In any case, our study is the first study on the subject, and naturally future studies should examine group decision making, and could contribute to our understanding of the ability to ignore information in the context of legal interpretation.   

Lastly, Dunoff and Pollack question the ability to generalize from students to international judges. We completely agree that this is an important concern in empirical legal studies, and see the difference between students and experts as one of the main contributions of our study. Therefore, it is not clear to us why Dunoff and Pollack classify this point as one of the “three external validity challenges that Shereshevsky and Noah’s studies share with other experimental studies of international law.” As they subsequently rightly acknowledge, our study actually involved both students and international law experts. As we state in our paper, and as Dunoff and Pollack acknowledge, we believe that our findings call for caution in using students as proxies for legal experts when it comes to tasks that are unique to legal expertise.

Nevertheless, we acknowledge in the paper that our study involves a similar external validity limitation that was not discussed by Dunoff and Pollack. It relates to the ability to generalize from the experts who participated in our study to international judges. Because the interpretation of international law is a task that is not unique to judges, our study did not focus on international judges but on international experts more generally. However, judicial interpretation is highly important in international law, and only a small minority of our participants were international judges or arbitrators. While we explain in the paper why we think that our experts might be good proxies for international judges, we believe that future studies that use international judges will be highly beneficial.  

Internal Validity 

Dunoff and Pollack also address the internal validity of our study. Specifically, they raise concerns regarding whether our research design was indeed sufficient to demonstrate the effect of exposure to preparatory work on the decisions of the participants.

The first issue that Dunoff and Pollack address is the decision to design the experiment with one group that was exposed to preparatory work and a control group that was not exposed to preparatory work. They suggest that since interpreters are usually exposed to such materials in practice, our question “possesses limited real-world purchase.” Instead, they suggest that the more relevant question is whether “those who think they are permitted to use these materials for interpretive purposes read the treaty differently than those who think that they are not permitted to use preparatory materials.” This concern is puzzling for several reasons. First, we actually used the suggested research design in Experiment 1B: in this experiment, all participants were exposed to the preparatory work, and the two groups differed in only one respect: whether they were allowed to use the preparatory work or not. Dunoff and Pollack refer to this experiment in their analysis as similar to the first experiment, without acknowledging that Experiment 1B was designed as they suggested. Second, and more importantly, the best way to assess the influence of exposure to preparatory work is to manipulate the exposure to preparatory work. A design in which both groups are exposed to preparatory work, as Dunoff and Pollack recommend, enables one to test the effect of the exposure to preparatory work only in cases where the two groups – those who think that the use of preparatory work is permitted and those who think that it is forbidden – differ from each other. In the case where there is no significant difference between the two groups, it is impossible to infer whether the exposure had no effect on participants’ decisions, or whether the effect was similar in the two groups. Moreover, even in cases where such an effect is found, it is impossible to infer whether both groups were influenced by the preparatory work (with a difference in the strength of its influence) or only one of the groups was. The design we used in our studies enables one to test both the influence of the exposure to preparatory work, and the moderating effect of whether the participants believed that its use is allowed or forbidden. Finally, the concern that our study has limited “real-world purchase,” which is a concern of external validity (rather than internal validity), does not seem warranted since their concern regarding the lack of “real-world purchase” is relevant only to the control group. The control group’s aim is only to make sure that we indeed isolate the effect of the exposure and not to demonstrate “real-world purchase.” The relevant group, the experimental group, was exposed to preparatory work similarly to individuals in real cases.   

Dunoff and Pollack’s second internal validity concern is based on what they describe as a “puzzling anomaly” in which, among participants who were not exposed to preparatory work, there was a difference between the decisions of those who reported that they were allowed to use the materials and those who reported that such use was not allowed. According to Dunoff and Pollack, this challenges the attribution of the difference in decisions of participants to exposure to preparatory work. Dunoff and Pollack raise two concerns: the first is the possibility that posing the questions regarding the interpretive position after the decision in the scenarios might have influenced participants’ answers as a result of attempts to justify their decisions post hoc. The second is that the difference between the groups is a result of their broader approaches to legal interpretation.

It important to first note that Dunoff and Pollack’s concern does not affect the most important findings of the study regarding the ability of international law experts who hold to the traditional approach to better resist the influence of exposure to preparatory work compared to students who hold a similar interpretive attitude. Before addressing the two challenges, we want to suggest a much simpler explanation of the “puzzling anomaly.” Under the VCLT rules, even proponents of the traditional approach allow the use of preparatory work to determine the meaning of the text if interpretation under the General Rule leaves the meaning ambiguous or leads to a manifestly absurd result. Since our goal was to examine the influence of preparatory work when its use is prohibited, our groups were not divided merely according to their general support of the traditional or corrective approach, but rather according to those who thought that in the specific scenario the use of preparatory work was allowed or prohibited. This means that in addition to those who belonged to the corrective approach, the first group included those who belonged to the traditional approach but believed the text was ambiguous. The other group included only those who held the traditional approach and in addition believed that the meaning was clear. Thus, it is reasonable to assume that the main explanation for the “puzzling anomaly” is not the general interpretive approach, but specifically the position regarding the ambiguity of the specific case. Indeed, the direction of participants’ decisions further supports this hypothesis: those who determined that the text was clear more often found that there was a violation of the relevant norm, in line with the direction that the plain meaning of the text seemed to suggest.

Of course, explaining the anomaly does not excuse us from addressing the two challenges. As to the first challenge regarding the danger of insincere answers, we want to start by demonstrating that this concern does not explain the “puzzling anomaly.” To explain why, let us further develop Dunoff and Pollack’s suggestion of the potential influence of participants’ decisions on their reported attitudes. The only group that potentially faced a contrast between the decision and their broader interpretive attitude consisted of the participants who were exposed to the preparatory work and in addition held to the traditional approach and thought that the text was clear, but realized ex post that their decisions had been influenced by the preparatory work. All other groups were either not exposed to the materials or believed that its use was allowed. However, the “puzzling anomaly” that Dunoff and Pollack address concerns only participants who were not exposed to preparatory work, and thus were not part of those who allegedly falsely reported their positions in the hierarchy debate.

As to the concern itself, we have strong reasons to believe that participants’ reports on their interpretive attitude were not influenced by their decisions in the scenarios. To justify their decisions, these participants should have reported that they believed that they were allowed to use the preparatory work, in contrast to their genuine position. However, there is an easier and less burdensome way for these participants to justify their decisions: to determine that the text is ambiguous. As was explained above, even the traditional approach allows the use of preparatory work when the meaning is ambiguous. This does not require them to report a different interpretive approach than their actual position in the hierarchy debate, but only an ambiguity in the specific case. Thus, if Dunoff and Pollack’s concern is valid, we would have expected to see that participants’ determinations regarding the ambiguity of the text were influenced by exposure to preparatory work. However, as we report in the paper, we examined whether the exposure to preparatory work affected the determinations of ambiguity (in order to examine a potential indirect and unintentional effect of the exposure), but did not find such an effect. In addition, each participant received one scenario with preparatory work and one scenario without it and, as a result, all participants were exposed to the preparatory work before addressing their position in the hierarchy debate. This further decreased the need to report a false general position rather than the actual position regarding the specific case. In addition, the study was completely anonymous; thus the participants’ incentives to falsely report their position seemed very low. To sum up, it does not seem plausible to us that the participants falsely reported their positions only with regard to their general position in the hierarchy debate (which is more demanding for them compared to determinations of ambiguity).

Unlike with the other concerns, we cannot rule out the possibility that there is something qualitatively different between those who hold the traditional approach and those who hold the corrective approach, and further research of this question would be welcome. However, we are skeptical of the significance of such potential differences to the results of our study. If the different decisions were due to the different interpretive attitudes of the participants, we would have expected to see no significant difference between the decision patterns of students and experts that hold the same interpretive approach (or alternatively, we would find significant differences between all groups of students and experts if, in addition to the differences that result from general attitude, students and experts hold qualitatively different interpretive approaches). Nonetheless, we found a significant difference only between the decisions of students and experts who determined that they were not allowed to use preparatory work, and no such difference was found between students and experts who determined that such use was allowed. This suggests that the ability to resist the influence of preparatory work is the main difference between the two groups, rather than qualitatively different interpretive approaches.

We want to conclude by thanking professors Dunoff and Pollack for taking the time to engage so closely with our paper. It is truly a blessing to know that our scholarship has an audience. We hope that this exchange contributes to Dunoff and Pollack’s overarching goal of enabling international lawyers to become “knowledgeable and critical consumers of experimental research.”

Print Friendly, PDF & Email

Leave a Comment

Comments for this post are closed


John R Morss says

April 6, 2018

Just a comment on the process for this important debate -- this forum seems to me indeed ideal for such follow up discussion/
development. Once in a while the responses and responses to responses in the printed pages of EJIL seem in that sense somewhat misplaced especially as EJIL pages are at such a premium for non-invited contributions (I still feel the pain of cutting 3000 words of glorious prose).