The Data-Driven Future of International Law

Written by , and

Data is not only fueling the economy, but has also become an increasingly important driver of empirical legal research. Three reasons are chiefly responsible for this. First, the internet, better search engines and bigger databases today put more international law data from treaties to disputes or arbitrators at a scholar’s disposal than ever before. Second, researchers are beginning to treat the primary material of law – legal texts – as data. By conceiving text as data and transforming it into numerical representation using natural language processing techniques, scholars can analyze more written material than they could ever read. Third, neighboring disciplines, including legal informatics, computer science or the digital humanities, provide international lawyers with new tools for digesting large amounts of legal data including through machine learning and artificial intelligence.

In a Special Issue for the Journal of International Economic Law we are beginning to explore this new data-driven frontier in empirical legal scholarship. We have been fortunate to assemble strong contributions that engage with major international economic law debates through a data-driven lens using state-of-the-art empirical techniques. In this blog post, we want set out the main issues that, we believe, are raised by this new frontier of empirical scholarship.

What is different in data-driven research?

Three aspects distinguish data-driven empirical scholarship from traditional approaches. First, data is not anymore only a means to empirically test a theory, but increasingly the starting point for empirical research. Big data science allows for the inductive analysis of large amounts of data to detect patterns and trends, which we did not know to exist or expected beforehand, by letting “data speak for itself”. Results generated through this “data-first” attitude can then be used to test established theories or build new ones.

Second, the availability of more information and more efficient means for its analysis allows researchers to investigate entire data populations rather than subsamples thereof. Even though datasets will not always be complete, as some investment awards will remain secret and some treaties unpublished, access to more comprehensive data will improve accuracy of results and reveal patterns only visible in the aggregate.

Third, in analyzing this new wealth of data, researchers are increasingly relying on computing rather than reading or counting. From traditional content analysis that employs an “army” of research assistants to read and hand-code documents, there is a shift towards using computers and artificial intelligence to make sense of texts. While machines are better than humans at spotting patterns across large amounts of texts (which is why we use them in plagiarism detection software for instance), they are worse than humans at resolving interpretive ambiguities. Human coders are thus not going to be completely replaced any time soon, but computers and artificial intelligence are beginning to play a larger role in legal document analysis.

The promises of data-driven research

Data-driven empirical research promises to uncover latent patterns in international law data, debunk past myths and forecast the future, while contributing to new theory-building as the contributions to the JIEL Special Issue show.

Behn et al., for instance, map the network of investment arbitration practitioners quantifying a phenomenon known as double-hatting where a lawyer acts both as arbitrator and counsel in concurrent, un-related proceedings, which is otherwise only discussed in the abstract. Similarly putting hard figures on abstract debates is Charlotin who compiled a dataset of 75’000 citations of international courts and tribunals for this Issue to put numbers on international law’s fragmentation by investigating the degree of cross-citations between international economic law tribunals and other international adjudicatory institutions.

Debunking a different type of myth is Allee et al. who challenge the view that preferential trade agreements (PTAs) and the World Trade Organizations are competing projects on trade regulation. Using textual similarity metrics they show that references to WTO agreements and incorporation of WTO language in PTAs have increased rather than decreased over time. Indeed, the countries most forcefully pursuing PTAs are also the ones that link their treaties most explicitly to the WTO. Daku and Pelc use similar techniques to compare party submissions in WTO disputes and Appellate Body reports to track parties’ influence over dispute settlement outcomes and find that WTO members with less legal capacity also have less of an impact on the precedents that shape WTO jurisprudence.

On the more conceptual side, Derlén and Lindholm show how citation analysis can be used to track the importance of precedent over time using the European Court of Justice’s decisions that shaped the European Single Market as case studies. Broude et al. empirically operationalize the idea of regulatory space in investment agreements and compare the Trans-Pacific Partnership to overlapping treaties on that ground. Finally, Morin et al. argue that the universe of trade agreements can be understood as a complex adaptive system and support that claim by empirically tracing innovation and adoption of environmental provisions in trade agreements.

The challenges and limitations of data-driven research

New data and new tools thus provide exciting new opportunities for empirical analysis that would have been impossible at this scale or depth using more traditional methods. At the same time, it also comes with challenges and limitations.

Challenges span across the entire life-cycle of data-driven research. It is often difficult to obtain machine-readable data at the outset and those who have it may not be willing to share it. Once data is obtained, most legal researchers lack the methodological training to fully exploit it. And even when research is ready for dissemination, outlets may be hesitant to publish work that is descriptive rather than normative. For data-driven research to thrive, empirical legal scholars thus need to collaborate more closely in building and disseminating joint datasets, work together with other disciplines, in particular computer science, to benefit from their complementary skillsets and prepare legal research outlets for more data-intensive scholarship including by broadening the pool of reviewers and putting in place mandatory data publication conventions.

Even when these challenges are overcome, some limitations remain. Data-driven research is particularly prone to be mistaken for theory-less research where data not only speaks but also thinks for itself. That is why researchers engaged in data-driven work need to be careful to separate pattern from noise and to complement elaborate quantitative tools with equally elaborate qualitative evidence backed up by sound theory.

Furthermore, data-driven research such as text-as-data analysis or network analysis is also exceptionally exposed to generating wrong conclusions from skewed data. Think of a research project that only looks at English-language treaties, because of their greater availability. How generalizable are its results? Or consider a network of cross-appointments of arbitrators, where links are missing because cases remain secret. Data-driven research is thus particularly sensitive to the quality of underlying data.

The time is ripe

 In spite of these challenges and limitations, we still believe that the time is ripe for a greater role of data-driven research in empirical legal scholarship. Important normative questions from international law’s fragmentation to the double hatting of arbitration practitioners and legal innovation in trade agreements can finally be tackled through empirical research. And data-driven research not only provides new opportunities for legal scholars, but also for practitioners who can benefit from big data research when it is disseminated through dedicated websites and applications.

We thus hope that this Special Issue will help introduce this emerging field and its growing cohort of computer-savvy scholars to a wider range of legal researchers and practitioners.

Print Friendly, PDF & Email


Leave a Comment

Comments for this post are closed


Rossana Deplano says

July 25, 2017

Wonderful post, thanks!

Dan Joyner says

July 25, 2017

I remember reading the quantitative/qualitative methodology debates from twenty years ago in political science research methodology when I was doing my masters. We're just now starting to have the same debates in international legal scholarship. As in any other academic discipline, empirical, data driven analysis is a tool that can be used very effectively on some questions and not effectively on other questions. It makes sense to me that international trade and investment law is an area with a lot of data based on lots of similar transactions, that is therefore a fruitful area for the application of empirical methodology. Other areas with lower "n" values, like use of force law or international humanitarian law, will likely be less effectively studied with empirical methods. Understood within its context of usefulness and the limitations thereof, as the authors make a point to explain, the broader use of empirical methodology in international legal scholarship should be welcomed.

Kriangsak Kittichaisaree says

July 26, 2017

In my work, the quantitative/qualitative methodology is relevant to the identification of a rule of customary international law, i.e. identifying State practice and corresponding opinion juris. NGOs like Amnesty International, Human Rights Watch etc. have compiled loads of data but one need to put them in their proper context as part of an empirical analysis. Hence, the practical usefulness of the data-driven research.

A. Fisher says

July 26, 2017

I appreciate the efforts made by this Special Issue of JIEL to bring together works and resources that apply quantitative methods in studying international law. However, I do not share the authors’ eager embrace of the data-driven approach.

First, there is a general caveat against the methodology-driven approach in social science research. Good research starts with identification of important questions, not expedient datasets or new tools. The authors correctly point out that big data helps uncover patterns that were not apparent previously. But novel patterns do not necessarily lead to meaningful questions. To suggest that a data-driven approach should be widely employed carries the risk of missing the forest for the tree. It is important to keep in mind that research question comes before the method, not the other way around.

Second, the statistical analysis is accomplished in the vacuum of case law. For example, the article on regulatory space assigns values to the different elements in an investment agreement without referring to their practical relevance in adjudications. In investment law jurisprudence, the Fair-and-Equitable Treatment (FET) tends to take the central stage, and the MFN is materially less relevant. The article simply assigns similar values to the two to build the dependent variable. While the big data approach makes comparison of the treaty texts much easier, it remains a formidable challenge to quantify the highly inconsistent case law. The interpretive techniques, arguably the most powerful weapon of international lawyers, would find it even more difficult to fit in a neat regression equation. Doing empirical research on international economic law is more than replacing the seemingly overly doctrinal case law with misleading numbers in rudimentary models.

Wolfgang Alschner says

July 26, 2017

Thank you all for your thoughtful reactions to our post. As many of you rightly point out, theory remains crucial even as we gather more data on international law. Similarly, in-depth qualitative research can get at issues that escape the bird’s eye view of quantitative methods.

I just wanted to briefly react to the intriguing first point raised by A Fisher’s comment that research questions come before the method. I agree in principle that the type of research question determines what tools are needed to resolve it. But where do our research questions come from in the first place? One important source is theory. Theory yields hypotheses that we can test empirically. But another source is data. By exploring large amounts of data, we can discover interesting patterns we did not expect or variation that existing theories cannot explain, which then motivate our research questions. I fully agree that “novel patterns do not necessarily lead to meaningful questions“ – but sometimes they do. And it is often only by looking at how the real world behaves that we discover the questions worth asking.

So as international law data becomes more and more abundant, I believe that data-driven exploratory work becomes more important as well, since new data and new methods can give rise to new and interesting research questions.