(Hier finden Sie die englische Fassung; eine deutsche Fassung finden Sie hier.)

Since 2014 we – that is the core team of the Interdisciplinary Research Group Computer Assisted Legal Linguistics (CAL2) under the direction of the linguist Friedemann Vogel and the lawyer Hanjo Hamann – have developed the first reference corpus of legal language (“Juristisches Referenzkorpus”, JuReko). Supported by the Academy of Science (Germany) for three years, the first aim of the project is to create a huge collection of all relevant text types of German Law, which covers the following three main domains:

  • all statutes of national law (legislation, recorded at one time);
  • decisions and opinions of all federal courts and of a selection of courts at different instances (case law);
  • commentaries, legal papers and articles of academic legal discourse, published in the most important and high ranked law journals.

In that context we pay attention to all legal areas, for example, finance law as well as labour law or constitutional law. Therefore, JuReko is a specialised corpus that does not take all texts of a given language, but only certain text types that serve the analysis in the area of legal linguistics.

With the help of xsl transformations all texts will be converted to TEI P5 conformant xml. This de facto standard is used because it comes with „comprehensive Guidelines […] and a large helpful community“ (Stührenberg 2012: 10). We separate meta data (like title, details of references, court instance etc.) stored in a relational mysql database, citation information in footnotes or references (especially of academic texts) and annotate the main texts with part-of-speech information. Currently we collected 43.000 texts of academic legal discourse (~150 M token), about 370.000 texts of case law (~800 M token) and about 6.300 statutes (~2.3 M token).

JuReko will be the first representative data basis for global quantitative analysis and computer assisted legal linguistics at several levels of linguistics as well as social structure of law. Furthermore, in September 2015 we also began to expand the core corpus of German Law with texts of British Case Law. In this additional project and pilot study we will compare speech patterns of German and British labour law to explore commons and differences of European legal language. In this perspective JuReko should be the starting point to develop a European Law Corpus as common ground for comparative studies in legal linguistics.

To discuss related questions we held a international conference about “the fabric of law and language”, 2016 located in Heidelberg (Academy of Science, Germany).