Select an English corpus (333.6 million words in all):
Europarl (ca. 25.7 mill. words, no password) Wikipedia A (ca. 35.3 mill. words, no password) Wikipedia B (ca. 40.7 mill. words, no password) Wikipedia C (ca. 39.1 mill. words, no password) BNC-written (50.8 mill. words, 57.6 mill. tokens, password) BNC-spoken (20.2 mill. words, 23 mill. tokens, password) Chat corpus (ca. 23.5 mill. words, no password) KEMPE (8.9 mill. words, 10.7 mill. tokens, no password) E-mail corpus (2.7.mill words, 3.3 mill. tokens, password, SDU only) E-mail openings (110.000 words, 127.000 tokens, password, SDU only) Enron e-mails A (ca. 27.5 mill. words, 32.1 mill. tokens) Enron e-mails B (ca. 27.5 mill. words, 32.1 mill. tokens) Enron e-mails C (ca. 27.5 mill. words, 32.1 mill. tokens)
Case insensitive Diacritics insensitive