The Czech National Corpus: research infrastructure for empirical language-oriented inquiry

The Czech National Corpus: research infrastructure for empirical language-oriented inquiry


The Czech National Corpus (CNC) project, established in 1994, strives to continually map the Czech language in all available dimensions (from the time, regional and genre perspective). The CNC builds and makes available large electronic text collections (language corpora) serving as a basis for research on current Czech (both written and spoken) as well as historical Czech and other languages. It also develops the methodology of empirical linguistic research and tools for language corpora exploration.


Since 2012 the CNC has been recognized as a research infrastructure for empirical language-oriented inquiry in many fields of social sciences and humanities (esp. linguistics, psychology, sociology, history, NLP etc.). Thanks to its large and high-quality language resources the CNC is a sought-after partner in many international research projects. Besides these activities, CNC also focuses on consulting, providing analyses for research or popularizing purposes, providing data packages for research on Czech as well as other languages for contrastive research, and automatic text processing.

Key collaborators

Selected outputs

  • Čermáková, A., Chlumská, L., Malá, M. (eds): Jazykové paralely. NLN. Praha 2016.

  • Petkevič, V.: Morfologická homonymie v současné češtině. NLN. Praha 2014.

  • Čermák, F. – Křen, M. (eds): A Frequency Dictionary of Czech: Core Vocabulary for Learners. Routledge, London 2011.


Last change: September 8, 2017 12:04 
Share on:  
Contact Us
Contact

Charles University

Ovocný trh 5

Prague 1

116 36

Czech Republic


Centre for Information, Counselling and Social Services

E-mail:

Phone: +420 224 491 850


Public Relations Officer

E-mail:   

Phone: +420 224 491 248


Data Box ID: piyj9b4

ID No.: 00216208

VAT No.: CZ00216208


How to Reach Us