Research Grants

 

Analysis of the syntactic complexity of texts of the corpus CzeSL-SGT
SGS08/FF/2023, University of Ostrava (2023)
Miroslav Kubát (team leader)
Radek Čech, Michaela Hanušková, Michaela Nogolová, (team members)
 
The project will focus on the analysis of the development of the syntactic complexity of the texts of non-native Czech speakers across different language levels (A1-C1 according to CEFR). Syntactic complexity will be measured by two methods: a) mean dependency distance (MDD) and b) mean length of linear dependency segments (LDS). The research will be based on the student corpus CzeSL-SGT, which contains more than 8 000 texts written by non-native speakers of Czech. This project is closely related to the dissertation of M. Hanušková focused on the quantitative analysis of the texts of non-native speakers of Czech and also to the dissertation of M. Nogolová dealing with the quantitative analysis of syntactic units. 
 

 

Quantitative Syntactic Stylistics of Contemporary Written Czech
GA22-20632S, Czech Science Foundation GAČR (2022-2024)
Miroslav Kubát (team leader)
Xinying Chen, Radek Čech (team members)
 
The project focuses on syntactic features of different styles in contemporary written Czech. The research is based on the quantitative analysis of the corpus SYN2020 which is a syntactically annotated, representative corpus of contemporary written Czech. The corpus consists of 100 million tokens. Various syntactic features such as mean sentence length, sentence types, word order, modality, distribution of syntactic functions, indicators of attributivity and subjectivity are analyzed. Our aim is to enrich the previous research on the style of Czech texts with new perspectives. First, we focus on the syntactic part of Czech stylistics which usually stands out of the main interest of scholars. Second, our analysis is based on quantitative methods.

 

 

Quantitative analysis of texts of CzeSL-SGT corpus
SGS06/FF/2022, University of Ostrava (2022)
Miroslav Kubát (team leader)
Radek Čech, Michaela Hanušková, Michaela Nogolová, Markéta Guńková (team members)

The project will focus on the quantitative analysis of the texts of the CzeSL-SGT corpus in order to obtain data on texts of individual language levels, to model the development of these texts and to analyze the process of learning Czech as a foreign language. This
corpus contains over 8000 texts written by learners of Czech as a foreign language at all language levels. We will analyze the texts using the QuitaUP and UDPipe software, which allow us to compute various properties of the texts. In particular, we will be
interested in the average length of tokens, the descriptivity of the text, the verb distances, the length of sentences, lexical richness, the number of clauses in a sentence, syntactic characteristics of dependency trees. This project is the first phase of research towards
a M. Hanušková's PhD thesis focused on the analysis of texts written by non-native speakers of Czech. The applied methods are also involved in MA theses of M. Nogolová and M. Guńková. The results of the research will be presented at linguistic conferences and in a scientific articles.

 

 

Semantic analysis based on Neural Networks approaches
SGS07/UVAFM/2019, University of Ostrava (2019)
Miroslav Kubát (team leader)
Jan Hůla, David Mojžíšek, Kateřina Pelegrinová (team members)
 
The project is thematically related to the series of previous SGS projects which were focused on neural based linguistic analysis. The goal of this SGS project is to make a previously developed analysis accessible for wider audience as an interactive web application. Concretely, the plan is to develop a web application which would allow to measure semantic properties of words chosen by a user and which would visualize the results of the analysis in comprehensive graphs.
The second part of the project will be deal with a construction of semantic graph based on the similarity of the meanings of words which will be acquired with the method called Word2Vec. This graph will be later analyzed by the methods used to study complex networks.
In comparison with our previous SGS projects, the analysis will be done on newer synchronic corpus SYN_V6 (Czech national corpus).
The proposed project corresponds both to the research of the Institute for Research and Applications of Fuzzy Modeling of the University of Ostrava in the field of neural networks, as well as to the main branch of quantitative textual linguistics at the Department of Czech Language, Faculty of Arts of the University of Ostrava.
 

 

SGS01/UVAFM/2018, University of Ostrava (2018)
Miroslav Kubát (team leader)
Radek Čech, Jan Hůla, David Číž, Kateřina Pelegrinová (team members)
 
The project follows up the previous SGS project Application of Neural Networks in Diachronic and Synchronic Semantic Analysis of Texts. The first analysis showed that there is a convincing potential of this approach. The main goal is to extend the functionality of the developed software and to discover the possible applications of the proposed method in linguistic research. Specifically, with our method we can measure the Context specificity of lemma (CSL). This method is based on the Word2vec technique and measures the degree of the context specificity of a lemma.
 
 
 
SGS02/UVAFM/2017 University of Ostrava (2017)
Radek Čech (team leader)
Miroslav Kubát, Jan Hůla, Vojtěch Molek (team members)
 
The aim of the project is to apply the contemporary methods based on neural networks in textology. Semantic changes in a Czech corpus are analyzed from synchronic and diachronic viewpoints. More specifically, (a) we examine the development of the political and social discourse from 1990 to 2014, and (b) we investigate the effectiveness of this method for genre classification. The project reflects the research topics of the Department of Czech Language (quantitative linguistics) and the Institute for Research and Applications of Fuzzy Modeling of the University of Ostrava (neural networks).
 
 
 
Implementation of new methods for the teaching of quantitative linguistic subjects at the Department of Czech Language at the Faculty of Arts of the of the University of Ostrava, and the improvement of the pedagogical and professional competence of the staff of the Department, based on expert consultation at the Department of Philosophy, Sociology, Education and Applied Psychology, University of Padua.
IRP201707 Universtity of Ostrava (2017)
Miroslav Kubát (team leader)
Radek Čech (team member)
 
 
 
QUITA (Quantitative Index Text Analyzer) – Software Measuring Vocabulary Richness and Other Quantitative Features of Texts 
IGA FF_2013_031, Palacký Univesity Olomouc (2013)
Radek Čech (team leader)
Vladimír Matlach, Miroslav Kubát (team members)
 
Quantitative Index Text Analyzer (QUITA) covers the most common indicators, especially those connected with frequency structure of a text. In addition to computing results of the indicators, QUITA also provides  statistical testing and graphical visualization of obtained data. QUITA is a versatile tool with many uses designed for researchers from various disciplines (linguistics, literary criticism, history, sociology, psychology, politics, biology, etc.). The programme enables basic text processing functions – such as creating word lists, text lemmatizing, or creating n-grams. The program also provides more advanced tools, such as a random text creator or a binary file translator. However, the main part of the software is an indicator computing. Although the authors focused mainly on the indicators connected to frequency structure of a text (e.g., h-point, entropy, repeat rate, adjusted modulus, Gini’s coefficient, lambda), there are also several other characteristics, such as thematic concentration, activity & descriptivity, or writer’s view. More information about the software is to be found in the book QUITA – Quantitative Index Text Analyzer and in the diploma thesis Kvantitativně lingvistický software.