Research Grants :: Miroslav Kubát

Homepage > Research Grants

Research Grants

Biography of AI-attributed disinformation: A risky phenomenon through the prism of modern human sciences
(Biografie dezinformace s přívlastkem AI: Rizikový fenomén prizmatem moderních věd o člověku)
Working Package 1: Disinformation texts from the perspective of 21st century linguistics
(Výzkumný záměr 1: Dezinformační texty v perspektivě lingvistiky 21. století)

CZ.02.01.01/00/23_025/0008724, Operační program Jan Amos Komenský (Johannes Amos Comenius Programme) (2025-2028)

Miroslav Kubát (team leader of Working Package 1 - Linguistics)

Michal Místecký, Xinying Chen, Michaela Nogolová, Lenka Vaňková, Martin Mostýn, Milan Pišl, Dominika Beneš Kováčová, Jiří Lukl, Petr Šlechta (team members of Working Package 1 - Linguistics)

The research project analyzes disinformation using progressive methods of contemporary linguistics. It is divided into two sub-projects: VZ1.1 – fake news from the perspective of quantitative linguistics (QL), and VZ1.2 – disinformation texts from the perspective of interlingual comparison. Within VZ1.1, the main objective is to identify intersubjective, interpretatively significant, and publicly communicable linguistic features that distinguish disinformation from real news in the Czech-language environment. VZ1.2 is synergistically linked with VZ1.1. In VZ1.2, the research on disinformation will be expanded to include a comparative perspective. Through comparative analyses of texts written in Czech, German, Spanish, and English within the field of media discourse, verbal and non-verbal features specific to disinformation texts in each language will be identified and described using qualitative methodology. The study will examine whether interlingual comparisons with Czech reveal intercultural differences among the texts. The analysis will focus on various aspects of interlingual comparison: the presentation of (dis)information, related argumentation and emotionalization strategies that enhance the text's persuasiveness, its thematic structure, coherence, use of cohesive devices, cognitive metaphors, phraseology, and multimodality. This approach will not only allow the identification of specific characteristics that distinguish disinformation texts from other texts but also provide a foundation for a deeper understanding of the mechanisms and strategies employed in disinformation campaigns.

Quantitative analysis of syntactic complexity in Czech texts

SGS04/FF/2025, University of Ostrava (2025)

Miroslav Kubát (team leader)

Xinying Chen, Michaela Nogolová, Žaneta Stiborská (team members)

The project focuses on the quantitative analysis of the syntactic complexity of Czech texts, using methods such as average sentence length, average clause length, average dependency distance (MDD) and average length of linear dependency segments (LDS). The analysis will be applied to three different corpora: the New Year speeches of Czechoslovak and Czech presidents (in order to investigate the differences between democratic and communist presidents), the works of Karel Čapek (to analyse the effect of genre on syntactic complexity in literary works of one author), and adapted literature for non-native speakers (to assess syntactic complexity by language level). The project will thus provide insight into the relationship between syntactic complexity and different types of texts. This project is closely related to the themes of PhD thesis of M. Nogolová (Between word and clause, a quantitative analysis of syntactic units) and Ž. Stiborská (Reading literacy among students with a different mother tongue).

Analysis of the syntactic complexity of texts of the corpus CzeSL-SGT

SGS08/FF/2023, University of Ostrava (2023)

Miroslav Kubát (team leader)

Radek Čech, Michaela Hanušková, Michaela Nogolová, (team members)

The project will focus on the analysis of the development of the syntactic complexity of the texts of non-native Czech speakers across different language levels (A1-C1 according to CEFR). Syntactic complexity will be measured by two methods: a) mean dependency distance (MDD) and b) mean length of linear dependency segments (LDS). The research will be based on the student corpus CzeSL-SGT, which contains more than 8 000 texts written by non-native speakers of Czech. This project is closely related to the dissertation of M. Hanušková focused on the quantitative analysis of the texts of non-native speakers of Czech and also to the dissertation of M. Nogolová dealing with the quantitative analysis of syntactic units.

Quantitative Syntactic Stylistics of Contemporary Written Czech

GA22-20632S, Czech Science Foundation GAČR (2022-2024)

Miroslav Kubát (team leader)
Xinying Chen, Radek Čech (team members)

The project focuses on syntactic features of different styles in contemporary written Czech. The research is based on the quantitative analysis of the corpus SYN2020 which is a syntactically annotated, representative corpus of contemporary written Czech. The corpus consists of 100 million tokens. Various syntactic features such as mean sentence length, sentence types, word order, modality, distribution of syntactic functions, indicators of attributivity and subjectivity are analyzed. Our aim is to enrich the previous research on the style of Czech texts with new perspectives. First, we focus on the syntactic part of Czech stylistics which usually stands out of the main interest of scholars. Second, our analysis is based on quantitative methods.

Disclaimer - Waste RegimeWaste Regime

Quantitative analysis of texts of CzeSL-SGT corpus

SGS06/FF/2022, University of Ostrava (2022)

Miroslav Kubát (team leader)
Radek Čech, Michaela Hanušková, Michaela Nogolová, Markéta Guńková (team members)

The project will focus on the quantitative analysis of the texts of the CzeSL-SGT corpus in order to obtain data on texts of individual language levels, to model the development of these texts and to analyze the process of learning Czech as a foreign language. This

corpus contains over 8000 texts written by learners of Czech as a foreign language at all language levels. We will analyze the texts using the QuitaUP and UDPipe software, which allow us to compute various properties of the texts. In particular, we will be

interested in the average length of tokens, the descriptivity of the text, the verb distances, the length of sentences, lexical richness, the number of clauses in a sentence, syntactic characteristics of dependency trees. This project is the first phase of research towards

a M. Hanušková's PhD thesis focused on the analysis of texts written by non-native speakers of Czech. The applied methods are also involved in MA theses of M. Nogolová and M. Guńková. The results of the research will be presented at linguistic conferences and in a scientific articles.

Semantic analysis based on Neural Networks approaches

SGS07/UVAFM/2019, University of Ostrava (2019)

Miroslav Kubát (team leader)
Jan Hůla, David Mojžíšek, Kateřina Pelegrinová (team members)

The project is thematically related to the series of previous SGS projects which were focused on neural based linguistic analysis. The goal of this SGS project is to make a previously developed analysis accessible for wider audience as an interactive web application. Concretely, the plan is to develop a web application which would allow to measure semantic properties of words chosen by a user and which would visualize the results of the analysis in comprehensive graphs.

The second part of the project will be deal with a construction of semantic graph based on the similarity of the meanings of words which will be acquired with the method called Word2Vec. This graph will be later analyzed by the methods used to study complex networks.

In comparison with our previous SGS projects, the analysis will be done on newer synchronic corpus SYN_V6 (Czech national corpus).

The proposed project corresponds both to the research of the Institute for Research and Applications of Fuzzy Modeling of the University of Ostrava in the field of neural networks, as well as to the main branch of quantitative textual linguistics at the Department of Czech Language, Faculty of Arts of the University of Ostrava.

Analysis of Context Specificity of Lemma using Neural Networks

SGS01/UVAFM/2018, University of Ostrava (2018)

Miroslav Kubát (team leader)
Radek Čech, Jan Hůla, David Číž, Kateřina Pelegrinová (team members)

The project follows up the previous SGS project Application of Neural Networks in Diachronic and Synchronic Semantic Analysis of Texts. The first analysis showed that there is a convincing potential of this approach. The main goal is to extend the functionality of the developed software and to discover the possible applications of the proposed method in linguistic research. Specifically, with our method we can measure the Context specificity of lemma (CSL). This method is based on the Word2vec technique and measures the degree of the context specificity of a lemma.

Application of Neural Networks in Diachronic and Synchronic Semantic Analysis of Texts

SGS02/UVAFM/2017 University of Ostrava (2017)

Radek Čech (team leader)
Miroslav Kubát, Jan Hůla, Vojtěch Molek (team members)

The aim of the project is to apply the contemporary methods based on neural networks in textology. Semantic changes in a Czech corpus are analyzed from synchronic and diachronic viewpoints. More specifically, (a) we examine the development of the political and social discourse from 1990 to 2014, and (b) we investigate the effectiveness of this method for genre classification. The project reflects the research topics of the Department of Czech Language (quantitative linguistics) and the Institute for Research and Applications of Fuzzy Modeling of the University of Ostrava (neural networks).

Implementation of new methods for the teaching of quantitative linguistic subjects at the Department of Czech Language at the Faculty of Arts of the of the University of Ostrava, and the improvement of the pedagogical and professional competence of the staff of the Department, based on expert consultation at the Department of Philosophy, Sociology, Education and Applied Psychology, University of Padua.

IRP201707 Universtity of Ostrava (2017)

Miroslav Kubát (team leader)
Radek Čech (team member)

QUITA (Quantitative Index Text Analyzer) – Software Measuring Vocabulary Richness and Other Quantitative Features of Texts

IGA FF_2013_031, Palacký Univesity Olomouc (2013)

Radek Čech (team leader)
Vladimír Matlach, Miroslav Kubát (team members)

Quantitative Index Text Analyzer (QUITA) covers the most common indicators, especially those connected with frequency structure of a text. In addition to computing results of the indicators, QUITA also provides statistical testing and graphical visualization of obtained data. QUITA is a versatile tool with many uses designed for researchers from various disciplines (linguistics, literary criticism, history, sociology, psychology, politics, biology, etc.). The programme enables basic text processing functions – such as creating word lists, text lemmatizing, or creating n-grams. The program also provides more advanced tools, such as a random text creator or a binary file translator. However, the main part of the software is an indicator computing. Although the authors focused mainly on the indicators connected to frequency structure of a text (e.g., h-point, entropy, repeat rate, adjusted modulus, Gini’s coefficient, lambda), there are also several other characteristics, such as thematic concentration, activity & descriptivity, or writer’s view. More information about the software is to be found in the book QUITA – Quantitative Index Text Analyzer and in the diploma thesis Kvantitativně lingvistický software.

Search site

Contact

Miroslav Kubát

Faculty of Arts University of Ostrava Havlíčkovo nábřeží 38 Ostrava 702 00 Czech Republic

miroslav.kubat@gmail.com

Menu

Search site

Contact