Semantic analysis based on Neural Networks approaches
SGS07/UVAFM/2019, University of Ostrava (2019)
Miroslav Kubát, Jan Hůla, David Mojžíšek, Kateřina Pelegrinová
The project is thematically related to the series of previous SGS projects which were focused on neural based linguistic analysis. The goal of this SGS project is to make a previously developed analysis accessible for wider audience as an interactive web application. Concretely, the plan is to develop a web application which would allow to measure semantic properties of words chosen by a user and which would visualize the results of the analysis in comprehensive graphs.
The second part of the project will be deal with a construction of semantic graph based on the similarity of the meanings of words which will be acquired with the method called Word2Vec. This graph will be later analyzed by the methods used to study complex networks.
In comparison with our previous SGS projects, the analysis will be done on newer synchronic corpus SYN_V6 (Czech national corpus).
The proposed project corresponds both to the research of the Institute for Research and Applications of Fuzzy Modeling of the University of Ostrava in the field of neural networks, as well as to the main branch of quantitative textual linguistics at the Department of Czech Language, Faculty of Arts of the University of Ostrava.
SGS01/UVAFM/2018, University of Ostrava (2018)
Miroslav Kubát, Radek Čech, Jan Hůla, David Číž, Kateřina Pelegrinová
The project follows up the previous SGS project Application of Neural Networks in Diachronic and Synchronic Semantic Analysis of Texts. The first analysis showed that there is a convincing potential of this approach. The main goal is to extend the functionality of the developed software and to discover the possible applications of the proposed method in linguistic research. Specifically, with our method we can measure the Context specificity of lemma (CSL). This method is based on the Word2vec technique and measures the degree of the context specificity of a lemma.
SGS02/UVAFM/2017 University of Ostrava (2017)
Radek Čech, Miroslav Kubát, Jan Hůla, Vojtěch Molek
The aim of the project is to apply the contemporary methods based on neural networks in textology. Semantic changes in a Czech corpus are analyzed from synchronic and diachronic viewpoints. More specifically, (a) we examine the development of the political and social discourse from 1990 to 2014, and (b) we investigate the effectiveness of this method for genre classification. The project reflects the research topics of the Department of Czech Language (quantitative linguistics) and the Institute for Research and Applications of Fuzzy Modeling of the University of Ostrava (neural networks).
Implementation of new methods for the teaching of quantitative linguistic subjects at the Department of Czech Language at the Faculty of Arts of the of the University of Ostrava, and the improvement of the pedagogical and professional competence of the staff of the Department, based on expert consultation at the Department of Philosophy, Sociology, Education and Applied Psychology, University of Padua.
IRP201707 Universtity of Ostrava (2017)
Miroslav Kubát, Radek Čech
QUITA (Quantitative Index Text Analyzer) – Software Measuring Vocabulary Richness and Other Quantitative Features of Texts
IGA FF_2013_031, Palacký Univesity Olomouc (2013)
Radek Čech, Vladimír Matlach, Miroslav Kubát
Quantitative Index Text Analyzer (QUITA) covers the most common indicators, especially those connected with frequency structure of a text. In addition to computing results of the indicators, QUITA also provides statistical testing and graphical visualization of obtained data. QUITA is a versatile tool with many uses designed for researchers from various disciplines (linguistics, literary criticism, history, sociology, psychology, politics, biology, etc.). The programme enables basic text processing functions – such as creating word lists, text lemmatizing, or creating n-grams. The program also provides more advanced tools, such as a random text creator or a binary file translator. However, the main part of the software is an indicator computing. Although the authors focused mainly on the indicators connected to frequency structure of a text (e.g., h-point, entropy, repeat rate, adjusted modulus, Gini’s coefficient, lambda), there are also several other characteristics, such as thematic concentration, activity & descriptivity, or writer’s view. More information about the software is to be found in the book QUITA – Quantitative Index Text Analyzer and in the diploma thesis Kvantitativně lingvistický software.