O‘ZBEK-INGLIZ PARALLEL KORPUSIDA METAMA’LUMOTLAR VA FORMATLASH MASALASI HAMDA O‘ZBEK TILI UCHUN MAVJUD NLP VOSITALARI

Botir Elov; Ma’rufjon Amirkulov; Malika Suyunova

doi:10.47390/ts-v3i9y2025No3

Authors

Botir Elov
Ma’rufjon Amirkulov
Malika Suyunova

DOI:

https://doi.org/10.47390/ts-v3i9y2025No3

Keywords:

parallel corpus, metadata, TEI, CoNLL-U, Uzbek language, NLP, morphological analysis, tagging, lemmatization, linguistic resources.

Abstract

This article explores the issues of metadata formatting, the use of TEI and CoNLL-U standards, and the analysis of existing Natural Language Processing (NLP) tools for the Uzbek language in the process of creating an Uzbek-English parallel corpus. The paper discusses each stage of corpus development, including text alignment, syntactic and morphological annotation, and structural encoding. Furthermore, it evaluates the performance of Uzbek morphological analyzers, lemmatizers, and POS taggers, emphasizing their practical significance in constructing high-quality bilingual corpora. The results of the study provide a methodological basis for accurately encoding, automatically analyzing, and integrating parallel corpora into linguistic search and processing systems.

References

1. Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Al-Smadi, M, & Eryiğit, G. (2016, June). SemEval-2016 Task 5: Aspect-based sentiment analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 19–30). Association for Computational Linguistics.

2. Mirdjonovna, H. S., & Ilxomovna, A. X. Boltayevich, E. B., (2022, October). Methods for creating a morphological analyzer. In International Conference on Intelligent Human Computer Interaction (pp. 27-38). Cham: Springer Nature Switzerland.

3. Adalι, E., Mirdjonovna, K. S., Xolmo‘minovna, A. O., Yuldashevna, X. Z., Boltayevich, E. B., & Uktamboy O'g'li, X. N. (2023, September). The Problem of Pos Tagging and Stemming for Agglutinative Languages (Turkish, Uyghur, Uzbek Languages). In 2023 8th International Conference on Computer Science and Engineering (UBMK) (pp. 57-62). IEEE.

4. Elov, B., & Xudayberganov, N. (2024). O ‘zbek tili korpusi matnlarini pos teglash usullari. Computer Linguistics: problems, solutions, prospects, 1(1).

5. Sharipov, M., Mattiev, J., Sobirov, J., & Baltayev, R. (2022). Creating a morphological and syntactic tagged corpus for the Uzbek language. arXiv preprint arXiv:2210.15234.

6. Hamroyeva, S., Alayev, R., Xusainova, Z., & Yodgorov, U., Elov, B. (2023). O ‘zbek tili korpusi matnlarini qayta ishlash usullari. Digital transformation and artificial intelligence, 1(3), 117-129.

7. Elov, B., & Xudayberganov, N. (2024). O ‘zbek tili korpusi matnlarini pos teglash usullari. Computer Linguistics: problems, solutions, prospects, 1(1).

METADATA AND FORMATTING ISSUES IN THE UZBEK-ENGLISH PARALLEL CORPUS AND EXISTING NLP TOOLS FOR THE UZBEK LANGUAGE

Authors

DOI:

Keywords:

Abstract

References

Downloads

Submitted

Published

How to Cite

Issue

Section

Similar Articles

Language

make a submission

SidebarMenu

EditorialTeam

JournalTemplate

Journal Template

Information

Current Issue

visitors

Visitors

Founder:

Editorial Office:

Information: