DEVELOPING NEW PARAPHRASE ALGORITHMS ADAPTED FOR THE UZBEK LANGUAGE

Published 2025-06-30
SOCIAL SCIENCES AND HUMANITIES Vol. 80 No. 2 (2025)
Том 80 №2 (2025)
Authors:
  • KHAYATOVA Z.M.
  • HAMROYEVA SH.M.
PDF

Paraphrase generation in Natural Language Processing (NLP) is well-developed for high-resource languages like English but remains underexplored for Uzbek, a low-resource agglutinative language with free word order. The unique morphological structure of Uzbek presents challenges for transformer-based models such as mBART, mT5, and GPT, which struggle with morphological segmentation, syntactic variation, and semantic preservation due to the lack of high-quality annotated datasets. This study proposes a hybrid approach that combines rule-based morphological analyzers (UZLex, O‘zMorphAnalyzer) with deep learning models fine-tuned on Uzbek corpora. To address data scarcity, manual dataset curation and back-translation techniques are employed. The methodology includes morphology-aware tokenization, contextual embeddings, and semantic role labeling, ensuring grammatical correctness and fluency in paraphrase generation.

The proposed model is evaluated using BLEU, ROUGE, and BERTScore, alongside human assessments, showing that hybrid models outperform standard neural approaches. The results highlight the importance of integrating linguistic knowledge into NLP systems for low-resource languages. Future work will focus on expanding annotated corpora, improving morphology-sensitive embeddings, and developing domain-specific models for applications in machine translation and automated text processing.

KHAYATOVA Z.M.

Phd, 2nd year post-doctoral student, Tashkent State University of Uzbek language and literature named after A. Navoi, Republic Uzbekistan.

E-mail: khayatovazarnigor@gmail.com, https://orcid.org/0000-0001-6465-6517

HAMROYEVA SH.M.

professor, DSc. Tashkent State University of Uzbek language and literature named after A. Navoi, Republic Uzbekistan 

E-mail: hamroyeva81@mail.ru, https://orcid.org/0000-0002-5429-4708

  1. Jumaniyozov A., & Karimov B. (2022). Advances in Computational Morphology for Uzbek. Springer.
  2. Xue H., Zhang Y., & Liu J. (2021). Low-Resource Language Modeling: Challenges and Approaches. IEEE Transactions on NLP, 34(2), 45-58.
  3. Edunov S., Ott, M., Auli, M., & Grangier, D. (2018). Understanding Back-Translation at Scale. arXiv preprint arXiv:1808.09381.
  4. Koehn P., & Knowles, R. (2017). Six Challenges for Neural Machine Translation. Proceedings of the First Workshop on Neural Machine Translation, 28-39.
  5. Tashkent State University of Uzbek Language and Literature. (2023). Computational Linguistics and Uzbek Language Processing. Tashkent: UzNLP Press.
Paraphrase Generation, Low-Resource Languages, Uzbek NLP, Segmentation, Transformer-Based NLP

How to Cite

DEVELOPING NEW PARAPHRASE ALGORITHMS ADAPTED FOR THE UZBEK LANGUAGE. (2025). Scientific Journal "Bulletin of the K. Zhubanov Aktobe Regional University", 80(2), 231-237. https://doi.org/10.70239/