Paraphrase generation in Natural Language Processing (NLP) is well-developed for high-resource languages like English but remains underexplored for Uzbek, a low-resource agglutinative language with free word order. The unique morphological structure of Uzbek presents challenges for transformer-based models such as mBART, mT5, and GPT, which struggle with morphological segmentation, syntactic variation, and semantic preservation due to the lack of high-quality annotated datasets. This study proposes a hybrid approach that combines rule-based morphological analyzers (UZLex, O‘zMorphAnalyzer) with deep learning models fine-tuned on Uzbek corpora. To address data scarcity, manual dataset curation and back-translation techniques are employed. The methodology includes morphology-aware tokenization, contextual embeddings, and semantic role labeling, ensuring grammatical correctness and fluency in paraphrase generation.
The proposed model is evaluated using BLEU, ROUGE, and BERTScore, alongside human assessments, showing that hybrid models outperform standard neural approaches. The results highlight the importance of integrating linguistic knowledge into NLP systems for low-resource languages. Future work will focus on expanding annotated corpora, improving morphology-sensitive embeddings, and developing domain-specific models for applications in machine translation and automated text processing.
KHAYATOVA Z.M.
Phd, 2nd year post-doctoral student, Tashkent State University of Uzbek language and literature named after A. Navoi, Republic Uzbekistan.
E-mail: khayatovazarnigor@gmail.com, https://orcid.org/0000-0001-6465-6517
HAMROYEVA SH.M.
professor, DSc. Tashkent State University of Uzbek language and literature named after A. Navoi, Republic Uzbekistan
E-mail: hamroyeva81@mail.ru, https://orcid.org/0000-0002-5429-4708
- Jumaniyozov A., & Karimov B. (2022). Advances in Computational Morphology for Uzbek. Springer.
- Xue H., Zhang Y., & Liu J. (2021). Low-Resource Language Modeling: Challenges and Approaches. IEEE Transactions on NLP, 34(2), 45-58.
- Edunov S., Ott, M., Auli, M., & Grangier, D. (2018). Understanding Back-Translation at Scale. arXiv preprint arXiv:1808.09381.
- Koehn P., & Knowles, R. (2017). Six Challenges for Neural Machine Translation. Proceedings of the First Workshop on Neural Machine Translation, 28-39.
- Tashkent State University of Uzbek Language and Literature. (2023). Computational Linguistics and Uzbek Language Processing. Tashkent: UzNLP Press.