Estudio sobre los trabajos de investigación en PNL en Guyaratí
DOI:
https://doi.org/10.15649/2346030X.4445Palabras clave:
natural language processing (NLP), Gujarati language, artificial intelligence (AI), Gujarati, machine learning (ML), deep learning (DL)Resumen
El Procesamiento del Lenguaje Natural (PLN) es un área de la Inteligencia Artificial (IA) que se ocupa del texto, el habla y la traducción. Los usuarios de Internet en lenguas vernáculas aumentan día a día. Por ello, los investigadores en PNL han estado trabajando en lenguas regionales, y el gujarati es una de ellas. El gujarati es una lengua indoaria originaria del estado indio de Gujarat y vocalizada por los gujarati. Alrededor de 62 millones de personas hablan gujarati en todo el mundo, lo que la sitúa en el puesto 26 de las lenguas más habladas del mundo. Sin embargo, el gujarati es la lengua india más joven y con menos recursos en la comunidad de la PNL. Sin embargo, se han realizado pocos trabajos pioneros en PNL para el gujarati (GNLP). Por ejemplo, WordNet, Morphological, Stemmer, reconocimiento óptico de caracteres (OCR), reconocimiento del habla, partes del habla, traducción automática, etc. Muchos investigadores han trabajado con un enfoque basado en reglas para GNLP. Después de eso, solo unos pocos investigadores han intentado los enfoques de aprendizaje automático, aprendizaje profundo y aprendizaje de refuerzo. Este artículo se centra en un estudio crítico de la investigación existente sobre GNLP, que abarca artículos de investigación desde 1999 hasta agosto de 2024. Esta encuesta predice el estudio GNLP hasta 2030 con el uso de los datos disponibles. Este estudio explora las lagunas en los estudios actuales, y las sugerencias para el nuevo campo de investigación activa de GNLP, a través de este documento se puede decidir el desarrollo de Deep Learning basado en el sistema GNLP, para obtener una mayor precisión.
Referencias
[1] B. Waghmar, Gujarati, in Concise Encyclopedia of the Languages of the World, K. Brown and S. Ogilvie, Eds. Oxford: Elsevier, pp. 468–469, 2009.
[2] T. Vyas and A. Ganatra, “Gujarati language: Research issues, resources and proposed method on word sense disambiguation,” Int. J. Recent Technol. Eng. (IJRTE) ISSN, vol. 8, no. 2, suppl. 11, p. 2277-3878, pp-3745- 3749, 2019.
[3] C. Masica, The Indo-Aryan Languages. Cambridge: Cambridge University Press, ISBN 978-0-521-29944-2, 1991.
[4] P. J. Mistry, “International encyclopedia of linguistics,” in Gujarati, 2nd ed, vol. 2, W. Frawley, Ed. Oxford: Oxford University Press, 2003.
[5] Census of India. India: LANGUAGE Atlas, 2011.
[6] A Study by KPMG in India and Google- “Indian Languages – Defining India’s Internet”, Apr. 2017.
[7] P. Bhattacharyya, “IndoWordNet,” in Proc. Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta. European Language Resources Association (ELRA), 2010.
[8] J. Baxi, P. Patel, and B. Bhatt, “Morphological Analyzer for Gujarati using Paradigm based approach with knowledge based and Statistical Methods,” in Proc. 12th International Conference on Natural Language Processing, pp. 178–182, 2015.
[9] J. Sheth and B. Patel, “Dhiya: A stemmer for morphological level analysis of Gujarati language,” in international conference on issues and challenges in intelligent computing techniques (ICICT). IEEE, pp. 151–154, 2014.
[10] H. Patel, “Gujarati OCR: Compound character recognition using Zernike moment feature extractor,” Int. J. Comput. Sci. Trends Technol. (IJCST), vol. 8, no. 5, pp-45–50, 2020.
[11] J. H. Tailor and D. B. Shah, “Speech recognition system architecture for Gujarati language,” Int. J. Comput. Appl., vol. 138, no. 12, 2016.
[12] A. A. Desai Kapadia, “U.N.,Paradigm based Part of Speech Tagging with priorities: Implantation for Gujarati Script,” Int. J. Comput. Sci. Trends Technol., vol. 10, no. 1, pp. 104–112.
[13] V. Goyal and D. M. Sharma, “The IIIT-H Gujarati-English machine translation system for WMT19,” in Proc. Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pp. 191–195, 2019.
[14] S. Antani and L. Agnihotri, “Gujarati character recognition,” Proc. Fifth International Conference on Document Analysis and Recognition. ICDAR’99 (cat. No. PR00318). Bangalore, India, pp. 418–421, 1999.
[15] P. Patel, K. Popat, and P. Bhattacharyya, “Hybrid stemmer for Gujarati,” in Proc. 1st Workshop on South and Southeast Asian Natural Language Processing, pp. 51–55, 2010.
[16] A. A. Desai, “Gujarati handwritten numeral optical character reorganization through neural network,” Pattern Recognit., vol. 43, no. 7, pp-2582-2589, 2010.
[17] B. S. Bhatt, C. K. Bhensdadia, P. Bhattacharyya, D. Chauhan, and K. Patel, “Gujarati WordNet: A profile of the IndoWordNet,” in The WordNet in Indian Languages, N. Dash, P. Bhattacharyya, and J. Pawar, Eds. Singapore: Springer, 2017.
[18] C. K. Bhensdadia, B. Bhatt, and P. Bhattacharyya, “Introduction to Gujarati wordnet,” in Third national workshop on indowordnet Proceedings, vol. 494, 2010.
[19] A. A. Desai, “Handwritten Gujarati numeral optical character recognition using hybrid feature extraction technique,” in IPCV 2010, Proc. 2010 International Conference on Image Processing, Computer Vision, & Pattern Recognition, Las Vegas, NV, Jul. 12–15, pp. 733–739, 2010.
[20] C. Patel and A. Desai, “Zone identification for Gujarati handwritten word,” in Second International Conference on Emerging Applications of Information Technology. IEEE, pp. 194–197, 2011.
[21] K. Suba, D. Jiandani, and P. Bhattacharyya, “Hybrid inflectional stemmer and rule-based derivational stemmer for Gujarati,” in Proc. 2nd Workshop on South Southeast Asian Natural Language Processing (WSSANLP), pp. 1–8, 2011.
[22] J. Ameta, N. Joshi, and I. Mathur, “A lightweight stemmer for Gujarati. ArXiv:1210.5486”. Available at: https://arxiv.org/abs/1210.5486, 2012.
[23] J. R. Sheth and B. C. Patel [Article], “Stemming techniques and naive approach for Gujarati stemmer” IJCA Proceedings on International Conference on Recent Trends in Information Technology and Computer Science 2012 ICRTITCS (2), vols. 9–11, Feb., 2013.
[24] C. Patel and A. Desai, “Extraction of characters and modifiers from handwritten Gujarati words,” Int. J. Comput. Appl., vol. 73, no. 3, 7–12, 2013.
[25] D. B. Patel and M. M. Goswami, “Word level correction in Gujarati document using probabilistic approach,” International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), Coimbatore, India, pp. 1–5. 2014.
[26] B. C. Patel and A. A. Desai, “Recognition of spoken Gujarati numeral and its conversion into electronic form,” Int. J. Eng. Res. Technol. (IJERT), Vol., no. 9, Sept., p. 3, pp-474–480, 2014.
[27] J. M. Varghese and S. S. Hande. “text-to-speech System for Gujarati Language.”, Int. J. Adv. Comput. Electron. Technol. (IJACET), vol. 2, no. 4, pp-78–81, 2015.
[28] J. Baxi, P. Patel, and B. Bhatt, “Morphological Analyzer for Gujarati using Paradigm based approach with Knowledge based and Statistical Methods,” in Proc. 12th International Conference on Natural Language Processing, pp. 178–182, 2015.
[29] A. A. Desai, “Support vector machine for identification of handwritten Gujarati alphabets using hybrid feature space,” CSIT, vol. 2, no. 4, 235–241, 2015.
[30] U. Kapadia and A. Desai, “Morphological rule set and lexicon of Gujarati grammar: A linguistics approach,” VNSGU J. Sci. Technol., vol. 4, no. 1, Pp-127–133, 2015.
[31] N. B. Jariwala and Dr. B. Patel, “Transliteration of digital Gujarati mathematical text into Braille for visually impaired people,” Int. J. Latest Trends Eng. Technol., Issue (3), vol. 7, no. 3, pp. 217–229, 2016.
[32] U. Kapadia and A. Desai, “Rule based Gujarati morphological analyzer,” Int. J. Comput., vol. 14, no. 2, pp. 30–35, 2017.
[33] J. Sheth and B. Patel, “Saaraansh: Gujarati text summarization system,” Int. J. Comput. Sci. Inf. Technol. Sec., vol. 7, no. 3, pp. 46–53, 2017.
[34] V. A. Naik and A. A. Desai, “Multi-layer classification approach for online handwritten Gujarati character recognition,” in Adv. Intell. Syst. Comput., vol 799, N. Verma and A. Ghosh, Eds. Computational Intelligence: Theories, Applications and Future Directions. Singapore: Springer, vol. II, 2019.
[35] H. Patel and B. Patel, “Stemmatizer-Stemmer-based lemmatizer for Gujarati text,” in Emerging Trends in Expert Applications and Security. Advances in Intelligent Systems and Computing, vol. 841, V. Rathore, M. Worring, D. Mishra, A. Joshi, and S. Maheshwari, Eds. Singapore: Springer, 667–674. 2019.
[36] V. Goyal and D. M. Sharma, “The IIIT-H Gujarati-English machine translation system for WMT19,” in Proc. Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pp. 191–195, 2019. doi: 10.18653/v1/W19-5316.
[37] S. Sen, K. K. Gupta, A. Ekbal, and P. Bhattacharyya, “IITP-MT system for Gujarati-English news translation task at WMT 2019,” in Proc. Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pp. 407–411, 2019. doi: 10.18653/v1/W19-5346.
[38] M. Patel and A. Apurva Desai, Performance analysis of various wavelet filters for Gujarati text localization in images, IJRAR- International Journal of Research and Analytical Reviews ,, “Jagin,” vol. 6, no. 2, pp. 96–100, 2019.
[39] C. Tailor and B. Patel, “Sentence tokenization using statistical unsupervised machine learning and rule-based approach for running text in Gujarati language,” in Emerging Trends in Expert Applications and Security. Advances in Intelligent Systems and Computing, vol. 841, V. Rathore, M. Worring, D. Mishra, A. Joshi, and S. Maheshwari, Eds. Singapore: Springer, 319–326, 2019.
[40] D. Raval, V. Pathak, M. Patel, and B. Bhatt, “End-to-end automatic speech recognition for Gujarati,” in Proc. 17th International Conference on Natural Language Processing (ICON). Patna, India: Indian Institute of Technology Patna. NLP Association of India (NLPAI), pp. 409–419, 2020.
[41] S. Mehta and S. K. Mitra, “Text classification of Gujarati newspaper headlines,” Int. J. As. Lang. Proc., vol. 30, 2020.
[42] C. Tailor and B. Patel, “Chunker for Gujarati language using hybrid approach,” in Rising Threats in Expert Applications and Solutions. Advances in Intelligent Systems and Computing, vol. 1187, V. S. Rathore, N. Dey, V. Piuri, R. Babo, Z. Polkowski, and J. M. R. S. Tavares, Eds. Singapore: Springer, 77–84. 2021. doi: 10.1007/978-981-15-6014-9_10.
[43] H. Patel, B. Patel, and K. Lad, “Jodani: A spell checking and suggesting tool for Gujarati language,” Data Sci. Eng. (Confluence), (Noida, India) 11th International Conference on Cloud Computing, vol. 2021, pp. 94–99. 2021.
[44] N. Patel and D. Patel, “‘Implementation Approach of Indian Language Gujarati Grammar’s Concept “sandhi’ using the Concepts of Rule-based NLP,” 8th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, vol. 2021, pp. 481–485, 2021.
[45] J. K. Raulji, J. R. Saini, K. Pal, and K. Kotecha, “A novel framework for Sanskrit-Gujarati symbolic machine translation system,” IJACSA, vol. 13, 2022. doi: 10.14569/IJACSA.2022.0130444, Pp-374–380.
[46] M. K. Audichya, J. R. Saini, and J. C. Modh, “Towards a richer IndoWordNet with new additions for Hindi and Gujarati languages,” IJACSA, vol. 13, Pp-832–842. 2022.
[47] J. Baxi and B. Bhatt, “GujMORPH-A dataset for creating Gujarati morphological analyzer,” in Proc. Thirteenth Language Resources and Evaluation Conference, pp. 7088–7095, 2022,
[48] M. Parikh and A. Desai, “Recognition of handwritten Gujarati conjuncts using the convolutional neural network architectures: AlexNet, GoogLeNet, inception V3, and ResNet50,” in Advances in Computing and Data Sciences, Revised Selected Papers: 6th International Conference, ICACDS 2022, Kurnool, India, April 22–23, part II, p. 291–303, 2022.
[49] 303. Cham: Springer International Publishing, 2022.
[50] N. Kapadia Utkarsh and A. Deasi Apurva, “Paradigm based Part of Speech Tagging with priorities: Implantation for Gujarati Script,” Int. J. Comput. Sci. Trends Technol. (IJCST), vol. 10, no. 1, pp. 104–112, 2022.
[51] K. Limbachiya, A. Sharma, P. Thakkar, D. Adhyaru, “Identification of handwritten Gujarati alphanumeric script by integrating transfer learning and convolutional neural networks” Sādhanā, vol. 47, no. 2, p. 102, 2022.
[52] P. Goel and A. Ganatra, “Handwritten Gujarati numerals classification based on deep convolution neural networks using transfer learning scenarios,” IEEE Access, vol. 11, pp. 20202–20215, 2023.
[53] A. Sharma et al., “Gujarati script recognition,” Procedia Comput. Sci., vol. 218, pp. 2287–2298, 2023.
[54] U. Chauhan et al., “Modeling topics in DFA-based lemmatized Gujarati text,” Sensors (Basel), vol. 23, no. 5, p. 2708, 2023. doi: 10.3390/s23052708.
[55] M. Gokani and G. S. A. C. Radhika Mamidi, “A Gujarati sentiment analysis corpus from Twitter,”, in Proc. 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis. Association for Computational Linguistics, Jul. 14, 2023, pp. 129–137.
[56] J. Baxi and B. Bhatt, “A bidirectional LSTM-based morphological analyzer for Gujarati,” Nat. lang. processing. Cambridge University Press, pp. 1–17, 2024. doi: 10.1017/nlp.2024.14.
[57] N. G. Patel and D. B. Patel, “NLP-based processing of Gujarati compound word sandhi’s generation and segmentation,” in International Conference on Universal Threats in Expert Applications and Solutions. Singapore: Springer Nature Singapore, pp. 263–271, Jan., 2024.
[58] B. Y. Panchal and A. Shah, “Spell checker using Norvig algorithm for Gujarati language,” in. ICSMDI 2024. Algorithms for Intelligent Systems, R. Asokan, D. P. Ruiz, and S. Piramuthu, Eds. Smart Data Intelligence. Singapore: Springer, 2024. doi: 10.1007/978-981-97-3191-6_21.
[59] Available at: https://www.gujaratilexicon.com/saras-spellchecker/.
[60] Y. Gondaliya, P. Kalariya, B. Y. Panchal, and A. Nayak, “A rule-based grammar and spell checking,” SAMRIDDHI A J. Phys. Sci. Eng. Technol., vol. 14, no. 1, pp. 48–54, 2022. doi: 10.18090/samriddhi.v14i01.8.
Descargas
Publicado
Cómo citar
Descargas
Número
Sección
Licencia
Derechos de autor 2025 AiBi Revista de Investigación, Administración e Ingeniería

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
La revista ofrece acceso abierto bajo una Licencia Creative Commons Attibution License

Esta obra está bajo una licencia Creative Commons Attribution (CC BY 4.0).