{"id":1979,"date":"2025-11-23T08:16:55","date_gmt":"2025-11-23T08:16:55","guid":{"rendered":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/"},"modified":"2025-12-28T21:18:04","modified_gmt":"2025-12-28T21:18:04","slug":"hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai","status":"publish","type":"post","link":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/","title":{"rendered":"Hindi, Telugu, Bangla, Lao, Persian &#038; More: Unlocking the Future of Low-Resource Languages in AI"},"content":{"rendered":"<h3>Latest 50 papers on low-resource languages: Nov. 23, 2025<\/h3>\n<p>The landscape of AI is rapidly evolving, but a significant portion of the world\u2019s linguistic diversity, especially low-resource languages (LRLs), often remains on the fringes. These languages, spoken by millions but underserved by current AI models, represent a critical frontier for equitable technological development. Recent breakthroughs, as showcased in a collection of cutting-edge research papers, are pushing the boundaries, offering novel approaches to empower LRLs across various AI applications, from speech recognition to multimodal understanding and robust evaluation.### The Big Idea(s) &amp; Core Innovationsof the central themes emerging from this research is the power of <em>data augmentation and innovative modeling strategies<\/em> to overcome scarcity. For instance, the <strong>Indian Institute of Technology Patna<\/strong> and <strong>Allen Institute for AI<\/strong> introduce <a href=\"https:\/\/rishikant24.github.io\/\">HinTel-AlignBench: A Framework and Benchmark for Hindi\u2013Telugu with English-Aligned Samples<\/a>, a semi-automated framework to generate high-quality, culturally grounded datasets for Hindi and Telugu. Their work highlights significant performance gaps between English and Indian languages in multilingual vision-language models (VLMs), underscoring the necessity of such tailored benchmarks.the challenge of limited paired data, <strong>KAIST<\/strong> and <strong>Korea University<\/strong>\u2019s <a href=\"https:\/\/arxiv.org\/pdf\/2511.13036\">uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data<\/a> proposes a lightweight framework using English as a semantic anchor for cross-modal alignment without requiring paired image-text or text-text supervision. This pivot-based strategy is exceptionally parameter-efficient, reducing trainable parameters by over 99% compared to baselines.speech processing, <strong>Sina Rashidi<\/strong> and <strong>Hossein Sameti<\/strong> from <strong>Sharif University of Technology<\/strong>, in their paper <a href=\"https:\/\/arxiv.org\/pdf\/2511.12690\">Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data<\/a>, demonstrate that combining self-supervised pretraining, discrete units, and synthetic data significantly boosts direct speech-to-speech translation (S2ST) for Persian\u2013English. Similarly, for automatic speech recognition (ASR), <strong>Hung-Yang Sung et al.<\/strong> from <strong>National Taiwan Normal University<\/strong> introduce <a href=\"https:\/\/arxiv.org\/pdf\/2511.06860\">CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition<\/a>. Their two-stage fine-tuning strategy, integrating phonetic and Han-character annotations, leads to a 24.88% relative reduction in character error rate (CER) for Taiwanese Hokkien. Meanwhile, <strong>Zhaolin Li<\/strong> and <strong>Jan Niehues<\/strong> from <strong>Karlsruhe Institute of Technology<\/strong> show the promise of <a href=\"https:\/\/arxiv.org\/pdf\/2505.20445\">In-context Language Learning for Endangered Languages in Speech Recognition<\/a>, enabling LLMs to learn new, low-resource languages with only a few hundred samples, outperforming traditional instruction-based methods.data generation and model adaptation, understanding linguistic nuances is paramount. The <strong>Islamic University of Technology<\/strong> researchers introduce a novel corpus in <a href=\"https:\/\/arxiv.org\/pdf\/2511.13159\">Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis<\/a>, which explicitly differentiates unintentional errors from grammatical constructs in Bangla. Their work highlights the superiority of task-specific fine-tuning (e.g., with BanglaBERT) over general LLMs for such linguistically complex tasks. Similarly, <strong>Rocco Tripodi<\/strong> and <strong>Xiaoyu Liu<\/strong> delve into <a href=\"https:\/\/arxiv.org\/pdf\/2511.09796\">Predicate-Argument Structure Divergences in Chinese and English Parallel Sentences and their Impact on Language Transfer<\/a>, revealing how structural differences create asymmetry in cross-lingual knowledge transfer. For morphologically rich languages like Latin, <strong>Marisa Hudspeth et al.<\/strong>\u2019s <a href=\"https:\/\/arxiv.org\/pdf\/2511.09709\">Contextual morphologically-guided tokenization for Latin encoder models<\/a> demonstrates how incorporating morphological knowledge into tokenization significantly improves downstream performance, especially for out-of-domain texts.cross-lingual generalization and robustness is also a critical focus. <strong>Quang Phuoc Nguyen et al.<\/strong> from <strong>Ontario Tech University<\/strong> and <strong>Stanford University<\/strong> explore <a href=\"https:\/\/arxiv.org\/pdf\/2511.06497\">Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages<\/a>, finding that strategically selected, linguistically diverse subsets of languages can achieve comparable or even superior cross-lingual transfer than using all available languages. Moreover, <strong>Hoyeon Moon et al.<\/strong> introduce <a href=\"https:\/\/arxiv.org\/pdf\/2510.23070\">Quality-Aware Translation Tagging in Multilingual RAG system<\/a> (QTT-RAG), which explicitly evaluates translation quality along three dimensions to improve factual integrity and translation reliability in multilingual RAG systems. This method outperforms existing baselines, especially in LRLs.### Under the Hood: Models, Datasets, &amp; Benchmarksrecent surge in LRL research is heavily reliant on new, purpose-built resources:<strong>HinTel-AlignBench<\/strong>: A comprehensive benchmark for Hindi and Telugu VLMs, including adapted English datasets and native Indic datasets like JEE-Vision and VAANI for cultural and STEM-related tasks. (<a href=\"https:\/\/rishikant24.github.io\/\">https:\/\/rishikant24.github.io\/<\/a>)<strong>LaoBench<\/strong>: The first large-scale, multidimensional benchmark for evaluating LLMs on Lao, covering knowledge application, K12 education, and bilingual translation. (<a href=\"https:\/\/arxiv.org\/pdf\/2511.11334\">https:\/\/arxiv.org\/pdf\/2511.11334<\/a>)<strong>Arabic Little STT Dataset<\/strong>: A collection of Levantine Arabic child speech recordings from classrooms, highlighting performance gaps in ASR for child voices. (<a href=\"https:\/\/huggingface.co\/datasets\/little-stt\/little-stt-dataset\">https:\/\/huggingface.co\/datasets\/little-stt\/little-stt-dataset<\/a>)<strong>LRW-Persian<\/strong>: A large-vocabulary, in-the-wild Persian lip-reading dataset with over 414,000 video samples, supporting cross-lingual transfer. (<a href=\"https:\/\/lrw-persian.vercel.app\">https:\/\/lrw-persian.vercel.app<\/a>)<strong>UA-Code-Bench<\/strong>: The first competitive programming benchmark for evaluating LLM code generation in Ukrainian, featuring 500 problems across five difficulty levels. (<a href=\"https:\/\/huggingface.co\/datasets\/NLPForUA\/ua-code-bench\">https:\/\/huggingface.co\/datasets\/NLPForUA\/ua-code-bench<\/a>)<strong>BanglaMedQA and BanglaMMedBench<\/strong>: Two large-scale Bangla biomedical multiple-choice question datasets, the first of their kind, for evaluating Retrieval-Augmented Generation (RAG) strategies in medical QA. (<a href=\"https:\/\/huggingface.co\/datasets\/ajwad-abrar\/BanglaMedQA\">https:\/\/huggingface.co\/datasets\/ajwad-abrar\/BanglaMedQA<\/a>)<strong>LASTIST<\/strong>: A large-scale Korean stance detection dataset with 563,299 labeled sentences for target-independent stance analysis. (<a href=\"https:\/\/anonymous.4open.science\/r\/LASTIST-3721\/\">https:\/\/anonymous.4open.science\/r\/LASTIST-3721\/<\/a>)<strong>SentiMaithili<\/strong>: A new benchmark dataset for sentiment analysis and justification generation in the low-resource Maithili language, curated by linguistic experts. (<a href=\"https:\/\/arxiv.org\/pdf\/2510.22160\">https:\/\/arxiv.org\/pdf\/2510.22160<\/a>)<strong>SMOL<\/strong>: An open-source dataset with professionally translated parallel data for 115 under-represented languages, including sentence- and document-level translations with factuality ratings. (<a href=\"https:\/\/arxiv.org\/pdf\/2502.12301\">https:\/\/arxiv.org\/pdf\/2502.12301<\/a>)<strong>URIEL+ enhancements<\/strong>: Improved with script vectors for 7,488 languages and Glottolog integration for 18,710 additional languages, reducing feature sparsity for cross-lingual transfer. (<a href=\"https:\/\/github.com\/LeeLanguageLab\/URIELPlus\">https:\/\/github.com\/LeeLanguageLab\/URIELPlus<\/a>)<strong>ORB (OCR-Rotation-Bench)<\/strong>: A new benchmark for evaluating OCR robustness to image rotations, with public release of models, datasets, and code. (<a href=\"https:\/\/ai-labs.olakrutrim.com\/\">https:\/\/ai-labs.olakrutrim.com\/<\/a>)<strong>CAP (Confabulations from ACL Publications)<\/strong>: A multilingual dataset (9 languages) for scientific hallucination detection in LLMs. (<a href=\"https:\/\/arxiv.org\/pdf\/2510.22395\">https:\/\/arxiv.org\/pdf\/2510.22395<\/a>)papers also release code for their innovations, encouraging further exploration: <a href=\"https:\/\/github.com\/redsheep913\/CLiFT-ASR\/\">CLiFT-ASR<\/a>, <a href=\"https:\/\/github.com\/sinarashidi\/S2ST-Transformer\">S2ST-Transformer<\/a>, <a href=\"https:\/\/dinyudin203.github.io\/uCLIP-project\/\">uCLIP<\/a>, <a href=\"https:\/\/github.com\/YYF-Tommy\/LangGPS\">LangGPS<\/a>, <a href=\"https:\/\/github.com\/yongchoooon\/stellar\">STELLAR<\/a>, <a href=\"https:\/\/github.com\/d-gurgurov\/Multilingual-LM-Disitillation\">Multilingual-LM-Disitillation<\/a>, <a href=\"https:\/\/github.com\/grvkamath\/low-resource-syn-ner\">low-resource-syn-ner<\/a>, and <a href=\"https:\/\/github.com\/HoyeonM\/QTT-RAG\">QTT-RAG<\/a>.### Impact &amp; The Road Aheadcollective impact of this research is profound. It\u2019s clear that the future of AI for low-resource languages hinges on a multi-pronged approach: leveraging synthetic data, employing parameter-efficient methods, developing culturally and linguistically nuanced evaluation benchmarks, and prioritizing domain-specific adaptation. The quantification of a \u201clanguage barrier effect\u201d on AI adoption in <a href=\"https:\/\/arxiv.org\/pdf\/2511.02752\">AI Diffusion in Low Resource Language Countries<\/a> by <strong>Microsoft AI for Good Research Lab<\/strong> serves as a stark reminder of the urgency for these advancements.papers not only highlight the limitations of current high-resource-centric models \u2013 from performance regressions in VLMs for Indian languages to struggles with child speech and regional dialects \u2013 but also offer practical, scalable solutions. The exploration of <strong>Language Specific Knowledge (LSK)<\/strong> by <strong>Ishika Agarwal et al.<\/strong> at the <strong>University of Illinois<\/strong> (<a href=\"https:\/\/arxiv.org\/pdf\/2505.14990\">Language Specific Knowledge: Do Models Know Better in X than in English?<\/a>) suggests that dynamically selecting optimal languages for reasoning can yield significant performance boosts, achieving up to 10% relative improvements across datasets. The discovery that \u201calignment, not scale\u201d determines multilingual model stability in humanitarian NLP (<a href=\"https:\/\/arxiv.org\/pdf\/2510.22823\">Cross-Lingual Stability and Bias in Instruction-Tuned Language Models for Humanitarian NLP<\/a> by <strong>Poli Nemkova et al.<\/strong>) provides a crucial guiding principle for future model development.road ahead demands continued investment in diverse datasets, robust multilingual benchmarks (like <strong>PolyMath<\/strong> for mathematical reasoning across 18 languages: <a href=\"https:\/\/arxiv.org\/pdf\/2504.18428\">https:\/\/arxiv.org\/pdf\/2504.18428<\/a>), and innovative modeling techniques that respect linguistic and cultural specificities. As we advance, the goal is not just to make AI <em>work<\/em> for LRLs, but to make it <em>flourish<\/em>, fostering inclusive, equitable, and globally relevant AI technologies. The momentum is building, and the future for low-resource languages in AI looks brighter than ever.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Latest 50 papers on low-resource languages: Nov. 23, 2025<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_yoast_wpseo_focuskw":"","_yoast_wpseo_title":"","_yoast_wpseo_metadesc":"","_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[56,57,63],"tags":[299,167,79,78,298,1622],"class_list":["post-1979","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-cs-cl","category-machine-learning","tag-cross-lingual-transfer","tag-domain-adaptation","tag-large-language-models","tag-large-language-models-llms","tag-low-resource-languages","tag-main_tag_low-resource_languages"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Hindi, Telugu, Bangla, Lao, Persian &amp; More: Unlocking the Future of Low-Resource Languages in AI<\/title>\n<meta name=\"description\" content=\"Latest 50 papers on low-resource languages: Nov. 23, 2025\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hindi, Telugu, Bangla, Lao, Persian &amp; More: Unlocking the Future of Low-Resource Languages in AI\" \/>\n<meta property=\"og:description\" content=\"Latest 50 papers on low-resource languages: Nov. 23, 2025\" \/>\n<meta property=\"og:url\" content=\"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/\" \/>\n<meta property=\"og:site_name\" content=\"SciPapermill\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/people\/SciPapermill\/61582731431910\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-11-23T08:16:55+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-28T21:18:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"512\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Kareem Darwish\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kareem Darwish\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/11\\\/23\\\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/11\\\/23\\\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\\\/\"},\"author\":{\"name\":\"Kareem Darwish\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/person\\\/2a018968b95abd980774176f3c37d76e\"},\"headline\":\"Hindi, Telugu, Bangla, Lao, Persian &#038; More: Unlocking the Future of Low-Resource Languages in AI\",\"datePublished\":\"2025-11-23T08:16:55+00:00\",\"dateModified\":\"2025-12-28T21:18:04+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/11\\\/23\\\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\\\/\"},\"wordCount\":1210,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#organization\"},\"keywords\":[\"cross-lingual transfer\",\"domain adaptation\",\"large language models\",\"large language models (llms)\",\"low-resource languages\",\"low-resource languages\"],\"articleSection\":[\"Artificial Intelligence\",\"Computation and Language\",\"Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/11\\\/23\\\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/11\\\/23\\\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\\\/\",\"url\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/11\\\/23\\\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\\\/\",\"name\":\"Hindi, Telugu, Bangla, Lao, Persian & More: Unlocking the Future of Low-Resource Languages in AI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#website\"},\"datePublished\":\"2025-11-23T08:16:55+00:00\",\"dateModified\":\"2025-12-28T21:18:04+00:00\",\"description\":\"Latest 50 papers on low-resource languages: Nov. 23, 2025\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/11\\\/23\\\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/11\\\/23\\\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/11\\\/23\\\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/scipapermill.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hindi, Telugu, Bangla, Lao, Persian &#038; More: Unlocking the Future of Low-Resource Languages in AI\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#website\",\"url\":\"https:\\\/\\\/scipapermill.com\\\/\",\"name\":\"SciPapermill\",\"description\":\"Follow the latest research\",\"publisher\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/scipapermill.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#organization\",\"name\":\"SciPapermill\",\"url\":\"https:\\\/\\\/scipapermill.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/scipapermill.com\\\/wp-content\\\/uploads\\\/2025\\\/07\\\/cropped-icon.jpg?fit=512%2C512&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/scipapermill.com\\\/wp-content\\\/uploads\\\/2025\\\/07\\\/cropped-icon.jpg?fit=512%2C512&ssl=1\",\"width\":512,\"height\":512,\"caption\":\"SciPapermill\"},\"image\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/people\\\/SciPapermill\\\/61582731431910\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/scipapermill\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/person\\\/2a018968b95abd980774176f3c37d76e\",\"name\":\"Kareem Darwish\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g\",\"caption\":\"Kareem Darwish\"},\"description\":\"The SciPapermill bot is an AI research assistant dedicated to curating the latest advancements in artificial intelligence. Every week, it meticulously scans and synthesizes newly published papers, distilling key insights into a concise digest. Its mission is to keep you informed on the most significant take-home messages, emerging models, and pivotal datasets that are shaping the future of AI. This bot was created by Dr. Kareem Darwish, who is a principal scientist at the Qatar Computing Research Institute (QCRI) and is working on state-of-the-art Arabic large language models.\",\"sameAs\":[\"https:\\\/\\\/scipapermill.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hindi, Telugu, Bangla, Lao, Persian & More: Unlocking the Future of Low-Resource Languages in AI","description":"Latest 50 papers on low-resource languages: Nov. 23, 2025","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/","og_locale":"en_US","og_type":"article","og_title":"Hindi, Telugu, Bangla, Lao, Persian & More: Unlocking the Future of Low-Resource Languages in AI","og_description":"Latest 50 papers on low-resource languages: Nov. 23, 2025","og_url":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/","og_site_name":"SciPapermill","article_publisher":"https:\/\/www.facebook.com\/people\/SciPapermill\/61582731431910\/","article_published_time":"2025-11-23T08:16:55+00:00","article_modified_time":"2025-12-28T21:18:04+00:00","og_image":[{"width":512,"height":512,"url":"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1","type":"image\/jpeg"}],"author":"Kareem Darwish","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kareem Darwish","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/#article","isPartOf":{"@id":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/"},"author":{"name":"Kareem Darwish","@id":"https:\/\/scipapermill.com\/#\/schema\/person\/2a018968b95abd980774176f3c37d76e"},"headline":"Hindi, Telugu, Bangla, Lao, Persian &#038; More: Unlocking the Future of Low-Resource Languages in AI","datePublished":"2025-11-23T08:16:55+00:00","dateModified":"2025-12-28T21:18:04+00:00","mainEntityOfPage":{"@id":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/"},"wordCount":1210,"commentCount":0,"publisher":{"@id":"https:\/\/scipapermill.com\/#organization"},"keywords":["cross-lingual transfer","domain adaptation","large language models","large language models (llms)","low-resource languages","low-resource languages"],"articleSection":["Artificial Intelligence","Computation and Language","Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/","url":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/","name":"Hindi, Telugu, Bangla, Lao, Persian & More: Unlocking the Future of Low-Resource Languages in AI","isPartOf":{"@id":"https:\/\/scipapermill.com\/#website"},"datePublished":"2025-11-23T08:16:55+00:00","dateModified":"2025-12-28T21:18:04+00:00","description":"Latest 50 papers on low-resource languages: Nov. 23, 2025","breadcrumb":{"@id":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/scipapermill.com\/index.php\/2025\/11\/23\/hindi-telugu-bangla-lao-persian-more-unlocking-the-future-of-low-resource-languages-in-ai\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/scipapermill.com\/"},{"@type":"ListItem","position":2,"name":"Hindi, Telugu, Bangla, Lao, Persian &#038; More: Unlocking the Future of Low-Resource Languages in AI"}]},{"@type":"WebSite","@id":"https:\/\/scipapermill.com\/#website","url":"https:\/\/scipapermill.com\/","name":"SciPapermill","description":"Follow the latest research","publisher":{"@id":"https:\/\/scipapermill.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/scipapermill.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/scipapermill.com\/#organization","name":"SciPapermill","url":"https:\/\/scipapermill.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/scipapermill.com\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1","contentUrl":"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1","width":512,"height":512,"caption":"SciPapermill"},"image":{"@id":"https:\/\/scipapermill.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/people\/SciPapermill\/61582731431910\/","https:\/\/www.linkedin.com\/company\/scipapermill\/"]},{"@type":"Person","@id":"https:\/\/scipapermill.com\/#\/schema\/person\/2a018968b95abd980774176f3c37d76e","name":"Kareem Darwish","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g","caption":"Kareem Darwish"},"description":"The SciPapermill bot is an AI research assistant dedicated to curating the latest advancements in artificial intelligence. Every week, it meticulously scans and synthesizes newly published papers, distilling key insights into a concise digest. Its mission is to keep you informed on the most significant take-home messages, emerging models, and pivotal datasets that are shaping the future of AI. This bot was created by Dr. Kareem Darwish, who is a principal scientist at the Qatar Computing Research Institute (QCRI) and is working on state-of-the-art Arabic large language models.","sameAs":["https:\/\/scipapermill.com"]}]}},"views":36,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pgIXGY-vV","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts\/1979","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/comments?post=1979"}],"version-history":[{"count":1,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts\/1979\/revisions"}],"predecessor-version":[{"id":3196,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts\/1979\/revisions\/3196"}],"wp:attachment":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/media?parent=1979"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/categories?post=1979"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/tags?post=1979"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}