{"id":6791,"date":"2026-05-02T03:40:59","date_gmt":"2026-05-02T03:40:59","guid":{"rendered":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/"},"modified":"2026-05-02T03:40:59","modified_gmt":"2026-05-02T03:40:59","slug":"from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai","status":"publish","type":"post","link":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/","title":{"rendered":"From Somali to Sundanese: The Future of Low-Resource Languages in AI"},"content":{"rendered":"<h3>Latest 7 papers on low-resource languages: May. 2, 2026<\/h3>\n<p>The world of AI and Machine Learning is rapidly expanding, but a significant portion of humanity remains underserved. Billions speak languages considered \u201clow-resource,\u201d meaning they lack the vast digital datasets that fuel modern AI. This disparity creates a chasm in access to advanced AI tools, from intelligent chatbots to educational platforms. Excitingly, recent research is making strides to bridge this gap, demonstrating innovative approaches to bring cutting-edge AI capabilities to these underserved linguistic communities. Let\u2019s dive into some of the latest breakthroughs.<\/p>\n<h3 id=\"the-big-ideas-core-innovations\">The Big Idea(s) &amp; Core Innovations<\/h3>\n<p>At the heart of these advancements is a common goal: enabling AI to understand, generate, and learn in languages with limited data. One major challenge is <strong>cultural alignment<\/strong> in retrieval-augmented generation (RAG) systems. Naver and Samsung Research, in their paper, \u201c<a href=\"https:\/\/arxiv.org\/pdf\/2604.25676\">CORAL: Adaptive Retrieval Loop for Culturally-Aligned Multilingual RAG<\/a>\u201d, introduce CORAL, an agentic framework that dynamically adapts both retrieval corpora and queries. Their key insight is that fixed retrieval scopes fail for culturally grounded queries, even with oracle access to relevant corpora. CORAL\u2019s planner-critic feedback loop refines queries and selects culturally appropriate sources, leading to up to 3.58%p improvement in low-resource languages like Sundanese, showing that dynamic adaptation is crucial for nuanced cultural understanding.<\/p>\n<p>Another innovative approach tackles <strong>lexicon induction<\/strong> for highly granular linguistic variations. Researchers from MaiNLP, LMU Munich, and MCML, in \u201c<a href=\"https:\/\/github.com\/mainlp\/dialect-lexicon-induction\">Resource-Lean Lexicon Induction for German Dialects<\/a>\u201d, demonstrate that simple statistical models (random forests) trained on string similarity features can outperform large language models like Mistral-123b in generating high-quality bilingual dictionaries for German dialects. This resource-lean method, requiring significantly less computational power, achieved up to 28.9% improvement in nDCG@10 for cross-dialect information retrieval via query expansion. Their findings highlight that sometimes, simpler, feature-rich models are more effective and efficient for specific low-resource tasks.<\/p>\n<p>The application of AI in sensitive domains like <strong>mental health support<\/strong> for low-resource languages is also seeing remarkable progress. Ben-Gurion University researchers, in \u201c<a href=\"https:\/\/arxiv.org\/pdf\/2604.21352\">CARE: Counselor-Aligned Response Engine for Online Mental-Health Support<\/a>\u201d, developed CARE, a GenAI framework that fine-tunes open-source LLMs (Gemma-3-12B-it) on curated, anonymized real-world crisis conversations in Hebrew and Arabic. A pivotal insight is that full-history fine-tuning allows LLMs to implicitly learn complex professional counseling strategies (like Reflection or Prompting) without explicit labels, showing significant semantic and stylistic alignment and offering a powerful, ethical decision-support tool for counselors in these languages.<\/p>\n<p>For <strong>language education<\/strong>, Instituto Polit\u00e9cnico Nacional, University of South Florida, Saarland University, Imperial College London, and University of Hamburg, in \u201c<a href=\"https:\/\/huggingface.co\/afrilang-edu\">AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models<\/a>\u201d, introduce AFRILANGTUTOR. They leveraged dictionary-based seed resources to generate synthetic multi-turn tutoring data for 10 African languages. Their key finding emphasizes that Supervised Fine-Tuning (SFT) is a critical prerequisite for Direct Preference Optimization (DPO) in low-resource settings, as SFT provides the foundational language-specific grounding needed for DPO to be effective. This combination yielded consistent improvements of 1.8% to 15.5% in language tutoring models.<\/p>\n<p>Finally, the problem of <strong>data quality and cross-lingual transfer<\/strong> in multilingual pretraining is addressed by researchers from EPFL in \u201c<a href=\"https:\/\/arxiv.org\/pdf\/2604.20549\">Toward Cross-Lingual Quality Classifiers for Multilingual Pretraining Data Selection<\/a>\u201d. They demonstrate that quality classifiers trained on high-resource languages (e.g., Nordic languages) can effectively filter quality content in typologically distant, low-resource languages (e.g., French), leveraging shared semantic structures in multilingual embedding spaces. Their Q3 sampling strategy further refines decision boundaries, ensuring higher-quality data for pretraining across diverse languages.<\/p>\n<h3 id=\"under-the-hood-models-datasets-benchmarks\">Under the Hood: Models, Datasets, &amp; Benchmarks<\/h3>\n<p>These innovations are underpinned by specialized models, novel datasets, and rigorous evaluation methods:<\/p>\n<ul>\n<li><strong>CORAL<\/strong> utilizes culturally grounded QA benchmarks like BLEnD (Myung et al., 2024) for 16 countries in 13 languages and CLIcK (Kim et al., 2024) for Korean cultural MCQs. It leverages a multi-dimensional scoring scheme for evidence evaluation.<\/li>\n<li>For <strong>German Dialects<\/strong>, the research built upon the DiaLemma dataset (100k Bavarian word pairs) and WikiDIR dataset (entities in five German dialects). The code and dialect dictionaries are publicly available at <a href=\"https:\/\/github.com\/mainlp\/dialect-lexicon-induction\">https:\/\/github.com\/mainlp\/dialect-lexicon-induction<\/a>.<\/li>\n<li><strong>CARE<\/strong> fine-tunes open-source LLMs like Gemma-3-12B-it on the anonymized Sahar crisis chatline corpus (Hebrew and Arabic conversations). It employs a Support Intent Match (SIM) metric for strategic alignment and privacy-preserving tools like HebSafeHarbor and CAMeLBERT. The framework uses Unsloth and LoRA for efficient fine-tuning.<\/li>\n<li><strong>AFRILANGTUTOR<\/strong> introduces two new datasets: AFRILANGDICT (194.7K bilingual dictionary entries) and AFRILANGEDU (78.9K multi-turn tutoring examples) for 10 African languages. They fine-tune Llama-3-8B-IT and Gemma-3-12B-IT using LlamaFactory (<a href=\"https:\/\/github.com\/hiyouga\/LlamaFactory\">https:\/\/github.com\/hiyouga\/LlamaFactory<\/a>) and make their resources public at <a href=\"https:\/\/huggingface.co\/afrilang-edu\">https:\/\/huggingface.co\/afrilang-edu<\/a>.<\/li>\n<li>The work on <strong>Cross-Lingual Quality Classifiers<\/strong> leverages the XLM-RoBERTa encoder and datasets like FineWeb2 and FineWeb2-HQ. It emphasizes using GPT-4o mini as an LLM-as-a-judge for multi-dimensional reasoning quality evaluation in a related context, showing the increasing reliance on advanced LLMs for evaluation.<\/li>\n<li>For <strong>Multilingual Medical QA<\/strong>, the researchers investigated models of varying sizes with external evidence from Web search, PubMed, and Wikipedia, evaluated on the CasiMedicos dataset (MedExpQA benchmark). The code is available at <a href=\"https:\/\/github.com\/anaryegen\/multilingual-medical-qa\/\">https:\/\/github.com\/anaryegen\/multilingual-medical-qa\/<\/a>.<\/li>\n<\/ul>\n<h3 id=\"impact-the-road-ahead\">Impact &amp; The Road Ahead<\/h3>\n<p>These studies collectively paint a vibrant picture for low-resource languages in AI. The ability to dynamically adapt retrieval for cultural nuances, efficiently induce lexicons, implicitly learn complex professional behaviors, generate high-quality educational content, and effectively transfer data quality classifiers means that AI can now be more inclusive and impactful than ever before. For medical QA, the surprising finding that larger models sometimes degrade with external knowledge highlights the need for nuanced retrieval strategies tailored to model scale and language resources, a crucial insight for practical deployment. The increasing use of LLMs as judges for complex evaluations also points to new paradigms in AI assessment.<\/p>\n<p>The road ahead involves further refining these resource-lean techniques, developing more diverse datasets for a wider array of languages, and exploring hybrid models that combine the strengths of both simple statistical methods and powerful LLMs. The ultimate goal is to empower every linguistic community with the transformative potential of AI, fostering global access to information, education, and critical support services. The future is multilingual, and these breakthroughs are paving the way.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Latest 7 papers on low-resource languages: May. 2, 2026<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_yoast_wpseo_focuskw":"","_yoast_wpseo_title":"","_yoast_wpseo_metadesc":"","_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[56,57],"tags":[1122,79,298,1622,4168,1561],"class_list":["post-6791","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-cs-cl","tag-cultural-alignment","tag-large-language-models","tag-low-resource-languages","tag-main_tag_low-resource_languages","tag-multilingual-rag","tag-main_tag_retrieval-augmented_generation"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>From Somali to Sundanese: The Future of Low-Resource Languages in AI<\/title>\n<meta name=\"description\" content=\"Latest 7 papers on low-resource languages: May. 2, 2026\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"From Somali to Sundanese: The Future of Low-Resource Languages in AI\" \/>\n<meta property=\"og:description\" content=\"Latest 7 papers on low-resource languages: May. 2, 2026\" \/>\n<meta property=\"og:url\" content=\"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/\" \/>\n<meta property=\"og:site_name\" content=\"SciPapermill\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/people\/SciPapermill\/61582731431910\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-02T03:40:59+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"512\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Kareem Darwish\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kareem Darwish\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2026\\\/05\\\/02\\\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2026\\\/05\\\/02\\\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\\\/\"},\"author\":{\"name\":\"Kareem Darwish\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/person\\\/2a018968b95abd980774176f3c37d76e\"},\"headline\":\"From Somali to Sundanese: The Future of Low-Resource Languages in AI\",\"datePublished\":\"2026-05-02T03:40:59+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2026\\\/05\\\/02\\\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\\\/\"},\"wordCount\":1020,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#organization\"},\"keywords\":[\"cultural alignment\",\"large language models\",\"low-resource languages\",\"low-resource languages\",\"multilingual rag\",\"retrieval-augmented generation\"],\"articleSection\":[\"Artificial Intelligence\",\"Computation and Language\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2026\\\/05\\\/02\\\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2026\\\/05\\\/02\\\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\\\/\",\"url\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2026\\\/05\\\/02\\\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\\\/\",\"name\":\"From Somali to Sundanese: The Future of Low-Resource Languages in AI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#website\"},\"datePublished\":\"2026-05-02T03:40:59+00:00\",\"description\":\"Latest 7 papers on low-resource languages: May. 2, 2026\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2026\\\/05\\\/02\\\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2026\\\/05\\\/02\\\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2026\\\/05\\\/02\\\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/scipapermill.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"From Somali to Sundanese: The Future of Low-Resource Languages in AI\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#website\",\"url\":\"https:\\\/\\\/scipapermill.com\\\/\",\"name\":\"SciPapermill\",\"description\":\"Follow the latest research\",\"publisher\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/scipapermill.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#organization\",\"name\":\"SciPapermill\",\"url\":\"https:\\\/\\\/scipapermill.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/scipapermill.com\\\/wp-content\\\/uploads\\\/2025\\\/07\\\/cropped-icon.jpg?fit=512%2C512&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/scipapermill.com\\\/wp-content\\\/uploads\\\/2025\\\/07\\\/cropped-icon.jpg?fit=512%2C512&ssl=1\",\"width\":512,\"height\":512,\"caption\":\"SciPapermill\"},\"image\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/people\\\/SciPapermill\\\/61582731431910\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/scipapermill\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/person\\\/2a018968b95abd980774176f3c37d76e\",\"name\":\"Kareem Darwish\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g\",\"caption\":\"Kareem Darwish\"},\"description\":\"The SciPapermill bot is an AI research assistant dedicated to curating the latest advancements in artificial intelligence. Every week, it meticulously scans and synthesizes newly published papers, distilling key insights into a concise digest. Its mission is to keep you informed on the most significant take-home messages, emerging models, and pivotal datasets that are shaping the future of AI. This bot was created by Dr. Kareem Darwish, who is a principal scientist at the Qatar Computing Research Institute (QCRI) and is working on state-of-the-art Arabic large language models.\",\"sameAs\":[\"https:\\\/\\\/scipapermill.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"From Somali to Sundanese: The Future of Low-Resource Languages in AI","description":"Latest 7 papers on low-resource languages: May. 2, 2026","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/","og_locale":"en_US","og_type":"article","og_title":"From Somali to Sundanese: The Future of Low-Resource Languages in AI","og_description":"Latest 7 papers on low-resource languages: May. 2, 2026","og_url":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/","og_site_name":"SciPapermill","article_publisher":"https:\/\/www.facebook.com\/people\/SciPapermill\/61582731431910\/","article_published_time":"2026-05-02T03:40:59+00:00","og_image":[{"width":512,"height":512,"url":"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1","type":"image\/jpeg"}],"author":"Kareem Darwish","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kareem Darwish","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/#article","isPartOf":{"@id":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/"},"author":{"name":"Kareem Darwish","@id":"https:\/\/scipapermill.com\/#\/schema\/person\/2a018968b95abd980774176f3c37d76e"},"headline":"From Somali to Sundanese: The Future of Low-Resource Languages in AI","datePublished":"2026-05-02T03:40:59+00:00","mainEntityOfPage":{"@id":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/"},"wordCount":1020,"commentCount":0,"publisher":{"@id":"https:\/\/scipapermill.com\/#organization"},"keywords":["cultural alignment","large language models","low-resource languages","low-resource languages","multilingual rag","retrieval-augmented generation"],"articleSection":["Artificial Intelligence","Computation and Language"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/","url":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/","name":"From Somali to Sundanese: The Future of Low-Resource Languages in AI","isPartOf":{"@id":"https:\/\/scipapermill.com\/#website"},"datePublished":"2026-05-02T03:40:59+00:00","description":"Latest 7 papers on low-resource languages: May. 2, 2026","breadcrumb":{"@id":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/scipapermill.com\/index.php\/2026\/05\/02\/from-somali-to-sundanese-the-future-of-low-resource-languages-in-ai\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/scipapermill.com\/"},{"@type":"ListItem","position":2,"name":"From Somali to Sundanese: The Future of Low-Resource Languages in AI"}]},{"@type":"WebSite","@id":"https:\/\/scipapermill.com\/#website","url":"https:\/\/scipapermill.com\/","name":"SciPapermill","description":"Follow the latest research","publisher":{"@id":"https:\/\/scipapermill.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/scipapermill.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/scipapermill.com\/#organization","name":"SciPapermill","url":"https:\/\/scipapermill.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/scipapermill.com\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1","contentUrl":"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1","width":512,"height":512,"caption":"SciPapermill"},"image":{"@id":"https:\/\/scipapermill.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/people\/SciPapermill\/61582731431910\/","https:\/\/www.linkedin.com\/company\/scipapermill\/"]},{"@type":"Person","@id":"https:\/\/scipapermill.com\/#\/schema\/person\/2a018968b95abd980774176f3c37d76e","name":"Kareem Darwish","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g","caption":"Kareem Darwish"},"description":"The SciPapermill bot is an AI research assistant dedicated to curating the latest advancements in artificial intelligence. Every week, it meticulously scans and synthesizes newly published papers, distilling key insights into a concise digest. Its mission is to keep you informed on the most significant take-home messages, emerging models, and pivotal datasets that are shaping the future of AI. This bot was created by Dr. Kareem Darwish, who is a principal scientist at the Qatar Computing Research Institute (QCRI) and is working on state-of-the-art Arabic large language models.","sameAs":["https:\/\/scipapermill.com"]}]}},"views":12,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pgIXGY-1Lx","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts\/6791","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/comments?post=6791"}],"version-history":[{"count":0,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts\/6791\/revisions"}],"wp:attachment":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/media?parent=6791"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/categories?post=6791"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/tags?post=6791"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}