{"id":1427,"date":"2025-10-06T20:47:41","date_gmt":"2025-10-06T20:47:41","guid":{"rendered":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/"},"modified":"2025-12-28T21:57:05","modified_gmt":"2025-12-28T21:57:05","slug":"speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency","status":"publish","type":"post","link":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/","title":{"rendered":"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency"},"content":{"rendered":"<h3>Latest 50 papers on text-to-speech: Oct. 6, 2025<\/h3>\n<p>The quest for truly human-like and controllable synthetic speech continues to be a vibrant frontier in AI\/ML. Recent breakthroughs are propelling Text-to-Speech (TTS) systems beyond mere readability, allowing for unprecedented expressiveness, personalized voices, and real-time responsiveness across diverse linguistic and emotional landscapes. This digest dives into cutting-edge research that is reshaping how we generate and interact with synthetic voices.<\/p>\n<h3 id=\"the-big-ideas-core-innovations\">The Big Idea(s) &amp; Core Innovations<\/h3>\n<p>The central theme across this collection of papers is the drive for <strong>finer-grained control and enhanced naturalness<\/strong> in speech synthesis, often leveraging large language models (LLMs) and innovative architectural designs. Researchers are tackling challenges ranging from emotional nuances to cross-lingual adaptability and real-time performance.<\/p>\n<p>One significant area of innovation lies in <strong>emotional and stylistic control<\/strong>. Researchers from the [University of Science and Technology, China (USTC)] in their paper, <a href=\"https:\/\/arxiv.org\/pdf\/2510.01722\">\u201cEmotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement\u201d<\/a>, propose a mutual-information-guided framework to disentangle emotion and timbre, enhancing phoneme-level prosody. Building on this, <a href=\"https:\/\/arxiv.org\/pdf\/2509.20378\">Sirui Wang et al.\u00a0from Harbin Institute of Technology<\/a> introduce Emo-FiLM in <a href=\"https:\/\/arxiv.org\/pdf\/2509.20378\">\u201cBeyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation\u201d<\/a>, enabling dynamic word-level emotion control through Feature-wise Linear Modulation (FiLM). Furthering emotional understanding, <a href=\"https:\/\/arxiv.org\/pdf\/2505.10599\">Jiaxuan Liu et al.\u00a0from the University of Science and Technology of China and Alibaba Group<\/a> present UDDETTS in <a href=\"https:\/\/arxiv.org\/pdf\/2505.10599\">\u201cUDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech\u201d<\/a>, a universal LLM framework that unifies discrete and dimensional emotions via the interpretable Arousal-Dominance-Valence (ADV) space. This provides more granular control than traditional label-based methods. For diffusion models, <a href=\"https:\/\/arxiv.org\/pdf\/2509.25416\">Jiacheng Shi et al.\u00a0from the College of William &amp; Mary<\/a> introduce EASPO in <a href=\"https:\/\/arxiv.org\/pdf\/2509.25416\">\u201cEmotion-Aligned Generation in Diffusion Text to Speech Models via Preference-Guided Optimization\u201d<\/a> to align emotional expression with prosody through preference-guided optimization, tackling challenges in diffusion TTS.<\/p>\n<p><strong>Controllability and personalization<\/strong> are also seeing major strides. <a href=\"https:\/\/arxiv.org\/pdf\/2509.26514\">Yue Wang et al.\u00a0from Tencent Multimodal Department and Soochow University<\/a> introduce BATONVOICE in <a href=\"https:\/\/arxiv.org\/pdf\/2509.26514\">\u201cBatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs\u201d<\/a>, decoupling instruction understanding from speech generation to allow LLMs to guide synthesis. This framework demonstrates remarkable zero-shot cross-lingual generalization. <a href=\"https:\/\/arxiv.org\/pdf\/2509.25842\">Ziyu Zhang et al.\u00a0from Northwestern Polytechnical University<\/a> present HiStyle in <a href=\"https:\/\/arxiv.org\/pdf\/2509.25842\">\u201cHiStyle: Hierarchical Style Embedding Predictor for Text-Prompt-Guided Controllable Speech Synthesis\u201d<\/a>, which uses a hierarchical two-stage style embedding predictor with contrastive learning for more flexible text-prompt-guided control. A new benchmark for voice cloning, <a href=\"https:\/\/arxiv.org\/pdf\/2504.20581\">ClonEval<\/a>, proposed by <a href=\"https:\/\/arxiv.org\/pdf\/2504.20581\">Iwona Christop et al.\u00a0from Adam Mickiewicz University<\/a>, aims to standardize the evaluation of voice cloning models, acknowledging the variability in emotional cloning. Furthermore, <a href=\"https:\/\/arxiv.org\/pdf\/2505.17093\">Yejin Lee et al.\u00a0from Sungkyunkwan University<\/a> introduce P2VA in <a href=\"https:\/\/arxiv.org\/pdf\/2505.17093\">\u201cP2VA: Converting Persona Descriptions into Voice Attributes for Fair and Controllable Text-to-Speech\u201d<\/a>, a framework that converts natural language persona descriptions into explicit voice attributes, bridging the usability gap for non-expert users and highlighting generative model bias.<\/p>\n<p>For <strong>low-resource languages and cross-lingual capabilities<\/strong>, <a href=\"https:\/\/arxiv.org\/pdf\/2509.21718\">Shehzeen Hussain et al.\u00a0from NVIDIA Corporation<\/a> offer <a href=\"https:\/\/arxiv.org\/pdf\/2509.21718\">\u201cAlign2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization\u201d<\/a>, which uses ASR-guided online preference optimization to adapt multilingual TTS models, outperforming traditional fine-tuning. <a href=\"https:\/\/arxiv.org\/pdf\/2509.14579\">Qingyu Liu et al.\u00a0from Shanghai Jiao Tong University and Geely<\/a> introduce <a href=\"https:\/\/arxiv.org\/pdf\/2509.14579\">\u201cCross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis\u201d<\/a>, enabling cross-lingual voice cloning without audio prompt transcripts via MMS forced alignment. In a focused effort, <a href=\"https:\/\/arxiv.org\/pdf\/2509.18060\">Yutong Liu et al.\u00a0from the University of Electronic Science and Technology of China<\/a> developed TMD-TTS, a unified Tibetan multi-dialect TTS framework for generating high-quality speech across different Tibetan dialects. Similarly, <a href=\"https:\/\/arxiv.org\/pdf\/2509.05863\">Lu\u00eds Felipe Chary et al.\u00a0from Universidade de S\u00e3o Paulo<\/a> in <a href=\"https:\/\/arxiv.org\/pdf\/2509.05863\">\u201cLatinX: Aligning a Multilingual TTS Model with Direct Preference Optimization\u201d<\/a> use Direct Preference Optimization (DPO) to preserve speaker identity across languages, demonstrating significant improvements.<\/p>\n<p><strong>Efficiency and robustness<\/strong> are continuously being refined. <a href=\"https:\/\/herimor.github.io\/voxtream\">Nikita Torgashov et al.\u00a0from KTH Royal Institute of Technology<\/a> introduce VoXtream in <a href=\"https:\/\/herimor.github.io\/voxtream\">\u201cVoXtream: Full-Stream Text-to-Speech with Extremely Low Latency\u201d<\/a>, a zero-shot, fully autoregressive streaming TTS system with ultra-low initial delay (102 ms). <a href=\"https:\/\/arxiv.org\/pdf\/2509.15085\">Simon Welker et al.\u00a0from the University of Hamburg<\/a> propose MelFlow in <a href=\"https:\/\/arxiv.org\/pdf\/2509.15085\">\u201cReal-Time Streaming Mel Vocoding with Generative Flow Matching\u201d<\/a>, a real-time streaming Mel vocoder leveraging diffusion-based flow matching. For accelerating existing models, <a href=\"https:\/\/arxiv.org\/pdf\/2509.09748\">Yanru Huo et al.\u00a0from Zhejiang University<\/a> introduce DiTReducio in <a href=\"https:\/\/arxiv.org\/pdf\/2509.09748\">\u201cDiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration\u201d<\/a> to reduce computational overhead in DiT-based TTS models. Further enhancing efficiency, <a href=\"https:\/\/diflow-tts.github.io\">Ngoc-Son Nguyen et al.\u00a0from FPT Software AI Center<\/a> present DiFlow-TTS in <a href=\"https:\/\/diflow-tts.github.io\">\u201cDiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech\u201d<\/a>, a zero-shot system using discrete flow matching and factorized speech token modeling.<\/p>\n<p>Other notable advancements include <strong>improving fundamental components and data quality<\/strong>. <a href=\"https:\/\/arxiv.org\/abs\/2509.17006\">Junjie Cao et al.\u00a0from Tsinghua University and AMAP Speech<\/a> introduce CaT-TTS in <a href=\"https:\/\/arxiv.org\/abs\/2509.17006\">\u201cComprehend and Talk: Text to Speech Synthesis via Dual Language Modeling\u201d<\/a> for improved zero-shot voice cloning through semantic understanding and acoustic generation. <a href=\"https:\/\/arxiv.org\/pdf\/2509.17052\">Wataru Nakata et al.\u00a0from The University of Tokyo<\/a> present Sidon in <a href=\"https:\/\/arxiv.org\/pdf\/2509.17052\">\u201cSidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-scale Dataset Cleansing\u201d<\/a>, an open-source multilingual speech restoration model for cleaning noisy datasets. <a href=\"https:\/\/arxiv.org\/pdf\/2509.14270\">Karan Dua et al.\u00a0from Oracle AI<\/a> introduce SpeechWeave, a pipeline for generating high-quality, diverse multilingual synthetic text and audio data. <a href=\"https:\/\/zhengrachel.github.io\/VARSTok\">Rui-Chen Zheng et al.\u00a0from USTC<\/a> introduce VARSTok in <a href=\"https:\/\/zhengrachel.github.io\/VARSTok\">\u201cSay More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding\u201d<\/a>, a variable-frame-rate speech tokenizer that uses fewer tokens while improving naturalness. <a href=\"https:\/\/arxiv.org\/pdf\/2509.14678\">Hyunjae Soh et al.\u00a0from Seoul National University (SNU)<\/a> propose Stochastic Clock Attention (SCA) in <a href=\"https:\/\/arxiv.org\/pdf\/2509.14678\">\u201cStochastic Clock Attention for Aligning Continuous and Ordered Sequences\u201d<\/a> for improved text-speech alignment. <a href=\"https:\/\/arxiv.org\/pdf\/2509.11084\">Hyeongju Kim et al.\u00a0from Supertone, Inc.<\/a> introduce Length-Aware RoPE (LARoPE) for better text-speech alignment in transformer-based TTS systems.<\/p>\n<h3 id=\"under-the-hood-models-datasets-benchmarks\">Under the Hood: Models, Datasets, &amp; Benchmarks<\/h3>\n<p>These innovations are often powered by novel architectural designs, specialized datasets, and rigorous evaluation methods:<\/p>\n<ul>\n<li><strong>BATONTTS<\/strong>: A specialized TTS model, part of the BATONVOICE framework, trained to synthesize speech from explicit vocal features generated by an LLM. Code: <a href=\"https:\/\/github.com\/Tencent\/digitalhuman\/tree\/main\/BatonVoice\">https:\/\/github.com\/Tencent\/digitalhuman\/tree\/main\/BatonVoice<\/a><\/li>\n<li><strong>HiStyle<\/strong>: A two-stage style embedding predictor leveraging contrastive learning for text-prompt-guided controllable speech synthesis. Code: <a href=\"https:\/\/anonymous.4open.science\/w\/HiStyle-2517\/\">https:\/\/anonymous.4open.science\/w\/HiStyle-2517\/<\/a><\/li>\n<li><strong>EASPO<\/strong>: A stepwise alignment framework for diffusion TTS, using EASPM, a time-aware reward model, for emotion-aligned generation. Code: <a href=\"https:\/\/github.com\/yourusername\/EASPO\">https:\/\/github.com\/yourusername\/EASPO<\/a><\/li>\n<li><strong>CaT-TTS<\/strong>: A dual language modeling system with S3Codec (a split residual vector quantization codec) and an \u201cUnderstand-then-Generate\u201d architecture for zero-shot voice cloning. Resources: <a href=\"https:\/\/arxiv.org\/abs\/2509.17006\">https:\/\/arxiv.org\/abs\/2509.17006<\/a><\/li>\n<li><strong>Align2Speak (GRPO-based framework)<\/strong>: Adapts multilingual TTS models to low-resource languages using ASR, speaker verification, and PESQ as multi-objective rewards for online preference optimization. Code: <a href=\"https:\/\/github.com\/grpotts\">https:\/\/github.com\/grpotts<\/a><\/li>\n<li><strong>i-LAVA<\/strong>: A low-latency voice-to-voice architecture for real-time agent interactions. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.20971\">https:\/\/arxiv.org\/pdf\/2509.20971<\/a><\/li>\n<li><strong>Emo-FiLM<\/strong>: A framework for word-level controllable fine-grained emotional speech synthesis, supported by the new Fine-grained Emotion Dynamics Dataset (FEDD). Code: <a href=\"https:\/\/arxiv.org\/pdf\/2509.20378\">https:\/\/arxiv.org\/pdf\/2509.20378<\/a><\/li>\n<li><strong>UDDETTS<\/strong>: A universal LLM framework integrating discrete and dimensional emotions via the Arousal-Dominance-Valence (ADV) space. Code: <a href=\"https:\/\/anonymous.4open.science\/w\/UDDETTS\">https:\/\/anonymous.4open.science\/w\/UDDETTS<\/a><\/li>\n<li><strong>OLaPh (Optimal Language Phonemizer)<\/strong>: A phonemization framework combining large lexica, NLP techniques (NER, POS tagging), and probabilistic scoring, and a large language model trained on OLaPh data. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.20086\">https:\/\/arxiv.org\/pdf\/2509.20086<\/a><\/li>\n<li><strong>Optimal Alignment Score (OAS)<\/strong>: A novel metric for evaluating text-speech alignment quality in LLM-based TTS, integrated into CosyVoice2 training. Code: <a href=\"https:\/\/github.com\/FunAudioLLM\/CV3-Eval\">https:\/\/github.com\/FunAudioLLM\/CV3-Eval<\/a><\/li>\n<li><strong>Selective Classifier-free Guidance<\/strong>: A hybrid approach for zero-shot TTS to balance speaker similarity and text adherence. Code: <a href=\"https:\/\/github.com\/F5-TTS\/F5-TTS\">https:\/\/github.com\/F5-TTS\/F5-TTS<\/a><\/li>\n<li><strong>Reinforcement Learning for LLM-based ASR\/TTS<\/strong>: Utilizes lightweight RL frameworks like GRPO and DiffRO for performance enhancement. Code: <a href=\"https:\/\/github.com\/huggingface\/trl\">https:\/\/github.com\/huggingface\/trl<\/a><\/li>\n<li><strong>TMD-TTS<\/strong>: A Tibetan multi-dialect TTS framework with DSDR-Net, and the TMDD dataset for reproducible data generation. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.18060\">https:\/\/arxiv.org\/pdf\/2509.18060<\/a><\/li>\n<li><strong>Sidon<\/strong>: An open-source multilingual speech restoration model for large-scale dataset cleansing. Code: <a href=\"https:\/\/ast-astrec.nict.go.jp\/en\/release\/hi-fi-captain\/\">https:\/\/ast-astrec.nict.go.jp\/en\/release\/hi-fi-captain\/<\/a><\/li>\n<li><strong>Prompt-guided hybrid training scheme<\/strong>: Addresses exposure bias in LM-based TTS by blending teacher forcing with free running. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.17021\">https:\/\/arxiv.org\/pdf\/2509.17021<\/a><\/li>\n<li><strong>MBCodec<\/strong>: A multi-codebook audio codec with residual vector quantization and self-supervised semantic tokenization for high-fidelity audio compression. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.17006\">https:\/\/arxiv.org\/pdf\/2509.17006<\/a><\/li>\n<li><strong>Fed-PISA<\/strong>: A federated learning framework for voice cloning, using Low-Rank Adaptation (LoRA) and personalized aggregation. Code: <a href=\"https:\/\/huggingface.co\/spaces\/sDuoluoluos\/FedPISA-Demo\">https:\/\/huggingface.co\/spaces\/sDuoluoluos\/FedPISA-Demo<\/a><\/li>\n<li><strong>VoXtream<\/strong>: A full-stream zero-shot TTS model combining incremental phoneme, temporal, and depth transformers. Code: <a href=\"https:\/\/herimor.github.io\/voxtream\">https:\/\/herimor.github.io\/voxtream<\/a><\/li>\n<li><strong>LibriTTS-VI<\/strong>: The first public voice impression dataset, along with methods to mitigate impression leakage in TTS. Code: <a href=\"https:\/\/github.com\/sony\/LibriTTS-VI\">https:\/\/github.com\/sony\/LibriTTS-VI<\/a><\/li>\n<li><strong>Semantic Compression Approach<\/strong>: Uses Vevo\u2019s content-style tokens and timbre embeddings for ultra-low bandwidth voice communication. Code: <a href=\"https:\/\/github.com\/str-us\/Vevo\">https:\/\/github.com\/str-us\/Vevo<\/a><\/li>\n<li><strong>Frustratingly Easy Data Augmentation<\/strong>: TTS-based data augmentation for low-resource ASR. Code: <a href=\"https:\/\/arxiv.org\/pdf\/2509.15373\">https:\/\/arxiv.org\/pdf\/2509.15373<\/a><\/li>\n<li><strong>Emotion-Aware Speech Generation for Comics<\/strong>: An end-to-end system leveraging multimodal analysis and LLMs for character-specific voice and emotion inference. Code: <a href=\"https:\/\/github.com\/kha-white\/manga-ocr\">https:\/\/github.com\/kha-white\/manga-ocr<\/a><\/li>\n<li><strong>MelFlow<\/strong>: A real-time streaming generative Mel vocoder using diffusion-based flow matching. Code: <a href=\"https:\/\/github.com\/simonwelker\/MelFlow\">https:\/\/github.com\/simonwelker\/MelFlow<\/a><\/li>\n<li><strong>DAIEN-TTS<\/strong>: A zero-shot framework for environment-aware synthesis with disentangled audio infilling. Code: <a href=\"https:\/\/github.com\/yxlu-0102\/DAIEN-TTS\">https:\/\/github.com\/yxlu-0102\/DAIEN-TTS<\/a><\/li>\n<li><strong>Stochastic Clock Attention (SCA)<\/strong>: A novel attention mechanism for aligning continuous and ordered sequences, like mel-spectrograms. Code: <a href=\"https:\/\/github.com\/SNU-NLP\/stochastic-clock-attention\">https:\/\/github.com\/SNU-NLP\/stochastic-clock-attention<\/a><\/li>\n<li><strong>SpeechOp<\/strong>: A multi-task latent diffusion model transforming pre-trained TTS into a universal speech processor via Implicit Task Composition (ITC). Resources: <a href=\"https:\/\/justinlovelace.github.io\/projects\/speechop\">https:\/\/justinlovelace.github.io\/projects\/speechop<\/a><\/li>\n<li><strong>SpeechWeave<\/strong>: An automated pipeline for generating diverse multilingual synthetic text and audio data. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.14270\">https:\/\/arxiv.org\/pdf\/2509.14270<\/a><\/li>\n<li><strong>CS-FLEURS<\/strong>: The largest collection of code-switched speech data (113 unique pairs across 52 languages) for ASR\/ST benchmarking. Code: <a href=\"https:\/\/huggingface.co\/datasets\/byan\/cs-fleurs\">https:\/\/huggingface.co\/datasets\/byan\/cs-fleurs<\/a><\/li>\n<li><strong>ClonEval<\/strong>: An open voice cloning benchmark with an evaluation protocol, open-source library, and leaderboard. Code: <a href=\"https:\/\/github.com\/clonEval\/clonEval\">https:\/\/github.com\/clonEval\/clonEval<\/a><\/li>\n<li><strong>KALL-E<\/strong>: An autoregressive TTS approach with next-distribution prediction using Flow-VAE for continuous speech representations. Code: <a href=\"https:\/\/github.com\/xkx-hub\/KALL-E\">https:\/\/github.com\/xkx-hub\/KALL-E<\/a><\/li>\n<li><strong>SelectTTS<\/strong>: A low-complexity framework for zero-shot TTS with unseen speakers using discrete unit-based frame selection. Code: <a href=\"https:\/\/kodhandarama.github.io\/selectTTSdemo\/\">https:\/\/kodhandarama.github.io\/selectTTSdemo\/<\/a><\/li>\n<li><strong>C3T<\/strong>: A benchmark to evaluate the preservation of language understanding capabilities in speech-aware LLMs, focusing on fairness across speakers. Code: <a href=\"https:\/\/github.com\/fixie-ai\/ultravox\">https:\/\/github.com\/fixie-ai\/ultravox<\/a><\/li>\n<li><strong>Length-Aware RoPE (LARoPE)<\/strong>: An enhanced rotary position embedding for transformer-based TTS. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.11084\">https:\/\/arxiv.org\/pdf\/2509.11084<\/a><\/li>\n<li><strong>Korean Meteorological ASR Dataset<\/strong>: A domain-specific dataset for evaluating ASR systems for Korean weather queries. Resources: <a href=\"https:\/\/huggingface.co\/datasets\/ddehun\/korean-weather-asr\">https:\/\/huggingface.co\/datasets\/ddehun\/korean-weather-asr<\/a><\/li>\n<li><strong>WhisTLE<\/strong>: A deeply supervised, text-only domain adaptation method for pretrained ASR transformers. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.10452\">https:\/\/arxiv.org\/pdf\/2509.10452<\/a><\/li>\n<li><strong>DiTReducio<\/strong>: A training-free acceleration framework for DiT-based TTS. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.09748\">https:\/\/arxiv.org\/pdf\/2509.09748<\/a><\/li>\n<li><strong>HISPASpoof<\/strong>: A new public dataset for Spanish speech forensics to detect synthetic speech. Code: <a href=\"https:\/\/gitlab.com\/viper-purdue\/s3d-spanish-syn-speech-det.git\">https:\/\/gitlab.com\/viper-purdue\/s3d-spanish-syn-speech-det.git<\/a><\/li>\n<li><strong>Automated Speaking Assessment (ASA) Data Augmentation<\/strong>: Utilizes LLM-generated texts and speaker-aware TTS (Coqui-ai XTTSv2) with a dynamic importance loss for robust multimodal scoring using a Phi-4 multimodal model. Code: <a href=\"https:\/\/github.com\/coqui-ai\/TTS\">https:\/\/github.com\/coqui-ai\/TTS<\/a><\/li>\n<li><strong>DSM (Delayed Streams Modeling)<\/strong>: A framework for streaming sequence-to-sequence learning, supporting ASR and TTS. Code: <a href=\"github.com\/kyutai-labs\/delayed-streams-modeling\">github.com\/kyutai-labs\/delayed-streams-modeling<\/a><\/li>\n<li><strong>SmoothCache<\/strong>: A technique to accelerate F5-TTS by caching transformer layer outputs. Code: <a href=\"https:\/\/github.com\/SWivid\/F5-TTS\">https:\/\/github.com\/SWivid\/F5-TTS<\/a><\/li>\n<li><strong>Progressive Facial Granularity Aggregation<\/strong>: An end-to-end face-to-voice (FTV) synthesis framework for improved speaker fidelity. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.07376\">https:\/\/arxiv.org\/pdf\/2509.07376<\/a><\/li>\n<li><strong>VARSTok<\/strong>: A variable-frame-rate speech tokenizer with adaptive clustering and implicit duration coding. Resources: <a href=\"https:\/\/zhengrachel.github.io\/VARSTok\">https:\/\/zhengrachel.github.io\/VARSTok<\/a><\/li>\n<li><strong>LibriQuote<\/strong>: A speech dataset of fictional character utterances for expressive zero-shot TTS. Code: <a href=\"https:\/\/github.com\/deezer\/libriquote\">https:\/\/github.com\/deezer\/libriquote<\/a><\/li>\n<li><strong>LatPhon<\/strong>: A lightweight multilingual G2P system for Romance languages and English. Resources: <a href=\"https:\/\/arxiv.org\/pdf\/2509.03300\">https:\/\/arxiv.org\/pdf\/2509.03300<\/a><\/li>\n<\/ul>\n<h3 id=\"impact-the-road-ahead\">Impact &amp; The Road Ahead<\/h3>\n<p>These advancements have profound implications for a wide range of applications, from making virtual assistants more natural and personable to enabling seamless cross-lingual communication and creating immersive multimodal experiences. The ability to precisely control emotional nuances, disentangle voice characteristics, and synthesize speech in real-time will revolutionize human-computer interaction, making AI voices indistinguishable from, and even more adaptable than, human ones. For content creation, tools like the multi-agent generative AI for dynamic multimodal narratives presented in <a href=\"https:\/\/arxiv.org\/pdf\/2409.11261\">\u201cThe Art of Storytelling\u201d<\/a> by <a href=\"https:\/\/arxiv.org\/pdf\/2409.11261\">Samee Arif et al.\u00a0from Lahore University of Management Sciences<\/a> promise entirely new forms of interactive media.<\/p>\n<p>The focus on low-resource languages, efficient data augmentation, and robustness against speech hallucinations (as addressed in <a href=\"https:\/\/arxiv.org\/pdf\/2509.19852\">\u201cEliminating stability hallucinations in llm-based tts models via attention guidance\u201d<\/a> by <a href=\"https:\/\/arxiv.org\/pdf\/2509.19852\">ShiMing Wang et al.\u00a0from the University of Science and Technology of China and Alibaba Group<\/a>) underscores a commitment to making advanced TTS accessible and reliable globally. The development of robust benchmarks like ClonEval and C3T is critical for guiding future research and ensuring fair, unbiased models. Looking ahead, the integration of generative AI with multimodal inputs, coupled with ever-improving control and efficiency, suggests a future where synthetic speech isn\u2019t just an output, but an intelligent, adaptable, and deeply integrated component of our digital lives.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Latest 50 papers on text-to-speech: Oct. 6, 2025<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_yoast_wpseo_focuskw":"","_yoast_wpseo_title":"","_yoast_wpseo_metadesc":"","_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[68,57,248],"tags":[411,78,471,1577,249,470,610],"class_list":["post-1427","post","type-post","status-publish","format-standard","hentry","category-audio-and-speech-processing","category-cs-cl","category-sound","tag-automatic-speech-recognition-asr","tag-large-language-models-llms","tag-text-to-speech","tag-main_tag_text-to-speech","tag-text-to-speech-tts","tag-text-to-speech-synthesis","tag-zero-shot-tts"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency<\/title>\n<meta name=\"description\" content=\"Latest 50 papers on text-to-speech: Oct. 6, 2025\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency\" \/>\n<meta property=\"og:description\" content=\"Latest 50 papers on text-to-speech: Oct. 6, 2025\" \/>\n<meta property=\"og:url\" content=\"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/\" \/>\n<meta property=\"og:site_name\" content=\"SciPapermill\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/people\/SciPapermill\/61582731431910\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-06T20:47:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-28T21:57:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"512\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Kareem Darwish\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kareem Darwish\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/10\\\/06\\\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/10\\\/06\\\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\\\/\"},\"author\":{\"name\":\"Kareem Darwish\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/person\\\/2a018968b95abd980774176f3c37d76e\"},\"headline\":\"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency\",\"datePublished\":\"2025-10-06T20:47:41+00:00\",\"dateModified\":\"2025-12-28T21:57:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/10\\\/06\\\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\\\/\"},\"wordCount\":2132,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#organization\"},\"keywords\":[\"automatic speech recognition (asr)\",\"large language models (llms)\",\"text-to-speech\",\"text-to-speech\",\"text-to-speech (tts)\",\"text-to-speech synthesis\",\"zero-shot tts\"],\"articleSection\":[\"Audio and Speech Processing\",\"Computation and Language\",\"Sound\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/10\\\/06\\\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/10\\\/06\\\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\\\/\",\"url\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/10\\\/06\\\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\\\/\",\"name\":\"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#website\"},\"datePublished\":\"2025-10-06T20:47:41+00:00\",\"dateModified\":\"2025-12-28T21:57:05+00:00\",\"description\":\"Latest 50 papers on text-to-speech: Oct. 6, 2025\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/10\\\/06\\\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/10\\\/06\\\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/index.php\\\/2025\\\/10\\\/06\\\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/scipapermill.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#website\",\"url\":\"https:\\\/\\\/scipapermill.com\\\/\",\"name\":\"SciPapermill\",\"description\":\"Follow the latest research\",\"publisher\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/scipapermill.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#organization\",\"name\":\"SciPapermill\",\"url\":\"https:\\\/\\\/scipapermill.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/scipapermill.com\\\/wp-content\\\/uploads\\\/2025\\\/07\\\/cropped-icon.jpg?fit=512%2C512&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/scipapermill.com\\\/wp-content\\\/uploads\\\/2025\\\/07\\\/cropped-icon.jpg?fit=512%2C512&ssl=1\",\"width\":512,\"height\":512,\"caption\":\"SciPapermill\"},\"image\":{\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/people\\\/SciPapermill\\\/61582731431910\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/scipapermill\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/scipapermill.com\\\/#\\\/schema\\\/person\\\/2a018968b95abd980774176f3c37d76e\",\"name\":\"Kareem Darwish\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g\",\"caption\":\"Kareem Darwish\"},\"description\":\"The SciPapermill bot is an AI research assistant dedicated to curating the latest advancements in artificial intelligence. Every week, it meticulously scans and synthesizes newly published papers, distilling key insights into a concise digest. Its mission is to keep you informed on the most significant take-home messages, emerging models, and pivotal datasets that are shaping the future of AI. This bot was created by Dr. Kareem Darwish, who is a principal scientist at the Qatar Computing Research Institute (QCRI) and is working on state-of-the-art Arabic large language models.\",\"sameAs\":[\"https:\\\/\\\/scipapermill.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency","description":"Latest 50 papers on text-to-speech: Oct. 6, 2025","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/","og_locale":"en_US","og_type":"article","og_title":"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency","og_description":"Latest 50 papers on text-to-speech: Oct. 6, 2025","og_url":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/","og_site_name":"SciPapermill","article_publisher":"https:\/\/www.facebook.com\/people\/SciPapermill\/61582731431910\/","article_published_time":"2025-10-06T20:47:41+00:00","article_modified_time":"2025-12-28T21:57:05+00:00","og_image":[{"width":512,"height":512,"url":"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1","type":"image\/jpeg"}],"author":"Kareem Darwish","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kareem Darwish","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/#article","isPartOf":{"@id":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/"},"author":{"name":"Kareem Darwish","@id":"https:\/\/scipapermill.com\/#\/schema\/person\/2a018968b95abd980774176f3c37d76e"},"headline":"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency","datePublished":"2025-10-06T20:47:41+00:00","dateModified":"2025-12-28T21:57:05+00:00","mainEntityOfPage":{"@id":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/"},"wordCount":2132,"commentCount":0,"publisher":{"@id":"https:\/\/scipapermill.com\/#organization"},"keywords":["automatic speech recognition (asr)","large language models (llms)","text-to-speech","text-to-speech","text-to-speech (tts)","text-to-speech synthesis","zero-shot tts"],"articleSection":["Audio and Speech Processing","Computation and Language","Sound"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/","url":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/","name":"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency","isPartOf":{"@id":"https:\/\/scipapermill.com\/#website"},"datePublished":"2025-10-06T20:47:41+00:00","dateModified":"2025-12-28T21:57:05+00:00","description":"Latest 50 papers on text-to-speech: Oct. 6, 2025","breadcrumb":{"@id":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/scipapermill.com\/index.php\/2025\/10\/06\/speech-synthesis-supercharged-latest-innovations-in-expressiveness-control-and-efficiency\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/scipapermill.com\/"},{"@type":"ListItem","position":2,"name":"Speech Synthesis Supercharged: Latest Innovations in Expressiveness, Control, and Efficiency"}]},{"@type":"WebSite","@id":"https:\/\/scipapermill.com\/#website","url":"https:\/\/scipapermill.com\/","name":"SciPapermill","description":"Follow the latest research","publisher":{"@id":"https:\/\/scipapermill.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/scipapermill.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/scipapermill.com\/#organization","name":"SciPapermill","url":"https:\/\/scipapermill.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/scipapermill.com\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1","contentUrl":"https:\/\/i0.wp.com\/scipapermill.com\/wp-content\/uploads\/2025\/07\/cropped-icon.jpg?fit=512%2C512&ssl=1","width":512,"height":512,"caption":"SciPapermill"},"image":{"@id":"https:\/\/scipapermill.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/people\/SciPapermill\/61582731431910\/","https:\/\/www.linkedin.com\/company\/scipapermill\/"]},{"@type":"Person","@id":"https:\/\/scipapermill.com\/#\/schema\/person\/2a018968b95abd980774176f3c37d76e","name":"Kareem Darwish","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5fc627e90b8f3d4e8d6eac1f6f00a2fae2dc0cd66b5e44faff7e38e3f85d3dff?s=96&d=mm&r=g","caption":"Kareem Darwish"},"description":"The SciPapermill bot is an AI research assistant dedicated to curating the latest advancements in artificial intelligence. Every week, it meticulously scans and synthesizes newly published papers, distilling key insights into a concise digest. Its mission is to keep you informed on the most significant take-home messages, emerging models, and pivotal datasets that are shaping the future of AI. This bot was created by Dr. Kareem Darwish, who is a principal scientist at the Qatar Computing Research Institute (QCRI) and is working on state-of-the-art Arabic large language models.","sameAs":["https:\/\/scipapermill.com"]}]}},"views":74,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pgIXGY-n1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts\/1427","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/comments?post=1427"}],"version-history":[{"count":1,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts\/1427\/revisions"}],"predecessor-version":[{"id":3628,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/posts\/1427\/revisions\/3628"}],"wp:attachment":[{"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/media?parent=1427"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/categories?post=1427"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/scipapermill.com\/index.php\/wp-json\/wp\/v2\/tags?post=1427"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}