Deep neural networks dnns have been recently introduced in speech synthesis. Analysis of unsupervised and noiserobust speakeradaptive hmmbased speech synthesis systems toward a uni. Multimodal speech synthesis architecture for unsupervised speaker adaptation hieuthi luong 1and junichi yamagishi. Use of statistical ngram models in natural language generation for machine translation, to submit an update or takedown request for this paper, please submit an updatecorrectionremoval request. It is now possible to synthesise speech using hmms with a comparable quality to unitselection techniques. China speaker adaptation in speech synthesis transforms a source utterance to a target ut. Gales, 1998 111 and maximum a posteriori map adaptation gauvain, 1994112. Techniques in rapid unsupervised speaker adaptation based on. Yamagishi, junichi isca, 200809 it is now possible to synthesise speech using hmms with a comparable quality to unitselection techniques.
Unsupervised adaptation for hmmbased speech synthesis core. Flexible speech synthesis based on hidden markov models keiichi tokuda nagoya institute of technology apsipa asc 20, kaohsiung. Speech synthesis based on hidden markov models and deep learning marvin cotojim enez1. Unsupervised adaptation for hmmbased speech synthesis. Hidden markov model hmm based speech synthesis for urdu. For unsupervised adaptation of hmmbased speech synthesis. An unsupervised, discriminative, sentence level, hmm adaptation based on speech silence classification is presented. Thus, a core goal of emime is the development of unsupervised crosslingual speaker adaptation for hmmbased tts. Hidden markov models for artificial voice production and.
This paper first presents an approach to the unsupervised speaker adaptation task for hmm based speech synthesis models which avoids the need for such supplementary acoustic models. Consequently, this paper investigates crosslingual speaker adaptation based on uni. In the current thesis booklet i summarize the novel outcomes of my research grouped in the three research objectives. Us8438029b1 confidence tying for unsupervised synthetic. Voice conversion for unitselection concatenation speech synthesis 3 yamagishi, junichi, takao kobayashi, yuji nakano, katsumi ogata, and juri isogai. Since speech has temporal structure and can be encoded as a sequence of spectral vectors spanning the audio frequency range, the hidden markov model hmm provides a natural framework for. Speaker adaptation that transforms a given set of hmms to a target speaker or condition is a successful technique for both automatic speech recognition asr and hmmbased textto speech tts synthesis. The hmmbased speech synthesis system hts v ersion 2. Hidden markov model hmmbased speech synthesis systems possess several advantages over concatenative synthesis systems. The application of hidden markov models in speech recognition. The core of all speech recognition systems consists of a set of statistical models representing the various sounds of the language to be recognised. In this paper, an investigation on the importance of input features and training data on speaker dependent sd dnn based speech synthesis is presented. A comparison of supervised and unsupervised crosslingualspeaker adaptation approaches for hmm based speech synthesis hui liang1,2, john dines1, lakshmi saheer1,2 1 idiap research institute, martigny, switzerland 2 ecole polytechnique fe.
Analysis of unsupervised crosslingual speaker adaptation. This paper demonstrates how unsupervised crosslingual adaptation of hmm based speech synthesis models may be performed without explicit knowledge of the adaptation data language. The task of speech synthesis is to convert normal language text into speech. We proposed a decision tree marginalization technique in 4 for uni. Analysis of unsupervised crosslingual speaker adaptation for hmmbased speech synthesis using kldbased transform mapping article in speech communication 546. Analysis of speaker adaptation algorithms for hmm based speech synthesis and a constrained smaplr adaptation algorithm. This paper firstly presents an approach to the unsupervised speaker adaptation task for hmm based speech synthesis models which avoids the need for such supplementary acoustic models.
As a statistical parametric approach, the hmmbased framework provides a great deal of. The training part of hts has been implemented as a modified version of htk and released as a form of patch code to htk. Finally, listener evaluations reveal that the proposed unsupervised adaptation methods deliver performance approaching that of supervised adaptation. In recent years, hidden markov model hmm has been successfully applied to acoustic modeling for speech synthesis, and hmm based parametric speech synthesis has become a mainstream speech synthesis method. Junichi yamagishi october 2006 main adaptation for hmm based speech synthesis system using mllr masatsune tamura y, takashi masuko, keiichi tokuda, and takao kobayashi y tokyo institute of technology, yokohama, 2268502 japan. It is created by the htsworking group as a patch to the htk 18. Adaptation of pitch and spectrum for hmmbased speech.
Unsupervised crosslingual speaker adaptation for hmm based speech synthesis. Speech recognition is an interdisciplinary subfield of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers. Speech synthesis based on hidden markov models core. In this paper we present results of unsupervised crosslingual speaker adaptation applied to textto speech synthesis. A new journal paper journal papars junichi yamagishi. Speaker adaptation is one of the most exciting ones. Unsupervised crosslingual speaker adaptation for hmmbased speech synthesis by john dines, hui liang, lakshmi saheer, matthew gibson, william byrne, keiichiro oura, keiichi tokuda, junichi yamagishi, simon king, mirjam wester, teemu hirsimaki, reima karhila and mikko kurimo. For speech synthesis, a model trained on multiple speakers data is called an average voice model 6. Frequency warping for speaker adaptation in hmmbased speech.
Unsupervised speaker adaptation of dnnhmm by selecting similar speakers for lecture transcription masato mimura and tatsuya kawahara kyoto university, academic center for computing and media studies, sakyoku, kyoto 6068501, japan abstractunsupervised speaker adaptation of deep neural network dnn is investigated for lecture transcription. Silence and speech regions are determined either using a speech endpointer or the segmentation obtained from the recognizer in a first pass. Hmmbased pseudoclean speech synthesis for splice algorithm. Analysis of unsupervised and noiserobust speakeradaptive. Analysis of unsupervised crosslingual speaker adaptation for. This paper presents an automatic speech recognition based unsupervised adaptation method for hidden markov model hmm speech synthesis and its quality evaluation. Tokuda analysis of unsupervised crosslingual speaker adaptation for hmm based speech synthesis using kld based transform mapping. Hmmbased emotional speech synthesis using average emotion. Flexible speech synthesis based on hidden markov models. Speech synthesis is the artificial production of human speech.
The adaptation technique automatically controls the number of phone mismatches. Furthermore it was a challenge to pioneer hmm tts research in hungary. Byrne1 1cambridge university engineering department, 2helsinki university of technology introduction twopass decision tree construction evaluation. Unsupervised adaptation for hmmbased speech synthesis 2008.
Similarly to other datadriven speech synthesis approaches, hts has a compact language. Index terms hmm based speech synthesis, unsupervised. Context adaptive training with factorized decision trees for. The application of our research is the personalisation of speech to speech translation in which we employ a hmm statistical framework for both speech recognition and synthesis.
Utilizing the at least one of the speech synthesis parameters for the selected subnode for adaptation can include. A read is counted each time someone views a publication summary such as the title, abstract, and list of authors, clicks on a figure, or views or downloads the fulltext. Cabral trinity college dublin, ireland the adapt centre is funded under the sfi research centres programme grant rc2106 and is cofunded under the european regional development fund. The hmmdnnbased speech synthesis system hts has been developed by the hts working group and others see who we are and acknowledgments. Speech synthesis based on hidden markov models hmm. A study of speaker adaptation for dnnbased speech synthesis. Unsupervised crosslingual speaker adaptation for hmm. It is also known as automatic speech recognition asr, computer speech recognition or speech to text stt. In this paper, we introduce a method capable of unsupervised adaptation, using only speech from the target speaker without any labelling. Thus, an unsupervised crosslingual speaker adaptation system can be developed. Most research into speaker adaptation for hmm based speech synthesis or textto speech, tts has focussed upon the supervised scenario, where transcribed adaptation data is available.
This paper presents a technique for synthesizing emotional speech based on an emotionindependent model which is called average emotion model. Citeseerx unsupervised adaptation for hmmbased speech synthesis citeseerx document details isaac councill, lee giles, pradeep teregowda. Unsupervised intralingual and crosslingual speaker adaptation for hmm based speech synthesis using twopass decision tree construction m gibson, w byrne ieee transactions on audio, speech, and language processing 19 4, 895904, 2010. Unsupervised adaptation for hmm based speech synthesis. Speech database excitation parameter extraction spectral. In the hmm based tts system, speech synthesis units are modeled by multispace probability distribution msd hmms which can model spectrum and pitch simultaneously in a unified framework. Unsupervised speaker adaptation of dnnhmm by selecting. Oct 17, 2012 the task of speech synthesis is to convert normal language text into speech. In the emime project we have studied unsupervised crosslingual speaker adaptation. Speaker adaptation for hmm based speech synthesis system using mllr masatsune tamura y, takashi masuko, keiichi tokuda, and takao kobayashi y tokyo institute of technology, yokohama, 2268502 japan yy nagoya institute of technology, nagoya, 4668555 japan abstract. Listening tests show very promising results, demonstrating that adapted. The technique is based on an hmm based textto speech tts system and maximum likelihood linear regression mllr adaptation algorithm. Hmmbased speech synthesis minitutorial hmms are used to generate sequences of speech in a parameterised form from the parameterised form, we can generate a waveform the parameterised form contains suf.
The most popular speaker adaptation approaches in speech synthesis are based on maximum likelihood linear transforms mllt m. In recent years, hidden markov model hmm has been successfully applied to acoustic modeling for speech synthesis, and hmm based parametric speech synthesis has become a mainstream speech synthesis. The application of our research is the personalisation of speech to speech translation in which we employ a hmm statistical. Automatic speech recognition has been investigated for several decades, and speech recognition models are from hmm gmm to deep neural networks today. Analysis of unsupervised crosslingual speaker adaptation for hmm based speech synthesis using kld based transform mapping by keiichiro oura, junichi yamagishi, mirjam wester, simon king and keiichi tokuda. Unsupervised speaker adaptation for dnnbased tts synthesis. Oct 14, 2016 a comparison of supervised and unsupervised crosslingual speaker adaptation approaches for hmmbased speech synthesis.
In hmmbased speech synthesis, speaker adaptation techniques can be used to adapt the source model using speech data from target. Currently various organizations use it to conduct their own research projects, and we believe that it has contributed signi. By defining a mapping between hmm based synthesis models and asrstyle models, this paper introduces an approach to the unsupervised speaker adaptation task for hmm based speech synthesis models which avoids the need for supplementary acoustic models. On the other hand, our recent experiments with hmm based speech synthesis systems have demonstrated that speakeradaptive hmm based speech synthesis which uses an average voice model plus model adaptation is robust to nonideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly. Index termshmmbased speech synthesis, unsupervised. This paper describes the integration of these developments into a single architecture which achieves unsupervised crosslingual speaker adaptation for hmmbased speech synthesis. Analysis of speaker clustering strategies for hmmbased. However, it still requires high quality audio data with low signal to noise ration and precise labeling. The patch code is released under a free software license. Unsupervised crosslingual speaker adaptation for hmm based speech synthesis using twopass decision tree construction m.
Improving rapid unsupervised speaker adaptation based on hmm sufficient statistics in noisy environments using multitemplate models. Data selection and adaptation for naturalness in hmmbased. Hmm based speech synthesis erica cooper cs4706 spring 2011 concatenative synthesis hmm synthesis a parametric model can train on mixed data from many speakers model takes up a very small amount of space speaker adaptation hmms some hidden process has generated some visible observation. It is now possible to synthesise speech using hmms with a com parable quality to unitselection techniques. By defining a mapping between hmmbased synthesis models and asrstyle models, this paper introduces an approach to the unsupervised speaker adaptation task for hmmbased speech synthesis models which avoids the need for supplementary acoustic models. I have chosen hidden markovmodel based textto speech synthesis for my research topic because of its novelty and countless possibilities. The discriminative training procedure using a gpd or any other discriminative training algorithm, employed in conjunction with the hmm. It will include a brief introduction to speech synthesis, including just enough coverage of the textprocessing part of the problem to set the scene. The hmm dnn based speech synthesis system hts has been developed by the hts working group and others see who we are and acknowledgments. Unsupervised intralingual and crosslingual speaker adaptation for hmmbased speech synthesis using twopass decision tree construction abstract. No other constraints need to be placed on the asrhmm. Unsupervised clustering for expressive speech synthesis. In this paper, we present a novel approach to relax the constraint of stereodata which is needed in a series of algorithms for noiserobust speech recognition.
We have employed an hmm statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in tts textto speech using the recognized voice in asr automatic speech recognition. Such supervised methods require labelled adaptation data for the target speaker. Unsupervised adaptation for hmmbased speech synthesis, 2003. Adapting full context models for each full context dependent model, we can obtain the correspondingtriphonemodelbyignoringtheprosodiccontextualfactors and dropping some phonetic contextual factors. This is achieved by defining a mapping between hmm based synthesis models and asrstyle models, via a twopass decision tree construction process. Analysis of speaker clustering strategies for hmm based speech synthesis rasmus dall, christophe veaux, junichi yamagishi, simon king the centre for speech technology research, the university of edinburgh, u. This paper describes an hmm based speech synthesis system hts, in which speech waveform is generated from hmms themselves, and applies it to english speech synthesis using the general speech synthesis architecture of festival. Unsupervised intralingual and crosslingual speaker. Generating speech from a model has many potential advantages unsupervised adaptation for hmm based speech synthesis. Hybrid systems basically use hmm alignments to bootstrap themselves into producing recognition, and still use much of the surrounding machinery that hmm based recognizers used to use. We demonstrate an endtoend speechtospeech translation system built for four languages american english, mandarin, japanese, and finnish.
A textto speech tts system converts normal language text into speech. Generating speech from a model has many potential advantages over concatenating waveforms. When the asrhmm uses gaussian mixtures, we can use an approximated kld goldberger et al. Supervised adaptation the use of adaptation to create new voices for speech synthesis makes hmm based speech synthesis very attractive. Us6076057a unsupervised hmm adaptation based on speech. Also, hmms are generative models so they are much more useful in the case of speech synthesis the just is still out on using deep networks for the synthesis. The use of adaptation to create new voices for speech synthesis makes hmm based speech synthesis very attractive. Flexible speech synthesis based on hidden markov models keiichi tokuda nagoya institute of technology apsipa asc 20, kaohsiung november 1, 20. Twopass decision tree construction for unsupervised.
As a demonstration in splice algorithm, we generate the pseudoclean features to replace the ideal clean features from one of the stereo channels, by using hmmbased speech synthesis. Frequency warping for speaker adaptation in hmm based speech synthesis weixun gao1 and qiying cao1,2 1school of information science and technology 2college of computer science and technology donghua university shanghai, 200051 p. Ieice special issue on statistical modeling for speech processing e89d 3. Some aspects of asr transcription based unsupervised.
1140 1427 1328 1466 1231 1108 30 1260 1390 831 1124 735 783 623 1508 578 452 1279 1324 96 334 1343 753 935 287 761 379 1408 1143 1451