In television, you might ever see Dr. Stephen Hawking talked to his students. This Physician famous with his black-hole theory doesn’t have ability to create sound coming from his mouth anymore, by the way he still can talking about something to others, thanks to speech synthesizer technology. Hawking’s speech synthesizer machine is really complex enough. This machine is not only produce sound, but also catch input from the eye reflection of the doctor. It is also be commonly found in voice command application planted in advanced smartphone combine speech organizer with speech synthesizer. The most simple synthetic speech application is really can be found in every PC with Windows Operation System. If you click WinKey + U at the keyboard, Windows will activate Utility Manager, and inside it there is Microsoft Narrator application. This application can be read in every window you are activate, including the button inside it. Or maybe you are ever install Microsoft Reader application at PC. This application purposed to .LIT file is also completed with the ability to translate text to be sound (text-to-speech), and this is as a sample of speech synthesizer technology.
Working Flowchart
As in speech recognition, the application design of synthesizer speech is not only done by information technology experts, but also included linguistic experts. To understand working system of speech synthesizer, let we start from the other name: text-to-speech , means that converse text to be sound. Now, we have got two elements in speech synthesizer, they are text as an input element, and sound as an output element. Process happened between this input and output is called manipulation process. In speech synthesizer, process is divided to two big parts: front-end and back-end. The front-end part has two main functions, first is to converse the raw text filled by symbols like number and abbreviation to be readable letters. As a sample, the first task of front-end part is to converse 1 become one, btw become by the way, etc. This process is also often called text normalization, pre-processing or tokenization. This front-end part is then give phonetic transcription for each words, separate it, and mark text to inside prosodic (rhythm , intonation) units, like phrase (units of words function as one syntactic unit), clause , and sentence. This process is also called text-to-phone-me or grapheme-to-phoneme . Phonetic transcription and this information is then combined and build symbolic linguistic representation which is an output from front-end. Back-end part, is often called synthesizer itself, is then converse this symbolic linguistic representation become sound. This is a description of working flow a speech synthesizer application or text-to-speech.
Synthesizer Technology
The most important quality from of speech synthesizer application is how “natural”, and “intelligible” output resulted. Natural, means that how close the sound resulted from speech synthesizer application compare to a human sound. While intelligible is how easy this output be understood by human. All speech synthetic application try to produce natural output and intelligible in one way. Until now, there are many technology to generate this synthetic sound wave. The most two technology used is concatenative synthetic and formant synthetic. These two technologies have each advantages and disadvantages. The first technology, concatenative synthetic base on links of recorded sound segments . Commonly will produce synthetic sound which are most natural. Anyway, the differentiation between recorded natural sound segmentation of sound wave or a is often produce disturbing sound. Similar with sound of queue number announcement in a bank or a sound of ‘call center “ from phone cellular operator which are state the rest of pulse and active period your phone cellular. The second technology, formant synthetic is not use human sound sample, but it produce synthetic sound use an acoustic model. Parameters like basic frequency, sound strains and noise level is varies from time to time to create synthetic sound wave. Most of application based on this technology produce synthetic sound (not natural) like a robot sound. Remembering the limited of these two technologies in producing synthetic sound, we have to be more patient waiting in further development in several next years or decades.
Tidak ada komentar:
Posting Komentar