Thursday, August 16, 2018

Making Raspberry Pi Speak: Japanese TTS with Open JTalk and Node-RED

Amazon After all the trial and error getting Alexa to control the Roomba, various home appliances became voice-controllable.

Convenient as it was, Alexa Home Skill always replies with a flat "OK" — not very expressive.
So I made Raspberry Pi speak more naturally when controlling appliances.

Setting Up Audio Output on Raspberry Pi

Amazon Raspberry Pi has no built-in speaker, so audio output needs to be configured.

Since the home appliances are nearby, I used HDMI output.
For how to enable HDMI audio, see the previous article: "Turning Raspberry Pi into an AirPlay Receiver".

If HDMI isn't a requirement, a USB speaker is generally better — cleaner audio and more reliable.
This site has detailed audio setup instructions.
Note: sending synthesized speech through HDMI introduces two annoyances:
  • The digital conversion doesn't start fast enough, so the first two or three syllables of each utterance get cut off.
  • If the target appliance and the Pi share the same HDMI output device, things get complicated.
For the first issue, the workaround is to add a meaningless filler word at the start of each synthesized phrase (like "OK" or "Um") so the real content starts cleanly.
For the second, the flow becomes: switch the target device's HDMI input to Raspberry Pi, play the speech, then send the appliance command.

Most devices don't auto-switch HDMI input on audio output alone,
so you'd need to copy the AV amp's IR remote codes and send the switch command manually.
The method from the Roomba series applies here.

Installing Open JTalk

Install Open JTalk, a Japanese text-to-speech (TTS) engine:
$ sudo apt-get install open-jtalk
$ sudo apt-get install open-jtalk-mecab-naist-jdic hts-voice-nitech-jp-atr503-m001
The package hts-voice-nitech-jp-atr503-m001 provides a male voice dictionary.
For a female voice (more common for TTS), download the mei voice separately:
$ wget http://downloads.sourceforge.net/project/mmdagent/MMDAgent_Example/MMDAgent_Example-1.7/MMDAgent_Example-1.7.zip
$ unzip MMDAgent_Example-1.7.zip
$ sudo cp -r ./MMDAgent_Example-1.7/Voice/mei /usr/share/hts-voice/

Making Open JTalk Speak

You can pass text and a voice dictionary as arguments directly to open_jtalk,
but it's handy to wrap it in a script like this:
#!/bin/sh

msg="$1"

voice_type=/usr/share/hts-voice/mei/mei_normal.htsvoice

echo "$msg" |
open_jtalk  -m $voice_type -x /var/lib/mecab/dic/open-jtalk/naist-jdic -ow /dev/stdout |
aplay -
Pass Japanese text as an argument to this script and it plays it back:
$ ./voice.sh "抵抗は無意味だ"

Usage Notes

Open JTalk has many synthesis options to tweak,
but in practice, adjusting them tends to degrade audio quality.
Unless you know what you're doing, sticking with the defaults is recommended.

Add the following to crontab to announce the time every hour from 7:00 to 22:00:
(If the system timezone is UTC, offset accordingly.)
$ crontab -e
0 7-22 * * * /home/pi/bin/tone.sh
The called script can be something simple like this:
#!/bin/sh

DIR=`/usr/bin/dirname $0`

y=`/bin/date +%Y`
m=`/bin/date +%-m`
d=`/bin/date +%-d`
h=`/bin/date +%-h`

msg="${y}年 ${m}月 ${d}日 ${h}時です。"

$DIR/voice.sh "$msg"

Integrating with Alexa

For how to connect Alexa Home Skill with Raspberry Pi, see the earlier explanation.
In Node-RED, insert a Shell Exec node calling the voice synthesis script before the appliance control command.

Here's an example flow. This lets Raspberry Pi say "Please wait a moment" while the projector is warming up.


No comments:

Post a Comment