[Python3 音声認識] juliasを使ってみる – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

### juliusとは
– 京大や名古屋工業大が開発しているオープンソース音声認識ライブラリ
– C言語で書かれている
– 独自の辞書モデルを定義することが可能
– 単体では動作せず、言語認識をするモデルを読み込んで動かす
– Juliux GitHub: https://github.com/julius-speech/julius

### julius install
$ sudo yum install build-essential zlib1g-dev libsdl2-dev libasound2-dev
$ git clone https://github.com/julius-speech/julius.git
$ cd julius
$ ./configure –enable-words-int
$ make -j4
$ ls -l julius/julius
-rwxrwxr-x 1 vagrant vagrant 700208 Jul 31 16:39 julius/julius

### julius model
https://sourceforge.net/projects/juliusmodels/
「ENVR-v5.4.Dnn.Bin.zip」をDownloadします。
$ unzip ENVR-v5.4.Dnn.Bin.zip
$ cd ENVR-v5.4.Dnn.Bin
$ ls
ENVR-v5.3.am ENVR-v5.3.layer5_weight.npy ENVR-v5.3.prior
ENVR-v5.3.dct ENVR-v5.3.layer6_bias.npy README.md
ENVR-v5.3.layer2_bias.npy ENVR-v5.3.layer6_weight.npy dnn.jconf
ENVR-v5.3.layer2_weight.npy ENVR-v5.3.layerout_bias.npy julius-dnn-output.txt
ENVR-v5.3.layer3_bias.npy ENVR-v5.3.layerout_weight.npy julius-dnn.exe
ENVR-v5.3.layer3_weight.npy ENVR-v5.3.lm julius.jconf
ENVR-v5.3.layer4_bias.npy ENVR-v5.3.mfc mozilla.wav
ENVR-v5.3.layer4_weight.npy ENVR-v5.3.norm test.dbl
ENVR-v5.3.layer5_bias.npy ENVR-v5.3.phn wav_config

$ sudo vi dnn.jconf

feature_options -htkconf wav_config -cvn -cmnload ENVR-v5.3.norm -cvnstatic
// 省略
state_prior_log10nize false // 追加

### Recognize Audio File
$ ../julius/julius/julius -C julius.jconf -dnnconf dnn.jconf
————-pass1_best: the shower
pass1_best_wordseq: the shower
pass1_best_phonemeseq: sil | dh iy | sh aw ax r
pass1_best_score: 130.578949
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 21497 generated, 4114 pushed, 332 nodes popped in 109
ALIGN: === word alignment begin ===
sentence1: the shower
wseq1: the shower
phseq1: sil | dh iy | sh aw ax r | sil
cmscore1: 0.552 0.698 0.453 1.000
score1: 88.964218
=== begin forced alignment ===
— word alignment —
id: from to n_score unit
—————————————-
[ 0 11] 0.558529 []
[ 12 30] 2.023606 the [the]
[ 31 106] 2.249496 shower [shower]
[ 107 108] -2.753532 []
re-computed AM score: 207.983871
=== end forced alignment ===

——
### read waveform input
1 files processed

音声認識をやってるっぽいのはわかるけど、Google-cloud-speechとは大分違うな

$ sudo vi mic.jconf

-input mic -htkconf wav_config -h ENVR-v5.3.am -hlist ENVR-v5.3.phn -d ENVR-v5.3.lm -v ENVR-v5.3.dct -b 4000 -lmp 12 -6 -lmp2 12 -6 -fallback1pass -multipath -iwsp -iwcd1 max -spmodel sp -no_ccd -sepnum 150 -b2 360 -n 40 -s 2000 -m 8000 -lookuprange 5 -sb 80 -forcedict

$ ../julius/julius/julius -C mic.jconf -dnnconf dnn.jconf
Notice for feature extraction (01),
*************************************************************
* Cepstral mean and variance norm. for real-time decoding: *
* initial mean loaded from file, updating per utterance. *
* static variance loaded from file, apply it constantly. *
* NOTICE: The first input may not be recognized, since *
* cepstral mean is unstable on startup. *
*************************************************************
Notice for energy computation (01),
*************************************************************
* NOTICE: Energy normalization is activated on live input: *
* maximum energy of LAST INPUT will be used for it. *
* So, the first input will not be recognized. *
*************************************************************

——
### read waveform input
Stat: adin_oss: device name = /dev/dsp (application default)
Error: adin_oss: failed to open /dev/dsp
failed to begin input stream

使い方がイマイチよくわからん
全部コマンドラインでアウトプットが出てくるんかいな