[音声認識] DeepSpeechを試そう – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

DeepSeechとは？
– DeepSpeech is an open-source Speech to Text engine, trained by machine learning based on Baidu’s Deep Speech research paper and using TensorFlow.

DeepSpeech Document: deepspeech.readthedocs.io.

# create a virtualenv
$ sudo apt install python3-virtualenv
$ source deepspeech-venv/bin/activate

# install DeepSpeech
$ pip3 install deepspeech

# download pre-trained English model
$ curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
$ curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

# Download example audio files
$ curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/audio-0.9.3.tar.gz
$ tar xvf audio-0.9.3.tar.gz

$ deepspeech –model deepspeech-0.9.3-models.pbmm –scorer deepspeech-0.9.3-models.scorer –audio audio/2830-3980-0043.wav
Loading model from file deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-08-24 22:27:18.338821: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.0447s.
Loading scorer from files deepspeech-0.9.3-models.scorer
Loaded scorer in 0.00898s.
Running inference.
experience proves this
Inference took 2.371s for 1.975s audio file.

なるほど、Juliusと似ているところがあるね
.wavファイルを作成せずにmicrophoneでrealtime speech recognitionを作りたいな。