[音声認識] DeepSpeechでTranscriberを実装する

PyAudio has two modes: blocking, where data has to read from the stream; and non-blocking, where a callback function is passed to PyAudio for feeding the audio data stream.
DeepSpeech streaming APIを使う
audio機能を使うには、pyaudioをインストールする必要がある

$ sudo apt-get install portaudio19-dev
$ pip3 install pyaudio

# -*- coding: utf-8 -*-
#! /usr/bin/python3

import deepspeech
import wave
import numpy as np
import os
import pyaudio

model_file_path = 'deepspeech-0.9.3-models.pbmm'
model = deepspeech.Model(model_file_path)

context = model.createStream()

text_so_far = ''

def process_audio(in_data, frame_count, time_info, status):
	global text_so_far
	data16 = np.frombuffer(in_data, dtype=np.int16)
	model.feedAudioContent(context, data16)
	text = model.intermediateDecode(context)
	if text != text_so_far:
		print('Interim text = {}'.format(text))
		text_so_far = text
	return (in_data, pyaudio.paContinue)

audio = pyaudio.PyAudio()
stream = audio.open(
	format=pyaudio.paInt16,
	channels=1,
	rate=16000,
	input=True,
	frames_per_buffer=1024,
	stream_callback=process_audio
)
print('Please start speaking, when done press Ctr-c ...')
stream.start_stream()

try:
	while stream.is_active():
		time.sleep(0.1)
except KeyboardInterrupt:
	stream.stop_stream()
	stream.close()
	audio.terminate()
	print('Finished recording.')

	text = model.finishStream(context)
	print('Final text = {}'.format(text))

$ python3 transcribe.py
Traceback (most recent call last):
File “transcribe.py”, line 28, in
stream = audio.open(
File “/home/vagrant/deepspeech-venv/lib/python3.8/site-packages/pyaudio.py”, line 750, in open
stream = Stream(self, *args, **kwargs)
File “/home/vagrant/deepspeech-venv/lib/python3.8/site-packages/pyaudio.py”, line 441, in __init__
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid input device (no default output device)

vagrantだとテストできないな。。

>>> import pyaudio
>>> pa = pyaudio.PyAudio()
>>> pa.get_default_input_device_info()
OSError: No Default Input Device Available

結局ラズパイ環境を準備しないとダメか。。
DeepSpeechがかなり使えることはわかった。