PyAudio has two modes: blocking, where data has to read from the stream; and non-blocking, where a callback function is passed to PyAudio for feeding the audio data stream.
DeepSpeech streaming APIを使う
audio機能を使うには、pyaudioをインストールする必要がある
$ sudo apt-get install portaudio19-dev
$ pip3 install pyaudio
# -*- coding: utf-8 -*-
#! /usr/bin/python3
import deepspeech
import wave
import numpy as np
import os
import pyaudio
model_file_path = 'deepspeech-0.9.3-models.pbmm'
model = deepspeech.Model(model_file_path)
context = model.createStream()
text_so_far = ''
def process_audio(in_data, frame_count, time_info, status):
global text_so_far
data16 = np.frombuffer(in_data, dtype=np.int16)
model.feedAudioContent(context, data16)
text = model.intermediateDecode(context)
if text != text_so_far:
print('Interim text = {}'.format(text))
text_so_far = text
return (in_data, pyaudio.paContinue)
audio = pyaudio.PyAudio()
stream = audio.open(
format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=1024,
stream_callback=process_audio
)
print('Please start speaking, when done press Ctr-c ...')
stream.start_stream()
try:
while stream.is_active():
time.sleep(0.1)
except KeyboardInterrupt:
stream.stop_stream()
stream.close()
audio.terminate()
print('Finished recording.')
text = model.finishStream(context)
print('Final text = {}'.format(text))
$ python3 transcribe.py
Traceback (most recent call last):
File “transcribe.py”, line 28, in
stream = audio.open(
File “/home/vagrant/deepspeech-venv/lib/python3.8/site-packages/pyaudio.py”, line 750, in open
stream = Stream(self, *args, **kwargs)
File “/home/vagrant/deepspeech-venv/lib/python3.8/site-packages/pyaudio.py”, line 441, in __init__
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid input device (no default output device)
vagrantだとテストできないな。。
>>> import pyaudio
>>> pa = pyaudio.PyAudio()
>>> pa.get_default_input_device_info()
OSError: No Default Input Device Available
結局ラズパイ環境を準備しないとダメか。。
DeepSpeechがかなり使えることはわかった。