PyAudio has two modes: blocking, where data has to read from the stream; and non-blocking, where a callback function is passed to PyAudio for feeding the audio data stream.
DeepSpeech streaming APIを使う
audio機能を使うには、pyaudioをインストールする必要がある
$ sudo apt-get install portaudio19-dev
$ pip3 install pyaudio
# -*- coding: utf-8 -*- #! /usr/bin/python3 import deepspeech import wave import numpy as np import os import pyaudio model_file_path = 'deepspeech-0.9.3-models.pbmm' model = deepspeech.Model(model_file_path) context = model.createStream() text_so_far = '' def process_audio(in_data, frame_count, time_info, status): global text_so_far data16 = np.frombuffer(in_data, dtype=np.int16) model.feedAudioContent(context, data16) text = model.intermediateDecode(context) if text != text_so_far: print('Interim text = {}'.format(text)) text_so_far = text return (in_data, pyaudio.paContinue) audio = pyaudio.PyAudio() stream = audio.open( format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024, stream_callback=process_audio ) print('Please start speaking, when done press Ctr-c ...') stream.start_stream() try: while stream.is_active(): time.sleep(0.1) except KeyboardInterrupt: stream.stop_stream() stream.close() audio.terminate() print('Finished recording.') text = model.finishStream(context) print('Final text = {}'.format(text))
$ python3 transcribe.py
Traceback (most recent call last):
File “transcribe.py”, line 28, in
stream = audio.open(
File “/home/vagrant/deepspeech-venv/lib/python3.8/site-packages/pyaudio.py”, line 750, in open
stream = Stream(self, *args, **kwargs)
File “/home/vagrant/deepspeech-venv/lib/python3.8/site-packages/pyaudio.py”, line 441, in __init__
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid input device (no default output device)
vagrantだとテストできないな。。
>>> import pyaudio
>>> pa = pyaudio.PyAudio()
>>> pa.get_default_input_device_info()
OSError: No Default Input Device Available
結局ラズパイ環境を準備しないとダメか。。
DeepSpeechがかなり使えることはわかった。