PyAudio has two modes: blocking, where data has to read from the stream; and non-blocking, where a callback function is passed to PyAudio for feeding the audio data stream.
DeepSpeech streaming APIを使う
audio機能を使うには、pyaudioをインストールする必要がある
$ sudo apt-get install portaudio19-dev
$ pip3 install pyaudio
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | # -*- coding: utf-8 -*- #! /usr/bin/python3 import deepspeech import wave import numpy as np import os import pyaudio model_file_path = 'deepspeech-0.9.3-models.pbmm' model = deepspeech.Model(model_file_path) context = model.createStream() text_so_far = '' def process_audio(in_data, frame_count, time_info, status): global text_so_far data16 = np.frombuffer(in_data, dtype = np.int16) model.feedAudioContent(context, data16) text = model.intermediateDecode(context) if text ! = text_so_far: print ( 'Interim text = {}' . format (text)) text_so_far = text return (in_data, pyaudio.paContinue) audio = pyaudio.PyAudio() stream = audio. open ( format = pyaudio.paInt16, channels = 1 , rate = 16000 , input = True , frames_per_buffer = 1024 , stream_callback = process_audio ) print ( 'Please start speaking, when done press Ctr-c ...' ) stream.start_stream() try : while stream.is_active(): time.sleep( 0.1 ) except KeyboardInterrupt: stream.stop_stream() stream.close() audio.terminate() print ( 'Finished recording.' ) text = model.finishStream(context) print ( 'Final text = {}' . format (text)) |
$ python3 transcribe.py
Traceback (most recent call last):
File “transcribe.py”, line 28, in
stream = audio.open(
File “/home/vagrant/deepspeech-venv/lib/python3.8/site-packages/pyaudio.py”, line 750, in open
stream = Stream(self, *args, **kwargs)
File “/home/vagrant/deepspeech-venv/lib/python3.8/site-packages/pyaudio.py”, line 441, in __init__
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid input device (no default output device)
vagrantだとテストできないな。。
>>> import pyaudio
>>> pa = pyaudio.PyAudio()
>>> pa.get_default_input_device_info()
OSError: No Default Input Device Available
結局ラズパイ環境を準備しないとダメか。。
DeepSpeechがかなり使えることはわかった。