[音声認識] DeepSpeechをPythonでテキスト出力(batch/stream)

$ python3 –version
Python 3.8.10

### batch API
– 全てのwavファイルを読み込んで処理

# -*- coding: utf-8 -*-
#! /usr/bin/python3

import deepspeech
import wave
import numpy as np

model_file_path = 'deepspeech-0.9.3-models.pbmm'
model = deepspeech.Model(model_file_path)

filename = 'audio/8455-210777-0068.wav'
w = wave.open(filename, 'r')
rate = w.getframerate()
frames = w.getnframes()
buffer = w.readframes(frames)

data16 = np.frombuffer(buffer, dtype=np.int16)
type(data16)
text = model.stt(data16)
print(text)

$ python3 app.py
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-08-28 08:55:38.538633: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
your paret is sufficient i said

### stream API
– bufferサイズごとに処理

// 上部省略
context = model.createStream()

buffer_len = len(buffer)
offset = 0
batch_size = 16384
text = ''

while offset < buffer_len:
	end_offset = offset + batch_size
	chunk = buffer[offset:end_offset]
	data16 = np.frombuffer(chunk, dtype=np.int16)
	context.feedAudioContent(data16)
	text = context.intermediateDecode()
	print(text)
	offset = end_offset

$ python3 app.py
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-08-28 09:15:50.970216: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

your paret
your paret is suff
your paret is sufficient i said
your paret is sufficient i said

ほう、これは中々凄いですね。
あとはTranscriberか。