[音声認識] DeepSpeechでvideoのAutoSub(srtファイル)作成

– AutoSub is a CLI application to generate subtile file for any video using DeepSpeech.

### install
$ git clone https://github.com/abhirooptalasila/AutoSub
$ cd AutoSub

### virtual env
$ python3 -m venv sub
$ source sub/bin/activate
$ pip3 install -r requirements.txt
requirementsの中身は以下の通りです。

cycler==0.10.0
numpy
deepspeech==0.9.3
joblib==0.16.0
kiwisolver==1.2.0
pydub==0.23.1
pyparsing==2.4.7
python-dateutil==2.8.1
scikit-learn
scipy==1.4.1
six==1.15.0
tqdm==4.44.1

$ deactivate

### download model & scorer
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
$ mkdir audio output

$ sudo apt-get install ffmpeg
$ ffmpeg -version
ffmpeg version 4.2.4-1ubuntu0.1

今回はyoutubeの動画を使います

これを mp4に変換します。

$ python3 autosub/main.py –file hello.mp4
※main.pyで、modelとscorerのファイルを取得しているため、–model /home/AutoSub/deepspeech-0.9.3-models.pbmm –scorer /home/AutoSub/deepspeech-0.9.3-models.scorerは不要です。

for x in os.listdir():
        if x.endswith(".pbmm"):
            print("Model: ", os.path.join(os.getcwd(), x))
            ds_model = os.path.join(os.getcwd(), x)
        if x.endswith(".scorer"):
            print("Scorer: ", os.path.join(os.getcwd(), x))
            ds_scorer = os.path.join(os.getcwd(), x)

output/hello.srt

1
00:00:06,70 --> 00:00:15,60
a low and low and level how are you have low low and low how are you

2
00:00:16,10 --> 00:00:30,20
i do i am great i wonder for a good i grant it wonder for

3
00:00:32,45 --> 00:00:41,30
now at low halloway hallo hallo hallo how are you

4
00:00:41,90 --> 00:00:43,40
tired

5
00:00:43,55 --> 00:00:50,35
i am angry i'm not so good i'm tired

6
00:00:50,55 --> 00:00:55,95
i'm hungry and not so good

7
00:00:58,10 --> 00:01:07,15
love hollow hollow how are you have to have loved halloo are you

8
00:01:07,30 --> 00:01:16,65
how how low how do how are you allow a love as now how are you

これ、日本語でやりたい & リアルタイム出力したい