[TTS] WAV – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

TTS が出力する音声ファイルとして最も一般的な形式のひとつです。
特徴：
音質が劣化しない
サイズは MP3 より大きい
音声処理（AI、機械学習、編集）に使われやすい
多くの TTS エンジンのデフォルト出力

from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY")

# TTSで音声（WAV）を生成
response = client.audio.speech.create(
    model="gpt-4o-mini-tts",
    voice="alloy",
    input="こんにちは、これはWAV形式のサンプルです。",
    format="wav"
)

# バイナリを保存
with open("sample.wav", "wb") as f:
    f.write(response.read())

print("sample.wav を作成しました")

Google cloud

from google.cloud import texttospeech

client = texttospeech.TextToSpeechClient()

input_text = texttospeech.SynthesisInput(text="こんにちは、WAVのサンプルです。")

voice = texttospeech.VoiceSelectionParams(
    language_code="ja-JP",
    name="ja-JP-Wavenet-B"
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.LINEAR16  # WAV
)

response = client.synthesize_speech(
    input=input_text,
    voice=voice,
    audio_config=audio_config
)

with open("sample.wav", "wb") as out:
    out.write(response.audio_content)

WAV = 高音質の音声ファイル形式
TTS の標準出力としてよく使われる
Python の TTS では簡単に WAV を生成できる