杰瑞科技汇

Python如何使用DirectSound实现音频播放?

Of course! Working with DirectSound in Python is a bit of a niche topic because it's a legacy Windows API. However, it's absolutely possible, and there are a couple of excellent ways to do it, depending on your goal.

Here's a complete guide covering the two main approaches:

  1. The Modern Approach: pyaudiowpatch (Recommended)
  2. The Direct Approach: pydirectsound (For true, low-level control)

The Modern & Recommended Approach: pyaudiowpatch

This is the best method for 99% of use cases. It doesn't use DirectSound directly, but it uses Windows Audio Session API (WASAPI) in "exclusive mode," which gives you direct, low-latency access to the audio hardware, bypassing all system mixing. This is the modern equivalent of what DirectSound was used for.

Why is this better?

  • Simple API: Much easier to use than raw DirectSound.
  • No COM: You don't have to deal with the complexities of COM objects.
  • Cross-Device: Easily capture or play to any available audio output or input device.
  • High Performance: Offers the same low-latency, high-fidelity access as DirectSound.

Installation

pip install pyaudiowpatch

Example: Capturing System Audio (What "DirectSound Capture" usually means)

This example will capture the audio currently playing to your default output device and save it to a WAV file.

import pyaudiowpatch as pyaudio
import wave
import numpy as np
import time
# --- Configuration ---
FORMAT = pyaudio.paInt16  # 16-bit PCM
CHANNELS = 2             # Stereo
RATE = 44100            # Sample rate (standard CD quality)
CHUNK = 1024            # Number of frames per buffer
RECORD_SECONDS = 10     # How long to record
OUTPUT_FILENAME = "output.wav"
# --- Main Logic ---
def record_system_audio():
    """
    Captures audio from the default output device and saves it to a WAV file.
    """
    audio = pyaudio.PyAudio()
    try:
        # Get the default output device
        wasapi_info = audio.get_wasapi_info()
        default_output_device = wasapi_info['default_output_device']
        print(f"Default Output Device: {default_output_device['name']}")
        # Open a stream for capture in exclusive mode
        # 'input=True' means we are capturing
        # 'exclusive_mode=True' gives us direct access to the hardware
        stream = audio.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            input_device_index=default_output_device['index'],
            frames_per_buffer=CHUNK,
            stream_callback=None  # We'll use read() instead of callback for simplicity
        )
        print(f"Recording for {RECORD_SECONDS} seconds...")
        frames = []
        for _ in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
            data = stream.read(CHUNK)
            frames.append(data)
        print("Recording finished.")
        # Save the recorded data to a WAV file
        wf = wave.open(OUTPUT_FILENAME, 'wb')
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(audio.get_sample_size(FORMAT))
        wf.setframerate(RATE)
        wf.writeframes(b''.join(frames))
        wf.close()
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        if 'stream' in locals() and stream.is_active():
            stream.stop_stream()
            stream.close()
        audio.terminate()
        print("Audio device terminated.")
if __name__ == "__main__":
    record_system_audio()

Example: Playing a Sound with Low Latency

This example plays a WAV file with minimal delay.

import pyaudiowpatch as pyaudio
import time
import wave
def play_sound_low_latency(wav_file):
    audio = pyaudio.PyAudio()
    try:
        # Get the default output device
        wasapi_info = audio.get_wasapi_info()
        default_output_device = wasapi_info['default_output_device']
        print(f"Playing on device: {default_output_device['name']}")
        wf = wave.open(wav_file, 'rb')
        # Open a stream for playback in exclusive mode
        stream = audio.open(
            format=audio.get_format_from_width(wf.getsampwidth()),
            channels=wf.getnchannels(),
            rate=wf.getframerate(),
            output=True,
            output_device_index=default_output_device['index'],
            frames_per_buffer=1024,
            stream_callback=None
        )
        print("Playing...")
        data = wf.readframes(1024)
        while data:
            stream.write(data)
            data = wf.readframes(1024)
        print("Finished playing.")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        if 'stream' in locals() and stream.is_active():
            stream.stop_stream()
            stream.close()
        wf.close()
        audio.terminate()
        print("Audio device terminated.")
if __name__ == "__main__":
    # Make sure you have a 'test.wav' file
    play_sound_low_latency("test.wav") 

The Direct (Low-Level) Approach: pydirectsound

This library provides a more-or-less direct Python wrapper around the native DirectSound COM API. It's much more complex and is only useful if you need fine-grained control over DirectSound features like 3D positioning, echo effects, or hardware buffers that pyaudiowpatch doesn't expose.

Warning: This is an advanced topic. You will be dealing with COM objects, GUIDs, and complex buffer management.

Installation

pip install pydirectsound

Example: Playing a Simple WAV File

This example demonstrates the boilerplate needed to initialize DirectSound and play a sound.

import directsound
import time
import wave
import numpy as np
def play_with_pydirectsound(wav_file):
    # 1. Initialize DirectSound
    # You need a window handle. A simple one can be created with ctypes.
    import ctypes
    user32 = ctypes.windll.user32
    hwnd = user32.CreateWindowExA(
        0, b"STATIC", b"DSound Window", 0, 0, 0, 0, 0, 0, 0, 0, None
    )
    # Create the DirectSound object
    # We'll use the primary sound buffer for playback
    try:
        ds = directsound.DirectSound(hwnd)
    except Exception as e:
        print(f"Failed to create DirectSound object: {e}")
        return
    # 2. Load the WAV file
    wf = wave.open(wav_file, 'rb')
    n_channels = wf.getnchannels()
    sampwidth = wf.getsampwidth()
    framerate = wf.getframerate()
    n_frames = wf.getnframes()
    audio_data = wf.readframes(n_frames)
    wf.close()
    # 3. Create a secondary buffer to hold the sound
    # The buffer description needs to match the WAV file format
    desc = directsound.DSBUFFERDESC()
    desc.dwSize = ctypes.sizeof(desc)
    desc.dwFlags = directsound.DSBCAPS_CTRLVOLUME # Allow volume control
    desc.dwBufferBytes = len(audio_data)
    desc.lpwfxFormat = directsound.WAVEFORMATEX()
    desc.lpwfxFormat.wFormatTag = directsound.WAVE_FORMAT_PCM
    desc.lpwfxFormat.nChannels = n_channels
    desc.lpwfxFormat.nSamplesPerSec = framerate
    desc.lpwfxFormat.wBitsPerSample = sampwidth * 8
    desc.lpwfxFormat.nBlockAlign = (desc.lpwfxFormat.wBitsPerSample // 8) * n_channels
    desc.lpwfxFormat.nAvgBytesPerSec = desc.lpwfxFormat.nSamplesPerSec * desc.lpwfxFormat.nBlockAlign
    try:
        # Create the secondary buffer
        buffer = ds.CreateSoundBuffer(desc)
    except Exception as e:
        print(f"Failed to create sound buffer: {e}")
        ds.Release()
        return
    # 4. Lock the buffer, write the audio data, and unlock it
    # This is a necessary step for DirectSound
    try:
        # Lock the entire buffer
        ptr1, ptr2, size1, size2 = buffer.Lock(0, desc.dwBufferBytes, 0)
        ctypes.memmove(ptr1, audio_data, size1)
        buffer.Unlock(ptr1, size2)
    except Exception as e:
        print(f"Failed to lock/unlock buffer: {e}")
        buffer.Release()
        ds.Release()
        return
    # 5. Play the buffer
    print("Playing sound...")
    buffer.Play(0) # 0 means play from the start, loop 0 times
    # Wait for the sound to finish
    # A simple sleep is not robust, but works for this example
    # A better way is to check the buffer's status
    time.sleep(3) # Adjust sleep time to match your audio length
    # 6. Clean up resources (VERY IMPORTANT!)
    print("Stopping and releasing resources.")
    buffer.Stop()
    buffer.Release()
    ds.Release()
if __name__ == "__main__":
    # Make sure you have a 'test.wav' file
    play_with_pydirectsound("test.wav")

Summary and Recommendation

Feature pyaudiowpatch (WASAPI) pydirectsound (DirectSound)
Ease of Use Excellent Poor (Complex, requires COM knowledge)
Performance Excellent (Low-latency, exclusive mode) Excellent (Hardware acceleration)
Features Core playback/capture, device selection 3D audio, effects, complex buffer control
Dependencies Pure Python, pyaudio Native Windows DLLs, ctypes
Use Case Most audio applications: recording system audio, playing sounds with low latency, audio analysis. Specialized applications: games, advanced audio effects, legacy system integration.

Final Recommendation:

Start with pyaudiowpatch. It is the modern, simpler, and more Pythonic way to achieve direct, low-latency audio on Windows. You will be able to accomplish almost any task you need without the headache of COM programming.

Only turn to pydirectsound if you have a very specific need for a feature that is exclusive to the DirectSound API and not available through WASAPI.

分享:
扫描分享到社交APP
上一篇
下一篇