Of course! Working with DirectSound in Python is a bit of a niche topic because it's a legacy Windows API. However, it's absolutely possible, and there are a couple of excellent ways to do it, depending on your goal.
Here's a complete guide covering the two main approaches:
- The Modern Approach:
pyaudiowpatch(Recommended) - The Direct Approach:
pydirectsound(For true, low-level control)
The Modern & Recommended Approach: pyaudiowpatch
This is the best method for 99% of use cases. It doesn't use DirectSound directly, but it uses Windows Audio Session API (WASAPI) in "exclusive mode," which gives you direct, low-latency access to the audio hardware, bypassing all system mixing. This is the modern equivalent of what DirectSound was used for.
Why is this better?
- Simple API: Much easier to use than raw DirectSound.
- No COM: You don't have to deal with the complexities of COM objects.
- Cross-Device: Easily capture or play to any available audio output or input device.
- High Performance: Offers the same low-latency, high-fidelity access as DirectSound.
Installation
pip install pyaudiowpatch
Example: Capturing System Audio (What "DirectSound Capture" usually means)
This example will capture the audio currently playing to your default output device and save it to a WAV file.
import pyaudiowpatch as pyaudio
import wave
import numpy as np
import time
# --- Configuration ---
FORMAT = pyaudio.paInt16 # 16-bit PCM
CHANNELS = 2 # Stereo
RATE = 44100 # Sample rate (standard CD quality)
CHUNK = 1024 # Number of frames per buffer
RECORD_SECONDS = 10 # How long to record
OUTPUT_FILENAME = "output.wav"
# --- Main Logic ---
def record_system_audio():
"""
Captures audio from the default output device and saves it to a WAV file.
"""
audio = pyaudio.PyAudio()
try:
# Get the default output device
wasapi_info = audio.get_wasapi_info()
default_output_device = wasapi_info['default_output_device']
print(f"Default Output Device: {default_output_device['name']}")
# Open a stream for capture in exclusive mode
# 'input=True' means we are capturing
# 'exclusive_mode=True' gives us direct access to the hardware
stream = audio.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
input_device_index=default_output_device['index'],
frames_per_buffer=CHUNK,
stream_callback=None # We'll use read() instead of callback for simplicity
)
print(f"Recording for {RECORD_SECONDS} seconds...")
frames = []
for _ in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("Recording finished.")
# Save the recorded data to a WAV file
wf = wave.open(OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(audio.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
except Exception as e:
print(f"An error occurred: {e}")
finally:
if 'stream' in locals() and stream.is_active():
stream.stop_stream()
stream.close()
audio.terminate()
print("Audio device terminated.")
if __name__ == "__main__":
record_system_audio()
Example: Playing a Sound with Low Latency
This example plays a WAV file with minimal delay.
import pyaudiowpatch as pyaudio
import time
import wave
def play_sound_low_latency(wav_file):
audio = pyaudio.PyAudio()
try:
# Get the default output device
wasapi_info = audio.get_wasapi_info()
default_output_device = wasapi_info['default_output_device']
print(f"Playing on device: {default_output_device['name']}")
wf = wave.open(wav_file, 'rb')
# Open a stream for playback in exclusive mode
stream = audio.open(
format=audio.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True,
output_device_index=default_output_device['index'],
frames_per_buffer=1024,
stream_callback=None
)
print("Playing...")
data = wf.readframes(1024)
while data:
stream.write(data)
data = wf.readframes(1024)
print("Finished playing.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
if 'stream' in locals() and stream.is_active():
stream.stop_stream()
stream.close()
wf.close()
audio.terminate()
print("Audio device terminated.")
if __name__ == "__main__":
# Make sure you have a 'test.wav' file
play_sound_low_latency("test.wav")
The Direct (Low-Level) Approach: pydirectsound
This library provides a more-or-less direct Python wrapper around the native DirectSound COM API. It's much more complex and is only useful if you need fine-grained control over DirectSound features like 3D positioning, echo effects, or hardware buffers that pyaudiowpatch doesn't expose.
Warning: This is an advanced topic. You will be dealing with COM objects, GUIDs, and complex buffer management.
Installation
pip install pydirectsound
Example: Playing a Simple WAV File
This example demonstrates the boilerplate needed to initialize DirectSound and play a sound.
import directsound
import time
import wave
import numpy as np
def play_with_pydirectsound(wav_file):
# 1. Initialize DirectSound
# You need a window handle. A simple one can be created with ctypes.
import ctypes
user32 = ctypes.windll.user32
hwnd = user32.CreateWindowExA(
0, b"STATIC", b"DSound Window", 0, 0, 0, 0, 0, 0, 0, 0, None
)
# Create the DirectSound object
# We'll use the primary sound buffer for playback
try:
ds = directsound.DirectSound(hwnd)
except Exception as e:
print(f"Failed to create DirectSound object: {e}")
return
# 2. Load the WAV file
wf = wave.open(wav_file, 'rb')
n_channels = wf.getnchannels()
sampwidth = wf.getsampwidth()
framerate = wf.getframerate()
n_frames = wf.getnframes()
audio_data = wf.readframes(n_frames)
wf.close()
# 3. Create a secondary buffer to hold the sound
# The buffer description needs to match the WAV file format
desc = directsound.DSBUFFERDESC()
desc.dwSize = ctypes.sizeof(desc)
desc.dwFlags = directsound.DSBCAPS_CTRLVOLUME # Allow volume control
desc.dwBufferBytes = len(audio_data)
desc.lpwfxFormat = directsound.WAVEFORMATEX()
desc.lpwfxFormat.wFormatTag = directsound.WAVE_FORMAT_PCM
desc.lpwfxFormat.nChannels = n_channels
desc.lpwfxFormat.nSamplesPerSec = framerate
desc.lpwfxFormat.wBitsPerSample = sampwidth * 8
desc.lpwfxFormat.nBlockAlign = (desc.lpwfxFormat.wBitsPerSample // 8) * n_channels
desc.lpwfxFormat.nAvgBytesPerSec = desc.lpwfxFormat.nSamplesPerSec * desc.lpwfxFormat.nBlockAlign
try:
# Create the secondary buffer
buffer = ds.CreateSoundBuffer(desc)
except Exception as e:
print(f"Failed to create sound buffer: {e}")
ds.Release()
return
# 4. Lock the buffer, write the audio data, and unlock it
# This is a necessary step for DirectSound
try:
# Lock the entire buffer
ptr1, ptr2, size1, size2 = buffer.Lock(0, desc.dwBufferBytes, 0)
ctypes.memmove(ptr1, audio_data, size1)
buffer.Unlock(ptr1, size2)
except Exception as e:
print(f"Failed to lock/unlock buffer: {e}")
buffer.Release()
ds.Release()
return
# 5. Play the buffer
print("Playing sound...")
buffer.Play(0) # 0 means play from the start, loop 0 times
# Wait for the sound to finish
# A simple sleep is not robust, but works for this example
# A better way is to check the buffer's status
time.sleep(3) # Adjust sleep time to match your audio length
# 6. Clean up resources (VERY IMPORTANT!)
print("Stopping and releasing resources.")
buffer.Stop()
buffer.Release()
ds.Release()
if __name__ == "__main__":
# Make sure you have a 'test.wav' file
play_with_pydirectsound("test.wav")
Summary and Recommendation
| Feature | pyaudiowpatch (WASAPI) |
pydirectsound (DirectSound) |
|---|---|---|
| Ease of Use | Excellent | Poor (Complex, requires COM knowledge) |
| Performance | Excellent (Low-latency, exclusive mode) | Excellent (Hardware acceleration) |
| Features | Core playback/capture, device selection | 3D audio, effects, complex buffer control |
| Dependencies | Pure Python, pyaudio |
Native Windows DLLs, ctypes |
| Use Case | Most audio applications: recording system audio, playing sounds with low latency, audio analysis. | Specialized applications: games, advanced audio effects, legacy system integration. |
Final Recommendation:
Start with pyaudiowpatch. It is the modern, simpler, and more Pythonic way to achieve direct, low-latency audio on Windows. You will be able to accomplish almost any task you need without the headache of COM programming.
Only turn to pydirectsound if you have a very specific need for a feature that is exclusive to the DirectSound API and not available through WASAPI.
