杰瑞科技汇

Python sounddevice如何播放音频?

Of course! sounddevice is a fantastic and popular Python library for audio processing. It provides a clean and powerful interface to PortAudio, the cross-platform audio I/O library.

Python sounddevice如何播放音频?-图1
(图片来源网络,侵删)

Here’s a comprehensive guide covering what it is, how to install it, and how to use it for common tasks.


What is sounddevice?

sounddevice is a Python library for playing and recording audio streams. Its main features are:

  • Cross-Platform: Works on Windows, macOS, and Linux.
  • Simple API: The syntax is intuitive and easy to learn.
  • Real-time Capabilities: Designed for low-latency audio input and output, making it great for live audio applications.
  • NumPy Integration: It works seamlessly with NumPy arrays, which is the standard for numerical computing in Python. This makes it easy to process audio with libraries like SciPy and NumPy itself.
  • Full-Duplex Support: You can record and play audio at the same time.

Under the Hood: It's a Python wrapper around the PortAudio library.


Installation

First, you need to install the library. It's recommended to install it using pip.

Python sounddevice如何播放音频?-图2
(图片来源网络,侵删)
pip install sounddevice

Important Prerequisite: PortAudio

sounddevice is just a wrapper. You need the underlying PortAudio library installed on your system for it to work.

  • On macOS (using Homebrew):
    brew install portaudio
  • On Debian/Ubuntu:
    sudo apt-get update
    sudo apt-get install libportaudio2
  • On Windows: The installer from pip usually handles this automatically, but if you encounter issues, you may need to download the PortAudio binaries and add them to your system's PATH.

Core Concepts: NumPy and Audio

sounddevice represents audio as NumPy arrays.

  • A mono audio signal is a 1D NumPy array (e.g., np.array of shape (N,)).
  • A stereo audio signal is a 2D NumPy array (e.g., np.array of shape (N, 2)), where each row contains the left and right channel samples for that time step.
  • The data type of the array is important. Common types are float32 (range -1.0 to 1.0) and int16 (range -32768 to 32767). sounddevice often defaults to float32.

Common Use Cases with Code Examples

Let's dive into the most common tasks.

A. Playing a Sound

You can play a NumPy array or a WAV file directly.

Playing a NumPy Array (a simple sine wave)

This is the "Hello, World!" of audio programming.

import numpy as np
import sounddevice as sd
import time
# 1. Define parameters
sample_rate = 44100  # Hertz
frequency = 440      # Hertz (A4 note)
duration = 3         # seconds
# 2. Generate the audio data (a sine wave)
# t is a time vector from 0 to duration
t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
# The sine wave formula: A * sin(2 * pi * f * t)
amplitude = 0.5  # Keep amplitude below 1.0 to avoid clipping
audio_data = amplitude * np.sin(2 * np.pi * frequency * t)
# 3. Play the audio
print(f"Playing a {frequency}Hz tone for {duration} seconds...")
sd.play(audio_data, samplerate=sample_rate)
# 4. Wait for the playback to finish before the script ends
sd.wait() 
print("Playback finished.")

Playing a WAV File

sounddevice has a convenient play function that can read WAV files directly.

import sounddevice as sd
# The filename of your WAV file
filename = 'my_audio.wav' 
print(f"Playing file: {filename}")
sd.play(filename) # sd.play can take a filename directly
sd.wait()
print("Playback finished.")

B. Recording Audio

Recording is just as straightforward. The sd.rec() function starts a recording and returns immediately. You must use sd.wait() to block until the recording is complete.

import numpy as np
import sounddevice as sd
# 1. Define parameters
duration = 5  # seconds
sample_rate = 44100
channels = 2  # for stereo recording
# 2. Start recording
# sd.rec() returns immediately, so we store the returned object
print("Recording started. Speak into your microphone...")
recording = sd.rec(int(duration * sample_rate), 
                   samplerate=sample_rate, 
                   channels=channels,
                   dtype='float32') # Use float32 for better processing
# 3. Wait for the recording to finish
sd.wait()
# 4. The recording is now a NumPy array
print("Recording finished.")
print(f"Recording shape: {recording.shape}") # Should be (duration * sample_rate, channels)
# You can now save or process the recording
# For example, save it to a WAV file
from scipy.io import wavfile
wavfile.write('my_recording.wav', sample_rate, recording)
print("Recording saved as 'my_recording.wav'")

C. Full-Duplex Audio (Simultaneous Recording and Playback)

This is a powerful feature. A classic example is an audio delay effect.

import numpy as np
import sounddevice as sd
import time
# Parameters
sample_rate = 44100
duration = 10  # seconds of audio to buffer
delay_seconds = 1.0
# Create a buffer to store audio
# We'll use a circular buffer approach
buffer_size = int(duration * sample_rate)
delay_samples = int(delay_seconds * sample_rate)
audio_buffer = np.zeros((buffer_size, 2), dtype='float32') # Stereo buffer
write_idx = 0
# The callback function is called by PortAudio for each audio block
def callback(indata, outdata, frames, time_info, status):
    """
    This function is called in real-time.
    - indata:  The input audio (from microphone).
    - outdata: The output audio (to speakers). We must fill this.
    """
    global write_idx, audio_buffer
    # 1. Read the new input and store it in our buffer
    audio_buffer[write_idx : write_idx + frames] = indata
    write_idx = (write_idx + frames) % buffer_size
    # 2. Find the index to read from for the delayed sound
    read_idx = (write_idx - delay_samples) % buffer_size
    # 3. Get the delayed audio from the buffer
    delayed_audio = audio_buffer[read_idx : read_idx + frames]
    # 4. Mix the original input (indata) with the delayed audio
    #    This creates the echo effect.
    outdata[:] = indata + 0.5 * delayed_audio # 0.5 is the volume of the echo
# Create the stream
# We need to specify the input and output devices
# device=(input_device_id, output_device_id) or None for default
# loopback=True can also be useful, but a callback is more flexible here
print("Starting full-duplex stream with a 1-second delay...")
with sd.InputStream(callback=callback,
                    samplerate=sample_rate,
                    channels=2,
                    blocksize=1024):
    # The 'with' statement keeps the stream open.
    # We just sleep for the desired duration.
    time.sleep(duration)
print("Stream closed.")

D. Querying Audio Devices

You can list all available input and output devices to find their IDs, which you can then use to select a specific device (e.g., a USB microphone or headphones).

import sounddevice as sd
# Print a list of all devices
print("Available audio devices:")
sd.query_devices()
# You can also get a more compact list
print("\n--- Device List ---")
for i, device in enumerate(sd.query_devices()):
    print(f"ID: {i}, Name: '{device['name']}', Max Input Channels: {device['max_input_channels']}, Max Output Channels: {device['max_output_channels']}")

You would then use the device ID when creating a stream: sd.InputStream(device=3, ...) to use the device with ID 3.


Summary of Key Functions

Function Description
sd.play(data, samplerate, ...) Plays audio from a NumPy array or a WAV file.
sd.rec(frames, samplerate, ...) Starts recording audio into a NumPy array. Returns immediately.
sd.wait() Blocks until playback or recording is finished. Essential to use after sd.play() or sd.rec().
sd.stop() Stops a currently playing or recording stream.
sd.query_devices() Returns a list of all available audio devices with their details.
sd.InputStream(...) Creates an input stream for advanced, real-time control (like full-duplex).
sd.OutputStream(...) Creates an output stream for advanced, real-time control.
sd.Stream(...) Creates a full-duplex stream (both input and output).

Alternatives

While sounddevice is excellent, it's good to know other options:

  • pyaudio: The original Python wrapper for PortAudio. It's more low-level and can be more complex to use, but it offers very fine-grained control.
  • simpleaudio / pydub: These are higher-level libraries focused on simplicity. simpleaudio is great for simple playback of WAV files, while pydub is excellent for manipulating audio files (e.g., converting formats, slicing, applying effects) but not for real-time streaming.
  • librosa: A specialized library for music and audio analysis. It's built on top of sounddevice (or pyaudio) and NumPy/SciPy, providing tools for feature extraction (like MFCCs), rhythm analysis, and more. If your goal is analysis, librosa is a better choice.

For most general-purpose audio I/O tasks in Python, sounddevice is the recommended choice due to its balance of power, simplicity, and performance.

分享:
扫描分享到社交APP
上一篇
下一篇