杰瑞科技汇

Python音频工具有哪些实用功能?

Of course! Python has a fantastic ecosystem for audio processing, ranging from simple playback and recording to complex analysis and machine learning. Here’s a comprehensive guide to the most popular and useful Python audio tools, broken down by category.

Python音频工具有哪些实用功能?-图1
(图片来源网络,侵删)

The Big Three: The Foundation of Audio in Python

These are the libraries you'll encounter most often. Each has its strengths.

Library Primary Use Case Key Strengths Learning Curve
Librosa Music and Audio Analysis Built for signal processing, feature extraction (MFCCs, spectral contrast), beat tracking, and visualization. The go-to for music information retrieval (MIR). Moderate
PyDub Audio Manipulation Extremely simple and intuitive for common tasks like cutting, concatenating, converting formats, applying effects, and adjusting volume. Great for scripting. Very Easy
PyAudio Audio I/O (Input/Output) Low-level streaming for real-time audio capture (microphone) and playback. The foundation for many real-time applications. Moderate to Hard

For Audio Analysis and Feature Extraction (Librosa)

If you're working with music, speech, or any audio where you need to understand its content (e.g., genre classification, chord recognition, speaker identification), Librosa is your best friend.

Installation:

pip install librosa

Common Use Cases:

Python音频工具有哪些实用功能?-图2
(图片来源网络,侵删)
  • Loading audio files.
  • Visualizing audio waveforms and spectrograms.
  • Extracting features like Mel-Frequency Cepstral Coefficients (MFCCs), spectral centroid, zero-crossing rate.
  • Finding the tempo (BPM) and beat locations of a song.

Example Code:

import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
# 1. Load an audio file
# librosa loads audio at 22050Hz by default and converts to mono
y, sr = librosa.load('path/to/your/song.wav')
# y: the audio time series (as a NumPy array)
# sr: the sampling rate
print(f"Audio length: {len(y)/sr:.2f} seconds")
print(f"Sample rate: {sr} Hz")
# 2. Visualize the waveform
plt.figure(figsize=(14, 5))
librosa.display.waveshow(y, sr=sr)"Waveform")
plt.show()
# 3. Compute a Short-Time Fourier Transform (STFT) to get a spectrogram
D = librosa.stft(y)  # STFT
magnitude = np.abs(D) # Get magnitude
# 4. Visualize the spectrogram
plt.figure(figsize=(14, 5))
librosa.display.specshow(librosa.amplitude_to_db(magnitude, ref=np.max), 
                         sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')"Log-frequency power spectrogram")
plt.show()
# 5. Extract a common feature: MFCCs
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
print(f"MFCCs shape: {mfccs.shape}") # (number of coefficients, number of time frames)
# 6. Estimate the tempo
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
print(f"Estimated tempo: {tempo:.2f} BPM")

For Simple Audio Manipulation (PyDub)

PyDub is perfect for tasks that don't require deep analysis. Think of it as the "Pillow" (for images) or "Pandas" (for data) of audio. It's a joy to use for scripting.

Installation:

pip install pydub

Note: PyDub depends on FFmpeg. You must have FFmpeg installed and in your system's PATH for PyDub to work. Download FFmpeg here.

Python音频工具有哪些实用功能?-图3
(图片来源网络,侵删)

Common Use Cases:

  • Convert between audio formats (MP3, WAV, OGG, etc.).
  • Cut, trim, or concatenate audio segments.
  • Change volume (gain).
  • Apply simple effects like fade in/out.
  • Mix audio tracks.

Example Code:

from pydub import AudioSegment
from pydub.utils import make_chunks
# 1. Load an audio file (supports many formats)
# Note: This requires ffmpeg
sound = AudioSegment.from_mp3("my_song.mp3")
# 2. Get basic info
print(f"Channels: {sound.channels}")
print(f"Frame rate: {sound.frame_rate} Hz")
print(f"Sample width: {sound.sample_width} bytes")
print(f"Max dBFS: {sound.max_dBFS}")
# 3. Manipulate audio
# Export to a different format
sound.export("my_song.wav", format="wav")
# Increase volume by 6 dB
louder_sound = sound + 6
# Decrease volume by 3 dB
quieter_sound = sound - 3
# Fade in for 2 seconds, fade out for 3 seconds
faded_sound = sound.fade_in(2000).fade_out(3000)
# Overlay two sounds (e.g., add background music)
background_music = AudioSegment.from_mp3("background.mp3")
overlayed = sound.overlay(background_music, position=15000) # Start background at 15s
# 4. Cut audio
# Get the first 10 seconds
first_10_seconds = sound[:10000]
# Get the last 30 seconds
last_30_seconds = sound[-30000:]
# Trim silence from the beginning/end
trimmed_sound = sound.strip_silence()
# 5. Split audio into chunks
chunk_length_ms = 5000 # 5-second chunks
chunks = make_chunks(sound, chunk_length_ms)
# Save the chunks
for i, chunk in enumerate(chunks):
    chunk_name = "chunk_{0}.mp3".format(i)
    print(f"exporting {chunk_name}")
    chunk.export(chunk_name, format="mp3")

For Real-time Audio I/O (PyAudio)

PyAudio provides Python bindings for the PortAudio library, allowing you to play and record audio streams. This is essential for building real-time applications like voice changers, audio effects processors, or speech recognition systems.

Installation:

pip install pyaudio

Note: On some systems (especially Windows), you might need to install it from a wheel file. If you have issues, search for pyaudio wheels for your specific Python version and OS.

Common Use Cases:

  • Recording audio from a microphone.
  • Playing audio files or generated sounds.
  • Building real-time audio effects (e.g., echo, reverb).
  • Streaming audio over a network.

Example Code: This simple script records audio from your microphone for 5 seconds and saves it.

import pyaudio
import wave
# --- Configuration ---
FORMAT = pyaudio.paInt16       # Format of sample
CHANNELS = 1                   # Number of channels
RATE = 44100                   # Sampling rate
CHUNK = 1024                   # Frames per buffer
RECORD_SECONDS = 5             # Duration of recording
OUTPUT_FILENAME = "output.wav" # Output file name
# --- Setup PyAudio ---
audio = pyaudio.PyAudio()
# --- Start Recording ---
print("Recording...")
stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
frames = []
for _ in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)
print("Finished recording.")
# --- Stop and Cleanup ---
stream.stop_stream()
stream.close()
audio.terminate()
# --- Save the recording as a WAV file ---
wf = wave.open(OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(audio.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
print(f"Audio saved as {OUTPUT_FILENAME}")

For Advanced Audio Synthesis and Effects (Pyo)

Pyo is a powerful module for creating digital signal processing (DSP) applications. It's more complex than PyDub but offers immense control for creating sounds, synthesizers, and custom audio effects from scratch.

Installation:

pip install pyo

Common Use Cases:

  • Synthesizing sounds (oscillators, noise).
  • Creating custom audio effects (filters, delays, granular synthesis).
  • Interactive audio installations.
  • Algorithmic composition.

For Speech Recognition

This is a specialized but very common task. The SpeechRecognition library is a great wrapper around several powerful speech recognition APIs.

Installation:

pip install SpeechRecognition

Note: You'll also need a library for audio playback/recording like pyaudio or sounddevice.

Example Code:

import speech_recognition as sr
# Create a recognizer instance
r = sr.Recognizer()
# Load an audio file
# Make sure you have a microphone.wav file
with sr.AudioFile("microphone.wav") as source:
    # Listen for the data (load the audio to memory)
    audio_data = r.record(source)
    try:
        # Recognize speech using Google Web Speech API
        text = r.recognize_google(audio_data)
        print("You said: " + text)
    except sr.UnknownValueError:
        print("Google Web Speech could not understand audio")
    except sr.RequestError as e:
        print(f"Could not request results from Google Web Speech service; {e}")

Summary and How to Choose

Your Goal Recommended Tool(s)
"I just want to cut/convert/merge this audio file." PyDub (Easiest and fastest for this)
"I need to analyze this music for its tempo, key, or genre." Librosa (The standard for audio analysis)
"I'm building a real-time app that listens to the mic." PyAudio (For low-level streaming)
"I want to build a synthesizer or a custom audio effect." Pyo (For advanced DSP and synthesis)
"I need to transcribe speech from an audio file." SpeechRecognition (A great wrapper for APIs)
"I need to play a sound in a GUI app (like Tkinter or Pygame)." Use the built-in sound capabilities of your GUI library.

For most users, starting with Librosa for analysis and PyDub for manipulation will cover 95% of your needs. Add PyAudio when you need to go real-time.

分享:
扫描分享到社交APP
上一篇
下一篇