杰瑞科技汇

SphinxTrain Python如何安装使用?

Of course. Let's dive into using SphinxTrain with Python.

SphinxTrain Python如何安装使用?-图1
(图片来源网络,侵删)

This is a crucial topic because SphinxTrain itself is a legacy C/C++ toolkit, but its modern successors and the entire ecosystem around speech recognition are dominated by Python. Understanding how they connect is key.

Here’s a comprehensive guide, broken down into:

  1. The Core Relationship: How Python and SphinxTrain interact.
  2. Direct Python API: The legacy, limited way to use it.
  3. The Modern Python Way: Using PocketSphinx and CMU Sphinx tools.
  4. A Practical Workflow: Training a model from Python.
  5. Key Takeaways.

The Core Relationship: Python as the "Glue"

It's a common misconception that you can train a Sphinx model directly from a Python script. You generally cannot. Here's why and how Python fits in:

  • SphinxTrain is C/C++: The core training algorithms in SphinxTrain are written in C and C++ for maximum performance. They are computationally intensive and not designed to be called as a Python library.
  • Python is the Orchestrator: Your role in Python is to prepare the data, run the C++ executables, and process the results. Python acts as the high-level "glue" that automates the entire pipeline.

Think of it like this: Python Script -> (Generates config files & data) -> SphinxTrain Executable -> (Trains the model) -> Python Script -> (Loads & uses the trained model)

SphinxTrain Python如何安装使用?-图2
(图片来源网络,侵删)

The Direct (But Limited) Python API in SphinxTrain

SphinxTrain does have a Python module, sphinxtrain, but it's primarily for post-processing and analyzing the results of a training run, not for initiating the training itself.

You would use it like this:

import sphinxtrain as st
# Example: Load a result file and inspect it
# This is typically done AFTER training is complete from the command line.
try:
    # Load a result file generated during training
    results = st.ResultsReader('results/result.mlf')
    # Iterate through the results
    for utt_id, trans in results:
        print(f"Utterance ID: {utt_id}")
        print(f"Transcription: {trans}")
        print("-" * 20)
except FileNotFoundError:
    print("Error: Training results file not found. Run SphinxTrain first.")
except Exception as e:
    print(f"An error occurred: {e}")

Key takeaway: Don't expect to find a train_model() function in this module. Its purpose is different.


The Modern Python Way: PocketSphinx and CMU Sphinx Tools

For most new projects, you won't use SphinxTrain directly. Instead, you'll use its modern, Python-friendly descendants.

SphinxTrain Python如何安装使用?-图3
(图片来源网络,侵删)

A) PocketSphinx (for Recognition)

This is the de facto Python library for running Sphinx recognition. It's fast, easy to install, and perfect for applications.

Installation:

pip install pocketsphinx

Simple Usage:

import speech_recognition as sr
# You can use the built-in recognizer or the PocketSphinx one directly
# For more control, use PocketSphinx directly:
from pocketsphinx import LiveSpeech, get_model_path
model_path = get_model_path()
# Create a live speech recognition object
speech = LiveSpeech(
    verbose=False,
    sampling_rate=16000,
    buffer_size=2048,
    no_search=False,
    full_utt=False,
    hmm=model_path + '/en-us/en-us',
    lm=model_path + '/en-us/en-us.lm.bin',
    dict=model_path + '/en-us/cmudict-en-us.dict'
)
print("Listening...")
for phrase in speech:
    print(phrase)
    if "exit" in str(phrase):
        break

B) CMU Sphinx Training Tools (Modern Alternative)

The CMU Sphinx project has developed newer, more user-friendly tools for training, often with Python wrappers. The most common one is sphinxtrain's successor, which is still evolving but often involves using scripts that call the core binaries.

A popular modern workflow involves using:

  • Python Scripts for data preparation (creating fileids, transcription files).
  • sphinxtrain binaries for the actual feature extraction (sphinx_fe) and model training (bw, mk hmm, etc.).
  • Python again to package the final model for use with PocketSphinx.

A Practical Workflow: Training a Model with Python as the Orchestrator

Let's walk through a simplified, conceptual workflow. Imagine you have a folder of audio files (my_wavs/) and their corresponding transcriptions (my_transcripts/).

Step 1: Data Preparation (Python Script)

You write a Python script (prepare_data.py) to create the files sphinxtrain needs.

# prepare_data.py
import os
import glob
# --- Configuration ---
AUDIO_DIR = "my_wavs/"
TRANSCRIPT_DIR = "my_transcripts/"
OUTPUT_DIR = "sphinx_data/"
FILEIDS_FILE = os.path.join(OUTPUT_DIR, "fileids.scp")
TRANSCRIPT_FILE = os.path.join(OUTPUT_DIR, "transcripts.scp")
# --- Create necessary directories ---
os.makedirs(OUTPUT_DIR, exist_ok=True)
# --- Write fileids.scp (list of audio files) ---
with open(FILEIDS_FILE, 'w') as f:
    for wav_file in sorted(glob.glob(os.path.join(AUDIO_DIR, "*.wav"))):
        # Get filename without extension
        file_id = os.path.splitext(os.path.basename(wav_file))[0]
        f.write(f"{file_id} {wav_file}\n")
# --- Write transcripts.scp (list of transcriptions) ---
with open(TRANSCRIPT_FILE, 'w') as f:
    for wav_file in sorted(glob.glob(os.path.join(AUDIO_DIR, "*.wav"))):
        file_id = os.path.splitext(os.path.basename(wav_file))[0]
        # Assuming a corresponding .txt file exists
        transcript_file = os.path.join(TRANSCRIPT_DIR, f"{file_id}.txt")
        with open(transcript_file, 'r') as t:
            transcript = t.read().strip().upper()
        f.write(f"{file_id} {transcript}\n")
print(f"Data preparation complete. Files saved in {OUTPUT_DIR}")

Step 2: Run SphinxTrain from Python (using subprocess)

Now, you write another script (run_training.py) that calls the SphinxTrain command-line tools. This is where you orchestrate the process.

# run_training.py
import subprocess
import os
# --- Configuration ---
SPHINXTRAIN_PATH = "/path/to/your/sphinxtrain/installation"
WORK_DIR = "sphinx_data/"
CONFIG_FILE = "train.config" # You would create this config file
print("Step 1: Feature Extraction (sphinx_fe)...")
subprocess.run([
    os.path.join(SPHINXTRAIN_PATH, "scripts/sphinx_fe",
    "-c", CONFIG_FILE,
    "-di", WORK_DIR,
    "-do", WORK_DIR,
    "-ei", "wav",
    "-eo", "mfc",
    "-includeep", "no"
], check=True)
print("Step 2: Training Initialization (mk_sphinx_lm)...")
subprocess.run([
    os.path.join(SPHINXTRAIN_PATH, "scripts/mk_sphinx_lm.pl",
    "-train", WORK_DIR + "transcripts.scp",
    "-dir", WORK_DIR,
    "-name", "my_lm"
], check=True)
print("Step 3: Training the Acoustic Model (bw, etc.)...")
# This is a simplified command. Real training involves many steps (ci, triphone training, etc.)
# You would run a series of commands like:
# mk_hmm
# bw
# lw
# ... and many more
subprocess.run([
    os.path.join(SPHINXTRAIN_PATH, "programs/bw"),
    "-hmmdir", "my_model",
    "-moddeffn", "my_model/defs",
    "-ts2cbfn", ".ptm.", # Phone transition model
    "-feat", "1s_c_d_dd", # Feature type
    "-svspec", "0-12/13-25/26-38/39-51/52-64/65-77/78-90", # State spec
    "-cmn", "current", # Cepstral mean normalization
    "-agc", "no", # Automatic gain control
    "-dictf", "my_model/cmu07a.dic", # Dictionary
    "-ctlfn", "sphinx_data/fileids.scp", # Control file
    "-lsnfn", "sphinx_data/transcripts.scp", # Transcription file
    "-cepdir", "sphinx_data/mfc", # Feature directory
    "-accumdir", "my_model/accumdir" # Accumulator directory
], check=True)
print("Training complete! The model is in 'my_model'.")

Step 3: Use the Model (PocketSphinx)

After the run_training.py script finishes, you will have a new model directory (my_model/). You can point your PocketSphinx application to it.

# recognize_with_my_model.py
from pocketsphinx import get_model_path, Decoder
# Point to your newly trained model
MODEL_PATH = "./my_model" # Path to your trained model
DICTIONARY_PATH = os.path.join(MODEL_PATH, "cmu07a.dic")
LANGUAGE_MODEL_PATH = os.path.join(MODEL_PATH, "my.lm") # You'd need to build this too
CONFIG_FILE = os.path.join(MODEL_PATH, "model_parameters") # HMM config
config = Decoder.default_config()
config.set_string('-hmm', MODEL_PATH)
config.set_string('-dict', DICTIONARY_PATH)
config.set_string('-lm', LANGUAGE_MODEL_PATH)
config.set_string('-logfn', 'logfile.txt') # To avoid printing to console
decoder = Decoder(config)
# Assume you have a new audio file to recognize
with open("new_audio.wav", "rb") as f:
    decoder.start_utt()
    decoder.process_raw(f.read(), False, True)
    decoder.end_utt()
print("Hypothesis: " + decoder.hyp().hypstr)

Key Takeaways

  1. No Direct Python Training API: SphinxTrain is a C++ toolkit. You don't call its training functions from Python directly.
  2. Python is the Orchestrator: Use Python to automate the workflow: prepare data, run command-line sphinxtrain tools (like sphinx_fe, bw), and process results.
  3. Use subprocess: The standard Python library for running external executables from a script is subprocess. This is your primary tool for controlling SphinxTrain.
  4. For Most Applications, Use PocketSphinx: If you just need to do speech recognition, install and use pocketsphinx. It's the modern, Pythonic way to consume a model.
  5. Consider Modern Alternatives: Training your own ASR model is complex. For simpler projects, consider cloud services (Google Speech-to-Text, AWS Transcribe). For more control, explore end-to-end deep learning toolkits like Vosk (which has a Python API) or NVIDIA NeMo.
分享:
扫描分享到社交APP
上一篇
下一篇