Python如何实现Levenshtein距离算法？-杰瑞科技汇

Of course! The Levenshtein distance is a measure of the difference between two sequences. It's the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.

（图片来源网络，侵删）

For example:

The Levenshtein distance between "kitten" and "sitting" is 3.
- k -> s (substitution)
- e -> i (substitution)
- Insert g at the end

Here's a complete guide on how to use and implement the Levenshtein distance in Python.

The Easy Way: Using the `python-Levenshtein` Library

For most applications, the best approach is to use a highly optimized, third-party library. The most common one is python-Levenshtein.

Installation

First, you need to install it. It's a C-optimized library, so it's very fast.

（图片来源网络，侵删）

pip install python-Levenshtein

Usage

The library provides a simple function distance().

import Levenshtein
# --- Basic Usage ---
str1 = "kitten"
str2 = "sitting"
distance = Levenshtein.distance(str1, str2)
print(f"The Levenshtein distance between '{str1}' and '{str2}' is: {distance}")
# Output: The Levenshtein distance between 'kitten' and 'sitting' is: 3
# --- Other Examples ---
print(Levenshtein.distance("book", "back"))  # 2 (substitute 'o' for 'a', 'o' for 'c')
print(Levenshtein.distance("Saturday", "Sunday")) # 3 (delete 'a', 't', 'u')
print(Levenshtein.distance("", "abc")) # 3 (insert 'a', 'b', 'c')
print(Levenshtein.distance("abc", "")) # 3 (delete 'a', 'b', 'c')
print(Levenshtein.distance("abc", "abc")) # 0 (no changes needed)

This library is also very fast, making it suitable for processing large lists of strings.

The Manual Way: Implementing the Algorithm (Dynamic Programming)

If you can't use external libraries or want to understand the underlying algorithm, you can implement it yourself using dynamic programming. The core idea is to build a matrix that stores the distances between all prefixes of the two strings.

Let's use "kitten" and "sitting" as an example.

（图片来源网络，侵删）

The Algorithm

Create a matrix with (len(s1) + 1) rows and (len(s2) + 1) columns.
Initialize the first row and column with their index values (0, 1, 2, ...). This represents the cost of converting an empty string to the target prefix.
Iterate through the matrix. For each cell (i, j):
- If the characters s1[i-1] and s2[j-1] are the same, the cost is 0. Otherwise, it's 1.
- The value of the cell is the minimum of three possible operations:
  - Deletion: matrix[i-1][j] + 1
  - Insertion: matrix[i][j-1] + 1
  - Substitution: matrix[i-1][j-1] + cost
The final Levenshtein distance is the value in the bottom-right cell of the matrix.

Python Implementation

Here is a clean and well-commented implementation.

def levenshtein_distance(s1, s2):
    """
    Calculates the Levenshtein distance between two strings.
    """
    # Get the lengths of the strings
    len_s1 = len(s1)
    len_s2 = len(s2)
    # Create a matrix to store distances
    # The matrix has len_s1 + 1 rows and len_s2 + 1 columns
    dp = [[0] * (len_s2 + 1) for _ in range(len_s1 + 1)]
    # Initialize the first row and column
    # The cost of converting an empty string to s2[0..j] is j insertions
    for i in range(len_s1 + 1):
        dp[i][0] = i
    # The cost of converting s1[0..i] to an empty string is i deletions
    for j in range(len_s2 + 1):
        dp[0][j] = j
    # Fill the rest of the matrix
    for i in range(1, len_s1 + 1):
        for j in range(1, len_s2 + 1):
            # Cost is 0 if characters are the same, 1 otherwise
            cost = 0 if s1[i - 1] == s2[j - 1] else 1
            # Find the minimum of deletion, insertion, and substitution
            dp[i][j] = min(
                dp[i - 1][j] + 1,      # Deletion
                dp[i][j - 1] + 1,      # Insertion
                dp[i - 1][j - 1] + cost # Substitution
            )
    # The final distance is in the bottom-right cell
    return dp[len_s1][len_s2]
# --- Usage ---
str1 = "kitten"
str2 = "sitting"
distance = levenshtein_distance(str1, str2)
print(f"The Levenshtein distance between '{str1}' and '{str2}' is: {distance}")
# Output: The Levenshtein distance between 'kitten' and 'sitting' is: 3

Practical Application: Fuzzy String Matching

The most common use case for Levenshtein distance is fuzzy string matching, where you want to find strings in a list that are "close" to a target string.

Here's how you can use the python-Levenshtein library to find the best match.

import Levenshtein
# A list of possible correct spellings or options
correct_options = [
    "apple", "banana", "cherry", "date", "elderberry",
    "fig", "grape", "honeydew", "kiwi", "lemon"
]
# A user input with a typo
user_input = "aple" # Missing 'p'
# Another example
user_input_2 = "bannanna" # Extra 'n'
# Another example
user_input_3 = "stawbery" # Wrong word entirely
def find_best_match(input_str, options):
    """
    Finds the best match for an input string from a list of options.
    """
    if not input_str or not options:
        return None
    # Calculate the distance for each option
    # The key=lambda item: item[1] sorts by the distance value
    # min() will find the tuple with the smallest second element (the distance)
    best_match = min(
        [(option, Levenshtein.distance(input_str, option)) for option in options],
        key=lambda item: item[1]
    )
    return best_match
# --- Find matches ---
match1 = find_best_match(user_input, correct_options)
match2 = find_best_match(user_input_2, correct_options)
match3 = find_best_match(user_input_3, correct_options)
print(f"Input: '{user_input}', Best Match: '{match1[0]}' (Distance: {match1[1]})")
print(f"Input: '{user_input_2}', Best Match: '{match2[0]}' (Distance: {match2[1]})")
print(f"Input: '{user_input_3}', Best Match: '{match3[0]}' (Distance: {match3[1]})")
# --- More advanced: Find all matches within a certain threshold ---
def find_all_matches_within_threshold(input_str, options, threshold):
    """
    Finds all options that are within a certain Levenshtein distance.
    """
    matches = []
    for option in options:
        distance = Levenshtein.distance(input_str, option)
        if distance <= threshold:
            matches.append((option, distance))
    # Sort by distance for better results
    matches.sort(key=lambda item: item[1])
    return matches
# Find all matches for "aple" with a distance of 1 or less
close_matches = find_all_matches_within_threshold(user_input, correct_options, 1)
print(f"\nClose matches for '{user_input}' (threshold <= 1):")
for match, dist in close_matches:
    print(f"- {match} (Distance: {dist})")

Summary: Which one should you use?

Method	Pros	Cons	Best for...
`python-Levenshtein`	Extremely fast, easy to use, standard industry choice.	Requires an external installation (`pip install`).	Almost all production use cases. Performance-critical applications.
Manual Implementation	No external dependencies, great for learning the algorithm.	Slower for large strings, more code to write and maintain.	Academic purposes, coding interviews, or environments where you cannot install packages.

Python如何实现Levenshtein距离算法？

The Easy Way: Using the `python-Levenshtein` Library

Installation

Usage

The Manual Way: Implementing the Algorithm (Dynamic Programming)

The Algorithm

Python Implementation

Practical Application: Fuzzy String Matching

Summary: Which one should you use?

99ANYc3cd6

宝马X3如何连接CarPlay？

java cas client

如何判断Linux中Python进程是否存在？

Java HashMap底层原理如何实现？

Java CKEditor如何配置？

Java Socket编程实例具体怎么实现？

Java序列化用Hessian有哪些优势？

Java如何高效解析SOAP XML？

Java中如何实现HMACSHA256？

Python json indent如何控制缩进？

Python字典如何转为列表？

三星J5108换外屏步骤有哪些？

创维50E780U刷机教程安全吗？

红米note2外屏自己换难不难？

java socket 线程池

Java内部类为何要用static？

Python如何实现Levenshtein距离算法？

The Easy Way: Using the python-Levenshtein Library

Installation

Usage

The Manual Way: Implementing the Algorithm (Dynamic Programming)

The Algorithm

Python Implementation

Practical Application: Fuzzy String Matching

Summary: Which one should you use?

相关推荐

Java Socket编程实例具体怎么实现？

The Easy Way: Using the `python-Levenshtein` Library