huffman code python-杰瑞科技汇

Of course! Here is a complete guide to implementing Huffman coding in Python, including detailed explanations of the concepts, the code, and a full example.

（图片来源网络，侵删）

What is Huffman Coding?

Huffman coding is a popular lossless data compression algorithm. Its core idea is to assign variable-length codes to input characters. The key principle is:

More frequent characters get shorter codes, and less frequent characters get longer codes.

This is based on the idea that the total number of bits used to represent a message will be minimized if the most common symbols use the fewest bits.

To achieve this, Huffman coding uses a binary tree, often called a Huffman Tree.

（图片来源网络，侵删）

How It Works: The Algorithm

The process has two main phases: Encoding and Decoding.

Phase 1: Building the Huffman Tree (The Encoding Process)

Count Frequencies: First, count the frequency of each character in the input data.
Create Leaf Nodes: Create a node (or a leaf) for each character, associating it with its frequency.
Build the Tree: a. Create a priority queue (a min-heap is perfect for this) and add all the leaf nodes to it. The priority will be the node's frequency. b. While there is more than one node in the queue: i. Remove the two nodes with the lowest frequency from the queue. ii. Create a new internal node with these two nodes as its children. The frequency of this new node is the sum of its children's frequencies. iii. Add the new internal node back into the priority queue. c. The last remaining node in the queue is the root of the Huffman Tree.
Generate Codes: Traverse the Huffman Tree from the root to every leaf node.
- When you go left, append a 0 to the code.
- When you go right, append a 1 to the code.
- The code for a character is the bit string you've built when you reach its leaf node.

Phase 2: Decoding

Decoding is straightforward if you have the Huffman Tree and the encoded bit string.

Start at the root of the Huffman Tree.
Iterate through the encoded bit string, one bit at a time.
- If the bit is 0, move to the left child.
- If the bit is 1, move to the right child.
When you reach a leaf node, you've found a character. Append this character to your output.
Return to the root and repeat the process with the next bit in the string until you've processed the entire bit string.

Python Implementation

We'll use Python's built-in heapq module, which provides an efficient implementation of a min-heap.

Data Structures

We need a way to store nodes in our priority queue. We'll create a simple Node class to represent characters and their frequencies.

import heapq
class Node:
    """A node in the Huffman Tree."""
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None
    # For comparison purposes in the heap
    def __lt__(self, other):
        return self.freq < other.freq

The __lt__ (less than) method is essential. heapq needs to know how to compare two Node objects to maintain the heap order. We'll compare them based on their frequency.

The Huffman Coding Class

Let's encapsulate all the logic in a clean, reusable class.

import heapq
class HuffmanCoding:
    def __init__(self):
        # A dictionary to store the Huffman codes for each character
        self.heap = []
        self.codes = {}
        self.reverse_mapping = {}
    # Phase 1: Building the Huffman Tree
    def _make_frequency_dict(self, text):
        """Counts the frequency of each character in the text."""
        frequency = {}
        for char in text:
            if char not in frequency:
                frequency[char] = 0
            frequency[char] += 1
        return frequency
    def _make_heap(self, frequency):
        """Creates a min-heap from the frequency dictionary."""
        for key in frequency:
            node = Node(key, frequency[key])
            heapq.heappush(self.heap, node)
    def _merge_nodes(self):
        """Builds the Huffman Tree by merging nodes from the heap."""
        while len(self.heap) > 1:
            node1 = heapq.heappop(self.heap)
            node2 = heapq.heappop(self.heap)
            merged = Node(None, node1.freq + node2.freq)
            merged.left = node1
            merged.right = node2
            heapq.heappush(self.heap, merged)
    def _make_codes_helper(self, root, current_code):
        """A recursive helper to generate codes from the Huffman Tree."""
        if root is None:
            return
        # If it's a leaf node, we've found a character
        if root.char is not None:
            self.codes[root.char] = current_code
            self.reverse_mapping[current_code] = root.char
            return
        # Traverse left and right, appending 0 or 1 to the code
        self._make_codes_helper(root.left, current_code + "0")
        self._make_codes_helper(root.right, current_code + "1")
    def _make_codes(self):
        """Generates the Huffman codes from the root of the tree."""
        root = heapq.heappop(self.heap)
        current_code = ""
        self._make_codes_helper(root, current_code)
    # Phase 2: Encoding and Decoding
    def get_encoded_text(self, text):
        """Encodes the input text using the generated Huffman codes."""
        encoded_text = ""
        for char in text:
            encoded_text += self.codes[char]
        return encoded_text
    def get_padded_encoded_text(self, encoded_text):
        """Pads the encoded text to make its length a multiple of 8."""
        padding_amount = 8 - (len(encoded_text) % 8)
        if padding_amount == 8:
            padding_amount = 0
        # The last byte stores the amount of padding
        padded_info = "{0:08b}".format(padding_amount)
        padded_encoded_text = padded_info + encoded_text + ("0" * padding_amount)
        return padded_encoded_text
    def get_byte_array(self, padded_encoded_text):
        """Converts the padded binary string into a byte array."""
        b = bytearray()
        for i in range(0, len(padded_encoded_text), 8):
            byte = padded_encoded_text[i:i+8]
            b.append(int(byte, 2))
        return b
    def compress(self, text):
        """Main function to compress the text."""
        frequency = self._make_frequency_dict(text)
        self._make_heap(frequency)
        self._merge_nodes()
        self._make_codes()
        encoded_text = self.get_encoded_text(text)
        padded_encoded_text = self.get_padded_encoded_text(encoded_text)
        byte_array = self.get_byte_array(padded_encoded_text)
        return byte_array
    def remove_padding(self, padded_encoded_text):
        """Removes the padding added during compression."""
        padding_info = padded_encoded_text[:8]
        padding_amount = int(padding_info, 2)
        # Remove the padding info and the padding itself
        encoded_text = padded_encoded_text[8:-padding_amount]
        return encoded_text
    def decode_text(self, encoded_text):
        """Decodes the encoded text using the reverse mapping."""
        current_code = ""
        decoded_text = ""
        for bit in encoded_text:
            current_code += bit
            if current_code in self.reverse_mapping:
                char = self.reverse_mapping[current_code]
                decoded_text += char
                current_code = ""
        return decoded_text
    def decompress(self, byte_array):
        """Main function to decompress the byte array."""
        # Convert byte array back to a binary string
        bit_string = ""
        for byte in byte_array:
            bits = bin(byte)[2:].rjust(8, '0')
            bit_string += bits
        # Remove padding and decode
        encoded_text = self.remove_padding(bit_string)
        decompressed_text = self.decode_text(encoded_text)
        return decompressed_text

Putting It All Together: A Full Example

Here's how you would use the HuffmanCoding class.

# --- Main execution ---
if __name__ == '__main__':
    # The text to be compressed
    text = "Huffman coding is a data compression algorithm."
    print("Original Text:")
    print(text)
    print("-" * 30)
    # --- Compression ---
    huffman = HuffmanCoding()
    compressed_data = huffman.compress(text)
    print("Compressed Data (as a byte array):")
    print(compressed_data)
    print("-" * 30)
    print("Huffman Codes:")
    for char, code in huffman.codes.items():
        print(f"'{char}': {code}")
    print("-" * 30)
    # --- Decompression ---
    # To decompress, we need the same HuffmanCoding instance
    # because it holds the reverse_mapping dictionary.
    # In a real application, you would need to save this mapping
    # along with the compressed data.
    decompressed_text = huffman.decompress(compressed_data)
    print("Decompressed Text:")
    print(decompressed_text)
    print("-" * 30)
    # Verification
    if text == decompressed_text:
        print("Success! The original and decompressed texts match.")
    else:
        print("Error! The texts do not match.")

Important Considerations for a Real-World Application

The example above works perfectly, but to use Huffman coding for file compression, you need to handle one more thing: saving the Huffman tree (or codes) to the compressed file. Without it, you cannot decompress the data.

The common approach is to serialize the reverse_mapping dictionary (or the tree structure itself) and prepend it to the compressed byte array.

Here's a conceptual idea of how you might save and load the codes:

import pickle # For serializing Python objects
def save_to_file(filename, byte_array, reverse_mapping):
    """Saves the compressed data and the Huffman codes to a file."""
    # Serialize the codes dictionary
    serialized_codes = pickle.dumps(reverse_mapping)
    # Write the length of the codes data first, then the codes, then the data
    with open(filename, 'wb') as f:
        # Write the length of the serialized codes
        f.write(len(serialized_codes).to_bytes(4, 'big'))
        # Write the serialized codes
        f.write(serialized_codes)
        # Write the compressed data
        f.write(byte_array)
def load_from_file(filename):
    """Loads the compressed data and Huffman codes from a file."""
    with open(filename, 'rb') as f:
        # Read the length of the serialized codes
        codes_length = int.from_bytes(f.read(4), 'big')
        # Read the serialized codes
        serialized_codes = f.read(codes_length)
        # Deserialize the codes
        reverse_mapping = pickle.loads(serialized_codes)
        # Read the rest of the file as the compressed data
        byte_array = f.read()
    return byte_array, reverse_mapping
# You would then integrate this into your main program:
# 1. Compress text -> get byte_array and reverse_mapping
# 2. save_to_file('file.huf', byte_array, reverse_mapping)
#
# 3. To decompress:
#    byte_array, reverse_mapping = load_from_file('file.huf')
#    huffman = HuffmanCoding()
#    huffman.reverse_mapping = reverse_mapping # Load the codes
#    decompressed_text = huffman.decompress(byte_array)

huffman code python

What is Huffman Coding?

How It Works: The Algorithm

Phase 1: Building the Huffman Tree (The Encoding Process)

Phase 2: Decoding

Python Implementation

Data Structures

The Huffman Coding Class

Putting It All Together: A Full Example

Important Considerations for a Real-World Application

99ANYc3cd6

python rabbitmq 广播

java access 2025

如何快速用PS做出专业证件照？

Python pymssql连接为何总报GBK错误？

android java xml

ubuntu安装教程16.04

Win10系统U盘安装，步骤详细吗？

Adobe Director教程如何入门？

Java DOM如何高效读取XML文件？

CorelDRAW X6教程哪里下载靠谱？

Eclipse for Java 1.7 还能用吗？

Python HTMLParser如何实现网页下载？

如何重装Win7系统？

Python编码声明 -- coding: utf-8 --有何用？

Python staticmethod 与普通方法有何区别？

PowerPoint 2007教程从哪学？新手入门指南？

huffman code python

What is Huffman Coding?

How It Works: The Algorithm

Phase 1: Building the Huffman Tree (The Encoding Process)

Phase 2: Decoding

Python Implementation

Data Structures

The Huffman Coding Class

Putting It All Together: A Full Example

Important Considerations for a Real-World Application

相关推荐

ubuntu安装教程16.04