杰瑞科技汇

python mysql encode

Of course! The topic of encoding with Python and MySQL is crucial for correctly handling data, especially non-English characters like Chinese, Arabic, or emojis. A mismatch in encoding between your Python application, the MySQL connection, and the database itself is the most common cause of garbled text (often called "mojibake").

python mysql encode-图1
(图片来源网络,侵删)

Here’s a comprehensive guide covering the key concepts, best practices, and code examples.


The Core Problem: Character Sets and Collations

In MySQL, text is stored using a character set (e.g., utf8, utf8mb4, latin1) and a collation (e.g., utf8mb4_unicode_ci, latin1_swedish_ci).

  • Character Set: Defines which characters are allowed (e.g., letters, numbers, symbols).
  • Collation: Defines how to compare and sort characters within a character set.

For modern applications, you almost always want to use utf8mb4.

Why utf8mb4 and not utf8?

This is a critical point. MySQL originally had a utf8 character set that only supported a maximum of 3 bytes per character. This is sufficient for most common languages but cannot store all Unicode characters, especially emojis (like 😊) and some rare CJK (Chinese, Japanese, Korean) characters.

python mysql encode-图2
(图片来源网络,侵删)
  • utf8: Max 3 bytes per character. Cannot store emojis.
  • utf8mb4: Max 4 bytes per character. A true superset of UTF-8. It can store every character in the Unicode standard, including emojis. You should always use utf8mb4 for new projects.

The Golden Rule: Consistency is Key

To avoid encoding issues, you must ensure the character set is the same in three places:

  1. The Database/Table/Column: The character set defined in your MySQL schema.
  2. The MySQL Connection: The character set used by your Python script to communicate with the MySQL server.
  3. Your Python File: The encoding of your Python source code file (.py).

If any of these three are out of sync, you will get garbled text.


Step-by-Step Guide to Proper Encoding

Let's walk through setting up a robust connection.

Step 1: Set the MySQL Character Set

First, ensure your database, tables, and text columns (VARCHAR, TEXT, etc.) are created with the utf8mb4 character set and a suitable collation like utf8mb4_unicode_ci.

python mysql encode-图3
(图片来源网络,侵删)

You can do this in your schema file or when creating the database:

-- Create the database with utf8mb4
CREATE DATABASE my_app_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- Use the database
USE my_app_db;
-- Create a table with utf8mb4 columns
CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    username VARCHAR(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
    bio TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
);

Step 2: Use a Modern MySQL Connector

The recommended library for connecting Python to MySQL is mysql-connector-python. It's actively maintained by Oracle and has excellent support for specifying connection parameters.

First, install it:

pip install mysql-connector-python

Step 3: Configure the Connection with charset

When you establish a connection, you must explicitly tell the connector to use utf8mb4. This is the most important step for preventing issues at the connection level.

import mysql.connector
from mysql.connector import Error
try:
    # The 'charset' parameter is the key here!
    connection = mysql.connector.connect(
        host='localhost',
        database='my_app_db',
        user='your_username',
        password='your_password',
        charset='utf8mb4'  # <-- IMPORTANT: Use utf8mb4 for full Unicode support
    )
    if connection.is_connected():
        db_info = connection.get_server_info()
        print(f"Successfully connected to MySQL Server version {db_info}")
        cursor = connection.cursor()
        cursor.execute("SELECT database();")
        record = cursor.fetchone()
        print(f"You're connected to database: {record[0]}")
except Error as e:
    print(f"Error while connecting to MySQL: {e}")
finally:
    # Closing the connection
    if 'connection' in locals() and connection.is_connected():
        cursor.close()
        connection.close()
        print("MySQL connection is closed.")

Step 4: Ensure Your Python File is Saved as UTF-8

Most modern code editors (like VS Code, PyCharm, Sublime Text) save files as UTF-8 by default. This is good practice. It ensures that if you hardcode strings with special characters in your Python script, they are correctly interpreted before being sent to the database.

# This string will be correctly handled if the file is saved as UTF-8
greeting = "你好,世界!"
emoji_text = "Hello with an emoji: 😊"

Complete Working Example

Here is a full script that demonstrates creating a table, inserting data with special characters and emojis, and retrieving it.

import mysql.connector
from mysql.connector import Error
def create_connection(host_name, user_name, user_password, db_name=None):
    """Create a database connection to a MySQL server."""
    connection = None
    try:
        connection = mysql.connector.connect(
            host=host_name,
            user=user_name,
            passwd=user_password,
            database=db_name,
            charset='utf8mb4'  # Ensure the connection uses utf8mb4
        )
        print("Connection to MySQL successful")
    except Error as e:
        print(f"The error '{e}' occurred")
    return connection
def execute_query(connection, query, data=None):
    """Execute a single query."""
    cursor = connection.cursor()
    try:
        if data:
            cursor.execute(query, data)
        else:
            cursor.execute(query)
        connection.commit()
        print("Query executed successfully")
    except Error as e:
        print(f"The error '{e}' occurred")
def execute_read_query(connection, query):
    """Execute a read query and fetch results."""
    cursor = connection.cursor()
    try:
        cursor.execute(query)
        result = cursor.fetchall()
        return result
    except Error as e:
        print(f"The error '{e}' occurred")
# --- Main Execution ---
if __name__ == "__main__":
    # --- Connection Details ---
    HOST = "localhost"
    USER = "your_username"
    PASSWORD = "your_password"
    DB_NAME = "my_app_db"
    # 1. Connect to the MySQL server (no database initially to create it)
    conn = create_connection(HOST, USER, PASSWORD)
    # 2. Create the database if it doesn't exist (using utf8mb4)
    create_db_query = "CREATE DATABASE IF NOT EXISTS my_app_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci"
    execute_query(conn, create_db_query)
    # 3. Close and reconnect to the specific database
    conn.close()
    conn = create_connection(HOST, USER, PASSWORD, DB_NAME)
    # 4. Create a table with utf8mb4 columns
    create_table_query = """
    CREATE TABLE IF NOT EXISTS users (
        id INT AUTO_INCREMENT PRIMARY KEY,
        name VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
        bio TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
        favorite_emoji VARCHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
    )
    """
    execute_query(conn, create_table_query)
    # 5. Insert data with special characters and emojis
    insert_user_query = "INSERT INTO users (name, bio, favorite_emoji) VALUES (%s, %s, %s)"
    user_data = (
        "张三",  # Chinese characters
        "这是一个关于Python和MySQL编码的测试。",  # More Chinese
        "😊"  # Emoji
    )
    execute_query(conn, insert_user_query, user_data)
    # 6. Retrieve and print the data to verify
    select_users_query = "SELECT id, name, bio, favorite_emoji FROM users"
    users = execute_read_query(conn, select_users_query)
    print("\n--- Retrieved Users ---")
    for user in users:
        # The data should be printed correctly
        print(f"ID: {user[0]}, Name: {user[1]}, Bio: {user[2]}, Emoji: {user[3]}")
    # 7. Close the connection
    conn.close()

Troubleshooting Common Issues

  • mysql.connector.errors.ProgrammingError: 1366 (HY000): Incorrect string value: This is the classic mojibake error. It means you are trying to insert a character that is not supported by the column's character set. Solution: Ensure your MySQL table columns are defined as CHARACTER SET utf8mb4 and your connection uses charset='utf8mb4'.

  • UnicodeEncodeError or UnicodeDecodeError in Python: This usually happens when there's a mismatch between what your Python script expects (UTF-8) and what the MySQL connection is sending/receiving (e.g., latin1). Solution: Double-check that your connection string includes charset='utf8mb4'.

  • Data is correct in the database but shows as in your application: This is almost always a display issue. The data is stored correctly, but the tool you're using to view the database (like a GUI or command-line client) is not configured to display utf8mb4. Check the client's connection settings and ensure it's also connecting with utf8mb4.

分享:
扫描分享到社交APP
上一篇
下一篇