杰瑞科技汇

Python insertbatch如何高效批量插入数据?

Of course! The term "insertbatch" in Python most commonly refers to the executemany() method provided by database cursor objects. This is the standard and most efficient way to insert multiple rows into a database in a single operation.

Python insertbatch如何高效批量插入数据?-图1
(图片来源网络,侵删)

Let's break down what it is, why it's important, and how to use it with code examples for different database libraries.


The Core Concept: executemany()

When you need to insert hundreds or thousands of rows into a database, doing it one by one with a loop and a single execute() call for each row is very slow. This is because each execute() involves a network round-trip to the database server.

executemany() solves this problem. It takes a single SQL query with placeholders and a list of data tuples. It then sends the entire batch of data to the database in one go, which the database can process much more efficiently.

Key Advantages:

Python insertbatch如何高效批量插入数据?-图2
(图片来源网络,侵删)
  1. Performance: Drastically reduces the number of network round-trips, making your code orders of magnitude faster for bulk inserts.
  2. Simplicity: It's a clean and straightforward way to handle bulk operations.
  3. Atomicity (with some databases): In some database systems, executemany() can be part of a single transaction, meaning all rows are inserted together, or none are if an error occurs.

The General Syntax

The pattern is always the same:

# 1. Your SQL query with placeholders (%s for most databases)
sql = "INSERT INTO my_table (column1, column2) VALUES (%s, %s)"
# 2. A list of tuples, where each tuple is a row of data
data_to_insert = [
    ('value1_row1', 'value2_row1'),
    ('value1_row2', 'value2_row2'),
    ('value1_row3', 'value2_row3'),
    # ... potentially thousands more
]
# 3. Get a cursor from your database connection
# cursor = my_connection.cursor()
# 4. Execute the many command
# cursor.executemany(sql, data_to_insert)
# 5. Commit the transaction to save the changes
# my_connection.commit()
# 6. Close the cursor
# cursor.close()

Practical Examples with Popular Libraries

Here are complete, runnable examples for different database types.

SQLite (Built-in)

SQLite is great for examples because it's included with Python and doesn't require a separate server.

import sqlite3
# --- Setup: Create an in-memory database and a table ---
conn = sqlite3.connect(':memory:') # Use ':memory:' for a temporary DB or a file name like 'mydatabase.db'
cursor = conn.cursor()
cursor.execute('''
    CREATE TABLE users (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        email TEXT NOT NULL UNIQUE
    )
''')
# --- Batch Insert using executemany() ---
# Define the SQL query with placeholders
sql = "INSERT INTO users (name, email) VALUES (?, ?)" # SQLite uses ? for placeholders
# Define the data to insert as a list of tuples
users_data = [
    ('Alice', 'alice@example.com'),
    ('Bob', 'bob@example.com'),
    ('Charlie', 'charlie@example.com'),
    ('Diana', 'diana@example.com'),
]
# Execute the batch insert
try:
    cursor.executemany(sql, users_data)
    conn.commit() # Commit the transaction
    print(f"{cursor.rowcount} records inserted successfully into users table.")
except sqlite3.Error as e:
    print(f"An error occurred: {e}")
    conn.rollback() # Rollback in case of error
# --- Verify the data ---
cursor.execute("SELECT * FROM users")
rows = cursor.fetchall()
print("\nData in the table:")
for row in rows:
    print(row)
# --- Cleanup ---
cursor.close()
conn.close()

PostgreSQL (using psycopg2)

psycopg2 is the most popular PostgreSQL adapter for Python. You'll need to install it first: pip install psycopg2-binary.

import psycopg2
# --- Connection Details (replace with your own) ---
DB_NAME = "testdb"
DB_USER = "postgres"
DB_PASS = "yourpassword"
DB_HOST = "localhost"
DB_PORT = "5432"
try:
    # --- Setup: Connect and create a table ---
    conn = psycopg2.connect(database=DB_NAME, user=DB_USER, password=DB_PASS, host=DB_HOST, port=DB_PORT)
    cursor = conn.cursor()
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS products (
            id SERIAL PRIMARY KEY,
            product_name VARCHAR(255) NOT NULL,
            price NUMERIC(10, 2) NOT NULL
        )
    ''')
    # --- Batch Insert using executemany() ---
    # Define the SQL query with placeholders (%s is the standard for psycopg2)
    sql = "INSERT INTO products (product_name, price) VALUES (%s, %s)"
    # Define the data to insert
    products_data = [
        ('Laptop', 1200.50),
        ('Mouse', 25.00),
        ('Keyboard', 75.75),
        ('Monitor', 300.00),
    ]
    # Execute the batch insert
    cursor.executemany(sql, products_data)
    conn.commit() # Commit the transaction
    print(f"{cursor.rowcount} records inserted successfully into products table.")
except psycopg2.Error as e:
    print(f"Database error: {e}")
    if conn:
        conn.rollback() # Rollback in case of error
finally:
    # --- Cleanup ---
    if 'cursor' in locals() and cursor:
        cursor.close()
    if 'conn' in locals() and conn:
        conn.close()

MySQL (using mysql-connector-python)

mysql-connector-python is the official Oracle-maintained driver for MySQL. Install it with: pip install mysql-connector-python.

import mysql.connector
from mysql.connector import Error
# --- Connection Details (replace with your own) ---
DB_CONFIG = {
    'host': 'localhost',
    'user': 'your_mysql_user',
    'password': 'your_mysql_password',
    'database': 'testdb'
}
try:
    # --- Setup: Connect and create a table ---
    conn = mysql.connector.connect(**DB_CONFIG)
    cursor = conn.cursor()
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS employees (
            id INT AUTO_INCREMENT PRIMARY KEY,
            first_name VARCHAR(50) NOT NULL,
            last_name VARCHAR(50) NOT NULL,
            department VARCHAR(50)
        )
    ''')
    # --- Batch Insert using executemany() ---
    # Define the SQL query with placeholders (%s is the standard)
    sql = "INSERT INTO employees (first_name, last_name, department) VALUES (%s, %s, %s)"
    # Define the data to insert
    employees_data = [
        ('John', 'Doe', 'Engineering'),
        ('Jane', 'Smith', 'Marketing'),
        ('Peter', 'Jones', 'Sales'),
        ('Mary', 'Williams', 'HR'),
    ]
    # Execute the batch insert
    cursor.executemany(sql, employees_data)
    conn.commit() # Commit the transaction
    print(f"{cursor.rowcount} records inserted successfully into employees table.")
except Error as e:
    print(f"Error while connecting to MySQL or executing query: {e}")
    if conn and conn.is_connected():
        conn.rollback()
finally:
    # --- Cleanup ---
    if 'cursor' in locals() and cursor:
        cursor.close()
    if 'conn' in locals() and conn.is_connected():
        conn.close()

Advanced: The executemany "Hack" for Dictionaries

Sometimes your data is in a list of dictionaries instead of a list of tuples. You can still use executemany by providing a small helper function.

import sqlite3
# Assume 'conn' and 'cursor' are already established
# from the SQLite example above.
# Data as a list of dictionaries
data_as_dicts = [
    {'name': 'Eve', 'email': 'eve@example.com'},
    {'name': 'Frank', 'email': 'frank@example.com'},
]
# The SQL query
sql = "INSERT INTO users (name, email) VALUES (:name, :email)"
# The executemany method can accept a second argument that is an iterable
# of dictionaries. The keys of the dictionary are used to match the placeholders.
cursor.executemany(sql, data_as_dicts)
conn.commit()
print(f"{cursor.rowcount} records inserted from dictionary list.")
# --- Verify ---
cursor.execute("SELECT * FROM users")
for row in cursor.fetchall():
    print(row)
cursor.close()
conn.close()

Best Practices and Considerations

  1. Transaction Management: Always wrap your executemany() call in a try...except...finally block and use conn.commit() on success and conn.rollback() on failure. This ensures data integrity.
  2. Batch Size: For extremely large datasets (millions of rows), you might want to break your data into smaller batches (e.g., 1000 or 5000 rows at a time) to avoid overwhelming the database server's memory.
  3. Placeholders: Be careful with your placeholder syntax. It varies between database drivers:
    • SQLite, MySQL:
    • PostgreSQL, SQL Server: %s
    • Oracle: 1, 2, etc., or named placeholders like name.
  4. Alternatives: For very high-performance needs, some databases offer specialized tools:
    • PostgreSQL: The COPY command is significantly faster than executemany.
    • MySQL: The LOAD DATA INFILE command is the fastest way to load data from a file.
    • SQLAlchemy: The ORM provides a session.bulk_save_objects() method for efficient bulk inserts.
分享:
扫描分享到社交APP
上一篇
下一篇