杰瑞科技汇

Python如何连接Oracle并导出CSV数据?

Of course! Here’s a comprehensive guide on how to work with Oracle databases and CSV files using Python. We'll cover the most common scenarios:

  1. Reading from a CSV file and writing the data to an Oracle database.
  2. Querying an Oracle database and writing the results to a CSV file.

We'll use the following tools:

  • Python: The programming language.
  • Oracle Python Driver (cx_Oracle): The standard library to connect to an Oracle database.
  • Python's csv module: A built-in library for reading and writing CSV files.
  • Pandas (Optional but Recommended): A powerful data manipulation library that can simplify the process significantly.

Prerequisites

  1. Python Installed: Make sure you have Python on your system.
  2. Oracle Instant Client: You need Oracle's Instant Client to connect to the database. You don't need a full Oracle client installation.
    • Download: Go to the Oracle Instant Client Downloads page.
    • Choose your platform (e.g., Windows x64, Linux x64, macOS x64) and download the appropriate package (e.g., instantclient-basic-windows.x64-21c-21.9.0.0.0dbru.zip).
    • Install/Unzip: Unzip the downloaded file to a permanent location (e.g., C:\oracle\instantclient_21_9 on Windows or /opt/oracle/instantclient_21_9 on Linux).
    • Set PATH: Add the path to the Instant Client directory to your system's PATH environment variable. This allows Python to find the necessary libraries.
  3. cx_Oracle Library: Install the Python driver using pip.
    pip install cx_Oracle
  4. Pandas Library (Recommended): If you want to use the Pandas method, install it.
    pip install pandas

Method 1: Using Python's csv and cx_Oracle Modules

This method gives you fine-grained control and doesn't require any external libraries beyond cx_Oracle.

Scenario A: Load Data from CSV to Oracle

This script reads a CSV file row by row and inserts it into an Oracle table.

Sample CSV File (employees.csv) Let's assume you have a CSV file with the following structure. Note the header row.

employee_id,first_name,last_name,email, hire_date, salary
101,John,Doe,john.doe@example.com,2025-01-15,60000
102,Jane,Smith,jane.smith@example.com,2025-05-20,75000
103,Peter,Jones,peter.jones@example.com,2025-03-10,55000

Oracle Table You need a corresponding table in your Oracle database. The column names and data types must match.

CREATE TABLE employees (
    employee_id   NUMBER(5),
    first_name    VARCHAR2(50),
    last_name     VARCHAR2(50),
    email         VARCHAR2(100),
    hire_date     DATE,
    salary        NUMBER(10,2)
);

Python Script (load_csv_to_oracle.py)

import csv
import cx_Oracle
import datetime
# --- Configuration ---
# Replace with your Oracle connection details
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_DSN = "hostname:port/service_name" # e.g., "mydb.example.com:1521/ORCLPDB1"
# Path to your CSV file
CSV_FILE = "employees.csv"
# --- Main Logic ---
def load_data():
    connection = None
    cursor = None
    try:
        # Establish a connection to the Oracle database
        connection = cx_Oracle.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
        cursor = connection.cursor()
        print("Successfully connected to Oracle Database.")
        # Open the CSV file
        with open(CSV_FILE, 'r', newline='') as file:
            # csv.DictReader uses the first row of the file as keys for the dictionary
            csv_reader = csv.DictReader(file)
            # Prepare the SQL INSERT statement
            # Using a bind variable (:1, :2, etc.) is crucial for performance and security
            sql_insert = """
                INSERT INTO employees (employee_id, first_name, last_name, email, hire_date, salary)
                VALUES (:1, :2, :3, :4, TO_DATE(:5, 'YYYY-MM-DD'), :6)
            """
            # Loop through each row in the CSV file
            for row in csv_reader:
                # Convert data types if necessary
                # For example, ensure salary is a float and hire_date is a string in the correct format
                employee_id = int(row['employee_id'])
                first_name = row['first_name']
                last_name = row['last_name']
                email = row['email']
                hire_date_str = row['hire_date'] # CSV will read this as a string
                salary = float(row['salary'])
                # Create a tuple of values to be bound to the SQL statement
                # The order must match the placeholders in the SQL string
                data_to_insert = (employee_id, first_name, last_name, email, hire_date_str, salary)
                # Execute the SQL statement with the data
                cursor.execute(sql_insert, data_to_insert)
                print(f"Inserted: {data_to_insert}")
        # Commit the transaction to make the changes permanent
        connection.commit()
        print(f"\nSuccessfully loaded {cursor.rowcount} rows from {CSV_FILE} to the database.")
    except cx_Oracle.DatabaseError as e:
        error, = e.args
        print(f"Oracle Error {error.code}: {error.message}")
        if connection:
            connection.rollback() # Roll back in case of error
        print("Transaction rolled back.")
    except FileNotFoundError:
        print(f"Error: The file {CSV_FILE} was not found.")
    finally:
        # Ensure the cursor and connection are closed
        if cursor:
            cursor.close()
        if connection:
            connection.close()
        print("Database connection closed.")
if __name__ == "__main__":
    load_data()

Scenario B: Export Data from Oracle to CSV

This script queries an Oracle table and writes the results to a CSV file.

Python Script (export_oracle_to_csv.py)

import csv
import cx_Oracle
# --- Configuration ---
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_DSN = "hostname:port/service_name"
# Path to the output CSV file
CSV_FILE_OUTPUT = "employees_export.csv"
# --- Main Logic ---
def export_data():
    connection = None
    cursor = None
    try:
        # Establish a connection
        connection = cx_Oracle.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
        cursor = connection.cursor()
        print("Successfully connected to Oracle Database.")
        # Define the SQL query
        sql_query = "SELECT employee_id, first_name, last_name, email, hire_date, salary FROM employees"
        # Execute the query
        cursor.execute(sql_query)
        # Get the column names from the cursor description
        column_names = [desc[0] for desc in cursor.description]
        # Open the CSV file for writing
        with open(CSV_FILE_OUTPUT, 'w', newline='') as file:
            csv_writer = csv.writer(file)
            # Write the header row
            csv_writer.writerow(column_names)
            # Write the data rows
            csv_writer.writerows(cursor)
        print(f"\nSuccessfully exported data to {CSV_FILE_OUTPUT}")
    except cx_Oracle.DatabaseError as e:
        error, = e.args
        print(f"Oracle Error {error.code}: {error.message}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
    finally:
        if cursor:
            cursor.close()
        if connection:
            connection.close()
        print("Database connection closed.")
if __name__ == "__main__":
    export_data()

Method 2: Using the pandas Library (Recommended)

Pandas is fantastic for data tasks like this because it handles data types, connections, and conversions much more elegantly.

Scenario A: Load Data from CSV to Oracle using Pandas

import pandas as pd
import cx_Oracle
# --- Configuration ---
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_DSN = "hostname:port/service_name"
CSV_FILE = "employees.csv"
# --- Main Logic ---
def load_data_with_pandas():
    try:
        # 1. Read the CSV file into a pandas DataFrame
        # Pandas is smart about data types, but you might need to specify some
        df = pd.read_csv(CSV_FILE, parse_dates=['hire_date'])
        print("CSV file read successfully.")
        print(df.info()) # Check the data types
        # 2. Establish a connection to Oracle
        connection = cx_Oracle.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
        # 3. Use pandas' to_sql method to write the DataFrame to the database
        # if_exists='append': Adds data to the table if it exists.
        # if_exists='replace': Drops the table and recreates it.
        # if_exists='fail': Raises an error if the table exists.
        # index=False: Prevents pandas from writing the DataFrame index as a column.
        df.to_sql(
            'employees', 
            con=connection, 
            if_exists='append', 
            index=False,
            dtype={
                'employee_id': cx_Oracle.NUMBER,
                'first_name': cx_Oracle.VARCHAR2(50),
                'last_name': cx_Oracle.VARCHAR2(50),
                'email': cx_Oracle.VARCHAR2(100),
                'hire_date': cx_Oracle.DATE,
                'salary': cx_Oracle.NUMBER(10, 2)
            }
        )
        print(f"\nSuccessfully loaded {len(df)} rows into the 'employees' table.")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        if 'connection' in locals() and connection:
            connection.close()
            print("Database connection closed.")
if __name__ == "__main__":
    load_data_with_pandas()

Scenario B: Export Data from Oracle to CSV using Pandas

import pandas as pd
import cx_Oracle
# --- Configuration ---
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_DSN = "hostname:port/service_name"
CSV_FILE_OUTPUT = "employees_export_pandas.csv"
# --- Main Logic ---
def export_data_with_pandas():
    try:
        # 1. Establish a connection
        connection = cx_Oracle.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
        # 2. Use pandas' read_sql_query to execute a query and load results into a DataFrame
        sql_query = "SELECT employee_id, first_name, last_name, email, hire_date, salary FROM employees"
        df = pd.read_sql_query(sql_query, connection)
        print("Data queried successfully.")
        print(df.head())
        # 3. Write the DataFrame to a CSV file
        # index=False: Prevents writing the DataFrame index to the CSV.
        df.to_csv(CSV_FILE_OUTPUT, index=False)
        print(f"\nSuccessfully exported data to {CSV_FILE_OUTPUT}")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        if 'connection' in locals() and connection:
            connection.close()
            print("Database connection closed.")
if __name__ == "__main__":
    export_data_with_pandas()

Summary and Recommendation

Feature csv + cx_Oracle pandas + cx_Oracle
Ease of Use More manual, requires explicit data type conversion. Very easy. Pandas infers and handles data types well.
Performance Good for row-by-row operations. Can be slow for millions of rows if not using bulk operations. Generally faster for large datasets due to optimized C backend and efficient bulk operations.
Flexibility High. You have full control over the SQL execution loop. High. Easy to filter, transform data within the DataFrame before writing.
Dependencies Minimal (only cx_Oracle). Requires pandas and cx_Oracle.

Recommendation:

  • For simple scripts, small files, or when you need maximum control, the pure Python csv and cx_Oracle method is perfectly fine.
  • For any serious data manipulation, larger files, or for more readable and concise code, the Pandas method is highly recommended. It is the industry standard for data-related tasks in Python.
分享:
扫描分享到社交APP
上一篇
下一篇