Of course! Here’s a comprehensive guide on how to work with Oracle databases and CSV files using Python. We'll cover the most common scenarios:
- Reading from a CSV file and writing the data to an Oracle database.
- Querying an Oracle database and writing the results to a CSV file.
We'll use the following tools:
- Python: The programming language.
- Oracle Python Driver (
cx_Oracle): The standard library to connect to an Oracle database. - Python's
csvmodule: A built-in library for reading and writing CSV files. - Pandas (Optional but Recommended): A powerful data manipulation library that can simplify the process significantly.
Prerequisites
- Python Installed: Make sure you have Python on your system.
- Oracle Instant Client: You need Oracle's Instant Client to connect to the database. You don't need a full Oracle client installation.
- Download: Go to the Oracle Instant Client Downloads page.
- Choose your platform (e.g., Windows x64, Linux x64, macOS x64) and download the appropriate package (e.g.,
instantclient-basic-windows.x64-21c-21.9.0.0.0dbru.zip). - Install/Unzip: Unzip the downloaded file to a permanent location (e.g.,
C:\oracle\instantclient_21_9on Windows or/opt/oracle/instantclient_21_9on Linux). - Set
PATH: Add the path to the Instant Client directory to your system'sPATHenvironment variable. This allows Python to find the necessary libraries.
cx_OracleLibrary: Install the Python driver using pip.pip install cx_Oracle
- Pandas Library (Recommended): If you want to use the Pandas method, install it.
pip install pandas
Method 1: Using Python's csv and cx_Oracle Modules
This method gives you fine-grained control and doesn't require any external libraries beyond cx_Oracle.
Scenario A: Load Data from CSV to Oracle
This script reads a CSV file row by row and inserts it into an Oracle table.
Sample CSV File (employees.csv)
Let's assume you have a CSV file with the following structure. Note the header row.
employee_id,first_name,last_name,email, hire_date, salary 101,John,Doe,john.doe@example.com,2025-01-15,60000 102,Jane,Smith,jane.smith@example.com,2025-05-20,75000 103,Peter,Jones,peter.jones@example.com,2025-03-10,55000
Oracle Table You need a corresponding table in your Oracle database. The column names and data types must match.
CREATE TABLE employees (
employee_id NUMBER(5),
first_name VARCHAR2(50),
last_name VARCHAR2(50),
email VARCHAR2(100),
hire_date DATE,
salary NUMBER(10,2)
);
Python Script (load_csv_to_oracle.py)
import csv
import cx_Oracle
import datetime
# --- Configuration ---
# Replace with your Oracle connection details
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_DSN = "hostname:port/service_name" # e.g., "mydb.example.com:1521/ORCLPDB1"
# Path to your CSV file
CSV_FILE = "employees.csv"
# --- Main Logic ---
def load_data():
connection = None
cursor = None
try:
# Establish a connection to the Oracle database
connection = cx_Oracle.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
cursor = connection.cursor()
print("Successfully connected to Oracle Database.")
# Open the CSV file
with open(CSV_FILE, 'r', newline='') as file:
# csv.DictReader uses the first row of the file as keys for the dictionary
csv_reader = csv.DictReader(file)
# Prepare the SQL INSERT statement
# Using a bind variable (:1, :2, etc.) is crucial for performance and security
sql_insert = """
INSERT INTO employees (employee_id, first_name, last_name, email, hire_date, salary)
VALUES (:1, :2, :3, :4, TO_DATE(:5, 'YYYY-MM-DD'), :6)
"""
# Loop through each row in the CSV file
for row in csv_reader:
# Convert data types if necessary
# For example, ensure salary is a float and hire_date is a string in the correct format
employee_id = int(row['employee_id'])
first_name = row['first_name']
last_name = row['last_name']
email = row['email']
hire_date_str = row['hire_date'] # CSV will read this as a string
salary = float(row['salary'])
# Create a tuple of values to be bound to the SQL statement
# The order must match the placeholders in the SQL string
data_to_insert = (employee_id, first_name, last_name, email, hire_date_str, salary)
# Execute the SQL statement with the data
cursor.execute(sql_insert, data_to_insert)
print(f"Inserted: {data_to_insert}")
# Commit the transaction to make the changes permanent
connection.commit()
print(f"\nSuccessfully loaded {cursor.rowcount} rows from {CSV_FILE} to the database.")
except cx_Oracle.DatabaseError as e:
error, = e.args
print(f"Oracle Error {error.code}: {error.message}")
if connection:
connection.rollback() # Roll back in case of error
print("Transaction rolled back.")
except FileNotFoundError:
print(f"Error: The file {CSV_FILE} was not found.")
finally:
# Ensure the cursor and connection are closed
if cursor:
cursor.close()
if connection:
connection.close()
print("Database connection closed.")
if __name__ == "__main__":
load_data()
Scenario B: Export Data from Oracle to CSV
This script queries an Oracle table and writes the results to a CSV file.
Python Script (export_oracle_to_csv.py)
import csv
import cx_Oracle
# --- Configuration ---
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_DSN = "hostname:port/service_name"
# Path to the output CSV file
CSV_FILE_OUTPUT = "employees_export.csv"
# --- Main Logic ---
def export_data():
connection = None
cursor = None
try:
# Establish a connection
connection = cx_Oracle.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
cursor = connection.cursor()
print("Successfully connected to Oracle Database.")
# Define the SQL query
sql_query = "SELECT employee_id, first_name, last_name, email, hire_date, salary FROM employees"
# Execute the query
cursor.execute(sql_query)
# Get the column names from the cursor description
column_names = [desc[0] for desc in cursor.description]
# Open the CSV file for writing
with open(CSV_FILE_OUTPUT, 'w', newline='') as file:
csv_writer = csv.writer(file)
# Write the header row
csv_writer.writerow(column_names)
# Write the data rows
csv_writer.writerows(cursor)
print(f"\nSuccessfully exported data to {CSV_FILE_OUTPUT}")
except cx_Oracle.DatabaseError as e:
error, = e.args
print(f"Oracle Error {error.code}: {error.message}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
finally:
if cursor:
cursor.close()
if connection:
connection.close()
print("Database connection closed.")
if __name__ == "__main__":
export_data()
Method 2: Using the pandas Library (Recommended)
Pandas is fantastic for data tasks like this because it handles data types, connections, and conversions much more elegantly.
Scenario A: Load Data from CSV to Oracle using Pandas
import pandas as pd
import cx_Oracle
# --- Configuration ---
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_DSN = "hostname:port/service_name"
CSV_FILE = "employees.csv"
# --- Main Logic ---
def load_data_with_pandas():
try:
# 1. Read the CSV file into a pandas DataFrame
# Pandas is smart about data types, but you might need to specify some
df = pd.read_csv(CSV_FILE, parse_dates=['hire_date'])
print("CSV file read successfully.")
print(df.info()) # Check the data types
# 2. Establish a connection to Oracle
connection = cx_Oracle.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
# 3. Use pandas' to_sql method to write the DataFrame to the database
# if_exists='append': Adds data to the table if it exists.
# if_exists='replace': Drops the table and recreates it.
# if_exists='fail': Raises an error if the table exists.
# index=False: Prevents pandas from writing the DataFrame index as a column.
df.to_sql(
'employees',
con=connection,
if_exists='append',
index=False,
dtype={
'employee_id': cx_Oracle.NUMBER,
'first_name': cx_Oracle.VARCHAR2(50),
'last_name': cx_Oracle.VARCHAR2(50),
'email': cx_Oracle.VARCHAR2(100),
'hire_date': cx_Oracle.DATE,
'salary': cx_Oracle.NUMBER(10, 2)
}
)
print(f"\nSuccessfully loaded {len(df)} rows into the 'employees' table.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
if 'connection' in locals() and connection:
connection.close()
print("Database connection closed.")
if __name__ == "__main__":
load_data_with_pandas()
Scenario B: Export Data from Oracle to CSV using Pandas
import pandas as pd
import cx_Oracle
# --- Configuration ---
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_DSN = "hostname:port/service_name"
CSV_FILE_OUTPUT = "employees_export_pandas.csv"
# --- Main Logic ---
def export_data_with_pandas():
try:
# 1. Establish a connection
connection = cx_Oracle.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
# 2. Use pandas' read_sql_query to execute a query and load results into a DataFrame
sql_query = "SELECT employee_id, first_name, last_name, email, hire_date, salary FROM employees"
df = pd.read_sql_query(sql_query, connection)
print("Data queried successfully.")
print(df.head())
# 3. Write the DataFrame to a CSV file
# index=False: Prevents writing the DataFrame index to the CSV.
df.to_csv(CSV_FILE_OUTPUT, index=False)
print(f"\nSuccessfully exported data to {CSV_FILE_OUTPUT}")
except Exception as e:
print(f"An error occurred: {e}")
finally:
if 'connection' in locals() and connection:
connection.close()
print("Database connection closed.")
if __name__ == "__main__":
export_data_with_pandas()
Summary and Recommendation
| Feature | csv + cx_Oracle |
pandas + cx_Oracle |
|---|---|---|
| Ease of Use | More manual, requires explicit data type conversion. | Very easy. Pandas infers and handles data types well. |
| Performance | Good for row-by-row operations. Can be slow for millions of rows if not using bulk operations. | Generally faster for large datasets due to optimized C backend and efficient bulk operations. |
| Flexibility | High. You have full control over the SQL execution loop. | High. Easy to filter, transform data within the DataFrame before writing. |
| Dependencies | Minimal (only cx_Oracle). |
Requires pandas and cx_Oracle. |
Recommendation:
- For simple scripts, small files, or when you need maximum control, the pure Python
csvandcx_Oraclemethod is perfectly fine. - For any serious data manipulation, larger files, or for more readable and concise code, the Pandas method is highly recommended. It is the industry standard for data-related tasks in Python.
