杰瑞科技汇

Python mongoimport如何高效导入数据?

Of course! The term "python mongoimport" can mean two different things, so I'll cover both:

Python mongoimport如何高效导入数据?-图1
(图片来源网络,侵删)
  1. Using the mongoimport command-line tool from within a Python script. This is the most common meaning and is very useful for automating database imports.
  2. Using the pymongo library to import data directly from a Python script. This is a more "Pythonic" approach that gives you more control but requires different logic.

Using the mongoimport Command-Line Tool from Python

This approach involves executing the mongoimport executable as a subprocess from your Python code. It's ideal when you already have data in a format that mongoimport can handle (like JSON or CSV) and you want to leverage its power and speed.

Prerequisites

  1. MongoDB Installed: You must have the MongoDB Database Tools installed on your system. The mongoimport utility comes with this package.
  2. Python's subprocess module: This is a standard library module, so no installation is needed.

How to Do It

The key is to use Python's subprocess.run() function, which is the modern, recommended way to run external commands.

Here’s a complete, runnable example.

Step 1: Create a sample data file.

Python mongoimport如何高效导入数据?-图2
(图片来源网络,侵删)

Let's create a JSON file named users.json that we want to import.

// users.json
[
  {
    "_id": 1,
    "name": "Alice",
    "email": "alice@example.com",
    "age": 30,
    "city": "New York"
  },
  {
    "_id": 2,
    "name": "Bob",
    "email": "bob@example.com",
    "age": 25,
    "city": "London"
  },
  {
    "_id": 3,
    "name": "Charlie",
    "email": "charlie@example.com",
    "age": 35,
    "city": "Tokyo"
  }
]

Step 2: Write the Python script to run mongoimport.

This script will construct the command and execute it.

import subprocess
import os
# --- Configuration ---
# IMPORTANT: Replace with your actual MongoDB connection details
mongo_host = "localhost"
mongo_port = 27017
db_name = "my_database"
collection_name = "users"
file_path = "users.json" # Path to your data file
# --- Construct the mongoimport command ---
# We use a list of arguments, which is safer than a single string
# as it handles spaces and special characters automatically.
command = [
    "mongoimport",
    "--uri", f"mongodb://{mongo_host}:{mongo_port}",
    "--db", db_name,
    "--collection", collection_name,
    "--file", file_path,
    "--mode", "upsert" # Other modes: insert, merge, delete
]
print(f"Executing command: {' '.join(command)}")
# --- Run the command ---
try:
    # Execute the command and wait for it to complete
    result = subprocess.run(command, check=True, capture_output=True, text=True)
    # Print the output from mongoimport
    print("\n--- mongoimport Output ---")
    print("STDOUT:", result.stdout)
    print("STDERR:", result.stderr)
    print("\n✅ Import successful!")
except FileNotFoundError:
    print("❌ Error: 'mongoimport' command not found.")
    print("Please ensure MongoDB Database Tools are installed and in your system's PATH.")
except subprocess.CalledProcessError as e:
    print("❌ Error during mongoimport execution.")
    print(f"Return code: {e.returncode}")
    print("STDOUT:", e.stdout)
    print("STDERR:", e.stderr)

Explanation of Key mongoimport Options

  • --uri: The connection string for your MongoDB instance.
  • --db: The target database.
  • --collection: The target collection.
  • --file: The path to the input file (JSON, CSV, or TSV).
  • --mode: How to handle existing documents.
    • insert (default): Fails if a document with the same _id already exists.
    • upsert: Updates a document if it exists, or inserts a new one if it doesn't.
    • merge: Maves fields from the imported document into existing documents.
    • replace: Replaces the entire existing document.
  • --type: Specifies the file format (json, csv, tsl). Optional, as mongoimport often infers it.
  • --headerline: (For CSV/TSV) Treats the first row of the file as the field names.

Using pymongo for Direct Data Import

This approach uses the official Python driver for MongoDB. Instead of calling an external tool, your Python script reads the data and uses the insert_many() or bulk_insert() methods to load it into the database. This is more integrated but requires you to parse the data file yourself.

Python mongoimport如何高效导入数据?-图3
(图片来源网络,侵删)

Prerequisites

  1. MongoDB Server: Running and accessible.
  2. Python pymongo library: Install it using pip.
    pip install pymongo

How to Do It

Here's how to import the same users.json file using pymongo.

import pymongo
import json
from pymongo import MongoClient
# --- Configuration ---
mongo_host = "localhost"
mongo_port = 27017
db_name = "my_database"
collection_name = "users_pymongo" # Using a different collection name to avoid conflict
file_path = "users.json"
# --- Connect to MongoDB ---
try:
    client = MongoClient(host=mongo_host, port=mongo_port)
    db = client[db_name]
    collection = db[collection_name]
    # Read the JSON data from the file
    with open(file_path, 'r') as file:
        data_to_insert = json.load(file)
    # Use insert_many() for a list of documents
    # This is the most efficient way to insert multiple documents
    if data_to_insert: # Ensure the list is not empty
        result = collection.insert_many(data_to_insert)
        print(f"✅ Successfully inserted {len(result.inserted_ids)} documents.")
        print("Inserted IDs:", result.inserted_ids)
    else:
        print("❌ No data to insert.")
    # Close the MongoDB connection
    client.close()
except FileNotFoundError:
    print(f"❌ Error: The file '{file_path}' was not found.")
except pymongo.errors.PyMongoError as e:
    print(f"❌ A MongoDB error occurred: {e}")
except Exception as e:
    print(f"❌ An unexpected error occurred: {e}")

Comparison: subprocess.run(mongoimport) vs. pymongo

Feature subprocess.run(mongoimport) pymongo
How it Works Calls an external command-line tool. Uses a native Python library.
Performance Very fast. The tool is highly optimized for bulk imports. Slower for large files. Data is parsed in Python and sent over the network one batch at a time.
Data Format Limited to formats mongoimport supports (JSON, CSV). Any format. You can read from Excel, XML, or any other source if you can parse it in Python.
Error Handling Relies on the exit code and stderr of the mongoimport tool. Uses Python's try...except blocks with specific pymongo error types.
Dependencies Requires MongoDB Database Tools to be installed on the system. Requires only the pymongo Python library.
Use Case - Automating imports of standard files (JSON, CSV).
- Loading large datasets where speed is critical.
- Integrating import logic into a larger Python application.
- Importing data from non-standard sources.
- Needing fine-grained control over the import process (e.g., transforming data before inserting).

Conclusion

  • For speed and simplicity when dealing with standard files, use subprocess.run() with mongoimport.
  • For integration, flexibility, and control within a Python application, use pymongo.
分享:
扫描分享到社交APP
上一篇
下一篇