杰瑞科技汇

Python pythondocx如何操作Word文档?

Of course! python-docx is a fantastic and popular library in Python for creating and modifying Microsoft Word (.docx) files. It's not designed for reading complex, formatted documents, but it excels at generating documents from scratch or populating templates with data.

Python pythondocx如何操作Word文档?-图1
(图片来源网络,侵删)

Here's a comprehensive guide covering everything from installation to advanced features.


Installation

First, you need to install the library. Open your terminal or command prompt and run:

pip install python-docx

Core Concepts: The Document Object Model (DOM)

Before you start coding, it's helpful to understand how python-docx structures a Word document. A .docx file is essentially a ZIP archive containing XML files. python-docx gives you a Pythonic way to interact with this structure.

The main components are:

Python pythondocx如何操作Word文档?-图2
(图片来源网络,侵删)
  • Document: This is the top-level object. You can think of it as the entire Word file.
  • Block Elements: These are the large, structural parts of your document.
    • Paragraph: A block of text. This is the most common element you'll work with.
    • Table: A structured table with rows and cells.
  • Inline Elements: These are elements that exist within a paragraph.
    • Run: A contiguous run of text with the same formatting (e.g., bold, italic, font size). You apply styles to a Run, not directly to a Paragraph. This is a crucial concept.
  • Document Properties: Metadata about the file, like the author, title, and subject.

Creating a New Document from Scratch

Let's start by creating a simple, styled document.

from docx import Document
from docx.shared import Pt, RGBColor, Inches
from docx.enum.text import WD_ALIGN_PARAGRAPH
# 1. Create a new Document object
doc = Document()
# 2. Add a heading to the document
# The level can be 0-9 (0 is the standard "Title" style)
doc.add_heading('Document Title', level=0)
# 3. Add a paragraph
p = doc.add_paragraph('This is a paragraph with some initial text. ')
# 4. Add a second paragraph with bold and italic text
p.add_run('This is a bold run. ').bold = True
p.add_run('And this is an italic run. ').italic = True
# 5. Add a left-aligned paragraph
p_left = doc.add_paragraph('This is a left-aligned paragraph.')
p_left.alignment = WD_ALIGN_PARAGRAPH.LEFT
# 6. Add a right-aligned paragraph
p_right = doc.add_paragraph('This is a right-aligned paragraph.')
p_right.alignment = WD_ALIGN_PARAGRAPH.RIGHT
# 7. Add a centered paragraph
p_center = doc.add_paragraph('This is a centered paragraph.')
p_center.alignment = WD_ALIGN_PARAGRAPH.CENTER
# 8. Add a paragraph with custom font size and color
p_custom = doc.add_paragraph('This text has a custom style.')
run = p_custom.add_run('Custom font size and color. ')
run.font.size = Pt(14)  # 14 points
run.font.color.rgb = RGBColor(0x42, 0x24, 0xE9) # A nice blue color
# 9. Add a page break
doc.add_page_break()
# 10. Add a table
table = doc.add_table(rows=3, cols=3)
table.style = 'Table Grid' # Apply a built-in style
# Populate the table
for i in range(3):
    for j in range(3):
        cell = table.cell(i, j)
        cell.text = f'Row {i+1}, Col {j+1}'
# 11. Save the document
doc.save('my_new_document.docx')
print("Document 'my_new_document.docx' created successfully!")

Working with an Existing Document

You can also open an existing .docx file to read its contents or modify it.

from docx import Document
# Open an existing document
doc = Document('my_new_document.docx') # Use the file from the previous example
# --- Reading Content ---
print("\n--- Reading Document Content ---")
for paragraph in doc.paragraphs:
    # Check if the paragraph is empty (contains only whitespace)
    if paragraph.text.strip():
        print(f"Paragraph Text: '{paragraph.text}'")
# --- Modifying Content ---
print("\n--- Modifying Document Content ---")
# Find the first paragraph and change its text
if doc.paragraphs:
    first_paragraph = doc.paragraphs[0]
    first_paragraph.text = "This is the new, modified first paragraph."
# Add a new paragraph at the end
doc.add_paragraph("This paragraph was added after opening the document.")
# Find a table and modify its content
if doc.tables:
    table = doc.tables[0]
    # Change the text in the cell at row 1, column 1
    table.cell(1, 1).text = "Modified Cell!"
# Save the modified document
# It's good practice to save with a new name to avoid overwriting the original
doc.save('modified_document.docx')
print("Document 'modified_document.docx' saved successfully!")

Adding Images

You can easily add images to your document.

from docx import Document
from docx.shared import Inches
doc = Document()
doc.add_heading('Adding Images', level=1)
# Add an image, specifying its width
# The height is automatically adjusted to maintain aspect ratio
doc.add_picture('python_logo.png', width=Inches(2.0))
# You can also specify height
# doc.add_picture('python_logo.png', height=Inches(1.0))
doc.save('document_with_image.docx')

(Make sure you have an image file named python_logo.png in the same directory, or provide the correct path.)

Python pythondocx如何操作Word文档?-图3
(图片来源网络,侵删)

Advanced Features: Using Templates

A very powerful use case is using a Word template (.docx file) with "merge fields" and populating it with data from Python.

Step 1: Create the Template in Word

  1. Create a new Word document.
  2. Type your static text: "Dear [Customer Name],"
  3. Go to the Insert tab -> Quick Parts -> Field....
  4. In the Field dialog, choose Merge Field from the "Categories" list.
  5. In the "Field name" box, type customer_name.
  6. Add another line: "Your total order amount is $[order_total]."
  7. Repeat the process, but this time use the field name order_total.
  8. Save this file as template.docx.

Step 2: Populate the Template with Python

python-docx can't directly replace these fields. A common and effective workaround is to use string replacement on the raw XML of the document.

import docx
from docx import Document
import re
def populate_template(template_path, output_path, data):
    """
    Populates a Word template with merge fields using string replacement.
    Note: This is a simple approach and may not work for all complex templates.
    """
    # Open the template document
    doc = Document(template_path)
    # Get the document's XML content
    # doc.element.body is the XML element for the body
    # We convert it to a string to perform replacements
    xml_string = doc.element.body.xml
    # Use regular expressions to find and replace fields like [field_name]
    # The regex r'\[(.*?)\]' finds text inside square brackets
    for key, value in data.items():
        # Replace the field name with the actual value
        # We use re.escape to handle special characters in the key if necessary
        xml_string = re.sub(r'\[' + re.escape(key) + r'\]', str(value), xml_string)
    # Replace the body's XML with the new, modified XML
    # We need to clear existing content first
    for element in doc.element.body:
        doc.element.body.remove(element)
    # Add the new XML back
    # This part is a bit tricky; a more robust way is to use a library like 'lxml'
    # but for simple cases, this works.
    from docx.oxml import parse_xml
    new_body = parse_xml(xml_string)
    doc.element.body.append(new_body)
    # Save the new document
    doc.save(output_path)
# --- Example Usage ---
# Data to be inserted into the template
customer_data = {
    'customer_name': 'John Doe',
    'order_total': '150.75'
}
# Populate the template
populate_template('template.docx', 'populated_invoice.docx', customer_data)
print("Template 'template.docx' has been populated and saved as 'populated_invoice.docx'.")

Summary and Best Practices

Task Key Methods/Properties
Create New Document doc = Document()
Save Document doc.save('filename.docx')
Add Paragraph doc.add_paragraph('text')
Add Heading doc.add_heading('text', level=1)
Modify Paragraph Text paragraph.text = 'new text'
Add Formatting (Bold, Italic) run = paragraph.add_run('text')
run.bold = True
Add Font Style/Color run.font.size = Pt(12)
run.font.color.rgb = ...
Add Table doc.add_table(rows=3, cols=2)
Modify Table Cell table.cell(0, 0).text = 'new text'
Add Image doc.add_picture('image.png', width=Inches(1))
Add Page Break doc.add_page_break()
Read Paragraphs for p in doc.paragraphs: print(p.text)

Best Practices:

  • Use Run for Styling: Always apply formatting (bold, italic, color) to a Run object, not directly to a Paragraph. This allows for multiple styles within a single paragraph.
  • Check for Existence: Before trying to access a paragraph or table, check if the list is empty to avoid IndexError (e.g., if doc.paragraphs:).
  • Save with a New Name: When modifying an existing document, save it with a different name to preserve the original file.
  • For Complex Templates: For production-level template generation, consider more robust solutions like docx-mailmerge (a library built on top of python-docx specifically for this purpose) or generating HTML and converting it to .docx.
分享:
扫描分享到社交APP
上一篇
下一篇