杰瑞科技汇

python word execl

Of course! It sounds like you're looking for a way to use Python to work with Word (.docx) and Excel (.xlsx) files. This is a very common and powerful task for automating document and data processing.

Here's a comprehensive guide covering the most popular and effective Python libraries for this purpose.


Working with Excel Files (.xlsx, .xls)

For Excel, the go-to library is pandas. It's the industry standard for data manipulation in Python. For more advanced tasks like writing formulas or manipulating cell styles, you can use openpyxl.

Method 1: The pandas Library (Recommended for most tasks)

pandas is perfect for reading data into a DataFrame (a powerful table-like data structure), analyzing it, and then writing it back to an Excel file.

Installation:

pip install pandas openpyxl

(Note: openpyxl is required as an engine by pandas to write .xlsx files.)

Common Tasks:

a) Reading an Excel File

import pandas as pd
# Read the first sheet of an Excel file into a DataFrame
df = pd.read_excel('data.xlsx')
# Read a specific sheet by name
df_sales = pd.read_excel('data.xlsx', sheet_name='Sales')
# Read a specific sheet by index (0 is the first sheet)
df_customers = pd.read_excel('data.xlsx', sheet_name=0)
print(df.head()) # Display the first 5 rows

b) Writing to an Excel File

import pandas as pd
# Create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df_to_write = pd.DataFrame(data)
# Write the DataFrame to an Excel file
# This will create a new file or overwrite an existing one
df_to_write.to_excel('output.xlsx', index=False) # index=False prevents writing the DataFrame index
print("Excel file 'output.xlsx' created successfully.")

c) Appending Data to an Existing Sheet This is a bit more complex as pandas doesn't have a direct "append" mode. The best way is to read the existing data, combine it with your new data, and write it all back.

import pandas as pd
# New data to append
new_data = {'Name': ['David'], 'Age': [40], 'City': ['Houston']}
new_df = pd.DataFrame(new_data)
try:
    # Try to read the existing file
    existing_df = pd.read_excel('output.xlsx')
    # Combine the old and new DataFrames
    final_df = pd.concat([existing_df, new_df], ignore_index=True)
except FileNotFoundError:
    # If the file doesn't exist, just use the new data
    final_df = new_df
# Write the combined data back to the file
final_df.to_excel('output.xlsx', index=False)
print("Data appended to 'output.xlsx'.")

Method 2: The openpyxl Library

Use openpyxl when you need more control over the Excel file itself, such as:

  • Writing Excel formulas (=SUM(A1:A10)).
  • Merging cells, changing cell colors, or adjusting fonts.
  • Manipulating charts and images.
  • Reading and modifying existing files without losing formatting.

Installation:

pip install openpyxl

Common Tasks:

a) Reading Data from Specific Cells

from openpyxl import load_workbook
# Load an existing workbook
wb = load_workbook('data.xlsx')
ws = wb.active  # Get the active sheet
# Get the value from a specific cell
cell_value = ws['A1'].value
print(f"Value in cell A1: {cell_value}")
# Get all values from a column
column_values = []
for row in ws.iter_rows(min_row=2, max_row=10, min_col=1, max_col=1):
    for cell in row:
        column_values.append(cell.value)
print(f"Values in column A (rows 2-10): {column_values}")
wb.close() # Close the workbook

b) Writing Data and Formatting

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill
# Create a new workbook
wb = Workbook()
ws = wb.active= "Report"
# Write headers with bold font
headers = ['Product', 'Sales', 'Region']
ws.append(headers)
for cell in ws[1]:
    cell.font = Font(bold=True)
# Write some data
ws.append(['Laptop', 1500, 'North'])
ws.append(['Mouse', 25, 'South'])
# Format a specific cell (e.g., make 'Sales' for Laptop green)
ws['B2'].fill = PatternFill(start_color="90EE90", end_color="90EE90", fill_type="solid")
# Save the workbook
wb.save('formatted_report.xlsx')
print("Formatted Excel file 'formatted_report.xlsx' created.")

Working with Word Documents (.docx)

For Word documents, the standard library is python-docx. It allows you to create, read, and modify .docx files.

Installation:

pip install python-docx

Important Note: python-docx manipulates the text and structure of a document. It does not handle the visual layout (like exact positioning of images or text boxes) in the same way you would in the Word GUI.

Common Tasks:

a) Creating a New Word Document

from docx import Document
from docx.shared import Pt, Inches
from docx.enum.text import WD_ALIGN_PARAGRAPH
# Create a new Document object
doc = Document()
# Add a heading
doc.add_heading('Monthly Sales Report', level=1)
# Add a paragraph
doc.add_paragraph('This report summarizes the sales performance for the month of October.')
# Add a table
table = doc.add_table(rows=3, cols=3)
table.cell(0, 0).text = 'Product'
table.cell(0, 1).text = 'Units Sold'
table.cell(0, 2).text = 'Revenue'
# Populate the table
table.cell(1, 0).text = 'Laptop'
table.cell(1, 1).text = '50'
table.cell(1, 2).text = '$75,000'
table.cell(2, 0).text = 'Mouse'
table.cell(2, 1).text = '200'
table.cell(2, 2).text = '$5,000'
# Add a paragraph and align it to the right
p = doc.add_paragraph('End of Report.')
p.alignment = WD_ALIGN_PARAGRAPH.RIGHT
# Save the document
doc.save('sales_report.docx')
print("Word document 'sales_report.docx' created.")

b) Reading and Modifying an Existing Document

from docx import Document
# Open an existing document
doc = Document('sales_report.docx')
# --- Reading Content ---
print("--- Reading Document ---")
# Iterate through all paragraphs and print their text
for para in doc.paragraphs:
    print(para.text)
# Iterate through all tables
print("\n--- Reading Tables ---")
for i, table in enumerate(doc.tables):
    print(f"Table {i+1}:")
    for row in table.rows:
        for cell in row.cells:
            print(cell.text, end='\t')
        print() # New line after each row
# --- Modifying Content ---
# Find a paragraph and change its text
for para in doc.paragraphs:
    if 'October' in para.text:
        para.text = 'This report summarizes the sales performance for the month of November.'
        # Add a bold run to the end of the paragraph
        run = para.add_run(' (Revised)')
        run.bold = True
        break
# Add a new paragraph at the end
doc.add_paragraph('This document was automatically generated by Python.')
# Save the modified document (it's good practice to save with a new name)
doc.save('sales_report_modified.docx')
print("\nModified document saved as 'sales_report_modified.docx'")

Summary: Which Library to Use?

Task Excel (.xlsx) Word (.docx) Best For...
Data Analysis pandas N/A Reading large datasets, data cleaning, statistical analysis.
Simple Read/Write pandas python-docx Quick tasks of getting data in and out of files.
Complex Formatting openpyxl python-docx Changing cell styles, adding formulas, modifying document structure (headers, footers).
Appending Data pandas (read, concat, write) N/A Adding new rows to an existing sheet without losing data.
Automating Reports pandas + openpyxl python-docx Creating dynamic reports by pulling data from a database (using pandas) and formatting it into a Word/Excel document.

Putting It All Together: A Full Example

Let's create a script that reads data from an Excel file, performs a simple calculation, and generates a Word report.

Create the input Excel file (sales_data.xlsx)

Product Q1_Sales Q2_Sales
Product A 10000 12000
Product B 15000 18000
Product C 8000 9500

The Python Script (generate_report.py)

import pandas as pd
from docx import Document
from docx.shared import Pt, Inches
from docx.enum.text import WD_ALIGN_PARAGRAPH
# --- Step 1: Read data from Excel ---
try:
    df = pd.read_excel('sales_data.xlsx')
    print("Successfully read data from Excel.")
except FileNotFoundError:
    print("Error: 'sales_data.xlsx' not found. Please create it first.")
    exit()
# --- Step 2: Process the data ---
# Calculate total sales for each product
df['Total_Sales'] = df['Q1_Sales'] + df['Q2_Sales']
# Calculate overall total
grand_total = df['Total_Sales'].sum()
# --- Step 3: Generate the Word report ---
doc = Document()
doc.add_heading('Sales Performance Report', 0)
# Add an introduction
doc.add_paragraph('This report details the sales performance for Q1 and Q2 of the fiscal year.')
# Add the data table to the Word document
# We need to convert the DataFrame to a list of lists for the table
table_data = df.values.tolist()
# Add headers
table_headers = list(df.columns)
# Create the table in Word
table = doc.add_table(rows=1, cols=len(table_headers))
hdr_cells = table.rows[0].cells
for i, header in enumerate(table_headers):
    hdr_cells[i].text = str(header)
# Add the data rows
for row_data in table_data:
    row_cells = table.add_row().cells
    for i, cell_data in enumerate(row_data):
        row_cells[i].text = str(cell_data)
# Add a summary paragraph
summary_para = doc.add_paragraph()
summary_para.add_run('Summary: ').bold = True
summary_para.add_run(f'The grand total sales for all products is ${grand_total:,}.')
# Save the final report
doc.save('final_sales_report.docx')
print("\nReport generated successfully: 'final_sales_report.docx'")

When you run this script, it will read sales_data.xlsx, calculate the totals, and create a professional-looking final_sales_report.docx.

分享:
扫描分享到社交APP
上一篇
下一篇