杰瑞科技汇

Windows下Python如何设置UTF-8编码?

Of course. This is a very common and important topic for Python developers on Windows. Here’s a comprehensive guide covering the problem, the solutions, and best practices.

Windows下Python如何设置UTF-8编码?-图1
(图片来源网络,侵删)

The Core Problem: Windows's Default Encoding

Historically, Windows has used different default encodings for the console (terminal) and for reading/writing files.

  1. Console (Terminal): The default encoding for the Windows Command Prompt (cmd.exe) is often cp437 (or another cp* codepage like cp1252), not UTF-8. This means if you print a non-ASCII character (like or ) from Python, it might show up as a or a different, incorrect character.

  2. File I/O: The default encoding for file operations (like open()) in Python on Windows is cp1252 (for older Python versions) or utf-8 (for Python 3.7+ on Windows). This inconsistency is the root of most "mojibake" (garbled text) issues.

The goal is to make everything consistently use UTF-8.

Windows下Python如何设置UTF-8编码?-图2
(图片来源网络,侵删)

Solution 1: For Console Output (Making print() work)

You need to tell Python that the console it's outputting to understands UTF-8. The best way to do this is by setting the PYTHONUTF8 environment variable.

Method A: The Recommended Way (Set Environment Variable)

This is the most robust solution as it applies to all your Python scripts running in that terminal.

  1. Open the "Edit the system environment variables" window.

    • Press Win + R, type sysdm.cpl, and press Enter.
    • Go to the "Advanced" tab and click "Environment Variables...".
  2. Add a new system variable.

    Windows下Python如何设置UTF-8编码?-图3
    (图片来源网络,侵删)
    • In the "System variables" section, click "New...".
    • Variable name: PYTHONUTF8
    • Variable value: 1
    • Click OK on all windows to save.
  3. Restart your terminal. This is crucial. The new environment variable will only be available in new terminal sessions.

Now, when you run your Python script, it will automatically configure itself to use UTF-8 for the console.

Test it:

# test_utf8.py
print("Hello, world!")          # ASCII works fine
print("Café")                   # Common accented character
print("♥ Python ♥")             # Emoji/symbols
print("中文")                    # Chinese characters
print("こんにちは")               # Japanese characters

Running this in a restarted terminal should display all characters correctly.

Method B: The Code-Based Way (If you can't set env vars)

If you cannot set environment variables, you can force UTF-8 at the beginning of your script using sys.stdout.reconfigure.

import sys
# Force stdout to use UTF-8 encoding
sys.stdout.reconfigure(encoding='utf-8')
print("Hello, world!")
print("Café")
print("♥ Python ♥")
print("中文")
print("こんにちは")

This works, but you have to remember to add it to every script. It's less convenient than setting the environment variable.


Solution 2: For File I/O (Reading and Writing Files)

This is much simpler and has been largely solved by modern Python.

For Writing Files (.write())

Always explicitly specify encoding='utf-8' when opening a file for writing. This is a best practice on all operating systems.

# Good: Explicit and portable
with open("my_file.txt", "w", encoding="utf-8") as f:
    f.write("This will be saved as UTF-8.\n")
    f.write("Café and ♥ are no problem.\n")

For Reading Files (.read())

Similarly, always specify encoding='utf-8' when opening a file for reading.

# Good: Explicit and portable
with open("my_file.txt", "r", encoding="utf-8") as f:
    content = f.read()
    print(content)

What if you don't specify it?

  • Python 3.7+: The default is already utf-8 on Windows, so you might get away with not specifying it. However, it is still a critical best practice to always be explicit to ensure your code is portable and doesn't break on older Python versions or other OSes.
  • Python < 3.7: The default is cp1252. If you try to read a UTF-8 file without specifying the encoding, you will get UnicodeDecodeError or mojibake.

Solution 3: For the Windows Terminal (PowerShell & WSL)

The built-in Command Prompt (cmd.exe) is old and limited. For a much better experience, use Windows Terminal and configure it to use UTF-8 by default.

  1. Get Windows Terminal: Install it from the Microsoft Store.
  2. Open Settings: Click the dropdown arrow (▼) on the tab bar and select "Settings".
  3. Find Your Profile: Select the profile you use (e.g., "PowerShell" or "Command Prompt").
  4. Advanced Settings: Scroll down and click "Advanced settings".
  5. Set Default Encoding: Find the "Text encoding" option and change it from "System default" to UTF-8.

This setting, combined with the PYTHONUTF8=1 environment variable, gives you a seamless UTF-8 experience.


Solution 4: The Ultimate Fix (Python 3.12+)

Python 3.12 introduced a major change to solve this problem on Windows once and for all.

If you are using Python 3.12 or later, you can enable "UTF-8 mode" by starting Python with the -X utf8 command-line flag.

# This tells Python to use UTF-8 for the console and stdin/stdout
python -X utf8 your_script.py

In this mode, Python will:

  • Assume the console uses UTF-8.
  • Not use the "surrogateescape" error handler on stdin/stdout.
  • Use UTF-8 as the default encoding for open() (though explicit is still better!).

This is the most direct way to handle it if you're on the latest Python version.


Summary and Best Practices

Scenario Problem Best Solution
Console Output (print) cmd.exe uses cp437, not UTF-8. Set PYTHONUTF8=1 environment variable. (Restart terminal after setting). This is the gold standard.
File Reading/Writing Inconsistent default encodings. Always be explicit: open("file.txt", "r", encoding="utf-8") and open("file.txt", "w", encoding="utf-8").
Using a Modern Terminal Old cmd.exe is limited. Use Windows Terminal and set its "Text encoding" to "UTF-8" in profile settings.
Using the Latest Python (3.12+) Want a command-line fix. Use the python -X utf8 your_script.py flag.

Quick Checklist for a UTF-8 Friendly Windows Setup

  1. Install Windows Terminal from the Microsoft Store.
  2. In Windows Terminal settings, set your profile's "Text encoding" to UTF-8.
  3. Set the system environment variable PYTHONUTF8 to 1.
  4. Restart Windows Terminal.
  5. In your Python code, always specify encoding='utf-8' when opening files.
分享:
扫描分享到社交APP
上一篇
下一篇