Of course! Handling Unicode in Python with JSON is a fundamental task, especially since JSON is a text-based format. Here's a comprehensive guide covering the essentials, from basic usage to common pitfalls and solutions.

The Short Answer: What You Need to Know
- Python 3: By default,
json.loads()(string to Python) andjson.dumps()(Python to string) handle Unicode correctly. You'll get Pythonstrobjects, notbytes. - Python 2: This is where the main Unicode challenges arise. You must be careful with encoding/decoding to avoid
unicodeorstrobjects turning intostrobjects with\uXXXXescape sequences. - Non-ASCII Characters: The
json.dumps()function will automatically escape non-ASCII characters (like or ) by default. To preserve them as actual characters in the output string, useensure_ascii=False.
Detailed Breakdown
Let's dive into the specifics for Python 3 and Python 2.
Python 3 (The Modern, Easy Way)
Python 3's json module is Unicode-aware by default. It's designed to work with text, not bytes.
json.loads() (Decoding JSON)
This function takes a JSON string and converts it into a Python object.
import json
# A JSON string with Unicode characters
json_string = '{"name": "José", "city": "München", "id": 123}'
# Load the JSON string into a Python dictionary
python_dict = json.loads(json_string)
print(python_dict)
# Output: {'name': 'José', 'city': 'München', 'id': 123}
# Check the type of the values
print(type(python_dict['name'])) # Output: <class 'str'>
print(type(python_dict['city'])) # Output: <class 'str'>
As you can see, the values are standard Python str objects, correctly representing the Unicode characters.

json.dumps() (Encoding JSON)
This function takes a Python object and converts it into a JSON string.
Default Behavior (ensure_ascii=True)
By default, non-ASCII characters are escaped to their \uXXXX representation. This ensures the resulting string is pure ASCII, which is valid JSON.
import json
python_dict = {'name': 'José', 'city': 'München'}
# Default behavior: escapes non-ASCII characters
json_string_default = json.dumps(python_dict)
print(json_string_default)
# Output: {"name": "Jos\u00e9", "city": "M\u00fcnchen"}
Preserving Characters (ensure_ascii=False)

If you want the JSON string to contain the actual Unicode characters (e.g., for writing to a UTF-8 encoded file), set ensure_ascii=False.
import json
python_dict = {'name': 'José', 'city': 'München'}
# Preserve non-ASCII characters
json_string_unicode = json.dumps(python_dict, ensure_ascii=False)
print(json_string_unicode)
# Output: {"name": "José", "city": "München"}
Important Note on ensure_ascii=False and Files:
When ensure_ascii=False, the output of json.dumps() is a Unicode string. If you want to write this to a file, you must encode it to bytes (e.g., UTF-8).
import json
python_dict = {'name': 'José', 'city': 'München'}
# Get the Unicode string
json_string_unicode = json.dumps(python_dict, ensure_ascii=False)
print(f"Type of json_string_unicode: {type(json_string_unicode)}")
# Output: Type of json_string_unicode: <class 'str'>
# Write to a file with UTF-8 encoding
with open('data.json', 'w', encoding='utf-8') as f:
# The file.write() method expects a string, which is what we have.
# The 'encoding='utf-8'' part tells Python how to handle that string on disk.
f.write(json_string_unicode)
Python 2 (The Tricky, Legacy Way)
In Python 2, str and unicode are different types, and you have to manage the encoding yourself.
json.loads() (Decoding JSON)
In Python 2, json.loads() can accept either a str (byte string) or a unicode object.
- If you pass a
str, it's assumed to be encoded in UTF-8 (the standard for JSON) and will be decoded intounicodeobjects. - If you pass a
unicodeobject, it's used directly.
# Python 2
import json
# Case 1: Input is a byte string (str)
json_str = '{"name": "Jos\\u00e9", "city": "M\\u00fcnchen"}'
python_dict_from_str = json.loads(json_str)
print(python_dict_from_str)
# Output: {u'name': u'Jos\xe9', u'city': u'M\xfcnchen'}
print(type(python_dict_from_str['name'])) # Output: <type 'unicode'>
# Case 2: Input is a unicode string
json_unicode = u'{"name": "Jos\u00e9", "city": "M\u00fcnchen"}'
python_dict_from_unicode = json.loads(json_unicode)
print(python_dict_from_unicode)
# Output: {u'name': u'Jos\xe9', u'city': u'M\xfcnchen'}
print(type(python_dict_from_unicode['name'])) # Output: <type 'unicode'>
The key takeaway for Python 2 is that json.loads() consistently produces unicode objects for string values.
json.dumps() (Encoding JSON)
This is where it gets tricky. The default behavior often leads to unwanted escape sequences.
Default Behavior (ensure_ascii=True)
This is the default. It produces a str (byte string) where all non-ASCII characters are escaped.
# Python 2
import json
python_unicode_dict = {u'name': u'José', u'city': u'München'}
# Default behavior: produces a str with escaped characters
json_str_default = json.dumps(python_unicode_dict)
print(json_str_default)
# Output: {"name": "Jos\u00e9", "city": "M\u00fcnchen"}
print(type(json_str_default)) # Output: <type 'str'>
Preserving Characters (ensure_ascii=False)
This produces a unicode string with the actual characters.
# Python 2
import json
python_unicode_dict = {u'name': u'José', u'city': u'München'}
# Produce a unicode string with actual characters
json_unicode_str = json.dumps(python_unicode_dict, ensure_ascii=False)
print(json_unicode_str)
# Output: {u"name": u"Jos\xe9", u"city": u"M\xfcnchen"}
print(type(json_unicode_str)) # Output: <type 'unicode'>
Writing to a File in Python 2
If you have a unicode string from json.dumps(..., ensure_ascii=False) and want to write it to a file, you must encode it to a byte string first.
# Python 2
import json
python_unicode_dict = {u'name': u'José', u'city': u'München'}
json_unicode_str = json.dumps(python_unicode_dict, ensure_ascii=False)
# Encode the unicode string to a byte string before writing
with open('data_py2.json', 'w') as f:
f.write(json_unicode_str.encode('utf-8'))
Common Pitfalls and Solutions
| Pitfall | Cause | Solution |
|---|---|---|
Getting \uXXXX escapes instead of characters |
Using json.dumps() with the default ensure_ascii=True. |
Set ensure_ascii=False in json.dumps(). |
UnicodeEncodeError when writing to a file |
Trying to write a Python 3 str (which is Unicode) to a file opened in binary mode ('wb'), or trying to write a Python 2 unicode string to a file opened in text mode without encoding. |
Python 3: Open the file in text mode with encoding: open(..., 'w', encoding='utf-8').Python 2: Encode the unicode string to bytes before writing: my_unicode_str.encode('utf-8'). |
TypeError or ValueError with json.loads() |
Passing a Python dict to json.loads() instead of a string, or passing a malformed JSON string. |
Make sure the input to json.loads() is a valid JSON formatted string. |
| Data corruption on disk | Reading a JSON file with the wrong encoding (e.g., reading UTF-8 as Latin-1). | Always specify the correct encoding when opening files. For JSON, this is almost always utf-8. |
Best Practices Summary
-
Use Python 3. It handles this much more intuitively.
-
Be Explicit with File Encodings: Always use
encoding='utf-8'when opening files for JSON I/O. This prevents subtle bugs.# Best practice for reading with open('data.json', 'r', encoding='utf-8') as f: data = json.load(f) # Use load() for file objects # Best practice for writing with open('data.json', 'w', encoding='utf-8') as f: json.dump(data, f, ensure_ascii=False) # Use dump() for file objects -
Use
ensure_ascii=Falsefor Readability: If your JSON will be read by humans (e.g., in a config file or API response),ensure_ascii=Falsemakes it much more readable. -
Validate Your JSON: If you're dealing with untrusted data, consider using a library like
jsonschemato validate the structure and types of your JSON after parsing.
