杰瑞科技汇

Python defaultdict如何简化字典默认值处理?

Of course! Let's dive deep into collections.defaultdict in Python. It's a fantastic and highly useful tool that builds on the standard dictionary.

Python defaultdict如何简化字典默认值处理?-图1
(图片来源网络,侵删)

What is a defaultdict?

At its core, a defaultdict is a subclass of Python's built-in dict. It behaves almost exactly like a normal dictionary, but with one key difference:

If you try to access or modify a key that is not in the dictionary, a defaultdict will automatically create it for you.

You provide a "factory function" when you create the defaultdict. This function is used to generate the default value for any new key.


The Problem: Why Do We Need defaultdict?

Imagine you're trying to count the frequency of words in a list.

Python defaultdict如何简化字典默认值处理?-图2
(图片来源网络,侵删)

The "Manual" Way with a Regular dict

words = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
# Create an empty dictionary
word_counts = {}
for word in words:
    if word in word_counts:
        # If the word is already a key, increment its value
        word_counts[word] += 1
    else:
        # If it's a new word, add it with a value of 1
        word_counts[word] = 1
print(word_counts)
# Output: {'apple': 3, 'banana': 2, 'orange': 1}

This works, but it's a bit verbose. We have to constantly check if a key exists using an if statement. This is often called the "lookup and initialize" pattern.

A Slightly Better (but still clunky) Way with .get()

We can make it slightly cleaner using the .get() method, which lets us provide a default value if the key is missing.

words = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
word_counts = {}
for word in words:
    # .get(word, 0) returns the current count if 'word' is a key,
    # otherwise it returns the default value 0.
    word_counts[word] = word_counts.get(word, 0) + 1
print(word_counts)
# Output: {'apple': 3, 'banana': 2, 'orange': 1}

This is better, but it's still not the most elegant or readable solution. This is where defaultdict shines.


The Solution: Using defaultdict

defaultdict eliminates the need for manual checks or the .get() method. It handles the "default value" logic automatically.

Python defaultdict如何简化字典默认值处理?-图3
(图片来源网络,侵删)

The Syntax

You import it from the collections module and provide a "factory function" as the first argument to the constructor. This function is called with no arguments to produce a default value.

from collections import defaultdict
# The most common factory function is `list`, `int`, or `set`.
# `list()` creates an empty list `[]`
# `int()` creates the integer `0`
# `set()` creates an empty set `set()`
# Example: A defaultdict that defaults to an empty list
my_list_dict = defaultdict(list)
# Example: A defaultdict that defaults to the integer 0
my_int_dict = defaultdict(int)

Example 1: Counting Words (The int factory)

Let's revisit our word-counting problem.

from collections import defaultdict
words = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
# Create a defaultdict that defaults to int(), which is 0
word_counts = defaultdict(int)
for word in words:
    # No need to check if 'word' exists!
    # If 'word' is not in the dictionary, defaultdict automatically
    # adds it with a value of int() -> 0.
    # Then, it adds 1 to that value.
    word_counts[word] += 1
print(word_counts)
# Output: defaultdict(<class 'int'>, {'apple': 3, 'banana': 2, 'orange': 1})

Notice the output: defaultdict(<class 'int'>, ...). This tells you that the default factory function for this dictionary is int.

Example 2: Grouping Items (The list factory)

This is a classic use case. Imagine you have a list of tuples, where each tuple contains a category and an item. You want to group all items by their category.

from collections import defaultdict
# Data: (category, item)
data = [('fruit', 'apple'), ('color', 'red'), ('fruit', 'banana'), ('color', 'blue'), ('fruit', 'orange')]
# Create a defaultdict that defaults to an empty list
grouped_items = defaultdict(list)
for category, item in data:
    # No need to check if the category key exists!
    # If 'fruit' is not a key, defaultdict creates it with a value of list() -> [].
    # Then, it appends 'apple' to that list.
    grouped_items[category].append(item)
print(grouped_items)
# Output: defaultdict(<class 'list'>, {'fruit': ['apple', 'banana', 'orange'], 'color': ['red', 'blue']})

This is incredibly clean and readable. The code directly expresses the intent: "for each item, append it to the list associated with its category."


Common Factory Functions

The power of defaultdict comes from choosing the right factory function.

Factory Function Default Value Use Case
int 0 Counting things. (word counts, scores, etc.)
list [] Grouping items. (categorizing, collecting results)
set set() Storing unique items. (finding all unique tags, neighbors in a graph)
dict Creating nested dictionaries. (e.g., a 2D grid or a multi-level index)
lambda x: x (custom) For more complex defaults. For example, defaultdict(lambda: 'N/A') would default missing keys to the string 'N/A'.

Example 3: Using set for Uniqueness

Let's say you want to find all the unique cities for each country.

from collections import defaultdict
data = [('USA', 'New York'), ('Canada', 'Toronto'), ('USA', 'Los Angeles'), ('Canada', 'Vancouver'), ('USA', 'New York')]
# Use a set factory to automatically store only unique cities
cities_by_country = defaultdict(set)
for country, city in data:
    cities_by_country[country].add(city) # Use .add() for sets
print(cities_by_country)
# Output: defaultdict(<class 'set'>, {'USA': {'New York', 'Los Angeles'}, 'Canada': {'Toronto', 'Vancouver'}})

How defaultdict Works Internally

The magic of defaultdict is the __missing__ special method. When you try to access a key that doesn't exist, Python calls this method on the dictionary object.

defaultdict implements __missing__ like this (simplified):

class defaultdict(dict):
    def __init__(self, default_factory=None, ...):
        super().__init__(...)
        self.default_factory = default_factory
    def __missing__(self, key):
        # If a default factory is set...
        if self.default_factory is not None:
            # ...create the default value by calling the factory.
            value = self.default_factory()
            # Store this new value in the dictionary for the given key.
            self[key] = value
            # Return the newly created value.
            return value
        # If no default factory is set, it behaves like a normal dict and raises a KeyError.
        raise KeyError(key)

defaultdict vs. dict.setdefault()

A regular dictionary has a method called setdefault(key, default_value). It does something similar but is less efficient.

  • d.setdefault(k, []):

    1. Checks if key k exists.
    2. If it exists, returns its value.
    3. If it doesn't exist, inserts the key k with the value [] and then returns [].
  • d[k].append(x) with a defaultdict(list):

    1. Checks if key k exists.
    2. If it exists, returns its value (a list).
    3. If it doesn't exist, creates the default value (a new empty list []), inserts the key k with that value, and returns the new list.

The crucial difference is that setdefault always modifies the dictionary on every call, even if the key already exists. This makes it inefficient for loops.

Performance Comparison:

import timeit
from collections import defaultdict
setup_code = """
from collections import defaultdict
d = defaultdict(list)
d2 = {}
for i in range(1000):
    d.setdefault(f'key_{i}', []).append(i)
    # d[f'key_{i}'].append(i) # This is the defaultdict way
"""
# Time the setdefault version
time_taken = timeit.timeit(stmt="pass", setup=setup_code, number=10000)
print(f"Time with setdefault: {time_taken:.4f} seconds")
# Time the defaultdict version
setup_code_dd = """
from collections import defaultdict
d = defaultdict(list)
for i in range(1000):
    d[f'key_{i}'].append(i)
"""
time_taken_dd = timeit.timeit(stmt="pass", setup=setup_code_dd, number=10000)
print(f"Time with defaultdict: {time_taken_dd:.4f} seconds")

You will find that the defaultdict version is significantly faster because it only modifies the dictionary when a key is actually missing.


Key Takeaways and Summary

Feature Description
What it is A subclass of dict that provides a default value for missing keys.
How it works You pass a "factory function" (like int, list, set) to its constructor.
Key Advantage Eliminates verbose if key in dict: checks and makes code cleaner, more readable, and more efficient.
Common Use Cases Counting, grouping, categorizing, and building nested data structures.
Important Note A defaultdict will never raise a KeyError for a missing key. It will create the key instead.
Conversion If you need a regular dict from a defaultdict (e.g., to remove the defaultdict "wrapper"), you can simply dict(my_defaultdict).

In short, defaultdict is your go-to tool for any dictionary-based task where you need to handle missing keys gracefully and efficiently.

分享:
扫描分享到社交APP
上一篇
下一篇