杰瑞科技汇

Python itertools,高效迭代工具怎么用?

Of course! Let's dive deep into Python's itertools module.

Python itertools,高效迭代工具怎么用?-图1
(图片来源网络,侵删)

What is itertools?

The itertools module is part of Python's standard library. It provides a collection of fast, memory-efficient tools for working with iterators.

An iterator is any object in Python that can be used in a for loop. More technically, it's an object that implements the __iter__() and __next__() methods. The key advantage of iterators is that they allow you to loop over potentially huge sequences of data without loading the entire sequence into memory all at once.

itertools helps you create complex iterators by combining simple ones. It's a cornerstone of functional programming in Python and is incredibly useful for data processing, especially in data science and scripting.


Why Use itertools?

  1. Memory Efficiency: It doesn't generate all values at once. It produces them one by one as needed. This is perfect for handling large or infinite data streams.
  2. Performance: The functions are implemented in C, making them significantly faster than equivalent pure Python solutions.
  3. Readability and Conciseness: It allows you to express complex iteration logic in a clear, declarative way, often replacing verbose loops.

The Three Main Categories of itertools

The functions in itertools can be broadly grouped into three categories:

Python itertools,高效迭代工具怎么用?-图2
(图片来源网络,侵删)
  1. Infinite Iterators: Create an infinite sequence of values.
  2. Finite Iterators: Consume one or more input iterables and produce a single output iterable.
  3. Combinatoric Iterators: Combinatoric generators for permutations, combinations, and Cartesian products.

Let's explore each with examples.


Infinite Iterators

These iterators keep going forever. You typically use them with functions like next() or wrap them in a takewhile() or islice() to stop them.

count(start, step)

Creates an infinite iterator that starts at start and increments by step.

import itertools
# Start at 10, count by 2
counter = itertools.count(start=10, step=2)
print(next(counter))  # 10
print(next(counter))  # 12
print(next(counter))  # 14
# Often used with zip to limit its length
data = ['a', 'b', 'c', 'd', 'e']
zipped = zip(itertools.count(), data) # (0, 'a'), (1, 'b'), ...
for index, value in zipped:
    print(f"Index: {index}, Value: {value}")
# Index: 0, Value: a
# Index: 1, Value: b
# Index: 2, Value: c
# Index: 3, Value: d
# Index: 4, Value: e

cycle(iterable)

Creates an infinite iterator by repeating the elements from the input iterable.

Python itertools,高效迭代工具怎么用?-图3
(图片来源网络,侵删)
import itertools
repeater = itertools.cycle('ABCD')
print(next(repeater)) # 'A'
print(next(repeater)) # 'B'
print(next(repeater)) # 'C'
print(next(repeater)) # 'D'
print(next(repeater)) # 'A' (starts over)

repeat(elem, n)

Creates an iterator that returns elem over and over. If n is specified, it repeats n times.

import itertools
# Repeat 'hi' forever
hi_repeater = itertools.repeat('hi')
print(next(hi_repeater)) # 'hi'
print(next(hi_repeater)) # 'hi'
# Repeat 'test' 3 times
limited_repeater = itertools.repeat('test', 3)
print(list(limited_repeater)) # ['test', 'test', 'test']

Finite Iterators (Single Input)

These functions take a single iterable and transform it.

accumulate(iterable, func)

Returns an iterator that yields accumulated sums (or other binary function results). By default, it's sum.

import itertools
# Default behavior (sum)
numbers = [1, 2, 3, 4, 5]
sum_accumulator = itertools.accumulate(numbers)
print(list(sum_accumulator)) # [1, 3, 6, 10, 15]
# Using a custom function (multiplication)
def multiply(a, b):
    return a * b
product_accumulator = itertools.accumulate(numbers, multiply)
print(list(product_accumulator)) # [1, 2, 6, 24, 120]

chain(*iterables)

Takes multiple iterables and chains them together into a single, continuous sequence.

import itertools
list1 = [1, 2, 3]
list2 = ['a', 'b', 'c']
tuple1 = (10, 20)
chained = itertools.chain(list1, list2, tuple1)
print(list(chained)) # [1, 2, 3, 'a', 'b', 'c', 10, 20]

compress(data, selectors)

Filters elements from data, keeping only those elements whose corresponding element in selectors is "truthy".

import itertools
data = ['A', 'B', 'C', 'D', 'E']
selectors = [True, False, True, False, True]
filtered = itertools.compress(data, selectors)
print(list(filtered)) # ['A', 'C', 'E']

dropwhile(predicate, iterable)

Drops elements from the iterable as long as the predicate is true. It stops dropping as soon as it finds a false value and then yields all the rest.

import itertools
# Drop numbers as long as they are less than 5
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2]
dropped = itertools.dropwhile(lambda x: x < 5, data)
print(list(dropped)) # [5, 6, 7, 8, 9, 1, 2]

filterfalse(predicate, iterable)

The opposite of filter(). It keeps only the elements for which the predicate is false.

import itertools
# Keep only the even numbers (predicate is "is odd")
data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
filtered = itertools.filterfalse(lambda x: x % 2 != 0, data)
print(list(filtered)) # [2, 4, 6, 8]

islice(iterable, stop) or islice(iterable, start, stop, step)

Creates an iterator that returns a selected slice from the input iterable. It's a memory-efficient version of slicing a list.

import itertools
data = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
# Get the first 3 elements
slice1 = itertools.islice(data, 3)
print(list(slice1)) # ['a', 'b', 'c']
# Get elements from index 2 up to (but not including) 6
slice2 = itertools.islice(data, 2, 6)
print(list(slice2)) # ['c', 'd', 'e', 'f']
# Get elements from index 1, stepping by 2
slice3 = itertools.islice(data, 1, None, 2) # None for stop means go to the end
print(list(slice3)) # ['b', 'd', 'f', 'h']

takewhile(predicate, iterable)

The opposite of dropwhile(). It takes elements from the iterable as long as the predicate is true. It stops as soon as it finds a false value.

import itertools
# Take numbers as long as they are less than 5
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2]
taken = itertools.takewhile(lambda x: x < 5, data)
print(list(taken)) # [1, 2, 3, 4]

Combinatoric Iterators

These are used for creating permutations, combinations, and Cartesian products.

permutations(iterable, r)

Generates all possible orderings of r length from the input iterable. The order matters.

import itertools
data = 'ABC'
# All permutations of length 2
perms = itertools.permutations(data, 2)
print(list(perms)) # [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

combinations(iterable, r)

Generates all possible subsets of r length from the input iterable. The order does not matter.

import itertools
data = 'ABC'
# All combinations of length 2
combs = itertools.combinations(data, 2)
print(list(combs)) # [('A', 'B'), ('A', 'C'), ('B', 'C')]

combinations_with_replacement(iterable, r)

Like combinations, but elements can be repeated in the output.

import itertools
data = 'ABC'
# All combinations of length 2 with replacement
combs_wr = itertools.combinations_with_replacement(data, 2)
print(list(combs_wr)) # [('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'B'), ('B', 'C'), ('C', 'C')]

product(*iterables, repeat=1)

Computes the Cartesian product of input iterables. It's like nested for-loops.

import itertools
# Cartesian product of two lists
list1 = [1, 2]
list2 = ['a', 'b']
prod = itertools.product(list1, list2)
print(list(prod)) # [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
# Cartesian product of a list with itself
prod_repeat = itertools.product(list1, repeat=2)
print(list(prod_repeat)) # [(1, 1), (1, 2), (2, 1), (2, 2)]

itertools Recipes

The official Python documentation for itertools includes a section of "recipes"—higher-order functions built from itertools primitives. These are incredibly useful.

grouper(): Grouping consecutive items

A classic problem is grouping consecutive identical items.

import itertools
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args, fillvalue=fillvalue)
data = 'AAAAABBBCCDDEEEFF'
groups = grouper(data, 3, '_')
for group in groups:
    print(group)
# ('A', 'A', 'A')
# ('A', 'A', 'A')
# ('B', 'B', 'B')
# ('C', 'C', '_') # Note the fillvalue
# ('D', 'D', '_')
# ('E', 'E', 'E')
# ('F', 'F', '_')

powerset(): Generating all subsets

If you need all possible subsets (the "power set") of an iterable.

import itertools
def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))
data = [1, 2, 3]
for subset in powerset(data):
    print(subset)
# ()
# (1,)
# (2,)
# (3,)
# (1, 2)
# (1, 3)
# (2, 3)
# (1, 2, 3)

Summary

Category Function Description
Infinite count() Infinite counter (start, step).
cycle() Infinite cycle through an iterable.
repeat() Infinite repeat of an element.
Finite (1 in) accumulate() Running totals (or other function).
chain() Chain multiple iterables together.
compress() Filter elements based on a "selectors" iterable.
dropwhile() Drop items while a predicate is true.
filterfalse() Keep items where a predicate is false.
islice() Memory-efficient slicing of an iterable.
takewhile() Take items while a predicate is true.
Combinatoric permutations() All possible orderings (r-length).
combinations() All possible subsets (r-length).
combinations_with_replacement() All possible subsets with replacement (r-length).
product() Cartesian product of iterables.

Mastering itertools will make your Python code more efficient, more readable, and more powerful, especially when dealing with data.

分享:
扫描分享到社交APP
上一篇
下一篇