Python regex group如何捕获分组内容？-杰瑞科技汇

Of course! Let's dive deep into Python regex groups. This is a powerful feature that allows you to extract and work with specific parts of a matched string.

（图片来源网络，侵删）

What is a Regex Group?

A group is a way to treat multiple characters as a single unit. You create a group by enclosing part of your regular expression in parentheses .

Groups serve two primary purposes:

Grouping for Quantifiers: To apply a quantifier (like , , {3,5}) to a sequence of characters instead of just a single character.
- Without grouping: ab+ matches a followed by one or more bs (ab, abb, abbb).
- With grouping: (ab)+ matches one or more repetitions of the sequence ab (ab, abab, ababab).
Capturing: To "capture" the part of the string that matches the group and extract it later. This is the most common use case.
（图片来源网络，侵删）

Capturing Groups: The Basics

This is what most people mean when they talk about "groups" in regex. When you use parentheses in a pattern, the text that matches the expression inside them is captured and stored.

You can access these captured groups using the group() method of the re match object.

How to Access Captured Groups

match.group(0): Returns the entire match.
match.group(1): Returns the first captured group.
match.group(2): Returns the second captured group.
...and so on.
match.groups(): Returns a tuple containing all the captured groups (from 1 onwards).

Example: Parsing a Date

Let's say we want to parse dates in the format YYYY-MM-DD.

import re
text = "The event is scheduled for 2025-10-27."
pattern = r"(\d{4})-(\d{2})-(\d{2})" # 4 digits, then 2 digits, then 2 digits
match = re.search(pattern, text)
if match:
    # The entire matched string
    print(f"Full match: {match.group(0)}")
    # Output: Full match: 2025-10-27
    # The first captured group (the year)
    print(f"Year: {match.group(1)}")
    # Output: Year: 2025
    # The second captured group (the month)
    print(f"Month: {match.group(2)}")
    # Output: Month: 10
    # The third captured group (the day)
    print(f"Day: {match.group(3)}")
    # Output: Day: 27
    # All captured groups as a tuple
    print(f"All groups: {match.groups()}")
    # Output: All groups: ('2025', '10', '27')
else:
    print("No match found.")

Named Groups

When you have many groups, remembering which is group(1) and which is group(5) can be confusing. Named groups solve this by letting you assign a name to a group. This makes your code much more readable and maintainable.

Syntax

You create a named group using the syntax (?P<name>pattern). The P stands for "Python extension".

How to Access Named Groups

match.group('name'): Returns the group with the specified name.
match.groupdict(): Returns a dictionary where keys are the group names and values are the matched strings.

Example: Parsing a Date (Again, but with Names)

import re
text = "The event is scheduled for 2025-10-27."
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
match = re.search(pattern, text)
if match:
    # Access by name
    print(f"Year: {match.group('year')}")
    # Output: Year: 2025
    print(f"Month: {match.group('month')}")
    # Output: Month: 10
    # All named groups as a dictionary
    print(f"All named groups: {match.groupdict()}")
    # Output: All named groups: {'year': '2025', 'month': '10', 'day': '27'}
else:
    print("No match found.")

Non-Capturing Groups

Sometimes you need parentheses for grouping (e.g., (ab)+) but you don't actually need to capture the matched text. Using a capturing group in this case creates an unnecessary entry in the result tuple, which can be inefficient and confusing.

For this, you use a non-capturing group: (?:pattern).

The at the start tells the regex engine to group the expression but not to capture it.

Example: Finding Words Followed by "ing"

Let's find all words that end in "ing". We want to capture the whole word, not just the "ing" part.

import re
text = "I am running, jumping, and singing."
# Capturing group: We get ('runn', 'ing'), ('jump', 'ing'), etc.
# This is not what we want.
pattern_capturing = r"(\w+)(ing)"
matches_capturing = re.findall(pattern_capturing, text)
print(f"Capturing group result: {matches_capturing}")
# Output: Capturing group result: [('runn', 'ing'), ('jump', 'ing'), ('s', 'ing')]
# Non-capturing group: We only get the whole word, which is what we want.
pattern_non_capturing = r"(\w+)(?:ing)"
matches_non_capturing = re.findall(pattern_non_capturing, text)
print(f"Non-capturing group result: {matches_non_capturing}")
# Output: Non-capturing group result: ['runn', 'jump', 's']
# If we use re.finditer, we can see the full match is still available
pattern_non_capturing_iter = r"(\w+)(?:ing)"
for match in re.finditer(pattern_non_capturing_iter, text):
    print(f"Found word: {match.group(0)}")
# Output:
# Found word: running
# Found word: jumping
# Found word: singing

Key takeaway: Use when you need to extract a part of the string. Use when you only need the parentheses for logical grouping.

Other Types of Groups

There are several other specialized group types, but these are the most common.

Group Syntax	Name	Description
`(a\\|b)`	Alternation Group	Matches either `a` or `b`. This is the standard "OR" operator.
	Non-Capturing Group	Groups the regex but does not capture the match.
`(?P<name>...)`	Named Group	Captures the match and assigns it a name.
	Positive Lookahead	Asserts that the following characters match the pattern, but does not consume them. The match continues after the lookahead.
	Negative Lookahead	Asserts that the following characters do not match the pattern, but does not consume them.
`(?<=...)`	Positive Lookbehind	Asserts that the preceding characters match the pattern, but does not consume them. The match includes the text after the lookbehind.
`(?<!...)`	Negative Lookbehind	Asserts that the preceding characters do not match the pattern, but does not consume them.

Example: Lookahead for File Extensions

Let's find all words that are followed by a .py extension, but we only want the word itself, not the .py.

import re
text = "Use script.py or main.py, but not config.txt."
# Positive lookahead: (?=\.py)
# This means "match a word boundary followed by letters, but only if it's
# immediately followed by a literal '.py' string"
pattern = r"\b(\w+)(?=\.py)"
matches = re.findall(pattern, text)
print(f"Scripts found: {matches}")
# Output: Scripts found: ['script', 'main']

Summary and Best Practices

Group Type	Syntax	Use Case	How to Access
Capturing		You need to extract a specific part of the match.	`match.group(1)`, `match.groups()`
Named Capturing	`(?P<name>...)`	You need to extract parts and want readable code.	`match.group('name')`, `match.groupdict()`
Non-Capturing		You need parentheses for logic (quantifiers, alternation) but don't want to capture the text.	Not accessible via `group()` methods.
Lookahead		You want to match something only if it's followed by a specific pattern, without including that pattern in the result.	Not accessible via `group()` methods.

Golden Rule: If you don't need to use the matched text later, use a non-capturing group . It's more efficient and makes your code clearer.

Python regex group如何捕获分组内容？

What is a Regex Group?

Capturing Groups: The Basics

How to Access Captured Groups

Example: Parsing a Date

Named Groups

Syntax

How to Access Named Groups

Example: Parsing a Date (Again, but with Names)

Non-Capturing Groups

Example: Finding Words Followed by "ing"

Other Types of Groups

Example: Lookahead for File Extensions

Summary and Best Practices

99ANYc3cd6

树莓派Python应用怎么入门？

Python importlib安装失败怎么办？

Java response 返回时如何处理数据格式？

java checkbox 选中

Studio One教程怎么学才高效？

python phantomjs配置

Python HTMLParser如何实现网页下载？

Java虚拟机如何实现跨平台运行？

如何解锁有锁ATT iPhone？

Win10系统U盘安装教程具体步骤是怎样的？

Java.sql.Date 时间与 java.util.Date 有啥区别？

FluidSim3.6安装教程，步骤是否详细？

Python urlencode 中文如何正确编码？

Linux下Python 2.7安装步骤是什么？

Java webservice调用方法有哪些？

SolidWorks工程图教程如何快速上手？

Python regex group如何捕获分组内容？

What is a Regex Group?

Capturing Groups: The Basics

How to Access Captured Groups

Example: Parsing a Date

Named Groups

Syntax

How to Access Named Groups

Example: Parsing a Date (Again, but with Names)

Non-Capturing Groups

Example: Finding Words Followed by "ing"

Other Types of Groups

Example: Lookahead for File Extensions

Summary and Best Practices

相关推荐

python phantomjs配置