杰瑞科技汇

python matchobject

Of course! Let's dive deep into the re.MatchObject in Python.

python matchobject-图1
(图片来源网络,侵删)

What is a re.MatchObject?

A re.MatchObject is a special object returned by the re module's methods when a match is found. It's not a simple string; it's a rich object that contains a wealth of information about the match that just occurred.

You get a re.MatchObject when you use:

  • re.search()
  • re.match()
  • re.fullmatch()
  • As the return value of iterating over re.finditer().

If a match is not found, these methods return None.

Think of it as a detailed report on the successful match. It tells you what was matched, where it was matched, and how the different parts of your regular expression pattern correspond to the string.

python matchobject-图2
(图片来源网络,侵删)

How to Get a re.MatchObject

Let's start with a simple example.

import re
text = "The price is $123.45, and the item number is SKU-9876."
# re.search() finds the first location in the string where the pattern matches
match_object = re.search(r"\$\d+\.\d{2}", text)
# Check if a match was found before trying to use the object
if match_object:
    print("Match found!")
    print(f"The object type is: {type(match_object)}")
else:
    print("No match found.")

Output:

Match found!
The object type is: <class 're.Match'>

(Note: In Python, re.MatchObject is an alias for re.Match)


Key Attributes and Methods of re.MatchObject

The real power of this object lies in its attributes and methods. Let's explore them using the same match_object from the example above, which matched the string "$123.45".

python matchobject-图3
(图片来源网络,侵删)

match.group(): The Matched String

This is the most common method. It returns the actual string that was matched by the entire regular expression.

  • match.group() or match.group(0): Returns the entire match.
# The entire matched string
full_match = match_object.group()
print(f"Full match: '{full_match}'")
# Output: Full match: '$123.45'

match.groups(): Captured Groups (Tuples)

If your regular expression has capturing groups (denoted by parentheses ), match.groups() returns a tuple containing all the captured substrings.

Let's modify our pattern to capture the dollars and cents separately.

import re
text = "The price is $123.45, and the item number is SKU-9876."
# New pattern with capturing groups for dollars and cents
# (\d+)  -> Group 1: One or more digits (the dollars)
# (\.\d{2}) -> Group 2: A literal dot followed by two digits (the cents)
match_object_with_groups = re.search(r"\$(\d+)(\.\d{2})", text)
if match_object_with_groups:
    # .groups() returns a tuple of all captured groups
    captured_groups = match_object_with_groups.groups()
    print(f"All captured groups: {captured_groups}")
    # Output: All captured groups: ('123', '.45')
    # You can access them by index
    dollars = match_object_with_groups.group(1)
    cents = match_object_with_groups.group(2)
    print(f"Dollars: '{dollars}', Cents: '{cents}'")
    # Output: Dollars: '123', Cents: '.45'

match.group(N): Accessing a Specific Group

You can access any specific captured group by its number. Group 0 is the entire match, group 1 is the first capturing group, group 2 is the second, and so on.

if match_object_with_groups:
    # Group 0 is the whole match
    print(f"Group 0: {match_object_with_groups.group(0)}") # Output: $123.45
    # Group 1 is the first set of parentheses
    print(f"Group 1: {match_object_with_groups.group(1)}") # Output: 123
    # Group 2 is the second set of parentheses
    print(f"Group 2: {match_object_with_groups.group(2)}") # Output: .45

match.groupdict(): Captured Groups (Dictionary)

If your named capturing groups (using ?P<name>), match.groupdict() is incredibly useful. It returns a dictionary where keys are the group names and values are the captured strings.

import re
text = "Order ID: ORD-555-XYZ, Status: Shipped"
# (?P<order_id>\w+-\d+-\w+) creates a named group called 'order_id'
match_with_named_groups = re.search(r"Order ID: (?P<order_id>\w+-\d+-\w+)", text)
if match_with_named_groups:
    # .groupdict() returns a dictionary of named groups
    named_groups = match_with_named_groups.groupdict()
    print(f"Named groups: {named_groups}")
    # Output: Named groups: {'order_id': 'ORD-555-XYZ'}
    # Access the value by its name
    order_id = match_with_named_groups.group('order_id')
    print(f"The order ID is: {order_id}")
    # Output: The order ID is: ORD-555-XYZ

match.start() and match.end(): Match Position

These methods return the starting and ending indices of the match in the original string.

  • match.start(): The index of the first character of the match.
  • match.end(): The index of the character after the last character of the match.
if match_object:
    start_index = match_object.start()
    end_index = match_object.end()
    print(f"The match starts at index: {start_index}") # Output: 13
    print(f"The match ends at index: {end_index}")     # Output: 20
    # You can use these to slice the original string
    print(f"The matched string is: '{text[start_index:end_index]}'")
    # Output: The matched string is: '$123.45'

match.span(): Match Position as a Tuple

This is a convenient shortcut that returns the start and end indices as a single tuple (start, end).

if match_object:
    span_tuple = match_object.span()
    print(f"The span is: {span_tuple}") # Output: (13, 20)
    # It's equivalent to (match.start(), match.end())
    assert span_tuple == (match_object.start(), match_object.end())

Complete Example: Parsing a Log Line

Let's put it all together to parse a more complex string.

import re
log_line = "2025-10-27 14:30:00 [INFO] User 'alice' logged in from 192.168.1.100"
# Pattern to capture timestamp, log level, username, and IP address
# (\d{4}-\d{2}-\d{2})      -> Group 1: Date (YYYY-MM-DD)
# (\d{2}:\d{2}:\d{2})      -> Group 2: Time (HH:MM:SS)
# \[(\w+)\]                -> Group 3: Log Level (e.g., INFO)
# User '(\w+)'             -> Group 4: Username
# from (\d+\.\d+\.\d+\.\d+) -> Group 5: IP Address
pattern = r"(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) \[(\w+)\] User '(\w+)' from (\d+\.\d+\.\d+\.\d+)"
match = re.search(pattern, log_line)
if match:
    print("--- Log Parsing Successful ---")
    print(f"Raw Log Line: {log_line}\n")
    # Get all groups as a tuple
    all_groups = match.groups()
    print(f"All groups (tuple): {all_groups}\n")
    # Get specific groups by number
    timestamp = match.group(1) + " " + match.group(2)
    log_level = match.group(3)
    username = match.group(4)
    ip_address = match.group(5)
    print(f"Parsed Information:")
    print(f"  Timestamp: {timestamp}")
    print(f"  Log Level: {log_level}")
    print(f"  Username:  {username}")
    print(f"  IP Address: {ip_address}")
    # Get the span of the entire match
    print(f"\nThe entire match occurred at span: {match.span()}")
else:
    print("Log line did not match the expected format.")

Output:

--- Log Parsing Successful ---
Raw Log Line: 2025-10-27 14:30:00 [INFO] User 'alice' logged in from 192.168.1.100
All groups (tuple): ('2025-10-27', '14:30:00', 'INFO', 'alice', '192.168.1.100')
Parsed Information:
  Timestamp: 2025-10-27 14:30:00
  Log Level: INFO
  Username:  alice
  IP Address: 192.168.1.100
The entire match occurred at span: (0, 77)

Summary: re.MatchObject vs. re.Pattern

It's easy to confuse re.MatchObject with re.Pattern. Here's a quick distinction:

Feature re.Pattern re.MatchObject
What it is A compiled regular expression pattern. The result of a successful match.
How you get it re.compile(r"pattern") re.search(), re.match(), re.fullmatch(), re.finditer()
Purpose Represents the rule or template for matching. Represents the result or instance of a match.
Example Use pattern = re.compile(r"\d+") match = pattern.search("hello 123 world")
Key Attributes pattern.pattern, pattern.flags match.group(), match.start(), match.span(), etc.
分享:
扫描分享到社交APP
上一篇
下一篇