杰瑞科技汇

Java.util.regex如何高效匹配与处理正则?

Of course! java.util.regex is a powerful and essential package in Java for pattern matching in strings. It provides two main classes: Pattern and Matcher, which work together to find and manipulate text based on regular expressions.

Java.util.regex如何高效匹配与处理正则?-图1
(图片来源网络,侵删)

Here’s a comprehensive guide covering the key concepts, classes, and practical examples.


The Core Components

The java.util.regex package revolves around three key interfaces/classes:

  1. Pattern: A compiled representation of a regular expression. You don't instantiate this class directly. Instead, you use the static Pattern.compile() method to compile your regex string into a Pattern object. This is efficient because you can compile a pattern once and use it multiple times.
  2. Matcher: An engine that performs match operations on a character sequence (a String or CharSequence) by interpreting a Pattern. You get a Matcher object by calling the .matcher() method on a Pattern object.
  3. PatternSyntaxException: An unchecked exception that indicates a syntax error in a regular expression pattern.

Basic Workflow

The typical workflow for using regex in Java is:

  1. Define your regular expression as a String.
  2. Compile the regex string into a Pattern object using Pattern.compile().
  3. Create a Matcher object by calling .matcher() on the Pattern, passing in the input string you want to search.
  4. Perform matching operations (e.g., find, replace, split) using the Matcher object.

Key Matcher Methods

The Matcher class is where the action happens. Here are the most commonly used methods:

Java.util.regex如何高效匹配与处理正则?-图2
(图片来源网络,侵删)
Method Description Example
boolean find() Attempts to find the next subsequence of the input sequence that matches the pattern. Returns true if found. while (m.find()) { ... }
boolean matches() Attempts to match the entire region against the pattern. Returns true only if the whole string matches. if (m.matches()) { ... }
boolean lookingAt() Attempts to match the beginning of the region against the pattern. Returns true if the start of the string matches. if (m.lookingAt()) { ... }
String group() Returns the subsequence of the input sequence matched by the previous find() or matches() call. System.out.println(m.group());
String group(int group) Returns the input subsequence captured by the given group during the previous match. System.out.println(m.group(1));
int start() / int end() Returns the start and end indices (exclusive) of the previous match. System.out.println("Found at " + m.start() + "-" + m.end());
String replaceAll(String replacement) Replaces every subsequence of the input sequence that matches the pattern with the given replacement string. String result = m.replaceAll("-");
String replaceFirst(String replacement) Replaces the first subsequence of the input sequence that matches the pattern. String result = m.replaceFirst("-");
String[] split(CharSequence input) Splits the given input sequence around matches of this pattern. (This is a shortcut method on Pattern as well). String[] parts = m.split(" ");

Practical Examples

Example 1: Finding All Occurrences

This example finds all words that start with "J" and are followed by any number of lowercase letters.

import java.util.regex.*;
public class RegexFindExample {
    public static void main(String[] args) {
        String text = "Java is great. JavaScript is also good. Just kidding!";
        // Regex: \bJ[a-z]+\b
        // \b - Word boundary
        // J  - The letter 'J'
        // [a-z]+ - One or more lowercase letters
        // \b - Word boundary
        String regex = "\\bJ[a-z]+\\b";
        // 1. Compile the regex
        Pattern pattern = Pattern.compile(regex);
        // 2. Create a matcher
        Matcher matcher = pattern.matcher(text);
        // 3. Find all matches
        System.out.println("Words starting with 'J':");
        while (matcher.find()) {
            // matcher.group() returns the matched string
            System.out.println("Found: " + matcher.group());
            // matcher.start() and matcher.end() give the position
            System.out.println("  at index: " + matcher.start() + " to " + matcher.end());
        }
    }
}

Output:

Words starting with 'J':
Found: Java
  at index: 0 to 4
Found: JavaScript
  at index: 15 to 25
Found: Just
  at index: 40 to 44

Example 2: Extracting Groups (Capturing Parentheses)

This example parses a date string in YYYY-MM-DD format and extracts the year, month, and day.

import java.util.regex.*;
public class RegexGroupExample {
    public static void main(String[] args) {
        String dateText = "The event is on 2025-10-27 and another on 1999-01-01.";
        // Regex: (\d{4})-(\d{2})-(\d{2})
        // (\d{4}) - Group 1: Four digits (Year)
        // -       - A literal hyphen
        // (\d{2}) - Group 2: Two digits (Month)
        // -       - A literal hyphen
        // (\d{2}) - Group 3: Two digits (Day)
        String regex = "(\\d{4})-(\\d{2})-(\\d{2})";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(dateText);
        while (matcher.find()) {
            // The entire match is group 0
            System.out.println("Full date found: " + matcher.group(0));
            // Group 1 is the first captured group (the parentheses)
            String year = matcher.group(1);
            String month = matcher.group(2);
            String day = matcher.group(3);
            System.out.println("  Year: " + year);
            System.out.println("  Month: " + month);
            System.out.println("  Day: " + day);
            System.out.println("-------------------");
        }
    }
}

Output:

Full date found: 2025-10-27
  Year: 2025
  Month: 10
  Day: 27
-------------------
Full date found: 1999-01-01
  Year: 1999
  Month: 01
  Day: 01
-------------------

Example 3: Validation (matches() vs find())

It's crucial to understand the difference between matches() and find().

  • matches(): The entire string must match the pattern.
  • find(): The pattern must be found somewhere in the string.
import java.util.regex.*;
public class RegexValidationExample {
    public static void main(String[] args) {
        String email = "test.user@example.com";
        String invalidEmail = "this is not an email";
        // Regex for a simple email validation
        String regex = "[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher1 = pattern.matcher(email);
        Matcher matcher2 = pattern.matcher(invalidEmail);
        // Using find() - will be true if the email pattern is anywhere in the string
        System.out.println("Does '" + email + "' contain a valid email pattern? " + matcher1.find());
        System.out.println("Does '" + invalidEmail + "' contain a valid email pattern? " + matcher2.find());
        System.out.println("-----------------------------------");
        // Using matches() - will only be true if the WHOLE string is a valid email
        System.out.println("Is '" + email + "' a valid email? " + matcher1.matches());
        System.out.println("Is '" + invalidEmail + "' a valid email? " + matcher2.matches());
    }
}

Output:

Does 'test.user@example.com' contain a valid email pattern? true
Does 'this is not an email' contain a valid email pattern? false
-----------------------------------
Is 'test.user@example.com' a valid email? true
Is 'this is not an email' a valid email? false

Example 4: Replacing Text

This example removes all extra whitespace from a string.

import java.util.regex.*;
public class RegexReplaceExample {
    public static void main(String[] args) {
        String messyText = "This   has    multiple    spaces.";
        // Regex: \s+ - one or more whitespace characters
        String regex = "\\s+";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(messyText);
        // Replace every occurrence of one or more spaces with a single space
        String cleanText = matcher.replaceAll(" ");
        System.out.println("Original: '" + messyText + "'");
        System.out.println("Cleaned:  '" + cleanText + "'");
    }
}

Output:

Original: 'This   has    multiple    spaces.'
Cleaned:  'This has multiple spaces.'

Common Regex Patterns (Quick Reference)

Pattern Description Example
Any character (except line terminator) c.t matches "cat", "cut", "c8t"
^ Start of the string ^Hello matches "Hello world"
End of the string world$ matches "Hello world"
\d Any digit (0-9) \d{3} matches "123"
\D Any non-digit \D+ matches "abc!"
\w Word character (a-z, A-Z, 0-9, _) \w+ matches "user_name_123"
\W Non-word character \W matches "@" or " "
\s Whitespace character (space, tab, newline) \s+ matches " " or "\t"
\S Non-whitespace character \S+ matches "Java"
[abc] Any character in the set (a, b, or c) [aeiou] matches "a", "e", "i"
[^abc] Any character NOT in the set [^0-9] matches "a", "!"
a\|b Either 'a' or 'b' cat\|dog matches "cat" or "dog"
Zero or more of the preceding element a* matches "", "a", "aa"
One or more of the preceding element a+ matches "a", "aa"
Zero or one of the preceding element colou?r matches "color" or "colour"
{n} Exactly n times \d{3} matches exactly 3 digits
{n,} At least n times \d{2,} matches 2 or more digits
{n,m} Between n and m times \d{1,3} matches 1, 2, or 3 digits
(abc) Capturing group (test)+ matches "testtest"
\b Word boundary \bword\b matches "word" but not "sword"

Performance Considerations

  • Compile Once, Use Many: If you are using the same regular expression in a loop or multiple times, always compile it into a Pattern object first. This is much faster than re-compiling the regex string on every iteration.
  • Use String.split() for Simple Cases: For simple splitting, String.split(String regex) is a convenient shortcut. It internally compiles the regex and splits the string for you. For complex or repeated operations, using Pattern.split() is more explicit and efficient.

Learning Regular Expressions

Regular expressions are a skill in themselves. Here are some great resources to learn them:

  • Regex101: An excellent online tool for building, testing, and learning regex. It provides explanations for each part of your pattern.
  • RegExr: Another fantastic interactive tool with a helpful reference library.
  • Oracle Java Documentation: The official documentation for java.util.regex is always a good place to check for class details.
分享:
扫描分享到社交APP
上一篇
下一篇