杰瑞科技汇

Java字符串split()如何正确分割?

Of course! The String.split() method in Java is a fundamental and powerful tool for breaking a string into an array of substrings based on a delimiter. Here’s a comprehensive guide covering its usage, nuances, and best practices.

Java字符串split()如何正确分割?-图1
(图片来源网络,侵删)

Basic Syntax

The split() method is overloaded, but the most common version you'll use is:

public String[] split(String regex)

Parameters:

  • regex: A regular expression that defines the delimiter or pattern on which the string should be split.

Returns:

  • A String array containing the substrings.

Simple Examples

Let's start with the most straightforward cases.

Java字符串split()如何正确分割?-图2
(图片来源网络,侵删)

Example 1: Splitting by a Simple Character

This is the most intuitive use case. Let's split a comma-separated string.

String csvData = "apple,banana,cherry,date";
String[] fruits = csvData.split(",");
// The 'fruits' array will be: ["apple", "banana", "cherry", "date"]
for (String fruit : fruits) {
    System.out.println(fruit);
}
// Output:
// apple
// banana
// cherry
// date

Example 2: Splitting by a Space

String sentence = "Java is a powerful language";
String[] words = sentence.split(" ");
// The 'words' array will be: ["Java", "is", "a", "powerful", "language"]
for (String word : words) {
    System.out.println(word);
}
// Output:
// Java
// is
// a
// powerful
// language

Important Nuances and Gotchas

This is where most developers run into issues. Understanding these points is crucial.

Gotcha #1: The Parameter is a Regular Expression

The split() method takes a regular expression, not just a plain string. This means certain characters have special meanings and must be "escaped" with a backslash (\).

Common Special Regex Characters:

Java字符串split()如何正确分割?-图3
(图片来源网络,侵删)
  • (dot): Matches any character.
  • Acts as an "OR" operator.
  • Matches the preceding element zero or more times.
  • Matches the preceding element one or more times.
  • Matches the preceding element zero or one time.
  • ^: Matches the beginning of the string.
  • Matches the end of the string.
  • Defines a capturing group.
  • []: Defines a character class.
  • \: Escapes special characters.

Example: Splitting by a Period (.)

If you want to split a file path by the dot (), you must escape it with a double backslash (\\). In a Java string, a single backslash (\) is an escape character, so you need two to represent a literal backslash for the regex engine.

// INCORRECT - will likely throw an exception or give unexpected results
// String path = "image.png".split(".");
// The dot '.' in regex means "any character", so this splits between every character.
// CORRECT - Escape the dot with \\
String path = "image.png";
String[] parts = path.split("\\.");
// The 'parts' array will be: ["image", "png"]
System.out.println(parts[0]); // Output: image
System.out.println(parts[1]); // Output: png

Example: Splitting by a Pipe (|)

The pipe is a regex "OR" operator.

String data = "a|b|c";
// INCORRECT: This will try to split on "a" OR "b" OR "c", which is not what you want.
// String[] parts = data.split("|"); 
// CORRECT: Escape the pipe
String[] parts = data.split("\\|");
// The 'parts' array will be: ["a", "b", "c"]

Gotcha #2: Handling Consecutive Delimiters

By default, split() discards empty strings from the result.

String data = "apple,,banana,cherry,,";
String[] fruits = data.split(",");
// The 'fruits' array will be: ["apple", "", "banana", "cherry", "", ""]
// Notice the empty strings are kept if they are between delimiters.
// If you want to remove empty strings, you can use the second parameter.
String[] fruitsWithoutEmpties = data.split(",", -1); // See next section

The Second Parameter: limit

The split() method has a second, very useful form:

public String[] split(String regex, int limit)

The limit parameter controls the number of times the pattern is applied and the length of the resulting array.

limit Value Behavior
> 0 The pattern will be applied at most limit - 1 times. The resulting array will have at most limit elements. The last item will contain the rest of the string.
<= 0 This is the same as not providing the limit. The pattern will be applied as many times as possible, and trailing empty strings are kept.

Example: Using the limit Parameter

String data = "a,b,c,d,e";
String[] result;
// limit = 3: Apply split at most 2 times.
// Result will have 3 elements.
result = data.split(",", 3);
// result -> ["a", "b", "c,d,e"]
System.out.println(java.util.Arrays.toString(result)); 
// limit = 0 (or negative): Default behavior, but keeps trailing empty strings.
String dataWithTrailingEmpty = "a,b,,";
result = dataWithTrailingEmpty.split(",", 0);
// result -> ["a", "b", "", ""] (trailing empty strings are kept)
System.out.println(java.util.Arrays.toString(result));
// limit = -1: Same as 0, but explicitly defined.
result = dataWithTrailingEmpty.split(",", -1);
// result -> ["a", "b", "", ""] (trailing empty strings are kept)
System.out.println(java.util.Arrays.toString(result));

When to Use split() vs. StringTokenizer

For simple splitting, split() is generally preferred in modern Java because it's more concise and returns a flexible array. However, StringTokenizer is an older class with different behavior.

Feature String.split() StringTokenizer
Return Type String[] (array) Enumeration<String> (iterator-like)
Regex Support Yes, full regex support. No, treats delimiters as literal characters.
Empty Strings Can be kept or discarded based on usage. Discards all empty strings by default.
Usage Modern, idiomatic, concise. Legacy, more verbose.

Example of StringTokenizer:

String data = "apple,,banana,cherry,,";
StringTokenizer tokenizer = new StringTokenizer(data, ",");
// It will skip the empty strings between commas
while (tokenizer.hasMoreTokens()) {
    System.out.println(tokenizer.nextToken());
}
// Output:
// apple
// banana
// cherry

Recommendation: Use String.split() unless you specifically need the behavior of StringTokenizer (which is rare).


Best Practices and Alternatives

Best Practice 1: Be Aware of Memory

split() creates a new String array and a new String object for each substring. For extremely large strings or very frequent calls in performance-critical loops, this can create memory pressure. However, for 99% of use cases, it's perfectly fine.

Best Practice 2: Use try-with-resources for Files

When reading a file line-by-line and splitting each line, it's crucial to manage resources properly.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class FileSplitExample {
    public static void main(String[] args) {
        String filePath = "data.csv";
        // Use try-with-resources to automatically close the reader
        try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
            String line;
            while ((line = br.readLine()) != null) {
                String[] columns = line.split(",");
                // Process columns...
                System.out.println(java.util.Arrays.toString(columns));
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Alternative: String.join()

The opposite of split() is String.join(). It's useful for reassembling an array of strings with a delimiter.

String[] fruits = {"apple", "banana", "cherry"};
String csvString = String.join(",", fruits);
// csvString will be "apple,banana,cherry"
System.out.println(csvString);

Summary

Goal Method Example
Split by a simple char split() s.split(",")
Split by a special regex char split() with \\ s.split("\\.")
Split and limit parts split() with limit s.split(",", 3)
Split and keep trailing empties split() with limit <= 0 s.split(",", -1)
Split and discard all empties split() + filter (Java 8+) Arrays.stream(s.split(",")).filter(s -> !s.isEmpty()).toArray()
Read from file and split BufferedReader + split() See example above
Join an array into a string String.join() String.join(":", parts)
分享:
扫描分享到社交APP
上一篇
下一篇