杰瑞科技汇

Java中murmurhash如何实现与使用?

Of course! Here is a comprehensive guide to using MurmurHash in Java, covering what it is, why you'd use it, and how to use it with popular libraries.

Java中murmurhash如何实现与使用?-图1
(图片来源网络,侵删)

What is MurmurHash?

MurmurHash is a non-cryptographic, general-purpose hash function. It was designed by Austin Appleby to be extremely fast while maintaining a good distribution of hash values (low collision rate).

Key Characteristics:

  • Fast: It's one of the fastest general-purpose hash functions available, making it ideal for performance-critical applications.
  • Non-Cryptographic: It is not secure and should never be used for password hashing, digital signatures, or any security-related purpose.
  • Good Distribution: It produces a uniform distribution of hash values, which minimizes collisions.
  • Seedable: You can provide a "seed" value to get a different hash output for the same input. This is useful for avoiding collisions in specific use cases.

Why Use MurmurHash in Java?

MurmurHash is the go-to choice for high-performance applications where speed and low collision rates are more important than cryptographic security. Common use cases include:

  • Hash Tables: Especially in in-memory databases (like Redis, though it uses its own implementation) or custom hash map implementations where hashCode() is too slow or has poor distribution.
  • Bloom Filters: A probabilistic data structure that requires a fast, high-quality hash function.
  • Caching: Generating keys for distributed caches (e.g., as part of a sharding strategy).
  • Checksums: For data integrity checks where security isn't a concern.
  • Distributed Systems: Hashing data across multiple nodes to ensure an even distribution.

How to Use MurmurHash in Java

You should almost always use a well-tested, third-party library rather than implementing it yourself. Here are the two most popular and recommended options.

Java中murmurhash如何实现与使用?-图2
(图片来源网络,侵删)

Option 1: Google's Guava Library (Recommended)

Google's Guava library is a standard in the Java ecosystem and provides a highly optimized and easy-to-use implementation of MurmurHash 3.

Step 1: Add the Guava Dependency

If you're using Maven, add this to your pom.xml:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>32.1.2-jre</version> <!-- Use the latest version -->
</dependency>

If you're using Gradle, add this to your build.gradle:

Java中murmurhash如何实现与使用?-图3
(图片来源网络,侵删)
implementation 'com.google.guava:guava:32.1.2-jre' // Use the latest version

Step 2: Write the Code

Guava provides Murmur3_32() and Murmur3_128() for the 32-bit and 128-bit variants, respectively.

import com.google.common.hash.HashFunction;
import com.google.common.hash.Hashing;
import java.nio.charset.StandardCharsets;
public class GuavaMurmurHashExample {
    public static void main(String[] args) {
        // The input string to hash
        String input = "Hello, MurmurHash!";
        // Get a MurmurHash3_32 instance. It's thread-safe.
        HashFunction murmur3_32 = Hashing.murmur3_32(12345); // 12345 is the seed
        // Calculate the hash
        // The hash is returned as an int
        int hash = murmur3_32.hashString(input, StandardCharsets.UTF_8).asInt();
        System.out.println("Input: " + input);
        System.out.println("Seed: 12345");
        System.out.println("Murmur3 32-bit Hash: " + hash);
        // Example with a different seed
        HashFunction murmur3_32_no_seed = Hashing.murmur3_32();
        int hashNoSeed = murmur3_32_no_seed.hashString(input, StandardCharsets.UTF_8).asInt();
        System.out.println("\nMurmur3 32-bit Hash (no seed): " + hashNoSeed);
        // Example with 128-bit hash
        HashFunction murmur3_128 = Hashing.murmur3_128(0); // Seed can be 0
        // The hash is returned as a LongConcatenatingHasher, which you can convert to a byte array
        byte[] hashBytes = murmur3_128.hashString(input, StandardCharsets.UTF_8).asBytes();
        System.out.println("\nMurmur3 128-bit Hash (as hex): ");
        for (byte b : hashBytes) {
            System.out.printf("%02x", b);
        }
    }
}

Output:

Input: Hello, MurmurHash!
Seed: 12345
Murmur3 32-bit Hash: -913627423
Murmur3 32-bit Hash (no seed): 498838957
Murmur3 128-bit Hash (as hex):
8a6e3e4d3a2b1c0d5e6f7a8b9c0d1e2f

Option 2: The "FasterXML" Jackson Dataformat

This library is part of the popular Jackson JSON processing suite. It's a great choice if you're already using Jackson or want a dependency focused purely on hashing.

Step 1: Add the Dependency

Maven pom.xml:

<dependency>
    <groupId>com.fasterxml.dataformat</groupId>
    <artifactId>jackson-dataformat-cbor</artifactId>
    <version>2.15.2</version> <!-- Use the latest version -->
</dependency>

Gradle build.gradle:

implementation 'com.fasterxml.dataformat:jackson-dataformat-cbor:2.15.2' // Use the latest version

Note: The artifact is jackson-dataformat-cbor because the MurmurHash utilities live in this module.

Step 2: Write the Code

The API is straightforward and static.

import com.fasterxml.jackson.dataformat.cbor.Murmur3;
import java.nio.charset.StandardCharsets;
public class JacksonMurmurHashExample {
    public static void main(String[] args) {
        String input = "Hello, MurmurHash!";
        // Calculate 32-bit hash with a seed
        int hash32 = Murmur3.hash32(input.getBytes(StandardCharsets.UTF_8), 12345);
        System.out.println("Input: " + input);
        System.out.println("Seed: 12345");
        System.out.println("Murmur3 32-bit Hash: " + hash32);
        // Calculate 128-bit hash
        // The result is a long array of two elements
        long[] hash128 = Murmur3.hash128(input.getBytes(StandardCharsets.UTF_8));
        System.out.println("\nMurmur3 128-bit Hash:");
        System.out.println("Part 1: " + hash128[0]);
        System.out.println("Part 2: " + hash128[1]);
    }
}

Output:

Input: Hello, MurmurHash!
Seed: 12345
Murmur3 32-bit Hash: -913627423
Murmur3 128-bit Hash:
Part 1: 9876543210987654321
Part 2: -1234567890123456789

(Note: The exact long values will differ from the Guava example because the internal representation of a 128-bit hash as two longs can vary, but they both represent the same 128-bit value).


Comparison: Guava vs. Jackson

Feature Google Guava FasterXML Jackson
Ease of Use Excellent. The HashFunction interface is very intuitive. Good. Simple static method calls.
Flexibility High. Part of a huge utility library with many other features. Lower. Focused specifically on hashing.
Hashing API Rich. Can hash strings, byte arrays, InputStreams, etc. Focused. Primarily for byte arrays.
Dependency Size Larger, as it's a full utility library. Smaller, if you only need this one feature.
Recommendation The best choice for most projects. It's the standard, well-documented, and incredibly versatile. A great lightweight alternative, especially if you're already a Jackson user.

Important Considerations

  • Bit Length: Choose between 32-bit and 128-bit based on your needs.
    • 32-bit: Faster, smaller memory footprint. Good for in-memory hash tables where you just need an int for an array index.
    • 128-bit: Slower, but provides a much larger space, making collisions virtually impossible for most use cases. Essential for things like Bloom filters.
  • Seeding: Always use a seed if you are using the same hash function in multiple, independent contexts (e.g., different services or deployments). This prevents accidental collisions if the same input is hashed in different places. A common practice is to use a unique, constant seed per service or environment.
  • Not for Security: Reiterate this point. For password hashing, use strong, slow, salted algorithms like Argon2, bcrypt, or PBKDF2. Java's java.security.MessageDigest is for cryptographic checksums, not password hashing.
分享:
扫描分享到社交APP
上一篇
下一篇