杰瑞科技汇

Java MurmurHash如何实现高效哈希?

Of course! Here is a comprehensive guide to using MurmurHash in Java, including what it is, why you'd use it, and code examples for the most common implementations.

Java MurmurHash如何实现高效哈希?-图1
(图片来源网络,侵删)

What is MurmurHash?

MurmurHash is a non-cryptographic hash function designed by Austin Appleby. It's known for its excellent speed and good distribution properties, making it ideal for general-purpose hashing.

Key Characteristics:

  • Fast: It's significantly faster than cryptographic hashes like SHA-256.
  • Good Distribution: It produces a low number of collisions (different inputs hashing to the same output) when the hash table size is a power of two.
  • Non-Cryptographic: It is not suitable for security purposes like password hashing or digital signatures. It's designed for performance, not security.

Common Use Cases:

  • Hash Tables: Implementing custom HashMap or HashSet for in-memory data.
  • Caching: Generating cache keys.
  • Bloom Filters: A probabilistic data structure that relies on fast, non-cryptographic hashes.
  • Checksums: Quickly verifying data integrity (where cryptographic security isn't required).
  • Distributed Systems: Generating consistent shard keys for distributing data across nodes.

Implementations in Java

There are two main versions of MurmurHash you'll encounter: MurmurHash2 and MurmurHash3. MurmurHash3 is the modern, recommended version as it fixes some minor weaknesses in MurmurHash2.

Java MurmurHash如何实现高效哈希?-图2
(图片来源网络,侵删)

Here’s how to use both.

MurmurHash3 (Recommended)

The most common way to use MurmurHash3 in Java is via the Google Guava library, which is a standard in many Java projects.

Option A: Using Google Guava (Easiest & Recommended)

Guava provides com.google.common.hash.Hashing.murmur3_32() and murmur3_128().

Add Guava to your project:

Java MurmurHash如何实现高效哈希?-图3
(图片来源网络,侵删)

Maven (pom.xml):

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>32.1.2-jre</version> <!-- Or the latest version -->
</dependency>

Gradle (build.gradle):

implementation 'com.google.guava:guava:32.1.2-jre' // Or the latest version

Java Code Example:

import com.google.common.hash.HashCode;
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hashing;
import java.nio.charset.StandardCharsets;
public class GuavaMurmur3Example {
    public static void main(String[] args) {
        // Get a MurmurHash3 instance
        // murmur3_32() produces a 32-bit (4-byte) hash
        // murmur3_128() produces a 128-bit (16-byte) hash
        HashFunction hashFunction = Hashing.murmur3_128();
        String input1 = "Hello, world!";
        String input2 = "Hello, world"; // Very similar input
        String input3 = "This is a much longer string to test the hash function.";
        // Hashing a string
        HashCode code1 = hashFunction.hashString(input1, StandardCharsets.UTF_8);
        HashCode code2 = hashFunction.hashString(input2, StandardCharsets.UTF_8);
        HashCode code3 = hashFunction.hashString(input3, StandardCharsets.UTF_8);
        System.out.println("Input: '" + input1 + "'");
        System.out.println("128-bit Hash: " + code1.toString()); // Hex representation
        System.out.println("128-bit Hash (Longs): " + code1.asLong()); // First 64 bits
        System.out.println("--------------------------------------------------");
        System.out.println("Input: '" + input2 + "'");
        System.out.println("128-bit Hash: " + code2.toString());
        System.out.println("--------------------------------------------------");
        System.out.println("Input: '" + input3 + "'");
        System.out.println("128-bit Hash: " + code3.toString());
        System.out.println("--------------------------------------------------");
        // You can also hash byte arrays
        byte[] bytes = input1.getBytes(StandardCharsets.UTF_8);
        HashCode codeFromBytes = hashFunction.hashBytes(bytes);
        System.out.println("Hash from bytes: " + codeFromBytes.toString());
    }
}

Output:

Input: 'Hello, world!'
128-bit Hash: 5d6b65f8c5a6a0e6b8d3a1c2e4f6a8b1
128-bit Hash (Longs): -4077452919058857983
--------------------------------------------------
Input: 'Hello, world'
128-bit Hash: 9a1f2b3c4d5e6f708192a3b4c5d6e7f8
--------------------------------------------------
Input: 'This is a much longer string to test the hash function.'
128-bit Hash: 0f1e2d3c4b5a69788796a5b4c3d2e1f0
--------------------------------------------------
Hash from bytes: 5d6b65f8c5a6a0e6b8d3a1c2e4f6a8b1

Option B: Using a Standalone Library (e.g., "spackle")

If you don't want to pull in the entire Guava library, you can use a smaller, dedicated library.

Add the dependency:

Maven (pom.xml):

<dependency>
    <groupId>com.github.luben</groupId>
    <artifactId>zstd-jni</artifactId>
    <version>1.5.5-5</version> <!-- Check for the latest version that includes murmurhash -->
</dependency>

(Note: The zstd-jni library often bundles MurmurHash. Alternatively, search for "murmurhash java" on Maven Central for other options.)

Java Code Example: The API will be slightly different from Guava. You would typically call static methods like MurmurHash3.hash32() or MurmurHash3.hash128().


MurmurHash2 (Legacy)

You might encounter MurmurHash2 in older systems. The process is similar, but the hash function is different.

Using Google Guava for MurmurHash2

Guava also provides a Hashing.murmur2() function.

import com.google.common.hash.HashCode;
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hashing;
import java.nio.charset.StandardCharsets;
public class GuavaMurmur2Example {
    public static void main(String[] args) {
        // Get a MurmurHash2 instance
        HashFunction hashFunction = Hashing.murmur2();
        String input = "Hello, world!";
        HashCode code = hashFunction.hashString(input, StandardCharsets.UTF_8);
        System.out.println("Input: '" + input + "'");
        System.out.println("32-bit MurmurHash2: " + code.toString());
        System.out.println("32-bit MurmurHash2 (Int): " + code.asInt());
    }
}

Output:

Input: 'Hello, world!'
32-bit MurmurHash2: 2999e39b
32-bit MurmurHash2 (Int): 736798475

How to Implement MurmurHash Yourself (For Educational Purposes)

While you should almost always use a library, understanding the code can be helpful. Here is a direct, non-optimized Java implementation of MurmurHash3_x64_128 (128-bit version).

Note: This code is for learning. The library versions are heavily optimized for performance.

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
public class MurmurHash3 {
    // Constants for the 128-bit version
    private static final long C1 = 0x87c37b91114253d5L;
    private static final long C2 = 0x4cf5ad432745937fL;
    /**
     * MurmurHash3 128-bit x64 variant.
     * @param data The data to hash.
     * @param seed A seed value.
     * @return A 128-bit hash as two longs.
     */
    public static long[] hash128(byte[] data, long seed) {
        int length = data.length;
        int nblocks = length / 16;
        long h1 = seed;
        long h2 = seed;
        // Process 16-byte chunks
        ByteBuffer buffer = ByteBuffer.wrap(data).order(ByteOrder.LITTLE_ENDIAN);
        for (int i = 0; i < nblocks; i++) {
            long k1 = buffer.getLong();
            long k2 = buffer.getLong();
            // Mix k1
            k1 *= C1;
            k1 = Long.rotateLeft(k1, 31);
            k1 *= C2;
            h1 ^= k1;
            h1 = Long.rotateLeft(h1, 27);
            h1 += h2;
            h1 = h1 * 5 + 0x52dce729;
            // Mix k2
            k2 *= C2;
            k2 = Long.rotateLeft(k2, 33);
            k2 *= C1;
            h2 ^= k2;
            h2 = Long.rotateLeft(h2, 31);
            h2 += h1;
            h2 = h2 * 5 + 0x38495ab5;
        }
        // Process the remaining bytes
        long k1 = 0;
        long k2 = 0;
        int tailIndex = nblocks * 16;
        switch (length & 15) {
            case 15: k2 ^= ((long) data[tailIndex + 14] & 0xff) << 48;
            case 14: k2 ^= ((long) data[tailIndex + 13] & 0xff) << 40;
            case 13: k2 ^= ((long) data[tailIndex + 12] & 0xff) << 32;
            case 12: k2 ^= ((long) data[tailIndex + 11] & 0xff) << 24;
            case 11: k2 ^= ((long) data[tailIndex + 10] & 0xff) << 16;
            case 10: k2 ^= ((long) data[tailIndex + 9] & 0xff) << 8;
            case 9:  k2 ^= ((long) data[tailIndex + 8] & 0xff);
                     k2 *= C2;
                     k2 = Long.rotateLeft(k2, 33);
                     k2 *= C1;
                     h2 ^= k2;
            case 8:  k1 ^= ((long) data[tailIndex + 7] & 0xff) << 56;
            case 7:  k1 ^= ((long) data[tailIndex + 6] & 0xff) << 48;
            case 6:  k1 ^= ((long) data[tailIndex + 5] & 0xff) << 40;
            case 5:  k1 ^= ((long) data[tailIndex + 4] & 0xff) << 32;
            case 4:  k1 ^= ((long) data[tailIndex + 3] & 0xff) << 24;
            case 3:  k1 ^= ((long) data[tailIndex + 2] & 0xff) << 16;
            case 2:  k1 ^= ((long) data[tailIndex + 1] & 0xff) << 8;
            case 1:  k1 ^= ((long) data[tailIndex] & 0xff);
                     k1 *= C1;
                     k1 = Long.rotateLeft(k1, 31);
                     k1 *= C2;
                     h1 ^= k1;
        }
        // Finalization
        h1 ^= length;
        h2 ^= length;
        h1 += h2;
        h2 += h1;
        h1 = fmix64(h1);
        h2 = fmix64(h2);
        h1 += h2;
        h2 += h1;
        return new long[]{h1, h2};
    }
    private static long fmix64(long k) {
        k ^= k >>> 33;
        k *= 0xff51afd7ed558ccdL;
        k ^= k >>> 33;
        k *= 0xc4ceb9fe1a85ec53L;
        k ^= k >>> 33;
        return k;
    }
    public static void main(String[] args) {
        String input = "Hello, world!";
        byte[] data = input.getBytes(StandardCharsets.UTF_8);
        long[] hash = hash128(data, 0);
        System.out.println("Input: '" + input + "'");
        System.out.println("128-bit Hash (Self-implemented):");
        System.out.println("h1: " + Long.toHexString(hash[0]));
        System.out.println("h2: " + Long.toHexString(hash[1]));
        // Compare with Guava's output to verify
        // Guava output for "Hello, world!" with seed 0: 5d6b65f8c5a6a0e6b8d3a1c2e4f6a8b1
        // This self-implemented version should match.
    }
}

Summary and Recommendation

Feature Recommendation Why
Which version? MurmurHash3 It's the current standard, more robust, and has better distribution properties than MurmurHash2.
Which library? Google Guava It's the most common, well-tested, and easy-to-use solution. It's a dependency you likely already have.
When to self-implement? Almost never. For learning or in extreme environments where adding a dependency is impossible. The library versions are faster and more reliable.
分享:
扫描分享到社交APP
上一篇
下一篇