As we’re dealing with a heavily concurrent system, whenever we bring up instances, they need to create around 2 to 3 million unique IDs, each with a length of 8 characters. We initially considered using ULIDs, but since their output is 26 characters long, we can’t fully rely on them. However, we still want to leverage the uniqueness provided by ULIDs.
To achieve this, we decided to pass the generated ULID into a hash function that returns a hash of the ULID with the desired length. But I couldn’t find a function which does that.
I’ve attached what i tried
AS a first step, I tried to truncate the output given by sha256 to first 8 character.
To test this, I ran 100 threads, and i created 1000 hash output for each thread. AS i thought, i faced some collisions.
If possible, could u guys suggest one hash functions or methods to resolve this issue.
CODE: (Tried with truncating the first 8 character, but not good approach)
package main
import (
"crypto/sha256"
"encoding/hex"
"fmt"
"math/rand"
"sync"
"time"
"github.com/oklog/ulid/v2"
)
var collisionCount int
var collidedIDs []string
var hashSet = make(map[string]bool)
var lock sync.Mutex
func generateHashedULID() (string, string) {
entropy := ulid.Monotonic(rand.New(rand.NewSource(time.Now().UnixNano())), 0)
generatedULID := ulid.MustNew(ulid.Timestamp(time.Now()), entropy).String()
hash := sha256.New()
hash.Write([]byte(generatedULID))
hashedULID := hex.EncodeToString(hash.Sum(nil))[:8]
return generatedULID, hashedULID
}
func worker(results []string, index int, wg *sync.WaitGroup) {
defer wg.Done()
lock.Lock()
for i := 0; i < 1000; i++ {
generatedULID, encodedHash := generateHashedULID()
if _, exists := hashSet[encodedHash]; exists {
collisionCount++
collidedIDs = append(collidedIDs, generatedULID)
} else {
hashSet[encodedHash] = true
}
results[index] = encodedHash
}
lock.Unlock()
}
func main() {
numThreads := 100
results := make([]string, numThreads)
var wg sync.WaitGroup
collisionCount = 0
collidedIDs = []string{}
hashSet = make(map[string]bool)
for i := 0; i < numThreads; i++ {
wg.Add(1)
go worker(results, i, &wg)
}
wg.Wait()
fmt.Printf("Generated Hashes: %vn", results)
fmt.Printf("Number of Unique Hashes: %d length: %dn", len(hashSet), len(results))
fmt.Printf("Number of Collisions: %dn", collisionCount)
if collisionCount > 0 {
fmt.Printf("Collided ULIDs: %vn", collidedIDs)
}
}
OUPUT:
Number of Unique Hashes: 99996 length: 100
Number of Collisions: 4
Collided ULIDs: [01J8P9G2S7QZSZE0YRWBCBFY71 01J8P9G2SAW5HFM1KT5MY2CWEX 01J8P9G2SSJVQ3P6FXVP8SAJ3A 01J8P9G2V36X8Q15H0BM2FM35T]
Chandra Mouliesh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
5