I was working on the Leetcode problem “Reverse Vowels of a String.”
Here is my first solution:
class Solution {
public String reverseVowels(String s) {
String vowels = "aeiouAEIOU";
char[] arr = s.toCharArray();
int length = s.length();
int i = 0;
int j = length - 1;
while(i < j & i < length & j > 0) {
while(i < j && vowels.indexOf(arr[i]) == -1) {
i ++;
}
while(j > i && vowels.indexOf(arr[j]) == -1) {
j --;
}
char temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
i ++;
j --;
}
return String.valueOf(arr);
}
}
Since vowels.indexOf() has a time complexity of O(n), where n is the length of the vowels string, I decided to use a faster lookup data structure. Therefore, I implemented a HashSet as follows:
class Solution {
public String reverseVowels(String s) {
Set<Character> set = new HashSet<>(Arrays.asList('A','a','E','e','I','i','O','o','U','u'));
char[] arr = s.toCharArray();
int length = s.length();
int i = 0;
int j = length - 1;
while(i < j & i < length & j > 0) {
while(i < j && !set.contains(arr[i])) {
i ++;
}
while(i < j && !set.contains(arr[j])) {
j --;
}
char temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
i ++;
j --;
}
return String.valueOf(arr);
}
}
My understanding is that constructing the HashSet with n characters has a time complexity of O(n), and each lookup is O(1). I expected this version to be faster. However, on Leetcode, the first solution finished in 3 ms, while the second one finished in 5 ms.
Why is the second one slower? What is the time complexity of constructing a HashSet with n elements, and how does it affect runtime in this case?
2
Constructing a HashSet does have O(n) performance, but you can’t use the Big O notation to predict the runtime between different algorithms. You can’t even use it to predict the runtime of add()
and contain()
in HashSet itself. From a benchmark in Baeldung
Benchmark | 1000 | 10,000 | 100,000 |
---|---|---|---|
.add() | 0.026 | 0.023 | 0.024 |
.remove() | 0.009 | 0.009 | 0.009 |
.contains() | 0.009 | 0.009 | 0.010 |
Notice how both add()
and contains()
are O(1) (constant runtime regardless of collection size), but their runtime is different.
For very small input, the constant (not included in the Big O notation) overhead of a more complex algorithm likely exceeds the time saving for each operation. You’ll only notice the difference once you crank the number to millions.
1
The second solution is slower despite using a HashSet
for these reasons:
HashSet Overhead: Constructing the HashSet
and calculating hash codes introduces extra overhead, even though lookups are O(1).
Small Set Size: The indexOf()
in the first solution scans a small, constant set of 10 vowels, which is efficient. The linear search through a short string has minimal overhead.
Cache Efficiency: indexOf()
operates on a small contiguous string, making it more cache-friendly than a HashSet
, which has more complex memory access patterns.
Constant Factors: The overhead of HashSet
operations (hashing, collision handling) can outweigh the benefits for such a small data set.
In summary, while HashSet
has better theoretical complexity, the small set size and overhead make indexOf()
faster in practice for this problem.