I have two classes
class A{
int id;
String name;
public boolean equals(Object o)
{
if(o instanceof A) {
A a=(A)o;
if(a.getId().equals(this.getId()))
return true;
}
return false;
}
public int hashCode() { return id;}
//setter& getter
}
class B{
int id;
String address;
public boolean equals(Object o){
if(o instanceof B)
{
B b=(B)o;
if(b.getId().equals(this.getId()))
return true;
}
return false;
}
public int hashCode()
{ return id;}
//setter& getter
}
I have 100,000 A type objects and 100,000 B type objects.
So that, I have eliminated duplicates in both classes using HashSet.
Now i am comparing HashSet<A>
and HashSet<B>
with id field and place matched objects in another list with following code in main class..
HashSet<A> A_Set=new HashSet<>();
HashSet<B> B_Set=new HashSet<>();
for (A c1 : A_Set) {
for (B c2 : B_Set) {
if (c1.getId().equals(c2.getIid())) {
matchedData.add(c1);
}
}
}
the above code taking 15 minutes to compare 100,000 records…Is there any solution to increase performance of code.. (with in less time)
4
You have two sets as
and bs
. You want to calculate the set cs
such that it contains all the elements from set A
which have an ID which is the same as that of any object in set cs
. You are currently using this nested loop:
Set<A> as = ...;
Set<B> bs = ...;
Set<A> cs = new HashSet<>();
for (A a : as) {
for (B b : bs) {
if (a.getId() == b.getId())
cs.add(a);
}
}
This takes quite long because you loop through all elements of the set bs
. It has algorithmic complexity O(|as| · |bs|)
, where |x|
is the size of set x
.
We could apply one simple optimization: Once we’ve found a matching element in set bs
, we add the current a
to cs
and continue with the next element from as
. We do not search for further matches in bs
, as adding a matching element again doesn’t change the result set:
for (A a : as) {
for (B b : bs) {
if (a.getId() == b.getId()) {
cs.add(a);
break;
}
}
}
While this should be quite a bit faster, this still has O(|as| · |xs|)
complexity.
We can do better. For example, we could sort all elements by their ID (a one-time O(n log n)
cost) in ascending order, and iterate over them using an O(n)
algorithm that skips elements as long as they are larger than the current element from the other sequence. This is better, but still not optimal.
The optimal solution is to create a hash set of IDs of the bs
set. This does not require sorting of both sets, and allows linear-time membership test. There is a one-time O(n)
cost to assemble the set of IDs.
HashSet<Integer> bIds = new HashSet<>(bs.size());
for (B b : bs)
bIDs.add(b.getId());
for (A a : as)
if (bIds.contains(a.getId()))
cs.add(a);
The total complexity of this solution is O(|as| + |bs|)
. In other words, this should run roughly 100,000 times faster.
1