I have two different datasets, say A of dimension (N1,3) and B(N2,3).
I want to identify the rows in A which are approximately equal to some rows in B with tolerances tol = [tol1, tol2, tol3].
In MatLab, a very efficient function does so: ismembertol which returns the row indices of B in idx (in the form of cells) for which a full row in A approximately equals a full row in B within tol
ismatch, idx = ismembertol(A,B, ‘DataScale’, 1, ‘ByRow’, true, ‘OutputAllIndices’, true)
Again, it is very fast in MatLab, taking a few seconds for large datasets (typically, N1 and N2 are of the order of 1 million).
I could not find out any comparable implementation in Julia, knowing that even using multiple threads would be useless to process datasets that large.
Naively, I first used this basic implementation which returns the expected results but performs very bad :
# Function to check if two rows are within given tolerances
function row_within_tolerances(row1, row2, tolerances)
return all(abs.(row1 .- row2) .<= tolerances)
end
# Define tolerances for each dimension
tolerances = [atol1, atol2, atol3]
# Initialize logical array and cell array to store results
tf = falses(size(A, 1))
loc = Vector{Vector{Int}}(undef, size(A, 1))
# Check each row of A against all rows of B
for i in 1:size(A, 1)
matching_indices = Int[]
for j in 1:size(B, 1)
if row_within_tolerances(A[i, :], B[j, :], tolerances)
push!(matching_indices, j)
end
end
if !isempty(matching_indices)
tf[i] = true
loc[i] = matching_indices
else
loc[i] = Int[]
end
end
# Display the results
println("Logical array indicating matches:")
println(tf)
println("Cell array of indices in B that match each row in A:")
println(loc)
green20770 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.