I want to compute pairwise quantities, e.g. distances between two points.
A simple example would be
import numpy as np
N = 9
x = np.linspace(0,1,N)
y = np.abs(x - x[:,None]) # pairwise 1d eucdlidian distance
This will result in an (N,N)
array, that holds the distances from every element in x
to every other element in x
.
However, one can discard the lower (or upper) triangle of y
, since the distance from x[0]
to x[1]
is the same as the distance from x[1]
to x[0]
.
Can anyone think of a way of never computing the lower (or upper) triangle in the first place? Ideally this would generalize to Nd arrays, e.g.
x = np.linspace(0,1,N)
y = x + (1j * x)[:,None] # complex plane
z = np.abs(y[:,:,None,None] - y[None,None,:,:]) # pairwise 2d euclidian distance
and any pairwise quantity, e.g.
y = np.random.randint(0,2,(N,N))
z = y[:,:,None,None] == y[None,None,:,:] # pairwise equality
Thanks in advance.
PS: For anyone interested, this came up when I tried implementing an energy function for 2d simulated annealing.
1
Manually
You could generate the pair of indices with itertools.combinations
and manually compute the distances:
from itertools import combinations
a, b = map(list, zip(*combinations(range(len(x)), 2)))
tri = abs(x[a]-x[b])
Output:
array([0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 1. , 0.125,
0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 0.125, 0.25 , 0.375,
0.5 , 0.625, 0.75 , 0.125, 0.25 , 0.375, 0.5 , 0.625, 0.125,
0.25 , 0.375, 0.5 , 0.125, 0.25 , 0.375, 0.125, 0.25 , 0.125])
Which are the distance between the indices:
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (1, 2),
(1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (2, 3), (2, 4), (2, 5),
(2, 6), (2, 7), (2, 8), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (4, 5),
(4, 6), (4, 7), (4, 8), (5, 6), (5, 7), (5, 8), (6, 7), (6, 8), (7, 8)]
Using scipy.spatial.distance.pdist
This is exactly what pdist
is doing:
from scipy.spatial.distance import pdist, squareform
tri = pdist(x[:, None]) # one needs a 2D input
Output:
array([0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 1. , 0.125,
0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 0.125, 0.25 , 0.375,
0.5 , 0.625, 0.75 , 0.125, 0.25 , 0.375, 0.5 , 0.625, 0.125,
0.25 , 0.375, 0.5 , 0.125, 0.25 , 0.375, 0.125, 0.25 , 0.125])
NB. the order of the pairs is the same as in the itertools.combinations
approach.
If you want to get the squareform
:
squareform(tri)
array([[0. , 0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 1. ],
[0.125, 0. , 0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875],
[0.25 , 0.125, 0. , 0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 ],
[0.375, 0.25 , 0.125, 0. , 0.125, 0.25 , 0.375, 0.5 , 0.625],
[0.5 , 0.375, 0.25 , 0.125, 0. , 0.125, 0.25 , 0.375, 0.5 ],
[0.625, 0.5 , 0.375, 0.25 , 0.125, 0. , 0.125, 0.25 , 0.375],
[0.75 , 0.625, 0.5 , 0.375, 0.25 , 0.125, 0. , 0.125, 0.25 ],
[0.875, 0.75 , 0.625, 0.5 , 0.375, 0.25 , 0.125, 0. , 0.125],
[1. , 0.875, 0.75 , 0.625, 0.5 , 0.375, 0.25 , 0.125, 0. ]])
ND-arrays
NB. This does not generalize directly to ND arrays, but might be able to temporarily reshape to 2D depending on the exact use case.
In your second example, you could use:
# pairwise distances
tri = pdist(y.reshape(-1, 1))
# square form
out = squareform(tri).reshape(9, 9, 9, 9)
Unfortunately, pdist
doesn’t seem to handle complex data.
1