My problem seems very simple but I have been stuck on it for a while now.
I am simply trying to fit a 2D histogram with a sum of 2D gaussians.
As nothing seems to work I went back to basics and am trying to fit a 2D gaussian distribution with a 2D gaussian.
What I notice is if I fit artificially generated noisy data I do not get the same result as when sampling a gaussian distribution and bining it in a 2D histogram. The latter results in a very poor quality fit even with very good initial guess parameters (much better than what I would have in my real application problem).
Here is the example code :
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def twoD_Gaussian(xy, amplitude, xo, yo, sigma_x, sigma_y, rho):
x, y = xy
xo = float(xo)
yo = float(yo)
a = 1 / sigma_x**2
b = rho / sigma_x / sigma_y
c = 1 / sigma_y**2
g = amplitude * np.exp( - (1/(2*(1-rho**2)))*(a*((x-xo)**2) - 2*b*(x-xo)*(y-yo) + c*((y-yo)**2)))
return g.ravel()
# Produce a number of points in x-y from 1 distribution.
mean = [0,0]
cov = [[3,1.5],[1.5,1]]
N = int(1e6)
x,y = np.random.multivariate_normal(mean,cov,N).T
nbr_bins = 100 #int(2 * N ** (1/3))
print("numbins", nbr_bins)
# Produce 2D histogram
H,xedges,yedges = np.histogram2d(x,y,bins=nbr_bins)
bin_centers_x = (xedges[:-1]+xedges[1:])/2.0
bin_centers_y = (yedges[:-1]+yedges[1:])/2.0
X,Y = np.meshgrid(bin_centers_x, bin_centers_y)
data = twoD_Gaussian((X, Y), H.max(), mean[0], mean[1], np.sqrt(cov[0][0]), np.sqrt(cov[1][1]), cov[0][1]/(np.sqrt(cov[0][0])*np.sqrt(cov[1][1])))
data_noisy = data + 0.2*np.random.normal(size=data.shape)
# Initial Guess
p0 = (H.max(),mean[0],mean[1],1.7,1,0.9)
# Curve Fit parameters with histo
coeff, var_matrix = curve_fit(twoD_Gaussian,(X,Y),H.ravel(),p0=p0)
# Curve fit on noisy data
popt, pcov = curve_fit(twoD_Gaussian, (X, Y), data_noisy, p0=p0)
print('hist fit', coeff)
print('noisy data fit', popt)
data_fitted_hist = twoD_Gaussian((X, Y), *coeff)
data_fitted_noise = twoD_Gaussian((X, Y), *popt)
# Calculate the extent of the plots
extent = [X.min(), X.max(), Y.min(), Y.max()]
# Plot the hist fit
plt.hist2d(x, y, bins = nbr_bins, cmap = 'terrain_r')
plt.contour(X, Y, data_fitted_hist.reshape(nbr_bins, nbr_bins), 5)
plt.scatter(0, 0, marker='x', s=100, color='r')
plt.xlim(extent[0], extent[1]) # Set the x-axis limits
plt.ylim(extent[2], extent[3]) # Set the y-axis limits
plt.gca().set_aspect('equal', adjustable='box') # Set aspect ratio to be equal
plt.show()
# Plot the noisy data fit
fig, ax = plt.subplots(1, 1)
ax.scatter(0, 0, marker='x', s=100, color='w')
ax.imshow(data_noisy.reshape(nbr_bins, nbr_bins), cmap=plt.cm.jet, origin='lower', extent=extent)
ax.contour(X, Y, data_fitted_noise.reshape(nbr_bins, nbr_bins), 8, colors='w')
ax.set_xlim(extent[0], extent[1]) # Set the x-axis limits
ax.set_ylim(extent[2], extent[3]) # Set the y-axis limits
ax.set_aspect('equal', adjustable='box') # Set aspect ratio to be equal
plt.show()
And here are the result plots :
One can clearly see that the top fit (the one using the histogram) is incorrect as it is not centered (it’s sigma parameters are also different from one fit to the other).
I am wondering if curvefit might be the wrong tool for such a fit ? Or am I doing anything wrong ?
P.S. : I don’t know if this can be a clue or not but on my real problem application one of the problems with the fit is that it tries to make the gaussians amplitude as small as it can (tends towards 0 or lower bound if I impose bounds).
Owen Syrett is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.