I am trying to figure out a way to separate systematic errors from a set of datasets, representing position and position errors of a mechanical stage. The data is in pandas dataframe.
Background: A stage moves in a 2D plan. Measurements reports X and Y coordinates of the stage position as well as positional errors in X and Y directions. There is a 1nm apart grid of marks on the 2D plane and the stage moves to these marks smoothly and the measurement only taken at those marks. Errors are computed by a algo by subtracting true position of these marks and the measured stage position.
Data Format: Here Skew means error.
StageCoords_X | StageCoords_Y | Skew_X | Skew_Y |
---|---|---|---|
118760606 | 112836409 | -29 | -45 |
118760622 | 112836426 | -18 | 5 |
**Data Preprocess: **I am working in nm scale and each time I sweep through the same ‘mark’ location, the measured stage location will vary slightly around these ‘mark’ locations. Thats why I would like to bin the data in both X and Y direction, so that each rectangular bin (combining X and Y axis bin width) will cover data points intended for a specific mark. I can compute average and peak-to-peak errors for each bin and plot them.
# Define bin edges based on bin width
bin_width = 1000000
x_bins = np.arange(df['StageCoordsNM.X'].min(), df['StageCoordsNM.X'].max() + bin_width, bin_width)
y_bins = np.arange(df['StageCoordsNM.Y'].min(), df['StageCoordsNM.Y'].max() + bin_width, bin_width)
df['X_bin'] = pd.cut(df['StageCoordsNM.X'], bins=x_bins, labels=False)
df['Y_bin'] = pd.cut(df['StageCoordsNM.Y'], bins=y_bins, labels=False)
What I want to do: Now lets say, I have swept the stage through the same area 25 times. After binning in both x and y direction, I will have 25 datapoints for each bin, means 25 measure stage position errors for each bin. Now for each bin, I want to extract the systematic or repeatable component of the error. 25 datapoints for each bin might not be enough to make any solid conclusion. So using Machine Learning is difficult. I want to figure out a more statistical method to do this.
What I did So Far: Computed a “Normalized Weighted Adjusted Repeatability Index”. this suppose to tell how confident I am for a particular binned error to be repeatable or not. Ignore the ‘_before’ subscript.
#*__Pk-to-Pk X and Y__
df['Error_pk2pk_X'] = df.groupby(['X_bin', 'Y_bin'])['SkewNM.X'].transform(lambda x: x.max() - x.min())
df['Error_pk2pk_Y'] = df.groupby(['X_bin', 'Y_bin'])['SkewNM.Y'].transform(lambda x: x.max() - x.min())
df['Mean_SkewNM_X_before'] = df.groupby(['X_bin', 'Y_bin'])['SkewNM.X'].transform(lambda x: x.mean())
df['Mean_SkewNM_Y_before'] = df.groupby(['X_bin', 'Y_bin'])['SkewNM.Y'].transform(lambda x: x.mean())
def compute_confidence_X_before(group):
group = group.reset_index(drop=True)
group['WARI_x_before'] = (w_a * np.abs(group['Mean_SkewNM_X_before'].mean()) ) / ( w_p * (np.abs(group['Error_pk2pk_X'].mean()) + epsilon) )
group['NWARI_x_before'] = (group['WARI_x_before']) / (1+group['WARI_x_before'])
group['Confidence_X_before'] = ( np.abs(group['Mean_SkewNM_X_before'].mean()) - np.abs(group['Error_pk2pk_X'].mean()) ) / np.abs(group['SkewNM.X'].mean())
group['Confidence_X_before'] = group['Confidence_X_before'].apply(lambda x: max(x, -1))
return group
def compute_confidence_Y_before(group):
group = group.reset_index(drop=True)
group['WARI_y_before'] = (w_a * np.abs(group['Mean_SkewNM_Y_before'].mean()) ) / ( w_p * (np.abs(group['Error_pk2pk_Y'].mean()) + epsilon) )
group['NWARI_y_before'] = (group['WARI_y_before']) / (1+group['WARI_y_before'])
group['Confidence_Y_before'] = ( np.abs(group['Mean_SkewNM_Y_before'].mean()) - np.abs(group['Error_pk2pk_Y'].mean()) ) / np.abs(group['SkewNM.Y'].mean())
group['Confidence_Y_before'] = group['Confidence_Y_before'].apply(lambda x: max(x, -1))
return group
df = df.groupby(['X_bin', 'Y_bin'], group_keys=False).apply(compute_confidence_X_before)
df = df.groupby(['X_bin', 'Y_bin'], group_keys=False).apply(compute_confidence_Y_before)
df['NWARI_X_before'] = df['NWARI_x_before']
df['NWARI_Y_before'] = df['NWARI_y_before']
But I am not sure if this is a correct way to do this. Or if there is another way to verify this.
Any suggestions?