I want to implement the inference of onnx model in my own C code,but in some layers,the result between C and ONNX has 1 error, such as C is 40 but onnx is 41.
I want to know why numpy’s result is -87 but onnx is -88 ? ?
In Quant model inference, an error of 1 is fatal!The cumulative error through many layers can reach 4-5 (in 8-bit integers)
Thank u :>
the test code below⬇
onnxruntime==1.19.2 python==3.8
import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np
import onnxruntime as ort
A = 'A'
B = 'B'
C = 'C'
A_scale = 0.008010663092136383
A_zero_point = 7
B_scale = 0.00622713053599
B_zero_point = -128
C_scale = 0.006873490754514933
C_zero_point = -128
input_A = helper.make_tensor_value_info(A, TensorProto.INT8, [1, 1, 1, 1])
input_B = helper.make_tensor_value_info(B, TensorProto.INT8, [1, 1, 1, 1])
output = helper.make_tensor_value_info(C, TensorProto.INT8, [1, 1, 1, 1])
initializer_A_scale = numpy_helper.from_array(np.array(A_scale, dtype=np.float32), name='A_scale')
initializer_A_zero_point = numpy_helper.from_array(np.array(A_zero_point, dtype=np.int8), name='A_zero_point')
initializer_B_scale = numpy_helper.from_array(np.array(B_scale, dtype=np.float32), name='B_scale')
initializer_B_zero_point = numpy_helper.from_array(np.array(B_zero_point, dtype=np.int8), name='B_zero_point')
initializer_C_scale = numpy_helper.from_array(np.array(C_scale, dtype=np.float32), name='C_scale')
initializer_C_zero_point = numpy_helper.from_array(np.array(C_zero_point, dtype=np.int8), name='C_zero_point')
qlinear_add_node = helper.make_node(
'QLinearAdd',
inputs=[A, 'A_scale', 'A_zero_point', B, 'B_scale', 'B_zero_point', 'C_scale', 'C_zero_point'],
outputs=[C],
name='QLinearAdd',
domain='com.microsoft'
)
opset_version_ai_onnx = 13
opset_version_com_microsoft = 1
graph = helper.make_graph(
nodes=[qlinear_add_node],
name='QLinearAdd_Graph',
inputs=[input_A, input_B],
outputs=[output],
initializer=[
initializer_A_scale,
initializer_A_zero_point,
initializer_B_scale,
initializer_B_zero_point,
initializer_C_scale,
initializer_C_zero_point
]
)
model = helper.make_model(graph, producer_name='onnx-qlinearadd-fixed-params',
opset_imports=[ helper.make_opsetid(domain='ai.onnx', version=opset_version_ai_onnx),
helper.make_opsetid(domain='com.microsoft', version=opset_version_com_microsoft)])
onnx.save(model, 'qlinearadd_fixed_params_model.onnx')
print("ONNX MODEL save 'qlinearadd_fixed_params_model.onnx'")
A_int8 = np.array([-8], dtype=np.int8)
B_int8 = np.array([-64], dtype=np.int8)
A_real = A_scale * (A_int8.astype(np.int32) - A_zero_point)
B_real = B_scale * (B_int8.astype(np.int32) - B_zero_point)
C_real = A_real + B_real
A1 = A_scale *(A_int8 - A_zero_point)
B1 = B_scale*(B_int8 - B_zero_point)
print((A1+B1) / C_scale + C_zero_point )
C_int32 = np.round(C_real / C_scale) + C_zero_point
C_int8 = C_int32.astype(np.int8)
print(C_int8)
session = ort.InferenceSession('qlinearadd_fixed_params_model.onnx')
output_name = session.get_outputs()[0].name
A_data = np.array([-8], dtype=np.int8).reshape([1, 1, 1, 1])
B_data = np.array([-64], dtype=np.int8).reshape([1, 1, 1, 1])
input_dict = {
'A': A_data,
'B': B_data
}
outputs = session.run([output_name], input_dict)
C_output = outputs[0]
print("output C:", C_output)
the test code result is:
ONNX MODEL save 'qlinearadd_fixed_params_model.onnx'
[-87.49999529]
[-87]
output C: [[[[-88]]]]
but i think -87 is what i want to get.
Chiourain Soong is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.