How can I optimize the performance of this numpy function Is there any way optimizing the performance speed of this function?