Wrapping a pointer to device-side memory that was returned from a pybind11 module in a PyTorch Tensor
I have a C++ library that runs kernels on GPUs (it uses Kokkos). I would like to expose this in python and couple it with PyTorch, so I am using pybind11.