Does CUDA unified memory solve data movement issues on newer GPUs? How does CUDA unified memory handle data movement?