Thiết kế website giá rẻ

Question

We have a thrust/equal algorithm test code likes(Just curious):

thrust::host_vector<T>   h_data1 = unittest::random_samples<T>(n);
thrust::host_vector<T>   h_data2 = unittest::random_samples<T>(n);
thrust::device_vector<T> d_data1 = h_data1;
thrust::device_vector<T> d_data2 = h_data2;

//empty ranges
ASSERT_EQUAL(thrust::equal(h_data1.begin(), h_data1.begin(), h_data1.begin()), true);
ASSERT_EQUAL(thrust::equal(d_data1.begin(), d_data1.begin(), d_data1.begin()), true);
    
//symmetric cases
ASSERT_EQUAL(thrust::equal(h_data1.begin(), h_data1.end(), h_data1.begin()), true);
ASSERT_EQUAL(thrust::equal(d_data1.begin(), d_data1.end(), d_data1.begin()), true);

And I wonder that how to implement the thrust/equal algorithm, especially the implementation path from host ot NVIDIA GPU.

When I using Debug tool to single debug the code, and got the complete call backtrace as follows:

#0  thrust::cuda_cub::equal<thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::equal_to<signed char> > (
    policy=..., first1=..., last1=..., first2=...) at /home/DengWW/cuda-11.8/include/thrust/system/cuda/detail/equal.h:47
#1  thrust::cuda_cub::equal<thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::detail::normal_iterator<thrust::device_ptr<signed char> > > (policy=..., first1=..., 
    last1=..., first2=...) at /home/DengWW/cuda-11.8/include/thrust/system/cuda/detail/equal.h:58
#2  thrust::equal<thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::detail::normal_iterator<thrust::device_ptr<signed char> > > (system=..., first1=..., last1=..., 
    first2=...) at /home/DengWW/cuda-11.8/include/thrust/detail/equal.inl:37
#3  0x0000555555574bed in thrust::equal<thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::detail::normal_iterator<thrust::device_ptr<signed char> > > (first1=..., last1=..., first2=...)
    at /home/DengWW/cuda-11.8/include/thrust/detail/equal.inl:63
#4  0x000055555556d4d7 in TestEqual<signed char> (n=0) at /home/UMD/HYCUDATestSuits/thrust/tests/equal.cu:42
#5  0x000055555556d1d3 in TestEqualUnitTest::run (this=0x555555deb4c0 <TestEqualInstance>) at /home/UMD/HYCUDATestSuits/thrust/tests/equal.cu:64
#6  0x000055555562c169 in UnitTestDriver::run_tests (this=this@entry=0x555555deb638 <driver_instance(thrust::cuda_cub::tag)::s_instance>, tests_to_run=..., kwargs=...)
    at /home/UMD/HYCUDATestSuits/thrust/unittest/testframework.cu:303
#7  0x000055555562d104 in UnitTestDriver::run_tests (this=0x555555deb638 <driver_instance(thrust::cuda_cub::tag)::s_instance>, args=..., kwargs=...) at /home/UMD/HYCUDATestSuits/thrust/unittest/testframework.cu:409
#8  0x0000555555631fde in CUDATestDriver::run_tests (this=0x555555deb638 <driver_instance(thrust::cuda_cub::tag)::s_instance>, args=..., kwargs=...) at /home/UMD/HYCUDATestSuits/thrust/unittest/cuda/testframework.cu:191
#9  0x000055555562d6a0 in main (argc=<optimized out>, argv=0x7fffffffe178) at /home/UMD/HYCUDATestSuits/thrust/unittest/testframework.cu:514

From the backtrace, we can see the complete call stack. But my question is how the #2 call stack point stack into #1? The complete implementation of #2 function is:

#include <thrust/detail/config.h>
#include <thrust/equal.h>
#include <thrust/iterator/iterator_traits.h>
#include <thrust/system/detail/generic/select_system.h>
#include <thrust/system/detail/generic/equal.h>
#include <thrust/system/detail/adl/equal.h>

THRUST_NAMESPACE_BEGIN

__thrust_exec_check_disable__
template<typename System, typename InputIterator1, typename InputIterator2>
__host__ __device__
bool equal(const thrust::detail::execution_policy_base<System> &system, InputIterator1 first1, InputIterator1 last1, InputIterator2 first2)
{
  using thrust::system::detail::generic::equal;
  return equal(thrust::detail::derived_cast(thrust::detail::strip_const(system)), first1, last1, first2);
} // end equal()

THRUST_NAMESPACE_END

The #1 function located in cuda_cub namespace, and #2 function does not using any using namespace thrust::cuda_cub instruction to tell compiler. From using thrust::system::detail::generic::equal; instruction, we can see the #1 functions is implemented in thrust/system/detail/generic/equal.h header, not the thrust/system/cuda/detail/equal.h.

So, I have the following questions:

whether the backtrace is correct?
if the backtrace is correct, how #2 call stack into #1 by not using namespace thrust::cuda_cub instruction?
what the complete implementation path from Host API to NVIDIA Device GPU, such as the reduce, scan, sort algorithms?

Thiết kế website giá rẻ

Danh mục

What’s the complete call stack of equal algorithm in NVIDIA/thrust?