We have a thrust/equal
algorithm test code likes(Just curious):
thrust::host_vector<T> h_data1 = unittest::random_samples<T>(n);
thrust::host_vector<T> h_data2 = unittest::random_samples<T>(n);
thrust::device_vector<T> d_data1 = h_data1;
thrust::device_vector<T> d_data2 = h_data2;
//empty ranges
ASSERT_EQUAL(thrust::equal(h_data1.begin(), h_data1.begin(), h_data1.begin()), true);
ASSERT_EQUAL(thrust::equal(d_data1.begin(), d_data1.begin(), d_data1.begin()), true);
//symmetric cases
ASSERT_EQUAL(thrust::equal(h_data1.begin(), h_data1.end(), h_data1.begin()), true);
ASSERT_EQUAL(thrust::equal(d_data1.begin(), d_data1.end(), d_data1.begin()), true);
And I wonder that how to implement the thrust/equal
algorithm, especially the implementation path from host ot NVIDIA GPU.
When I using Debug tool to single debug the code, and got the complete call backtrace as follows:
#0 thrust::cuda_cub::equal<thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::equal_to<signed char> > (
policy=..., first1=..., last1=..., first2=...) at /home/DengWW/cuda-11.8/include/thrust/system/cuda/detail/equal.h:47
#1 thrust::cuda_cub::equal<thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::detail::normal_iterator<thrust::device_ptr<signed char> > > (policy=..., first1=...,
last1=..., first2=...) at /home/DengWW/cuda-11.8/include/thrust/system/cuda/detail/equal.h:58
#2 thrust::equal<thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::detail::normal_iterator<thrust::device_ptr<signed char> > > (system=..., first1=..., last1=...,
first2=...) at /home/DengWW/cuda-11.8/include/thrust/detail/equal.inl:37
#3 0x0000555555574bed in thrust::equal<thrust::detail::normal_iterator<thrust::device_ptr<signed char> >, thrust::detail::normal_iterator<thrust::device_ptr<signed char> > > (first1=..., last1=..., first2=...)
at /home/DengWW/cuda-11.8/include/thrust/detail/equal.inl:63
#4 0x000055555556d4d7 in TestEqual<signed char> (n=0) at /home/UMD/HYCUDATestSuits/thrust/tests/equal.cu:42
#5 0x000055555556d1d3 in TestEqualUnitTest::run (this=0x555555deb4c0 <TestEqualInstance>) at /home/UMD/HYCUDATestSuits/thrust/tests/equal.cu:64
#6 0x000055555562c169 in UnitTestDriver::run_tests (this=this@entry=0x555555deb638 <driver_instance(thrust::cuda_cub::tag)::s_instance>, tests_to_run=..., kwargs=...)
at /home/UMD/HYCUDATestSuits/thrust/unittest/testframework.cu:303
#7 0x000055555562d104 in UnitTestDriver::run_tests (this=0x555555deb638 <driver_instance(thrust::cuda_cub::tag)::s_instance>, args=..., kwargs=...) at /home/UMD/HYCUDATestSuits/thrust/unittest/testframework.cu:409
#8 0x0000555555631fde in CUDATestDriver::run_tests (this=0x555555deb638 <driver_instance(thrust::cuda_cub::tag)::s_instance>, args=..., kwargs=...) at /home/UMD/HYCUDATestSuits/thrust/unittest/cuda/testframework.cu:191
#9 0x000055555562d6a0 in main (argc=<optimized out>, argv=0x7fffffffe178) at /home/UMD/HYCUDATestSuits/thrust/unittest/testframework.cu:514
From the backtrace, we can see the complete call stack. But my question is how the #2
call stack point stack into #1
? The complete implementation of #2
function is:
#include <thrust/detail/config.h>
#include <thrust/equal.h>
#include <thrust/iterator/iterator_traits.h>
#include <thrust/system/detail/generic/select_system.h>
#include <thrust/system/detail/generic/equal.h>
#include <thrust/system/detail/adl/equal.h>
THRUST_NAMESPACE_BEGIN
__thrust_exec_check_disable__
template<typename System, typename InputIterator1, typename InputIterator2>
__host__ __device__
bool equal(const thrust::detail::execution_policy_base<System> &system, InputIterator1 first1, InputIterator1 last1, InputIterator2 first2)
{
using thrust::system::detail::generic::equal;
return equal(thrust::detail::derived_cast(thrust::detail::strip_const(system)), first1, last1, first2);
} // end equal()
THRUST_NAMESPACE_END
The #1
function located in cuda_cub
namespace, and #2
function does not using any using namespace thrust::cuda_cub
instruction to tell compiler. From using thrust::system::detail::generic::equal;
instruction, we can see the #1
functions is implemented in thrust/system/detail/generic/equal.h
header, not the thrust/system/cuda/detail/equal.h
.
So, I have the following questions:
- whether the backtrace is correct?
- if the backtrace is correct, how
#2
call stack into#1
by notusing namespace thrust::cuda_cub
instruction? - what the complete implementation path from Host API to NVIDIA Device GPU, such as the reduce, scan, sort algorithms?