I have a nested data structure which is stored on both host and device. I would like to copy the relevant inner field from host to device. Assume I have done all the allocations correct. Then, I would need the address of the innermost member on the device side (which I am obtaining via a kernel launch) and then storing this address into a dummy variable (which I am doing via cudaGetSymbolAddress) and then performing the copy (through cudaMemcpy). However, it doesn’t seem to work. The following is a snippet of the code:
__device__ int* d2;
__global__ void getLIO(EdgeSentinel** d_vd, unsigned int c)
{
d2 = &(d_vd[c]->last_insert_block->last_insert_offset);
printf("d2 contains = %pn", d2);
}
int main()
{
// Assume rest of allocations of the structure d_vd (on device) and vd (on host) are done right.
// Roughly, structure vd contains a field called last_insert_block which is a pointer to another structure which contains a field called last_insert_offset which is an int.
unsigned int c, k;
// Assume c and k are initialized to some values.
getLIO<<<1,1>>>(d_vd, c);
cudaDeviceSynchronize();
int* dummy;
cudaGetSymbolAddress((void**)&dummy, d2);
printf("Dummy contains = %pn", dummy);
cudaMemcpy(dummy, &(vd[k]->last_insert_block->last_insert_offset), sizeof(int), cudaMemcpyHostToDevice);
}
In the above snippet, when I print the addresses for debugging, I notice that what d2 contains is not the same as the value contained in dummy. So I am not sure what cudaGetSymbolAddress does at all. How does it work?
I am using this as a workaround, because I cannot directly use cudaMemcpy to copy the nested field from host to device, since this requires me to dereference the structure on the device side, which isn’t allowed on the host; i.e., cudaMemcpy(&(d_vd[c]->last_insert_block->last_insert_offset), &(vd[k]->last_insert_block->last_insert_offset), sizeof(int), cudaMemcpyHostToDevice);
won’t work since I am dereferencing d_vd which is a device pointer. So if not cudaGetSymbolAddress, what other way is there to achieve this?