Hello World
Our first exercise would be to print out "Hello World" from the GPU. To do that, we need to do the following things:
- Run a part or the entire application on the GPU.
- Call the CUDA function on a device.
- It should be called using the function qualifier
. - Call the device function in the main program:
- C/C++ example,
. - CUDA example,
(just using 1 thread). <<< >>>
, specify the thread blocks within the brackets.- Make sure to synchronize the threads.
synchronizes all the threads within a thread block.CudaDeviceSynchronize()
synchronizes a kernel call on the host.- Most of the CUDA APIs are synchronized calls by default, but sometimes it is good to call explicit synchronized calls to avoid errors in the computation.
Questions and Solutions¶
Examples: Hello World
// Hello-world.cu
// device function will be executed on device (GPU)
__global__ void cuda_function()
printf("Hello World from GPU!\n");
// synchronize all the threads
int main()
// call the kernel function
// synchronize the device kernel call
return 0;
Compilation and Output
Right now, you are printing just one Hello World from GPU
, but what if you would like to print more Hello World from GPU
? How can you do that?