Hello World
Our first exercise would be to print out "Hello World" from the GPU. To do that, we need to do the following things:
- Run a part or the entire application on the GPU.
- Call the CUDA function on a device.
- It should be called using the function qualifier
__global__
. - Call the device function in the main program:
- C/C++ example,
c_function()
. - CUDA example,
cuda_function<<<1,1>>>()
(just using 1 thread). <<< >>>
, specify the thread blocks within the brackets.- Make sure to synchronize the threads.
__syncthreads()
synchronizes all the threads within a thread block.CudaDeviceSynchronize()
synchronizes a kernel call on the host.- Most of the CUDA APIs are synchronized calls by default, but sometimes it is good to call explicit synchronized calls to avoid errors in the computation.
Questions and Solutions¶
Examples: Hello World
//-*-C++-*-
// Hello-world.cu
#include<stdio.h>
#include<cuda.h>
// device function will be executed on device (GPU)
__global__ void cuda_function()
{
printf("Hello World from GPU!\n");
// synchronize all the threads
__syncthreads();
}
int main()
{
// call the kernel function
cuda_function<<<1,1>>>();
// synchronize the device kernel call
cudaDeviceSynchronize();
return 0;
}
Compilation and Output
Questions
Right now, you are printing just one Hello World from GPU
, but what if you would like to print more Hello World from GPU
? How can you do that?