Hello World
Now, our first exercise would be to print out the Hello World from GPU. To do that, we need to do the following things:
- Run a part or entire application on the GPU
- Call the CUDA function on a device
- It should be called using function qualifier
__global__
- Calling the device function on the main program:
- C/C++ example,
c_function()
- CUDA example,
cuda_function<<<1,1>>>()
(just using 1 thread) <<< >>>
, specify the threads blocks within the bracket- Make sure to synchronize the threads
__syncthreads()
synchronizes all the threads within a thread blockCudaDeviceSynchronize()
synchronizes a kernel call in host- Most of the CUDA APIs are synchronized calls by default (but sometimes it is good to call explicit synchronized calls to avoid errors in the computation)
Questions and Solutions¶
Examples: Hello World
//-*-C++-*-
// Hello-world.cu
#include<stdio.h>
#include<cuda.h>
// device function will be executed on device (GPU)
__global__ void cuda_function()
{
printf("Hello World from GPU!\n");
// synchronize all the threads
__syncthreads();
}
int main()
{
// call the kernel function
cuda_function<<<1,1>>>();
// synchronize the device kernel call
cudaDeviceSynchronize();
return 0;
}
Compilation and Output
Questions
Right now, you are printing just one Hello World from GPU
, but what if you would like to print more Hello World from GPU
? How can you do that?