Hello World
Our first exercise would be to print out "Hello World" from the GPU. To do that, we need to do the following things:
- Run a part or the entire application on the GPU.
- Call the CUDA function on a device.
- It should be called using the function qualifier
__global__. - Call the device function in the main program:
- C/C++ example,
c_function(). - CUDA example,
cuda_function<<<1,1>>>()(just using 1 thread). <<< >>>, specify the thread blocks within the brackets.- Ensure that the threads are synchronized.
__syncthreads()synchronizes all the threads within a thread block.CudaDeviceSynchronize()synchronizes a kernel call on the host.- Most of the CUDA APIs are synchronized calls by default, but sometimes it is good to call explicit synchronized calls to avoid errors in the computation.
Questions and Solutions¶
Examples: Hello World
//-*-C++-*-
// Hello-world.cu
#include<stdio.h>
#include<cuda.h>
// device function will be executed on device (GPU)
__global__ void cuda_function()
{
printf("Hello World from GPU!\n");
// synchronize all the threads
__syncthreads();
}
int main()
{
// call the kernel function
cuda_function<<<1,1>>>();
// synchronize the device kernel call
cudaDeviceSynchronize();
return 0;
}
Compilation and Output
Questions
Right now, you are printing just one Hello World from GPU, but what if you would like to print more Hello World from GPU? How can you do that?
Last update: June 28, 2025 20:17:16
Created: March 11, 2023 20:16:27
Created: March 11, 2023 20:16:27