Hello World

Our first exercise would be to print out "Hello World" from the GPU. To do that, we need to do the following things:

Run a part or the entire application on the GPU.
Call the CUDA function on a device.
It should be called using the function qualifier __global__.
Call the device function in the main program:
C/C++ example, c_function().
CUDA example, cuda_function<<<1,1>>>() (just using 1 thread).
<<< >>>, specify the thread blocks within the brackets.
Ensure that the threads are synchronized.
__syncthreads() synchronizes all the threads within a thread block.
CudaDeviceSynchronize() synchronizes a kernel call on the host.
Most of the CUDA APIs are synchronized calls by default, but sometimes it is good to call explicit synchronized calls to avoid errors in the computation.

Questions and Solutions¶

Examples: Hello World

Serial-versionCUDA-version

//-*-C++-*-
// Hello-world.c

#include<stdio.h>
#include<cuda.h>

void c_function()
{
  printf("Hello World!\n");
}

int main()
{
  c_function();
  return 0;
}

//-*-C++-*-
// Hello-world.cu

#include<stdio.h>
#include<cuda.h>

// device function will be executed on device (GPU) 
__global__ void cuda_function()
{
  printf("Hello World from GPU!\n");

  // synchronize all the threads
  __syncthreads();
}

int main()
{
  // call the kernel function 
  cuda_function<<<1,1>>>();

  // synchronize the device kernel call
  cudaDeviceSynchronize();
  return 0;
}

Compilation and Output

Serial-versionCUDA-version

// compilation
$ gcc Hello-world.c -o Hello-World-CPU

// execution 
$ ./Hello-World-CPU

// output
$ Hello World from CPU!

// compilation
$ nvcc -arch=compute_70 Hello-world.cu -o Hello-World-GPU

// execution
$ ./Hello-World-GPU

// output
$ Hello World from GPU!

Questions

Right now, you are printing just one Hello World from GPU, but what if you would like to print more Hello World from GPU? How can you do that?

QuestionAnswerSolution Output

//-*-C++-*-
#include<stdio.h>
#include<cuda.h>

__global__ void cuda_function()
{
  printf("Hello World from GPU!\n");
  __syncthreads();
}

int main()
{
  // define your thread block here
  cuda_function<<<>>>();
  cudaDeviceSynchronize();
  return 0;
}

//-*-C++-*-
#include<stdio.h>
#include<cuda.h>

__global__ void cuda_function()
{
  printf("Hello World from GPU!\n");
  __syncthreads();
}

int main()
{
  // define your thread block here
  cuda_function<<<10,1>>>();
  cudaDeviceSynchronize();
  return 0;
}

Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!

Last update: June 28, 2025 20:17:16
Created: March 11, 2023 20:16:27