Skip to content

Hello World

Now, our first exercise would be to print out the Hello World from GPU. To do that, we need to do the following things:

  • Run a part or entire application on the GPU
  • Call the CUDA function on a device
  • It should be called using function qualifier __global__
  • Calling the device function on the main program:
  • C/C++ example, c_function()
  • CUDA example, cuda_function<<<1,1>>>() (just using 1 thread)
  • <<< >>>, specify the threads blocks within the bracket
  • Make sure to synchronize the threads
  • __syncthreads() synchronizes all the threads within a thread block
  • CudaDeviceSynchronize() synchronizes a kernel call in host
  • Most of the CUDA APIs are synchronized calls by default (but sometimes it is good to call explicit synchronized calls to avoid errors in the computation)

Questions and Solutions

Examples: Hello World
//-*-C++-*-
// Hello-world.c

#include<stdio.h>
#include<cuda.h>

void c_function()
{
  printf("Hello World!\n");
}

int main()
{
  c_function();
  return 0;
}
//-*-C++-*-
// Hello-world.cu

#include<stdio.h>
#include<cuda.h>

// device function will be executed on device (GPU) 
__global__ void cuda_function()
{
  printf("Hello World from GPU!\n");

  // synchronize all the threads
  __syncthreads();
}

int main()
{
  // call the kernel function 
  cuda_function<<<1,1>>>();

  // synchronize the device kernel call
  cudaDeviceSynchronize();
  return 0;
}
Compilation and Output
// compilation
$ gcc Hello-world.c -o Hello-World-CPU

// execution 
$ ./Hello-World-CPU

// output
$ Hello World from CPU!
// compilation
$ nvcc -arch=compute_70 Hello-world.cu -o Hello-World-GPU

// execution
$ ./Hello-World-GPU

// output
$ Hello World from GPU!
Questions

Right now, you are printing just one Hello World from GPU, but what if you would like to print more Hello World from GPU? How can you do that?

//-*-C++-*-
#include<stdio.h>
#include<cuda.h>

__global__ void cuda_function()
{
  printf("Hello World from GPU!\n");
  __syncthreads();
}

int main()
{
  // define your thread block here
  cuda_function<<<>>>();
  cudaDeviceSynchronize();
  return 0;
}
//-*-C++-*-
#include<stdio.h>
#include<cuda.h>

__global__ void cuda_function()
{
  printf("Hello World from GPU!\n");
  __syncthreads();
}

int main()
{
  // define your thread block here
  cuda_function<<<10,1>>>();
  cudaDeviceSynchronize();
  return 0;
}
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!
Hello World from GPU!