The CUDA programming model
CUDA (Compute Unified Device Architecture) is a C programming model and API (Application Programming Interface) introduced by NVIDIA to enable software developers to code general purpose apps that run on the massively parallel hardware on GPUs.
GPUs are optimal for data parallel apps aka SIMD (Single Instruction Multiple Data). CUDA allows us to also code MIMD apps, but at a reduced efficiency.
Threads running in parallel use extremely fast shared memory for communication. There is no MPI_Send(), but the equivalent of MPI_Barrier() is __syncthreads().