int idx = threadIdx.x; --shared-- int array[128]; array[idx]=threadIdx.x; if(idx < 127) array[idx] = array[idx+1]
thead, thread block
CUDA
a hierarchy of
-computation
-memory spaces
synchronization
Writing Efficient Program
High-level strategy
1.maximize arithmetic intensity math/memory