gpu measurements .

mar2015

This section has timing measurements done with nvidia gpus. The gpu's we have are (mar2015)

Wombat cpu (2*8 core 2670 intel machine. 128Gb memory)

tesla K20c 4.8 Gbytes memory, 706Mhz clock, 2496 cuda cores (13multiprocessors x 192cores/MP)
GeForce GTX 780, 3.071 Gbytes memory, 902 Mhz clock, 2304 cores (12 multiprocessorsx 192cores/MP)

ffttimes on gpu:

13mar15: fftfilter program

The fftfiltergpu program will:

Read complex float data from standard input
move the data to gpu
compute fft
move data back to program
write transformed data to stdout.

I ran this on 12mar15 with the following command:

fftfiltergpu --fftlen=fftlen --numfft=numfft < /dev/zero > /dev/null

I recorded the times for copying host to gpu, fft on gpu, and copying gpu back to host. I am not reporting the times to read from standardinp and write to stdout.
The results are shown in the table below:

fftLen	hostToGpu time(ms)	fft Time (ms)	GpuToHost time (ms)
1MegaPnt 2^20	3.065	0.450	2.520
16MegaPnt 2^24	44.984	6.102	40.011
128MegaPnt 2^27	178.319	63.997	159.998

I did not overlap any of the operations. The fft was done in place.

processing: svn/aosoft/src/clp/gpu/cuda/fftfiltergpu.cu

 home_~phil

-->