gpu measurements .
mar2015
This section has timing measurements done
with nvidia gpus. The gpu's we have are (mar2015)
- Wombat cpu (2*8 core 2670 intel machine. 128Gb memory)
- tesla K20c 4.8 Gbytes memory, 706Mhz clock, 2496 cuda cores
(13multiprocessors x 192cores/MP)
- GeForce GTX 780, 3.071 Gbytes memory, 902 Mhz clock, 2304
cores (12 multiprocessorsx 192cores/MP)
ffttimes on gpu:
13mar15: fftfilter program
The fftfiltergpu program will:
- Read complex float data from standard input
- move the data to gpu
- compute fft
- move data back to program
- write transformed data to stdout.
I ran this on 12mar15 with the following command:
- fftfiltergpu --fftlen=fftlen --numfft=numfft <
/dev/zero > /dev/null
I recorded the times for copying host to gpu, fft
on gpu, and copying gpu back to host. I am not reporting the times
to read from standardinp and write to stdout.
The results are shown in the table below:
fftLen
|
hostToGpu
time(ms)
|
fft Time
(ms)
|
GpuToHost
time (ms)
|
1MegaPnt
2^20
|
3.065
|
0.450
|
2.520
|
16MegaPnt
2^24
|
44.984
|
6.102
|
40.011
|
128MegaPnt
2^27
|
178.319
|
63.997
|
159.998
|
I did not overlap any of the operations. The fft
was done in place.
processing:
svn/aosoft/src/clp/gpu/cuda/fftfiltergpu.cu
home_~phil
-->