Software measurements .

oct2005

     This section includes general software measurements (mainly timing):

Different fft benchmarks:

    Below are listed various fft benchmarks that i've used to time things:

ACML performance:

    Used acmlbnch benchmark:

Acml notes:


FFTW performance

    Performance measurements were taken using the fftw bench routine. Ffts from 1K to 2^20 length were made with
various setups and machines:

cpu: summer1: (top)

FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
1024 1 0 18.67 ms 14.87 us 3442.0
1024 4 0 29.46 ms 43.46 us 1178.1
1024 8 0 40.54 ms 68.62 us 746.1
1024 1 1 36.43 ms 8.37 us 6115.2
1024 4 1 61.23 ms 54.53 us 938.9
1024 8 1 107.30 ms 69.15 us 740.4






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
2048 1 0 37.96 ms 35.25 us 3195.5
2048 4 0 65.88 ms 75.48 us 1492.2
2048 8 0 84.82 ms 69.73 us 1615.3
2048 1 1 71.98 ms 19.30 us 5834.9
2048 4 1 97.61 ms 52.47 us 2146.6
2048 8 1 144.96 ms 82.36 us 1367.7






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
4096 1 0 74.14 ms 78.51 us 3130.4
4096 4 0 100.95 ms 81.80 us 3004.2
4096 8 0 117.69 ms 100.98 us 2433.8
4096 1 1 142.61 ms 49.21 us 4993.6
4096 4 1 217.56 ms 72.66 us 3382.5
4096 8 1 310.06 ms 99.26 us 2476.0






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
8192 1 0 147.49 ms 170.78 us 3117.9
8192 4 0 170.44 ms 134.49 us 3959.2
8192 8 0 216.80 ms 152.18 us 3499.0
8192 1 1 284.35 ms 105.45 us 5049.8
8192 4 1 363.84 ms 132.57 us 4016.6
8192 8 1 530.91 ms 142.82 us 3728.3






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
16384 1 0 292.04 ms 366.41 us 3130.1
16384 4 0 318.86 ms 227.42 us 5043.0
16384 8 0 376.15 ms 175.45 us 6536.7
16384 1 1 555.23 ms 231.91 us 4945.4
16384 4 1 670.96 ms 203.48 us 5636.2
16384 8 1 799.97 ms 224.48 us 5109.0






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
32768 1 0 601.26 ms 795.00 us 3091.3
32768 4 0 629.55 ms 435.97 us 5637.1
32768 8 0 641.25 ms 296.72 us 8282.6
32768 1 1 1.05 s 509.09 us 4827.4
32768 4 1 1.18 s 431.06 us 5701.3
32768 8 1 1.30 s 276.88 us 8876.2






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
65536 1 0 1.66 s 1.73 ms 3032.8
65536 4 0 1.53 s 884.81 us 5925.4
65536 8 0 1.48 s 589.28 us 8897.1
65536 1 1 2.39 s 1.11 ms 4740.7
65536 4 1 2.47 s 727.94 us 7202.4
65536 8 1 2.61 s 521.56 us 10052.0






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
131072 1 0 4.46 s 4.92 ms 2263.1
131072 4 0 3.73 s 1.83 ms 6096.8
131072 8 0 3.35 s 1.14 ms 9732.9
131072 1 1 6.01 s 3.73 ms 2984.9
131072 4 1 5.47 s 1.56 ms 7138.3
131072 8 1 5.33 s 1.59 ms 7009.2






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
262144 1 0 13.77 s 12.99 ms 1816.2
262144 4 0 10.27 s 4.68 ms 5041.8
262144 8 0 9.10 s 3.12 ms 7553.4
262144 1 1 18.21 s 10.51 ms 2244.6
262144 4 1 13.95 s 4.02 ms 5872.5
262144 8 1 13.42 s 2.93 ms 8060.5






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
524288 1 0 6.38 s 31.44 ms 1584.1
524288 4 0 3.75 s 11.61 ms 4291.5
524288 8 0 2.99 s 7.40 ms 6731.2
524288 1 1 5.71 s 24.60 ms 2024.9
524288 4 1 3.68 s 10.32 ms 4825.4
524288 8 1 3.12 s 7.03 ms 7083.5






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
1048576 1 0 18.22 s 65.67 ms 1596.7
1048576 4 0 10.73 s 21.61 ms 4853.4
1048576 8 0 8.96 s 14.69 ms 7139.0
1048576 1 1 16.44 s 53.33 ms 1966.2
1048576 4 1 11.50 s 18.18 ms 5766.8
1048576 8 1 8.57 s 13.14 ms 7978.8







processing: /share/megs/phil/x101/fftw/fftw-3.2.2/archsrc/xeon5400/tests/ benchphil, benchhtml.pl

fftw on cpu  aserv11: (top)

Observations:


FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
1024 1 1 36.08 ms 3.20 us 15994.0
1024 4 1 100.03 ms 38.96 us 1314.1
1024 8 1 143.75 ms 36.01 us 1421.8
1024 12 1 166.72 ms 32.17 us 1591.6
1024 16 1 173.61 ms 61.41 us 833.7






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
2048 1 1 58.58 ms 7.32 us 15398.0
2048 4 1 163.33 ms 44.80 us 2514.0
2048 8 1 192.12 ms 45.64 us 2467.8
2048 12 1 240.42 ms 59.04 us 1908.0
2048 16 1 304.54 ms 83.88 us 1342.8






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
4096 1 1 101.56 ms 17.22 us 14268.0
4096 4 1 226.86 ms 56.85 us 4322.8
4096 8 1 311.78 ms 65.09 us 3775.9
4096 12 1 301.65 ms 84.76 us 2899.6
4096 16 1 375.20 ms 67.17 us 3658.7






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
8192 1 1 190.83 ms 48.95 us 10877.0
8192 4 1 346.32 ms 60.65 us 8779.8
8192 8 1 449.22 ms 70.93 us 7506.7
8192 12 1 511.94 ms 114.89 us 4634.7
8192 16 1 564.90 ms 80.33 us 6628.8






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
16384 1 1 349.97 ms 115.52 us 9927.7
16384 4 1 569.72 ms 82.45 us 13911.0
16384 8 1 635.73 ms 122.51 us 9361.7
16384 12 1 660.19 ms 112.47 us 10197.0
16384 16 1 800.49 ms 93.67 us 12244.0






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
32768 1 1 655.07 ms 257.95 us 9527.3
32768 4 1 1.03 s 133.98 us 18344.0
32768 8 1 1.11 s 162.08 us 15163.0
32768 12 1 1.15 s 135.22 us 18175.0
32768 16 1 1.12 s 139.38 us 17632.0





Intel ps_ipps benchmark:
FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
65536 1 1 1.49 s 577.28 us 9082.0
65536 4 1 2.01 s 314.28 us 16682.0
65536 8 1 1.81 s 188.17 us 27862.0
65536 12 1 1.60 s 185.53 us 28259.0
65536 16 1 1.97 s 293.89 us 17840.0






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
131072 1 1 3.23 s 1.20 ms 9295.4
131072 4 1 4.11 s 493.94 us 22556.0
131072 8 1 3.55 s 299.66 us 37180.0
131072 12 1 3.06 s 270.83 us 41137.0
131072 16 1 3.26 s 319.06 us 34918.0






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
262144 1 1 7.03 s 2.68 ms 8792.7
262144 4 1 8.19 s 858.25 us 27490.0
262144 8 1 7.56 s 795.50 us 29658.0
262144 12 1 5.40 s 783.50 us 30112.0
262144 16 1 6.20 s 1.09 ms 21605.0






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
524288 1 1 2.13 s 11.29 ms 4413.6
524288 4 1 1.27 s 3.17 ms 15708.0
524288 8 1 1.36 s 3.07 ms 16215.0
524288 12 1 12.45 s 2.19 ms 22737.0
524288 16 1 1.06 s 2.56 ms 19441.0






FFTLEN Nthreads UseSSE2 SetupTm RunTm MFLOPS
1048576 1 1 6.26 s 24.73 ms 4239.2
1048576 4 1 3.61 s 7.54 ms 13911.0
1048576 8 1 3.21 s 6.00 ms 17484.0
1048576 12 1 41.21 s 6.19 ms 16933.0
1048576 16 1 2.89 s 5.31 ms 19740.0







processing: /share/megs/phil/x101/fftw/fftw-3.2.2/archsrc/nehalem/tests/ benchphil, benchhtml.pl

21feb13: fftw 3.3.2 bench mark on megs3 (sandybridge) cpu  (top)

Observations:
FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
1024 1 avx 12.34 ms 1.36 us 37644.0
1024 4 avx 41.38 ms 10.02 us 5107.8
1024 8 avx 46.98 ms 13.88 us 3687.5






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
2048 1 avx 21.92 ms 3.70 us 30434.0
2048 4 avx 50.52 ms 9.39 us 12000.0
2048 8 avx 77.85 ms 16.82 us 6696.3






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
4096 1 avx 38.60 ms 8.80 us 27916.0
4096 4 avx 84.91 ms 11.71 us 20984.0
4096 8 avx 97.09 ms 17.57 us 13984.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
8192 1 avx 73.73 ms 22.43 us 23740.0
8192 4 avx 119.75 ms 16.35 us 32563.0
8192 8 avx 142.31 ms 25.87 us 20580.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
16384 1 avx 137.73 ms 52.25 us 21950.0
16384 4 avx 206.76 ms 25.70 us 44627.0
16384 8 avx 213.62 ms 29.53 us 38841.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
32768 1 avx 271.86 ms 124.62 us 19721.0
32768 4 avx 373.26 ms 45.91 us 53526.0
32768 8 avx 401.74 ms 88.19 us 27867.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
65536 1 avx 678.04 ms 275.86 us 19006.0
65536 4 avx 813.69 ms 90.05 us 58219.0
65536 8 avx 752.61 ms 84.72 us 61886.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
131072 1 avx 1.53 s 602.25 us 18499.0
131072 4 avx 1.77 s 176.06 us 63279.0
131072 8 avx 1.49 s 163.75 us 68037.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
262144 1 avx 3.42 s 1.26 ms 18673.0
262144 4 avx 3.93 s 381.19 us 61893.0
262144 8 avx 3.26 s 396.69 us 59475.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
524288 1 avx 1.22 s 7.05 ms 7060.4
524288 4 avx 753.81 ms 2.10 ms 23697.0
524288 8 avx 617.56 ms 2.02 ms 24694.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
1048576 1 avx 3.92 s 16.20 ms 6472.7
1048576 4 avx 2.44 s 5.50 ms 19070.0
1048576 8 avx 2.30 s 5.18 ms 20223.0








21feb13: fftw version 3.3.3 on megs3  (top)

    /sane setup as 01jul12 except that version 3.3.3 was used rather than 3.3.2


FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
1024 1 avx 13.32 ms 1.31 us 39188.0
1024 4 avx 32.82 ms 12.11 us 4228.5
1024 8 avx 48.87 ms 12.99 us 3942.3






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
2048 1 avx 24.41 ms 3.42 us 32948.0
2048 4 avx 49.20 ms 10.85 us 10384.0
2048 8 avx 72.02 ms 15.54 us 7247.5






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
4096 1 avx 39.52 ms 8.39 us 29292.0
4096 4 avx 74.76 ms 14.93 us 16463.0
4096 8 avx 100.88 ms 16.94 us 14511.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
8192 1 avx 74.70 ms 22.03 us 24174.0
8192 4 avx 121.43 ms 17.62 us 30215.0
8192 8 avx 139.91 ms 22.96 us 23197.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
16384 1 avx 145.28 ms 52.37 us 21901.0
16384 4 avx 204.60 ms 25.44 us 45083.0
16384 8 avx 217.31 ms 39.05 us 29366.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
32768 1 avx 278.92 ms 124.88 us 19679.0
32768 4 avx 387.19 ms 45.80 us 53658.0
32768 8 avx 408.18 ms 62.48 us 39336.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
65536 1 avx 677.89 ms 280.27 us 18707.0
65536 4 avx 815.73 ms 91.06 us 57575.0
65536 8 avx 729.79 ms 132.50 us 39569.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
131072 1 avx 1.51 s 591.62 us 18831.0
131072 4 avx 1.84 s 189.84 us 58686.0
131072 8 avx 1.61 s 251.11 us 44368.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
262144 1 avx 3.45 s 1.30 ms 18176.0
262144 4 avx 3.75 s 379.97 us 62092.0
262144 8 avx 3.38 s 320.37 us 73642.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
524288 1 avx 1.20 s 7.02 ms 7095.6
524288 4 avx 721.68 ms 2.27 ms 21948.0
524288 8 avx 611.24 ms 2.06 ms 24143.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
1048576 1 avx 3.93 s 15.97 ms 6566.3
1048576 4 avx 2.71 s 5.50 ms 19062.0
1048576 8 avx 2.24 s 5.27 ms 19910.0









27mar13: fftw on rserv2 (64 core amd bulldozer)  (top)

Observations:
The table below compares different compile options on rserv2 as well as some times on megs3. The comparison was done using V3.3.3.


The table below has the avx times:   (top)

rserv2(bulldozer) times --enable-avx Kernel 3.4.33
FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
1024 1 avx 23.25 ms 4.60 us 11130.0
1024 4 avx 101.22 ms 36.16 us 1416.1
1024 8 avx 146.86 ms 61.27 us 835.6
1024 16 avx 213.68 ms 61.02 us 839.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
2048 1 avx 45.97 ms 11.16 us 10097.0
2048 4 avx 135.69 ms 41.14 us 2738.2
2048 8 avx 197.71 ms 63.92 us 1762.3
2048 16 avx 317.12 ms 105.09 us 1071.9






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
4096 1 avx 90.97 ms 27.00 us 9101.6
4096 4 avx 193.45 ms 47.23 us 5203.9
4096 8 avx 262.78 ms 60.77 us 4044.1
4096 16 avx 404.87 ms 108.50 us 2265.1






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
8192 1 avx 179.27 ms 70.33 us 7570.9
8192 4 avx 321.36 ms 64.70 us 8229.6
8192 8 avx 362.27 ms 73.23 us 7271.7
8192 16 avx 549.13 ms 124.84 us 4265.4






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
16384 1 avx 363.11 ms 167.67 us 6840.0
16384 4 avx 543.77 ms 97.73 us 11735.0
16384 8 avx 564.38 ms 93.19 us 12307.0
16384 16 avx 746.70 ms 123.43 us 9291.8






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
32768 1 avx 769.47 ms 414.16 us 5934.0
32768 4 avx 1.03 s 180.09 us 13646.0
32768 8 avx 1.03 s 152.05 us 16163.0
32768 16 avx 1.20 s 174.27 us 14103.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
65536 1 avx 1.90 s 923.69 us 5676.0
65536 4 avx 2.23 s 340.91 us 15379.0
65536 8 avx 2.00 s 244.50 us 21443.0
65536 16 avx 2.10 s 271.23 us 19330.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
131072 1 avx 4.64 s 2.48 ms 4490.1
131072 4 avx 5.11 s 710.38 us 15683.0
131072 8 avx 4.29 s 444.87 us 25043.0
131072 16 avx 4.19 s 404.50 us 27543.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
262144 1 avx 11.48 s 5.96 ms 3960.5
262144 4 avx 12.86 s 1.74 ms 13524.0
262144 8 avx 10.45 s 949.56 us 24846.0
262144 16 avx 9.18 s 761.88 us 30967.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
524288 1 avx 3.44 s 16.94 ms 2939.7
524288 4 avx 2.05 s 4.68 ms 10646.0
524288 8 avx 1.58 s 2.72 ms 18280.0
524288 16 avx 1.43 s 1.92 ms 25924.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
1048576 1 avx 10.65 s 36.52 ms 2871.5
1048576 4 avx 6.64 s 11.28 ms 9294.2
1048576 8 avx 4.89 s 5.70 ms 18385.0
1048576 16 avx 4.34 s 4.09 ms 25650.0









This table shows the sse2 times on rserv2:


rserver2 (bulldozer) times with SSE2   Kernel:2.6.32-279
FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
1024 1 sse2 31.68 ms 5.45 us 9396.7
1024 4 sse2 151.33 ms 66.93 us 764.9
1024 8 sse2 203.87 ms 103.53 us 494.5
1024 16 sse2 342.76 ms 141.02 us 363.1






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
2048 1 sse2 62.26 ms 13.32 us 8459.4
2048 4 sse2 237.30 ms 68.60 us 1641.9
2048 8 sse2 348.20 ms 100.77 us 1117.8
2048 16 sse2 523.57 ms 172.59 us 652.6






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
4096 1 sse2 119.39 ms 32.51 us 7558.7
4096 4 sse2 352.33 ms 88.05 us 2791.2
4096 8 sse2 521.54 ms 109.77 us 2238.8
4096 16 sse2 757.10 ms 181.97 us 1350.6






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
8192 1 sse2 248.09 ms 71.70 us 7426.6
8192 4 sse2 583.80 ms 119.61 us 4451.8
8192 8 sse2 791.89 ms 131.12 us 4061.1
8192 16 sse2 1.04 s 186.73 us 2851.5






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
16384 1 sse2 512.36 ms 167.33 us 6854.1
16384 4 sse2 1.00 s 179.17 us 6401.0
16384 8 sse2 1.22 s 178.83 us 6413.3
16384 16 sse2 1.50 s 220.69 us 5196.9






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
32768 1 sse2 1.05 s 378.75 us 6488.7
32768 4 sse2 1.83 s 319.69 us 7687.5
32768 8 sse2 1.72 s 234.25 us 10491.0
32768 16 sse2 2.13 s 305.22 us 8051.9






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
65536 1 sse2 2.45 s 929.75 us 5639.0
65536 4 sse2 3.40 s 596.34 us 8791.7
65536 8 sse2 3.27 s 389.81 us 13450.0
65536 16 sse2 3.45 s 313.75 us 16710.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
131072 1 sse2 5.52 s 2.16 ms 5161.2
131072 4 sse2 7.03 s 829.25 us 13435.0
131072 8 sse2 7.15 s 749.94 us 14856.0
131072 16 sse2 6.24 s 572.22 us 19470.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
262144 1 sse2 13.57 s 5.21 ms 4531.9
262144 4 sse2 15.92 s 1.68 ms 14061.0
262144 8 sse2 14.89 s 1.02 ms 23112.0
262144 16 sse2 13.70 s 1.27 ms 18583.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
524288 1 sse2 3.47 s 17.06 ms 2919.0
524288 4 sse2 2.11 s 4.34 ms 11482.0
524288 8 sse2 1.84 s 3.68 ms 13552.0
524288 16 sse2 1.59 s 2.45 ms 20357.0






FFTLEN Nthreads SSE/AVX SetupTm RunTm MFLOPS
1048576 1 sse2 11.71 s 37.31 ms 2810.4
1048576 4 sse2 7.39 s 10.53 ms 9954.2
1048576 8 sse2 5.68 s 5.39 ms 19440.0
1048576 16 sse2 5.13 s 6.43 ms 16306.0








FFTW version differences

    The table below show some timing differences in different fftw versions:

cpu
length
threads
times (usecs)
v3.3.2
v3.3.3
megs3
64k
8
85
132
128k
8
163
251
256k
8
397
320

notes:


intel performance primitives (IPP) lib: (top)


Intel ps_ipps benchmark:

15aug11 IPP ffttimes on aserv11 (.ps) (.pdf) using intel benchmark program ps_ipps.

       processing:x101/intel/tests/ sysperf.sc, sysperf.pro

05feb13 IPP ffttimes on megs3 (.ps) (.pdf) using intel benchmark program ps_ipps.

       processing:x101/intel/tests/ sysperf.sc, sysperf.pro

My benchmark

18aug11: IPP table of ffttimes on aserv11/adslinux using my test program

    processing:x101/intel/tests/fftbnch.c

10oct05: Some linux kernels see no speed up when running 2 processes on a dual processor cpu. (top)



 (23feb06: We finally looked inside the aolc boxes and they do not have multiple cpus (even though the purchase order claimed they did). So their timing is for a single processor with hyperthreading enabled. So the conclusions about 2.4.21 kernels may not be correct...)

    The idl routine (atmclp) processes the coded long process atm data. It was used to benchmark  some of the dual processor cpus at the observatory. The data set used was:

A single version of the  processing was run and then two copies (two separate idl sessions) were run. The times for the processing are shown in the table below:
 
cpu
cpu type
freq(ghz)
hyper
thread
Linux
kernel
Time 1 copy
secs
Time 2 copies
secs
fusion00
xeon 2.4
no
2.4.18-27.8.0smp
59
62
fusion02
xeon 2.2
yes
2.4.21-4.ELsmp
61
99
aolc1*
xeon 2.4
no*
2.4.21-4.ELsmp
58
107
aolc2*
xeon 2.4
no*
2.4.21-4.ELsmp
61
134
pserverK
xeon 3.0
yes
2.6.8-1.521smp
57
53
57
53 (repeat)
pserverM
xeon 3.0
yes
2.6.8-1.521smp
52
60
pserverN
pent4 3.2
no
2.6.12-1.1447_FC4smp
61
104
(but cpu was busy)
*
You can see that the 2.4.21-4Elsmp kernels take twice as long to run two copies as 1 copy. This means that there is no advantage to using the dual processor (aolc2 actually took longer than twice the single copy time).  For most of the measurements top showed no other processes using the cpu. The exception was pserverN where root was running a cp that took about 30% of the cpu.
 

For the aolc computers you should spread the jobs out over multiple cpus rather than trying to run two of the same on the same cpu (until arun gets a chance to update the kernels).
 

processing: x101/atm/testclp.pro
 home_~phil
-->