pdev data taking measurements

LAST MOD:17may18

230329: pset error timeouts caused by disc i/o slowdown.
180517: disc i/o fails with 40mbyte/sec voltage sampling.
23feb09:scp'ing data while running at 65Mbytes/second.
02dec08: pdevs1 disk doesn't keep up.
monitoring cpu/disc statistics with iostat
pdev raid disc i/o rates.

180517: disc i/o fails with 40MB/sec voltage sampling.

s3279 was using lbw and the mock spectrometer to take some rfi data. The mock setup was:

clock: 160MHz, output Bw=5MHz, 16 bit voltage sampling polA and polB
6 mocks were used (pdevs2 had a degraded disc array).
data was taken from 12:00 to 15:00
pdevs2 had a degraded raid array (only 2 discs) so it was not used.

During the run, the mock program aborted with FPGA1 PI fifo error 0x0470 or FPGA1: DMA buffer overflow. The buffer pool monitor data was examined to see what was going on.

The Buffer pool:

The mock data taking uses 50 20Mbytes shared memory buffers for i/o buffering to disc (1GByte total). The process is:

The pnet program:

Requests a free shm buffer.

reads data from the pdev spectrometer into the buffer

puts the buffer on the bpinp program input queue.

The bpInp program processes the data buffers:

grab a buffer from its input queue (placed there by pnet

Processes the buffer.

For spectral data it does the bit reversal.

places the processed buffer on the bpOut input queue

bpOut program loads headers and writes to disc

grabs buffer from input queue

loads header info (at least for spectral data)

outputs the buffer to disc

puts the buffer back on the shm free list

The bpMon program continually monitors the status of the shared memory freelist queue as well as the bpInp,bpOut queues. It will

Every 5 seconds, it reads the shared memory info holding the queue status.
It then logs this to disc /data/pdevFiles/bpMon_bnsm.log

The s3279 disc writing errors:

    When voltage sampling is used, the i/o programs only use the 1st 1Mbyte of each 20Mbyte buffer for recording data. So instead of having 20Mb yte*50bufs = 1Gbyte of buffer memory, there is only 50*1= 50Mbytes of buffer spaces available. This was initially chosen so there wasn't too long a delay between buffer outputs when slow voltage data sampling was used.

    s3279 was taking data at 40 million bytes /sec (around 38.15 2^20 bytes/sec). 50 1Mbytes byte buffers gave a time slop of only 1.25 seconds.

The plots show the number of free buffers available during the run (.ps) (.pdf):

Page 1: number of free buffers vs time for each pdev spectrometer

Top frame to bottom frame are the different pdev spectrometers (pdevs2 is not included).
when the buffer count goes to 0, the the i/o will fail.
The green lines are blown up on page 2.

Page 2: blowup where i/o failed.

each + is a different 5 second sample.
The red lines show where the buffer count went to zero for a given spectrometer.
The blue lines are when cima reported that he mocks couldn't keep up with the data rate.

Cima will force a crash in any of the spectrometers can't keep up.

SUMMARY:

The i/o could not keep up 3 times during the run (i've excluded the times before pdevs2 was removed).

In these cases, the free buffer count went to 0.

looking at the data, .25 to .5 second backups (10 to 20 buffers) occur regularly.
I need to look into increasing the packet size used for voltage sampling . I think there was a reason why i didn't want to make it much larger than 1MByte.

processing: x101/180517/bpmonchk.pro

23feb09: scp data while taking data

    The scp program was run from galfas1 to copy a 2gb file from pdevs1 to galfas1 while pdevs1 was taking data in galfacts mode (65 Mb/sec). scp was run at the default nice level. The transfer completed without any aborts from data taking.The buffer memory went from 52 down to 41 free buffers (20Mbytes each) but this continued even after the scp finished. Top was run and the cpu utilization was recorded during and after scp was run:
-----------------------------------------------------------------------------------------------------
While scp pdevs1 to galfas1 was running with data taking:

top - 12:05:33 up 13 days, 17:29, 4 users, load average: 1.78, 0.88, 1.12
Tasks: 169 total,   3 running, 165 sleeping,   0 stopped,   1 zombie
Cpu(s): 16.1%us, 16.3%sy, 0.0%ni, 43.8%id, 6.1%wa, 1.3%hi, 16.3%si, 0.0%st
Mem:   4033376k total, 3831064k used,   202312k free,     1116k buffers
Swap: 10233384k total,      196k used, 10233188k free, 3507700k cached

PID USER      PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
20726 phil      15 0 73888 1696    972 R   66    0.0 0:16.92 sshd
20264 root      16   0 1073m 995m 957m S   32 25.3   1:56.83 bpOut
20721 root      15   0 1031m 469m 469m S   20 11.9   0:07.33 psrv
20260 root      18   0 1041m 957m 957m S   20 24.3   1:08.09 bpInp
20720 root      15   0 1031m 732m 732m S   20 18.6   0:07.41 psrv
20727 phil      15 0 33180 1352   1024 R   10     0.0   0:02.43 scp

----------------------------------------------------------------------------------------------------
scp not running, data taking still running

top - 12:08:07 up 13 days, 17:31, 4 users, load average: 2.77, 1.84, 1.46
Tasks: 166 total,   2 running, 163 sleeping,   0 stopped,   1 zombie
Cpu(s): 7.7%us, 10.4%sy, 0.0%ni, 71.6%id, 1.6%wa, 0.5%hi, 8.2%si, 0.0%st
Mem:   4033376k total, 3994320k used,    39056k free,      496k buffers
Swap: 10233384k total,      196k used, 10233188k free, 3665700k cached

PID USER      PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
20264 root      16   0 1073m 995m 957m S   40 25.3   2:53.05 bpOut
20260 root      18   0 1041m 957m 957m S   17 24.3   1:37.26 bpInp
20721 root      15   0 1031m 863m 863m R   13 21.9   0:30.77 psrv
20720 root      15   0 1031m 938m 938m S   11 23.8   0:31.54 psrv

sshd is using 66 % of 1 cpu probably to do the encryption..

processing: x101/090223/test.doc

02dec08: pdevs1 disk doesn't keep up. (top)

Galfacts was running on 01dec08-02dec08 with 65mb/sec per box. At 00:27:10 an abort message was written to the message.log. The circumstances were:

It had be running for 12384 seconds running at 65mB/sec
190*2 2GByte files had been generated for this scan.
pdevs1 ran out of memory buffers (1 gb or 16 secs of data at 65 mb/sec).
pdevs2 thru pdevs7 did not slow down. Only pdevs1 failed.

The table below shows when the last few disc files were generated and their last modification dates. The open times come from the message log. The message is written after the file has been successfully opened and the fits header info has been written. Column 2 is pdevs1, column 3 is pdevs2. For each pdev there are two files generated: b0s0g0, b0s1g0 for the two 170 MHz Bands.

Times when fits files were opened.
fileNumber	pdevs1 open (secMidite)	pdevs2 open (secMidnite)	open Dif p1-p2	lastMod p1_p2l	lastMod p1-p2
00286	00:23:36 (1416) 00:23:36	00:23:36 (1416) 00:23:36	0 0
00287	00:24:41 (1481) 00:24:42	00:24:41 (1481) 00:24:41	0 1	00:25:47_00:25:46 00:25:46_00:25:47	1 -1
00288	00:25:47 (1547) s0 00:25:46 (1546) s1	00:25:46 (1546) 00:25:47 (1547)	0 0	00:27:04_00:26:52 00:26:58_00:26:52	12 6
00289	00:27:08 (1628) s0 00:27:04 (1624) s1	00:26:52 (1612) 00:26:52 (1612)	16 12
abortMsg	00:27:10 (1630) 00:27:10 (1630)	00:27:10 (1630) 00:27:10

Timeline:

00:25:46 - times the same. last write to file 00287
00:25:47 - open file 00288 still in sync
00:26:52 - pdevs2 finishes writing file 00288
00:26:52 - pdevs2 finishes open,header write file 00289.
00:26:54 - no scramnet buffers logged to disc for 5 seconds.
00:26:58/00:27:04 - pdevs1 finishes open to file 00288 and writing headers. 6/12 seconds behind.
00:27:08/04 - pdevs1 finished open/header write file 00289.

took 4 seconds from close of previous file.
is now 16/12 seconds behind pdevs2.

The scramnet pnt, agc info is logged to disc on pdevs1 once a second. three time are recorded each second:

the time in the block
The time when pdevs1 shmScram read the block from the scram multicast
the time when pdevs1 scrmLog program wrote that block to disc.

The plot shows the scramnet times close to the pnetctl abort (.ps) (.pdf):

The different times are color coded.

Black is the time stamp in the pntblk.
Red is when the block was read from the ethernet broadcast
green is when the block was written to disc

At 26:54 no scramnet packet was seen for 5 seconds.

This 2 seconds after pdevs2 close file 288 and 4 seconds before the last write to 00288.s1 on pdevs1.

The other pdev devices receive the scramnet broadcasts. They did not have any delays in their disc writes.
The scramnet info is also recorded directly from the shared memory on observer2. During this time period the packets were showing up once a second.
A delay in receiving scram net packets could occur if:

the packets didn't arrive
the scramShm program could not grab the semaphore for scramShm (bpOut could be holding it).
An error in writing the scram block out (via scramLog) could cause a delay, but it would also generate a message via syslog and none were seen.

Looking at the raid disc controller info. pdevs1 disk 5 had 43 timeouts. None of the other discs on pdevs1 raid or any of the other pdevsN's raids had any disc timeouts.

summary

pdevs1 ran out of buffers at 27:10. There are enough buffers for about 16 seconds so the problem could have occured more than 16 seconds prior to this.
pdevs2-pdev7 had no problems.
at 27:04 one of the pdevs1 files was 12 seconds behind the pdevs2 file.
scramnet buffers stopped being recorded for 5 seconds at 26:54.
Something slowed down pdevs1 around 26:52.
It took about 4 seconds to open and write the header files for pdevs1 file 288. Pdevs2 did this within 1 second. These operations don't need any scramnet info so that isn't the reason for the slowdown.
Looking at pdevs1 raid statistics, disc 5 does not look like it is working 100%.
On 05dec08 we changed disc #5 in the raid array. We'll see if this fixes the problem.

Monitoring pdev server cpu & disc with iostat. (top)

The iostat program is run on the pdev servers for about 12 hours each evening. Galfacts (65Mb/sec) and a2133 (AUDS) run during this period. We would like to see what the cpu and disk are doing during data taking (especially during galfacts aborts).
The iostat program is set to average over 10 seconds intervals and then record the info to disc. This info is then plotted. It contains:

cpu usage (top plot)

%user (black) . Fraction of time cpu running in user mode.
%iowait (red). Fraction of time the cpu was not busy and there was an outstanding io request
%sys (green) fraction time cpu running in system mode (in the kernel).
%idle (blue) fraction of time cpu was idle.

disc i/o rates (middle plot).

this monitors /dev/sdc1 (/dump/pdev) where the data files are written.
Write i/o rate (black) in Mbytes/sec
Read i/o rate (red) in Mbytes/sec

Daily monitoring of iostat on pdevsN servers
Date (start date)	Notes
29nov11 (.ps) (.pdf))	programs run: (14.3-16.5):p2627 voltage sampling 40mb/sec io rate 40 mb/sec No bitflipping of spectra so cpu not busy. 15.2 burst of 100mb/sec. They stopped to switch feeds. when they start up it gets behind for awhile.. (16.5:19.25): p2030 pulsar search 19.5:-01.25 a2130 galfacts The red read rates is probably the p2030 compression running after the p2030 run.
19nov09 (.ps) (.pdf)	ok.alfalfa[-6:0],galfacts[0:6]
14dec08 (.ps) (.pdf)	ok
13dec08 (.ps) (.pdf)	no galfacts this night
09dec08 (.ps) (.pdf)	ok
07dec08 (.ps) (.pdf)	ok
06dec08 (.ps) (.pdf)	06dec08 and before only monitored pdevs1
05dec08 (.ps) (.pdf)	disc 5 raid array pdevs1 changed before run azimuth power supply fault stopped data taking. There were no i/o aborts.
04dec08 (.ps) (.pdf)
03dec08 (.ps) (.pdf)	galfacts aborted around 1:29:21 am. 2nd page is a blowup near the failure. Note the large iowait.

processing: svn/pdev/pdev/monitor/runiostat to collect the data. x101/pdev/iostat/pltday.pro to make plot.

Pdev raid disc i/o rates: (top)

The program discwr2 (/share/megs/phil/svn/pdev/pdev/scripts/discwr2 ) tests the raid disc writing speed on each pdevsN server.

It writes 1000 records of 20Mbytes each to the file /dump/pdev/discwr2.out
It times each record and the global time (globalStart, write all data, globalStop)
For the 1000 record times measured it also reports the min,max,mean, and rms of the record times.
The measurements could vary if someone was using pdevsN or accessing an NFS mounted file.

The raid measurements can be found in the discwr2 output file .

The program discwr2tmp tests the single disc i/o rates (no raid speedup). The single disc is normally used for galfacts compressed output.
The single disc measurements can be found in discwr2tmp output file.

History:

16dec09: group 0 now has 5* 1.5TB Drives. It was 6*.75TB drives.
16may09: arun replaced 750 gb drives on pdevs13 with 1.5 Tb drives.

first slot in raid no longer working.. 4 drives active.
i/o rate went from 30mb/sec to 137 mb/sec

<- page up

home_~phil