pdev data taking measurements
LAST MOD:17may18
230329: pset error timeouts caused by disc i/o slowdown.
180517: disc i/o fails with 40mbyte/sec voltage
sampling.
23feb09:scp'ing data while running at
65Mbytes/second.
02dec08: pdevs1 disk doesn't keep up.
monitoring cpu/disc statistics with iostat
pdev raid disc i/o rates.
180517: disc i/o fails with 40MB/sec
voltage sampling.
s3279 was using lbw and the mock spectrometer to
take some rfi data. The mock setup was:
- clock: 160MHz, output Bw=5MHz, 16 bit voltage sampling polA
and polB
- 6 mocks were used (pdevs2 had a degraded disc array).
- data was taken from 12:00 to 15:00
- pdevs2 had a degraded raid array (only 2 discs) so it was not
used.
During the run, the mock program aborted with
FPGA1 PI fifo error 0x0470 or FPGA1: DMA buffer overflow. The
buffer pool monitor data was examined to see what was going on.
The Buffer pool:
The mock data taking uses 50 20Mbytes shared
memory buffers for i/o buffering to disc (1GByte total). The
process is:
- The pnet program:
- Requests a free shm buffer.
- reads data from the pdev spectrometer into the buffer
- puts the buffer on the bpinp program input queue.
- The bpInp program processes the data buffers:
- grab a buffer from its input queue (placed there by pnet
- Processes the buffer.
- For spectral data it does the bit reversal.
- places the processed buffer on the bpOut input queue
- bpOut program loads headers and writes to disc
- grabs buffer from input queue
- loads header info (at least for spectral data)
- outputs the buffer to disc
- puts the buffer back on the shm free list
The bpMon program continually monitors the status
of the shared memory freelist queue as well as the bpInp,bpOut
queues. It will
- Every 5 seconds, it reads the shared memory info holding the
queue status.
- It then logs this to disc /data/pdevFiles/bpMon_bnsm.log
The s3279 disc writing errors:
When voltage sampling is used, the i/o
programs only use the 1st 1Mbyte of each 20Mbyte buffer for
recording data. So instead of having 20Mb yte*50bufs =
1Gbyte of buffer memory, there is only 50*1= 50Mbytes of
buffer spaces available. This was initially chosen so there wasn't
too long a delay between buffer outputs when slow voltage data
sampling was used.
s3279 was taking data at 40 million bytes
/sec (around 38.15 2^20 bytes/sec). 50 1Mbytes byte buffers
gave a time slop of only 1.25 seconds.
The plots show the number of
free buffers available during the run (.ps) (.pdf):
- Page 1: number of free buffers vs time for each pdev
spectrometer
- Top frame to bottom frame are the different pdev
spectrometers (pdevs2 is not included).
- when the buffer count goes to 0, the the i/o will fail.
- The green lines are blown up on page 2.
- Page 2: blowup where i/o failed.
- each + is a different 5 second sample.
- The red lines show where the buffer count went to zero for a
given spectrometer.
- The blue lines are when cima reported that he mocks couldn't
keep up with the data rate.
- Cima will force a crash in any of the spectrometers can't
keep up.
SUMMARY:
- The i/o could not keep up 3 times during the run (i've
excluded the times before pdevs2 was removed).
- In these cases, the free buffer count went to 0.
- looking at the data, .25 to .5 second backups (10 to 20
buffers) occur regularly.
- I need to look into increasing the packet size used for
voltage sampling . I think there was a reason why i didn't want
to make it much larger than 1MByte.
processing: x101/180517/bpmonchk.pro
23feb09: scp data while taking data
The scp program was run from galfas1 to copy a
2gb file from pdevs1 to galfas1 while pdevs1 was taking data
in galfacts mode (65 Mb/sec). scp was run at the default nice level.
The transfer completed without any aborts from data taking.The
buffer memory went from 52 down to 41 free buffers (20Mbytes each)
but this continued even after the scp finished. Top was run and the
cpu utilization was recorded during and after scp was run:
-----------------------------------------------------------------------------------------------------
While scp pdevs1 to galfas1 was
running with data taking:
top - 12:05:33 up 13 days, 17:29, 4 users, load average:
1.78, 0.88, 1.12
Tasks: 169 total, 3 running, 165 sleeping, 0
stopped, 1 zombie
Cpu(s): 16.1%us, 16.3%sy, 0.0%ni, 43.8%id, 6.1%wa,
1.3%hi, 16.3%si, 0.0%st
Mem: 4033376k total, 3831064k used,
202312k free, 1116k buffers
Swap: 10233384k total, 196k used,
10233188k free, 3507700k cached
PID USER PR NI
VIRT RES SHR S %CPU %MEM
TIME+ COMMAND
20726 phil 15 0 73888
1696 972 R 66
0.0 0:16.92 sshd
20264 root 16 0 1073m 995m
957m S 32 25.3 1:56.83 bpOut
20721 root 15 0 1031m 469m
469m S 20 11.9 0:07.33 psrv
20260 root 18 0 1041m 957m
957m S 20 24.3 1:08.09 bpInp
20720 root 15 0 1031m 732m
732m S 20 18.6 0:07.41 psrv
20727 phil 15 0 33180
1352 1024 R 10
0.0 0:02.43 scp
----------------------------------------------------------------------------------------------------
scp not running, data taking still
running
top - 12:08:07 up 13 days, 17:31, 4 users, load average:
2.77, 1.84, 1.46
Tasks: 166 total, 2 running, 163 sleeping, 0
stopped, 1 zombie
Cpu(s): 7.7%us, 10.4%sy, 0.0%ni, 71.6%id,
1.6%wa, 0.5%hi, 8.2%si, 0.0%st
Mem: 4033376k total, 3994320k
used, 39056k free,
496k buffers
Swap: 10233384k total, 196k used,
10233188k free, 3665700k cached
PID USER PR NI
VIRT RES SHR S %CPU %MEM TIME+
COMMAND
20264 root 16 0 1073m 995m
957m S 40 25.3 2:53.05 bpOut
20260 root 18 0 1041m 957m
957m S 17 24.3 1:37.26 bpInp
20721 root 15 0 1031m 863m
863m R 13 21.9 0:30.77 psrv
20720 root 15 0 1031m 938m
938m S 11 23.8 0:31.54 psrv
sshd
is
using 66 % of 1 cpu probably to do the encryption..
processing: x101/090223/test.doc
02dec08: pdevs1 disk doesn't keep up. (top)
Galfacts was running on
01dec08-02dec08 with 65mb/sec per box. At 00:27:10 an abort message
was written to the message.log. The circumstances were:
- It had be running for 12384 seconds running at 65mB/sec
- 190*2 2GByte files had been generated for this scan.
- pdevs1 ran out of memory buffers (1 gb or 16 secs of data at
65 mb/sec).
- pdevs2 thru pdevs7 did not slow down. Only pdevs1 failed.
The table below shows when the last few disc files were generated
and their last modification dates. The open times come from the
message log. The message is written after the file has been
successfully opened and the fits header info has been written.
Column 2 is pdevs1, column 3 is pdevs2. For each pdev there are two
files generated: b0s0g0, b0s1g0 for the two 170 MHz Bands.
Times when fits files
were opened.
fileNumber
|
pdevs1
open (secMidite)
|
pdevs2
open (secMidnite)
|
open Dif
p1-p2
|
lastMod
p1_p2l
|
lastMod
p1-p2
|
00286
|
00:23:36 (1416)
00:23:36
|
00:23:36 (1416)
00:23:36
|
0
0
|
|
|
00287
|
00:24:41 (1481)
00:24:42
|
00:24:41 (1481)
00:24:41
|
0
1
|
00:25:47_00:25:46
00:25:46_00:25:47
|
1
-1
|
00288
|
00:25:47 (1547) s0
00:25:46 (1546) s1
|
00:25:46 (1546)
00:25:47 (1547)
|
0
0
|
00:27:04_00:26:52
00:26:58_00:26:52
|
12
6
|
00289
|
00:27:08 (1628) s0
00:27:04 (1624) s1
|
00:26:52 (1612)
00:26:52 (1612)
|
16
12
|
|
|
abortMsg
|
00:27:10 (1630)
00:27:10 (1630)
|
00:27:10 (1630)
00:27:10
|
|
|
|
- Timeline:
- 00:25:46 - times the same. last write to file 00287
- 00:25:47 - open file 00288 still in sync
- 00:26:52 - pdevs2 finishes writing file 00288
- 00:26:52 - pdevs2 finishes open,header write file 00289.
- 00:26:54 - no scramnet buffers logged to disc for 5 seconds.
- 00:26:58/00:27:04 - pdevs1 finishes open to file 00288 and
writing headers. 6/12 seconds behind.
- 00:27:08/04 - pdevs1 finished open/header write file 00289.
- took 4 seconds from close of previous file.
- is now 16/12 seconds behind pdevs2.
The scramnet pnt, agc info is logged to disc on
pdevs1 once a second. three time are recorded each second:
- the time in the block
- The time when pdevs1 shmScram read the block from the scram
multicast
- the time when pdevs1 scrmLog program wrote that block to disc.
The plot shows the scramnet times close
to the pnetctl abort (.ps) (.pdf):
- The different times are color coded.
- Black is the time stamp in the pntblk.
- Red is when the block was read from the ethernet broadcast
- green is when the block was written to disc
- At 26:54 no scramnet packet was seen for 5 seconds.
- This 2 seconds after pdevs2 close file 288 and 4 seconds
before the last write to 00288.s1 on pdevs1.
- The other pdev devices receive the scramnet broadcasts. They
did not have any delays in their disc writes.
- The scramnet info is also recorded directly from the shared
memory on observer2. During this time period the packets were
showing up once a second.
- A delay in receiving scram net packets could occur if:
- the packets didn't arrive
- the scramShm program could not grab the semaphore for
scramShm (bpOut could be holding it).
- An error in writing the scram block out (via scramLog) could
cause a delay, but it would also generate a message via syslog
and none were seen.
Looking at the raid disc controller info. pdevs1 disk 5 had 43
timeouts. None of the other discs on pdevs1 raid or any of the other
pdevsN's raids had any disc timeouts.
summary
- pdevs1 ran out of buffers at 27:10. There are enough buffers
for about 16 seconds so the problem could have occured more than
16 seconds prior to this.
- pdevs2-pdev7 had no problems.
- at 27:04 one of the pdevs1 files was 12 seconds behind the
pdevs2 file.
- scramnet buffers stopped being recorded for 5 seconds at
26:54.
- Something slowed down pdevs1 around 26:52.
- It took about 4 seconds to open and write the header
files for pdevs1 file 288. Pdevs2 did this within 1 second.
These operations don't need any scramnet info so that isn't the
reason for the slowdown.
- Looking at pdevs1 raid statistics, disc 5 does not look like
it is working 100%.
- On 05dec08 we changed disc #5 in the raid array. We'll see if
this fixes the problem.
Monitoring pdev server cpu & disc with
iostat. (top)
The iostat program is run on the pdev
servers for about 12 hours each evening. Galfacts (65Mb/sec)
and a2133 (AUDS) run during this period. We would like to see what
the cpu and disk are doing during data taking (especially during
galfacts aborts).
The iostat program is set to average over 10
seconds intervals and then record the info to disc. This info is
then plotted. It contains:
- cpu usage (top plot)
- %user (black) . Fraction of time cpu running in user mode.
- %iowait (red). Fraction of time the cpu was not busy and
there was an outstanding io request
- %sys (green) fraction time cpu running in system mode
(in the kernel).
- %idle (blue) fraction of time cpu was idle.
- disc i/o rates (middle plot).
- this monitors /dev/sdc1 (/dump/pdev) where the data files
are written.
- Write i/o rate (black) in Mbytes/sec
- Read i/o rate (red) in Mbytes/sec
Daily
monitoring of iostat on pdevsN servers
Date (start date)
|
Notes
|
29nov11 (.ps) (.pdf))
|
programs run:
- (14.3-16.5):p2627 voltage sampling 40mb/sec
- io rate 40 mb/sec
- No bitflipping of spectra so cpu not busy.
- 15.2 burst of 100mb/sec.
- They stopped to switch feeds. when they start up
it gets behind for awhile..
- (16.5:19.25): p2030 pulsar search
- 19.5:-01.25 a2130 galfacts
- The red read rates is probably the p2030 compression
running after the p2030 run.
|
19nov09 (.ps)
(.pdf) |
ok.alfalfa[-6:0],galfacts[0:6]
|
14dec08 (.ps)
(.pdf) |
ok
|
13dec08 (.ps)
(.pdf) |
no galfacts this night
|
09dec08 (.ps)
(.pdf) |
ok
|
07dec08 (.ps)
(.pdf) |
ok
|
06dec08 (.ps)
(.pdf) |
06dec08 and before only
monitored pdevs1
|
05dec08 (.ps)
(.pdf) |
disc 5 raid array
pdevs1 changed before run
azimuth power supply fault stopped data taking. There were
no i/o aborts. |
04dec08 (.ps)
(.pdf) |
|
03dec08 (.ps) (.pdf) |
galfacts aborted around
1:29:21 am.
2nd page is a blowup near the failure. Note the large
iowait.
|
processing:
svn/pdev/pdev/monitor/runiostat to collect the data.
x101/pdev/iostat/pltday.pro to make plot.
Pdev raid disc i/o rates: (top)
The program discwr2
(/share/megs/phil/svn/pdev/pdev/scripts/discwr2 ) tests the raid
disc writing speed on each pdevsN server.
- It writes 1000 records of 20Mbytes each to the file
/dump/pdev/discwr2.out
- It times each record and the global time (globalStart, write
all data, globalStop)
- For the 1000 record times measured it also reports the
min,max,mean, and rms of the record times.
- The measurements could vary if someone was using pdevsN or
accessing an NFS mounted file.
The raid
measurements can be found in the discwr2 output file.
The program discwr2tmp tests the single disc i/o
rates (no raid speedup). The single disc is normally used for
galfacts compressed output.
The
single disc measurements can be found in discwr2tmp output file.
History:
- 16dec09: group 0 now has 5* 1.5TB Drives. It was 6*.75TB
drives.
- 16may09: arun replaced 750 gb drives on pdevs13 with 1.5 Tb
drives.
- first slot in raid no longer working.. 4 drives active.
- i/o rate went from 30mb/sec to 137 mb/sec
<- page up
home_~phil