Timeouts from disc i/o slowdown
The problem
12meter data taking with the mock
spectrometers was occasionally having an error and causing
the datataking to stop.
On 29mar23 at 13:23:56 one of these error occurred.
- The error printed on the user terminal was:
- 2023-03-29_13:23:56 mockpset:error mockpset grp g1 (0).
stat:er retVal:pseterr pdevs2.pdev.naic.edu psetReply:..
retrying
- This occurred when the tcl code tried to send the pset hdr
info to the mock output program before the start of a scan.
The mock datataking programs consist of:
- tcl program that user interacts with
- pnetctl - running on pdevs1.
- main program that talks to all of the programs on each
mock spectrometer
- the tcl program talks to to pnetctl.
- For each spectrometer (up to 7) there are a set of programs
used to take the data.
- bufferpool:the local programs communicate via input
and output queues that sit in shared memory.
- the memory is referred to as the buffer pool
- there is about 4 Gbytes of memory in the pool
- it contains large buffers (for data)
- small buffers to pass commands back and forth.
- this allows queuing of data and commands.
- prun - control program running on the mock fpga.
- collects the data and send it via local ethernet to
the psrv program running on the mock cpu
- psrv - runs on the mock cpu
- started by xinetd when the prun program connects to psrv's
port
- buffers the data from prun and then passes it on to
the buffer pool in shared memory.
- this program was originally written by jeff, it also has a
number of diagnostic routines.
- bpInp :buffer pool input program.
- Takes data that psrv put on bpInp's input queue
- Bit flips all spectra
- Adds some info to the header
- passes the data to the input queue of the bpOut program
- bpOut: program to continue filling the header
and then output the data to disc.
- receives data buffers on it's input queue.
- fills in the header
- outputs the data to disc
- It also receives commands/info from pnetctl via it's
input queue (from bpCmd)
- bpCmd: runs on each spectrometer. interfaces pnetctl
to bpInp,bpOut
- bpInp,bpOut read from their input queues in shared memory.
- pnetctl needs to send info,commands to these programs for
configuration. but pnetctl is on a remote computer
- so bpCmd listens on an input socket port.
- when it receives a request from pnetctl, it can
pass them on to bpInp,bpOut on the local
shared memory input queues.
Logging info around the error
The tcl log file had:
2023-03-29_13:22:48 13 308800075 offsets:
-400.691 0.000 start:132252
2023-03-29_13:22:49 Mock scan starting filenumber:07300
2023-03-29_13:22:52 Observation begins Wed Mar 29 13:22:52 2023.
scan:308800075
2023-03-29_13:23:53 Scan 308800075 completed.
2023-03-29_13:23:56 mockpset:error
mockpset grp g1 (0). stat:er retVal:pseterr
pdevs2.pdev.naic.edu psetReply:.. retrying
2023-03-29_13:23:56 mockpset:error mockpset grp g1 (0). stat:to
retVal:.. failed retry..quitting
- The previous scan finished at 2023-03-29_13:23:53 ( when prun
exited after completing the integration time)
- The tcl program then sends the pset info for the next scan
(this was an ra/dec map with 60 sec scans).
- The pnetctl program got the request
- it sent the request to the 7 mock spectrometers
- it went to bpCmd and then to bpOut
- pnetctl then waits up to 1 second for the reply
- I think this is where we are timing out.
message log of pnetctl
- msgOut:76:msg:infook pnetg0:Observation
begins at 1680110572.01 Wed Mar 29 13:22:52 2023:
- scan started 13:22:52 .. should last for 60 seconds.
- msgInp:pseterr
tm sendBpcmd pset: tot:1.0022439956665
send:pdevs1.pdev.naic.edu=0.000104
pdevs2.pdev.naic.edu=0.000060
pdevs3.pdev.naic.edu=0.000056
pdevs4.pdev.naic.edu=0.000048
pdevs5.pdev.naic.edu=0.000050
pdevs6.pdev.naic.edu=0.000051
pdevs7.pdev.naic.edu=0.000050
- The command was sent correctly..
- rcvCum:pdevs2.pdev.naic.edu=1.00163
- The reply took longer than 1 second to get back to
pnetctl. this is the timeout failure.
The message log on the mock spectrometer cpu
- I chose to look at pdevs2.
- the datataking programs send messages to /var/log/messages.
- the file had:
- Mar 29 13:22:48 pdevs2 xinetd[3292]: START: pdev pid=11930
from=192.168.7.123
- prun connects to psrv port. xinetd starts psrv in
preparation for the scan.
- Mar 29 13:22:51 pdevs2 bpInp[3759]: Got hdr band 1
- hdr info bpInp arrives (via bpCmd)
- note: When sending info to bpOut, it always goes through
bpInp's input queue to make sure that things are synchronized.
- we don't want to setup a new scan if there are still buffers
in bpOut's input queue from the previous scan).
- Mar 29 13:22:51 pdevs2 bpOut[3763]: opened
file:/dump/pdev/sun.20230329.b1s1g0.07300.fits
- file number passed to bpOut via bpCmd telling next file
number to use.
- Mar 29 13:22:51 pdevs2 bpOut[3763]: Got hdr band 1
- header info for new scan gets to bpOut.
- Mar 29 13:23:53 pdevs2 xinetd[3292]: EXIT: pdev status=1
pid=11930 duration=65(sec)
- psrv exits when prun finishes sending it data.
- all of the buffers from psrv have been put on bpInp's input
queue.
- psrv ran for 65 seconds. the first 5 seconds setup info was
passed around.
- Mar 29 13:24:26 pdevs2 bpOut[3763]: subBand:1 done:blks:60
bytes:34360300
lastFile:/dump/pdev/sun.20230329.b1s1g0.07300.fits
- bpOut reports that it has finished writing all of the
buffers from the previous block at 13:24:26.
- the datataking finished at 13:22:52 1 min=13:23:52 ..
so the data took 34 extra seconds to write to disc.
- the pset command request was put on bpOut's input
queue around 13:23:55
- It was not operated on until all of the previous
data buffers were output (at 13:24:26).
- this is 31 seconds after it was issued. Much longer than
the 1 second timeout.
- The data rate for the experiment is
8192channels*2pols*4bytes per second
- This is 65KBytes/second.
- I recently ran the disc i/o test on all the mock discs. they
all wrote greater the 120MBytes/second.
- So don't think there is a bad disc.
Plotting the buffer pools status around the error.
I had started the bufpool monitoring routine
around 13:00 on 29mar23 to see what was happening with the
buffers.
- the bpMon program reads the bookkeeping info from the bufpool
and outputs the into every 5 seconds to disc files.
- this is done for each mock cpu
- there is info for
- the large (data) and small (command) buffer lists
- the bpInp input queue
- the bpOut input queue.
The plots show the number of buffers in the various
queues around the time of the failure (.ps) (.pdf)
- Each page has info on a different queue or buffer list
- The 7 frames on a page are the 7 mock spectrometers
- The data is sampled once every 5 seconds
- the green lines are the 1 minute scan prior to the failure
- the red line shows the failure
- Page 1: Large (data) buffer freelist queue
- the queue holds 50 buffers.
- When bpInp gets data, it extracts a buffer and places it on
bpOut's input queue.
- when bpOut finished writing the data, it gets put back
on the freelist queue.
- The during the 60sec scan, the freelist maintains the 50
buffers. the input, output happens fast enough to not see a
change
- around 10 seconds before the end of the scan, the buffer
count begins to decrement. It ends up having up to 10 buffers
in use by bpout
- since spectra were integrated for 1 sec each, no data was
written to disc for up to 15 seconds.
- It took about 30secs after the end of the scan for the i/o
to complete.
- The required data rate was 65KBytes/second. The discs were
later timed to write at 120MBytes/second.
- All 7 mocks showed identical response, so this is not a
local disc problem on a mock.,
- Page 2: Small buffer (command) free list queue.
- No commands were sent during the failure.
- There should be 1024 buffers in the freelist. looks
like we've probably got sum buffers that have not been freed
(aborts?)
- Page 3: bpInp input queue
- This holds the buffers on bpInp's queue waiting to be
processed.
- There are no buffers backed up waiting for bpInp to process.
- Page 4: bpOut input queue.
- These are the buffers waiting on bpOut's input queue.
- The 10 buffers removed from the freelist are waiting for
bpOut to process.
- bpOut can not process a buffer on it's input queue
until the previous write has completed.
- This shows the problem is with bpOut writing to disc
- The same thing happened to all 7 mocks, so it is not a
single mock disc problem
Summary
- mock datataking has been aborting with pset errors
- The problem occurs because:
- prun reports that the integration has completed
- the user tcl program sends a new pset command to
bpOut to setup the parameters for the next scan.
- To be sure the command is synchronized with any buffers left
in bpInp,bpOut queues, the command is placed on bpInp's input
queue
- bpInp passes it on to bpOut's input queue
- For some reason there are 10 buffers (of 1 sec of data each)
still sitting on bpOut's queue.
- The pset command is placed behind these
- 30 seconds after the scan ends, the last data buffer
is finished being written to disc
- the pset command is then accepted by bpOUt
- Meanwhile the program sending the pset command times out
after waiting 1 second for the reply from bpOut.
- The problem occurs identically for all 7 mocks.
- this shows it is not an individual mock or mock
disc problem
processing:x101/230329/pseterr.info,
x101/pdev/bpmon/bpmonplot_230329.pro
<-
page up
home_~phil