Writing usrp data across the network
13aug18
The standard usrp software (from juha) reads
from the usrp into a ram filesystem. These files then get moved to
disc on the usrp166 computer.
On 9-13aug18 during a 4 day 430tx run, we tried writing across the
network on a 1 gigabit link rather than to a local disc. The data
rate is about 100 mbytes/second.
What we tried:
- The drf_ram_move.py script is the original script to write to
disc. It will
- scan directories in /ram looking for files to move
- copy the metadata files
- move any .h5 data files found (one at a time).
- The tests tried moving the data to a disc array on gpuserv1.
- setup 1
- usb3 to 1 gigabit ethernet was used to go to the 10 gigabit
switch and then gpuserv1
- an nfs mount of the remote file system
- this would fail after a few minutes
- setup 2 use rsync
- send 1 file at a time.. ran for a few hours
- switched to send all files available (except the last 2).
this seemed to improve the reliability.
- rsync would typically find 11 files and send 9 at a time
(each file is 1 second of data 100mbytes).
- switch from usbEthernet adapter to a pci ethernet device
- this ran for many hours..
- arun then increased the mtu from 1500 to 9000.
- This improved things so rsync would find 9 files and send
7 at a time.
- We were still having consistent failures at the same
time each day (00:17 ast).
- Looking at the rsync log file on gpuserv1 we could see:
- the connection from usrp166 for the metadata transfer was
made,
- there would then be a 30 to 40 sec gap before the meta
data filelist (usually empty) would show up.
- During this time the ram filesystem (4Gbytes) would fill
up and the datataking program would abort.
- The 00:17 time matched the crontab.hourly runtime..
but crontab.hourly had no tasks to run
- Arun commented out running crontab.daily
- that got rid of the 00:17 ast hangups for the last two
days.
- the only confusing thing was that cron.daily was being run
at 2:46 ast each day, not 00:17 ast
Monitoring the disc i/o rate on gpuserv1
iostat with a 30 second interval was run on the gpuserv1
disc (/dev/sda1) to monitor the i/o rates.
the plots show the
gpuserv1 i/o rates during the run (.ps) (.pdf)
- Page 1
- top: i/o rate versus day of month (all times are
ast).
- black is the write rate, red is the read rate
- the gpu's on gpuserv0 were processing the data over nfs
while the data was being written
- Dropouts:
- 11.0 : the write dropout at 11.0 was one of the failures
at ast 00:17
- i didn't record the dropout on 10aug 00:17
- 11.3 i think this was a power dip, where the program was
stopped then restarted.
- 11.8 this was when arun switched the crontab.
- changing i/o rates
- You can see spikes in the i/o rate (up to 160Mbytes/sec)
near the start of each day (see page 2)
- bottom: disc transactions/sec
- It averaged around 650 transactions/sec
- When the gpus were not reading it fell to around 400 (just
the disc writes).
- Page 2: when the i/o rates changed
- this has a blowup around .0625 *24 = 01:30 AST (each day)
- The i/o rate to disc would
- increase to 160Mbytes/sec for a 30 second average
- drop to 40 Mbytes/sec for the next 30 second average.
- the 60 second average remained 100 Mbytes/sec...(the
expected average rate)
- I looked at the rsync daemon log file (~guest/Rsyncd) and
rate that rsyncd was writing did not jump during this time.
- I wonder if this has something to do with the disc cache
buffers..
- Was rsyncd reporting when the data is written to the
disc cache rather than the physical.
- Is iostat reporting the actual rate to disc , or to
the cache.
- It's still a little strange that it jumps to 160
Mbytes/sec and then 40Mbytes/sec..
- I would it expect to happen in the other order
Summary:
- At the end we wrote at 100Mbytes/sec for 1.5 days without
losing data.
- To do this we:
- used rsync rather than nfs
- send multiple files at a time via rsync (rather than 1 at a
time)
- setup the mtu to be 9000 bytes
- disable running cron.daily in the crontab.
- drf_ram_mov_rsyn.py
- On usrp166 ~usrp/src/juha/trunk/python/thor2/arecibo/ (use
function gousrp)
- backed up off line:
/share/megs/phil/svn/aosoft/usrp/phil/scripts/arecibo/
(checked out of svn repository).
- the destination host,filesystem is set in the 2 remote_ xxx
variables.
- logging:
- usrp166: ~usrp/bin/monram.sc
- does a df /ram every 5 seconds and logs to
/tmp/ram.log
- gpuserv1
- ~phil/iostat.sc /dev/sda1
- logs to ~phil/sda1.log output of iostat every 30
seconds.
- There is something going on with gpuserv1 at 01:30 each
morning.
- looking at the crontab on gpuserv1 rear is run once a day at
01:30
- Aug 13 01:30:01 gpuserv1 CROND[22998]: (root) CMD
(/usr/sbin/rear checklayout || /usr/sbin/rear mkrescue)
q
processing: x101/180813/rsynciorate.pro
up to
aeron
home_~phil