Dewar monitoring
oct 2006
NOTE: 26feb15
The rcvMNProg.c has been moved to linux
(svn/aosoft/src/vwconvert/rcvmon/rcvMNProg.c). It is currently
running on galfas2.
I need to update the documentation below and move the doc from
vxWorks to linux.
Contents:
Block diagram:
Debugging the dewar monitoring
Software
rcvMNProg- program
to control and read dewar monitor.
Dewar monitoring daily
plots (for the web)
Monitoring the
dewar temperatures in real time
The platform
ethernet monitor program
History/problems:
Block diagram: (top)
The receiver dewars are outfitted with a monitoring
system. The monitoring system consists of:
- Each dewar has a monitor system. The control/voltage lines
from all dewars converge to an AO built multiplexor (on the
rotary floor).
- The multiplexor has the lines from all of the dewars coming
into it. It also receives ttl mux addresses from an hp34970 (in
the rfi box in the right blue cabinet on the rotary floor). The
mux decodes the addresses sent from the hp34970, selects one
dewar, and one function to read. It then passes the voltage of
this reading to an analog input device (a/d converter) on the
hp34970.
- The hp34970 is in the rfi box in the right cabinet on the
rotary floor. It has an digital i/o module and an a/d module
inserted in its slots. The hp34970 will receive monitor function
requests from a computer (downstairs rfip1 ) via gpib. It then
selects the address in the mux and reads the analog voltage from
the line sent back from the mux. The data is then passed to the
rfip1 computer downstairs via gpib/ethernet.
- The program rcvMNProg runs on the rfip1 computer to control,
read, and store the dewar monitoring data. (source code
~phil/vw/datatk/rcvMon/rcvMNProg.c) controls the dewar
monitoring. It runs on the rfip1 computer in the rfi create. The
computer is running vxWorks. The program configures the
hp34970 and then cycles through all receivers reading all of
their outputs. This takes about 23 seconds for 1 pass through
all receivers. The data is written to disc
(/share/obs4/rcvm/rcvmN).
- Communications connection: The hp34970 is connected via gpib
to a national instruments gpibenet device that is in the rfi box
with the hp34970. The gpibenet takes gpib as input (from the
hp34970) and sends it out via ethernet to the rfip1 computer
downstairs. The pieces of equipment used are:
- gpibenet device. This is a 10/100 mb device. We use it in
the 10 mbit mode. It sits in the rfi box on the rotary floor).
- 10b to10fl transceiver. The gpibenet outputs twisted pair.
This cable goes to the transceiver that converts this to
fiber. This transceiver is also in the rfi box.
- The fiber from the transceiver connects to the rfip1 single
board computer via the platform ethernet.
- rfip1
computer. This is a motorola single board computer
(sbc). It is the 2nd computer in the rfi crate. The rfi crate
is the bottom rfi crate in the 19 inch rack that also holds
the pnt vme crate. This rack is in the clock room to the left
of the door as you enter. The platform ethernet starts here on
the ei interface. rfip1 can access ao net via the bath:
rfibackplaneNetwork -> rficpu -> aoNet.
- From the transceiver in the rfi box to the rfip1 cpu via the
platform ethernet. The network is 10 mbits (set by the
repeaters and rfip1 interface. The ethernet path has :
- rfip1 connects to rep1 using its ei interface via a fiber
patch cable (see rep1
port
usage)
- rep1 port 6 connects to rfip1. rep1 port 5 goes upstairs
to rep2 (port11) via the main fiber cable C1 (see cable1
fiber
usage)
- rep2 port12 sends the signal down to the xcvr in the rfi
box in the turret room (see rep2
port
usage).
Debugging the
dewar monitoring: (top)
The dewar monitoring has had troubles in the
past. This has mainly been caused by the communications between the
rfip1 computer and the hp34970 device.
The symptoms:
Some symptoms of dewar monitoring problems:
- The displays are not updating. Either the rcvMNProg is not
running, there are communications problems, the hp34970
has trouble (it may have powered off) , or the ao mux is not
working.
- The displays work but they are updating very slowly (longer
than one minute between updates). This is usually caused by
communications problems between rfip1 and the hp34970).
Debugging details:
Try the following when trying to debug the dewar
monitoring. Note that all communications with the rfip1 computer
need to be done from a computer that knows how to get there (eg
observer2).
- See if the rcvMNProg is running on the rfip1 crate: rsh rfip1
i
This prints a list of the currently running programs on rfip1. You
should see:
NAME |
ENTRY |
tid |
pri |
status |
pc |
sp |
errno |
delay |
rcvMNProg |
rcvMNProg |
ee9b58 |
140 |
PEND |
2e748 |
ee9460 |
d0003 |
0 |
If you don't see rcvMNProg in the list, you should try starting it
by:
- rlogin rfip1
- rcvMNProgStart
- Print out the debug info from the rcvMNProgStart. (The output
needs to be documented)
rsh rfip1 rcvMNProgDbg .. will print:
rsh rfip1 rcvMNProgDbg
progRunning:1 gdDev:0 lastSec:49537 lastErrno:0 lastErrSec:-1 adrDelay:0
StopRequest:0 CurPosProg:Call GetRcvr
out : 49537.0 outV: 1.0 curRcvr:11 curDewar:3 curMuxAdr:4
outfile:/share/obs4/rcvm/rcvmN prcvrI:0xf01790 needReset:0
rcvListLog:rcvrsToLog.dat numRcvsToLog:8
rcvNumsToLog: 2 5 7 8 9 10 11 12
dewAdrToLog : 1 4 6 7 8 5 3 9
tmSndDev:39136 5.529 39136 5.529 39136 103.328 ms (last,min,max)
tmRdDev :49537 87.961 47209 82.496 43841 174.944 ms (last,min,max)
tmIo : 0 0.000 0 999000.000 0 0.000 ms (last,min,max)
tmAdr :49537 7.445 0 0.000 47838 9.550 ms (last,min,max)
tmTot :49537 88.054 47209 82.589 43841 175.038 ms (last,min,max)
tm1Rcvr :49534 1874.628 40120 1862.127 39705 1968.039 ms (last,min,max)
voltsA voltsB curA curB temp
1.304 1.298 0.000 0.000 16K: 9.804 dwrP15: 0.000 ledHemtA: 0 rcv:11
1.023 0.992 0.000 0.000 70K: 0.000 dwrN15: 0.000 ledHemtB: 0 tm:49538
0.000 0.000 0.000 0.000 OMT:15.637 postP15: 0.000 lkShorDisp: 0
The tmxxx lines show when (ast seconds from midnite) and how
long it took for different operations. Each of these has the
last,mintime, and max time.
See if the communications is working:
- rlogin rfip1 (from observer2)
- ping "gpib0" .. you need the quotes
- This will start pinging the gpibenet device in the rfi box
upstairs. It will continue running until you enter control-c.
The output should look like:
ping "gpib0"
PING gpib0 (192.160.175.10): 56 data bytes
64 bytes from gpib0 (192.160.175.10): icmp_seq=0. time=0. ms
64 bytes from gpib0 (192.160.175.10): icmp_seq=1. time=0. ms
64 bytes from gpib0 (192.160.175.10): icmp_seq=2. time=0. ms
64 bytes from gpib0 (192.160.175.10): icmp_seq=3. time=0. ms
64 bytes from gpib0 (192.160.175.10): icmp_seq=4. time=0. ms
ctrl-c
.. walkback from ctrl-c printed...
----gpib0 PING Statistics----
5 packets transmitted, 5 packets received, 0% packet loss
If the ping test failed can do a couple of things:
- get the dell laptop used for the tiedowns, and turret. This
should have a name of platform 1 with the correct ip address
(see correct
ip
info).
- plug the fiber of the laptop into various parts of the
fiber chain and then:
- rlogin rfip1
- ping "platform1"
You should get the same listing as the ping "gpib0" above. A
good place to start is the output of rep1 (port 5). You can
then move up or down the path till you find where it starts
working or not working
- Take the gpibenet and the xcvr down from the rfi box in the
dome and use that as your ping probe. You should used ping
"gpib0" for the ping command.
We have had some trouble with the hp34970 being powered off
(or sitting in standby mode). It is plugged into a ups and
should not lose power (unless someone turned off the ups
accidentally...). Look in the rfi box in the turret room and
make sure that the hp34970 is on and the screen is in
remote. You could just cycle the power and see what
happens. You also want to make sure that someone had not
inadvertently changed the gpib address of the hp34970. The gpib
address should be: 11 decimal (0xb hex).
Run the platform ethernet statistics logger. This is a script
that does an rsh rfip1 ifShow every N seconds and logs it to a
disc file. It will tell you if there have been any ethernet
input/output/ or collisions vs time (see how to run the platform
enet monitor ).
Software: (top)
- rcvMNProg- program to
control and read dewar monitor. (top)
- The rcvMNProg runs on the rfip1 computer on vxWorks. It
controls the hp34970, configures the aomux, reads the
data, and writes the data to the disc file:
/share/obs4/rcvm/rcvmN.
- The output datafile (rcvmN) grows to about 80 Mbytes per
month. At the end of each month, rcvmN is moved to
rcvmN.yymm and rcvmN is reset to 0 size (this is done by the
end of month processing: /home/phil/admin/monthproc.sc). The
program rcvMNProg should be stopped while these files are
switched (or it will continue to write to the archived
file).
- The source code is in
/home/phil/vw/datatk/rcvMon/rcvMNProg.c .
- It can be compiled with make rcvMNProg in that directory.
- The object code to be loaded in vxWorks is compiled into
the directory /home/online/vw/load . This file
is loaded into the rfip1 computer at boot time.
- /share/obs4/rcvm/rcvrsToLog.dat: This file is
read by rcvMNProg when it is started. It determines which
dewars are monitored. The file contains all dewars. Putting
a # in column 1 will cause a dewar to not be monitored (in
case it has been removed).
- The rcvMNProg is started automatically when the rfip1
computer is booted.
- You can stop and then restart the rcvMNProg from a
computer that can access the rfip1 computer. Be careful that
you don't get more than one copy running:
- rlogin rfip1
- rcvMNProgStop .. this will stop the program
- i .. this will list the programs running
- rcvMNProgStart .. this will start the program (make sure
the old version has exited).
- logout .. to exit the rfip1 computer.
- If you try to stop the rcvMNProg and it won't exit (maybe
because the gpibenet is hung up), you can try the following
from the rfip1 prompt:
- rcvMNProgShutDown .. this will try and
close the file descriptor used by the gpibenet.
- If the above doesn't work, you can try to manually close
the gpibenet file descriptor:
gpibEDbg .. this prints out the status of the
gpibEnet driver on vxWorks.
vw-> gpibEDbg
gpibEVerbose:0
num Use fd Role ibsta iberr
0 1 26 D 100 0
1 0 0 B 0 0
Look in the column Use and find the row that has a 1 in it.
The adjacent col (fd) is the file descriptor for the
hp34970. If you close this fd (close,26) this should
shutdown the rcvMNProg
Dewar monitoring daily plots
(for the web): (top)
- A cron script is run daily on megs (4:25 am) to
create the dewar
monitoring
daily plots . The script is located at
/share/megs/phil/x101/dwtemp/dwtempdaily.sc. The plots can
be found at http://www.naic.edu/~phil . Scroll down to
monitoring and click on dewar temperatures.
- dwtempdaily.sc starts idl and then runs dwtempdaily.pro to
create the plots for the previous day. The start/stop time
for the script are logged in dwtempdaily.log. The idl
sesssion output is stored in dwtempdailyidl.out. If the
plots are not updating, you should take a look at the .out
file.
Monitoring the dewar
temperatures in real time: (top)
The dewar temperatures can be monitored in real time with (more
info):
- monrcvtemp (/usr/local/bin/monrcvtemp). It will bring up
a window with the dewar temperatures for each receiver. It
will update every 30 seconds when a new round of data has
been input. It is reading the file
/share/obs4/rcvm/rcvmN. The routine needs to be run from a
sun computer.
- monrcv (/usr/local/bin/monrcv). Contains the
monitored voltages/currents of the amps as well as the
monrcvtemp temperatures. Needs to be run from a sun
computer.
- monrcvpl (/usr/local/bin/monrcvpl). Plots the 16k dewar
temps for the last 60 minutes and the last 24 hours. It
runs as a strip chart updating the values when new data
becomes available. An idl program is started when you
enter monrcvpl. The command monrcvpl is currently only
available on the sun computers (/usr/local/bin). There is
no reason why it can't run on the linux machines.
The platform ethernet monitor
program: (top)
- The script /home/phil/vw/datatk/rcvMon/chkmon.sc is a
script that will monitor the platform ethernet (used by the
dewar monitoring program). Every N seconds it will send rsh
rfip1 ifShow to the rfip1 computer. This
returns the current state of the IF interfaces. The ei
interface is used for the platform ethernet. The script then
sends this data to a disc file. You should edit the script
and set delaysec to the number of seconds to delay between
queries (default was 3600 =1 hour). The data is written to
the file rcvMNProg.log in the directory where chkmon.sc is
run. You might want to rename the old logfile to something
else prior to running it.
- The idl script rcvmnprog.pro will plot out the i/o
statistics for the data output to rcvMNProg.log. To run it:
- go to the rcvMon/ directory
- idl : starts idl
- @phil & @geninit .. for
initialization
- hard=0 --> no hardcopy, send to screen. hard=1 will
output to a .ps file.
- .run rcvmnprog .. this will read the file and plot
the various parameters. The i/o rates are plotted with
their median removed.
- You should see a constant ramp in the input/output. Pay
special attention to the i/o errors and conditions. They
will all be plotted vs date.
- To kill the chkmon.sc just do a ctrl-c in the window where
you started it.
History/Problems:
(top)
- 10sep13: working again.. see below
- aug13:
- ethernet repeater (in control room below fiber cables)
failed. bypassed. made it straight thru.
- 34970 multi meter fails.
- main unit does not pass self test. bad voltage
readings. replaced with spare multi meter.
- multiplexor card was not working. always gave 0
readings..No errors. replaced with spare mux card
- found loose cable in our multiplexor switch (that
multiplexes the various dewars back to the 34970). loose wire
mad bit 2 of dewar address always be high. fixed the wire
- 13jun11:
- gpib to ethernet and multimeter had been losing power. We
replaced the UPS that gave power to the rfi box in the rack.
Was an APC, now a triplite.
- 11oct06: dewar monitoring slowed
down. Updating once every 5 minutes. It finally died on 12oct06.
- ping "gpib0" from rfip1 failed.
- We brought the xcvr and gpibenet down to the control an
hooked it into the fiber coming out of rep1 port5.
- The xcvr was a 10/100 transceiver. We notices that when the
xcvr lost the fiber input, the gpibenet would no longer sync
up with the xcvr. The 10/100 link light on the gpibenet stayed
off (it should be yellow when 10 mb).
- We replaced the xcvr with a 10mb xcvr and no longer had a
linkup problem between the gpibenet and the xcvr (even when
the fiber was removed).
- We took this working system up to the rotary floor and
installed it. It didn't work. When pinging the gpib0 we would
occasionally see the tx flash on the gpibenet but packets
would never get back to rfip1.
- We took the xcvr and gpibenet and connect them to the main
fiber (c1.12,c1.13) in the sband klystron room. ping worked
fine.
- We moved to the input of rep2 (port11). Ping worked fine.
- We tried different output ports and ping failed on all of
them (we didn't switch the port11 input).
- Conclusion is that the rep2 is bad.
- We took a fiber barrel and jumpered the fibers rep2.port11
(inp) to rep2.port12 (out) to bypass the repeater.
<-
page up
home_~phil