RS1D frequent data overruns and gaps in data

My RS 1D was placed into service about 5 days ago. I’m seeing frequent gaps in the data, as illustrated by the helicorder gif, and the system log shows frequent data overruns. Can anyone tell me what I need to do to correct this? Software version is the latest, AFIK, since it was downloaded from the RS web site.

Thanks,

James, KB5FIO

RSH.R7314.2020-10-27T03_42_42.logs.tar (4.5 MB)

Hello James, welcome to the community!

Thank you for posting the logs from your Shake. From them, a couple of things that are connected to the gaps in your data appear. The first one is the following:

Oct 26 23:27:02 raspberryshake kernel: [446898.167351] ttyS ttyS0: 1 input overrun(s)

|2020 301 03:42:38>>|No Data has been received from the MCU in 12 read attempts.It appears the MCU is not transmitting data. This is a fatal condition and should be investigated if this condition persists!|
|2020 301 03:42:38>>|Data has been successfully received, fatal condition resolved.|

You already had noticed the input overrun errors, and those, connected to the MCU errors (even if they appear to solve themselves) can mean that some of the connections of the Shake are not solid.

Did you assemble the Shake yourself? Even if you received a pre-assembled unit, can you please check that every cable is properly in its socket, and that the Shake blue board is solidly inserted in the Raspberry Pi computer pins?

The second issue is this:

2020 301 03:37:06>> Time adjustment M0: HARD RESET. This will result in a one-time time-tear.
2020 301 03:37:06>> 5.0: NTP Time (Init): NTP: 1603769825.760442495
2020 301 03:37:06>> 5.2: NTP Sync (HARD): VEL Before: 1603769694.499000072 After: 1603769825.500000000 Diff: 131.000999928
2020 301 03:38:16>> Connection succeeded to DDS server.

Again, the problem seems to auto-correct, but these mean that the Shake is having a problem with the communication to the NTP server (the one which takes care of the time synchronization).

Could you please check if in your modem/router the

port 123

that is used by the NTP process is open for TCP and UDP traffic in both directions?

These ‘time tears’ also cause the issue with the broken upload to our servers, which receive data correctly (as shown here https://raspberryshake.net/stationview/#?net=AM&sta=R7314) but only between a ‘tear’ and another.

We will see what we have to do after these initial checks.

Continuing the discussion from RS1D frequent data overruns and gaps in data:

I did build it myself. I have been a radio amateur for more than 30 years, so I am very familiar with building and repair of electronic devices. I was careful when I assembled the shake. Nevertheless, I retrieved it from its outdoor enclosure today and checked everything out. The connector securing the wires from the geophone to the shake board is good and tight, the shake board connector is fully seated on the RPi pins, and the power and ethernet cables are fully seated. While I had the unit open, I re-flowed the solder on the shake board connector pins, since pin headers are a frequent cause of bad connections. None of this seems to have done any good.

The router is not blocking any ports or any type of traffic, but it does have a settable option to allow the router itself to use NTP. I enabled that about 5 days ago. When the shake was re-started at about 2244 Z on 2020-10-27, it set itself to the correct time, so it must have access to NTP, or some source of time data.

This problem with gaps in the data has gotten much worse over the last two days, following a temperature drop of about 40 F / 22 C. There is a definite thermal component to this problem which, I agree, points to a bad connection or bad solder joint somewhere in the system. Are there any components on either the shake or RPi boards that are known to be troublesome in this regard?

Please advise as to the next thing I should check.

Thank you,

James, KB5FIO

Perfect, thank you very much for such extensive checks both on the Shake and your router and for your explanation. It was not to put in doubt your capabilities, but only for me to know if it was a pre-assembled unit or not.

Could you please send me the logs after your 2244Z restart? I want to see if there are any differences from the previous one, and to see how the NTP connection is fairing since as you said, it managed to synchronize again.

I will ask our dev team about the temperature drop. I’ve followed a bit the weather situation and it was really a cold blast for this time of the year. The Raspberry Pi board is certified with an operating temperature between 0°C (32°F) and 60°C (140°F), while the Pi board can go to -20°C (-4°F). Outside this range, some kind of error can happen.

As of now, I can see a more constant stream of data to our servers: https://raspberryshake.net/stationview/#?net=AM&sta=R7314 with little to no interruption.

RSH.R7314.2020-10-28T17_51_40.logs.tar (5.1 MB)

Logs tarball downloaded today is attached. Still shows data overruns and MCU not transmitting, but postboot shows NTP is available.

Update: The shake performed perfectly today from about 11 AM until almost 5 PM local time (1800 Z to 0000 Z). During that time the external temperature was near or above 60 F / 16 C. But when the sun began to set and the temperature began to decline, the data gaps re-appeared. The problem is definitely temperature-related.

Regards,

James, KB5FIO

Thank you for the new logs James.

Yes, the correlation appears quite clearly. I took a bit of time in monitoring the live stream from your station on StationView and review the records from SWARM: everything seems to support this.

Have you noticed the formation of any condensation after a cool/cold night near or on the Shake? Or the chance to put a remote thermo/hygro sensor to further confirm this hypothesis?

Maybe it could be the case to consider the construction of a vault to better insulate the unit? Here are some experiences from our users: https://manual.raspberryshake.org/posthole.html#examples-2-amateur-seismic-vaults

Condensation is a good guess, but it is not the problem here. I live in high desert country in Arizona. We have not seen a drop of rain in more than 4 months, my outdoor digital hygrometer is currently reading off-scale low (i.e. less than 11% relative humidity) and the glass containing an iced drink that is sitting on the table beside me is completely dry on the outside – no condensation due to the low humidity. That is all normal for this area.

I have previously read the posts about vaults that others have built. My shake is housed in a weatherproof electrical junction box that is buried in the ground up to the lid. Today I added a layer of 3/4 inch (about 19 mm) thick styrofoam insulation to the top and sides. That will help isolate the unit from diurnal temperature variations, but the foam only has an R value of 3 or 4, so it is a less than perfect insulator. It will also retain some of the heat generated by the shake inside the box. I will need to watch it for a day or so to make sure it does not overheat.

We will also see if it has any effect on the data gaps. The shake (or, at least my shake) seems to have a very narrow operating temperature range. The technical specs do not say anything about this, so it is still possible that there is a defect in my unit.

James, KB5FIO

Status update: system.log shows no input data overruns since 1821Z on 29 Oct. That was just before I finished installing the insulation and I closed the box. Local helicorder data shows no gaps in the last 24 hours. It seems the key to making the shake work properly is to keep it nice and warm. I can’t blame it, I don’t like to be cold either. :slight_smile:

There are a few gaps visible in swarm, which may be due to networking issues. Odf_SL_plugin.err entries for day 304 nearly all involve connection refused and errors sending data. Am I correct in assuming these errors are due to networking issues? Latest logs tarball is attached below.

Thanks,

James KB5FIO

RSH.R7314.2020-10-30T19_02_35.logs.tar (2.1 MB)

Thank you for all the updates James, they are so well written that it is becoming like reading a novel! Thanks also for the climate explanation, it is a very different place from where I live (Scotland).

From the logs and the available data I can see the same and confirm what you said. There are still many hard resets in the logs, which means that every now and then the Shake loses the connection to the NTP servers (but, at this point, it has to be a ISP issue, since the ports are enabled and the NTP reconnects after a while).

Every time this happens, with a Hard Reset, then it takes a bit before the data upload starts again (as soon as the time gets synchronized again), so the gaps are related to this issue too.

Regarding the connection issues you can see in the odf_SL_plugin.err file, they are istantaneous. If you combine the last one from the logs with the odf_SL_plugin.info file for example, you can see that the connection problem resolved itself after a bit:

2020 304 17:09:40>> Connection request (raspberryshakedata.com:55555) failed with error code: Connection refused

2020 304 17:09:50>> sendDClientDP(): Error sending data ...

2020 304 17:09:50>> Connection succeeded to DDS server.

For example, now your stream is quite good, both on our StationView and on SWARM, but it is daytime. It seems then the the insulation was the ‘cure’ to that, and we both agree with the Shake!

Thanks for confirming my suspicion that the remaining gaps are due to networking problems. I get my internet access via a microwave link to an antenna on the mountain about 2 mi / 3 km to the east. This provider does have frequent, usually brief outages–I notice them when surfing the web sometimes. But, until someone decides to run cable or fiber out this way, that is as good as it is going to get. I can always add a GPS receiver as a time source for the shake if this becomes more troublesome.

Another 24 hours has elapsed with no gaps in the local data record, so I think I will declare this issue to be resolved. I will continue to monitor the shake’s behavior during the next freezing weather episode and on into the winter. If more or better insulation is called for, then that can be arranged.

Thanks much for your insight and help, Stormchaser. Stay safe and stay healthy!

James KB5FIO

1 Like

Yes, those Hard Resets are due to NTP server connection issues. Sometimes is the modem/router, and sometimes is the ISP, as it is your case.

I hope you will get cabled soon, or at least, have put down a more stable mean, but as of now I can see that the data stream has become much healtier compared to what it was, and the insulation solution was definitely the perfect choice.

The GPS possibility is a realistic one. If the service interruptions become really too bothersome, then that will solve any time-synchro problem, leaving only for the Shake to upload the data when possible.

It was a pleasure James, thank you and stay safe and healty too!

73 from IZ6GSM

Enjoy Shaking!