Station disappearing from RS StationView every few days

Hi Ian, hope all is well. Received the haven’t seen you in a while email today at 9:03. As usual the station, AM.RB9B8, showed connected.

Here’s the log files. I rebooted it.

Cheers

MarkRSH.RB9B8.2019-05-05T02_44_10.logs.tar (2.9 MB)

Hi @MarkC, Richard and I both took a look at the logs, and sadly both odf_SL_plugin.err and postboot.log.old tell a story of DNS-related connection woes (detailed below).

If I remember correctly, you’re on a satellite connection? I wonder if we could come up with a script for satellite users that checks the logs for a lapse in connectivity then gracefully restarts the services in order to handle reconnection automatically. This probably wouldn’t be included in any RS software release (yet) but as a custom code for users with satellite or otherwise spotty connections.

Here is an example of connection loss from odf_SL_plugin.err where the plugin loses connection to the DNS server outright and cannot re-establish:

2019 119 15:34:12>>    DDSsendDP(): send error EPIPE (Broken pipe), closing socket
2019 119 15:34:12>>    DDSsend(): Send error: 0
2019 119 15:34:12>>    sendDClientDP(): Error sending data ... 
2019 119 15:34:32>>    create_socket(): Error in getaddrinfo: Name or service not known
2019 119 15:34:32>>    Likely cause is no DNS server found.
2019 119 15:34:32>>    sendDClientDP(): Error sending data ... 
2019 119 15:34:32>>    sendDClientDP(): Error sending data ... 
2019 119 15:34:32>>    sendDClientDP(): Error sending data ... 
2019 119 15:34:32>>    sendDClientDP(): Error sending data ... 
...and so on

And here is an example of general connection spottiness from postboot.log.old where NTP tries to start but can’t, tries several more times, and finally gets through:

2019 071 05:59:39: We have an internet connection
2019 071 05:59:41: NTP failed to start, and we have an interent connection, trying to restart it...
2019 071 05:59:54: We have NTP services

Hi Ian,
Thanks for the update. Yes, the ISP uses a satellite connection.
A communication flatline check and service restart sounds like a good idea. You could even have a “switch” on the configuration page for users to set this feature on/off.
I’d be happy to offer this station to test code.
Any chance you could “expose” to the user (perhaps on the config page):

  • the number of times services have been restarted
  • the time and date of the last restart?
    Thanks also for your and Richards time on this.
    Mark

Hi, missing you message arrived 8:54am today. Hopefully here’s the log files.

RSH.RB9B8.2019-05-30T04_15_05.logs.tar (3.0 MB)

Hi Mark, I’m wondering if this might help with automating a response to connectivity issues:

There are some things in this blog post you probably don’t need but overall it seems like it could be useful in rebooting when there’s especially poor connection. On the other hand, it may end up rebooting way more often than you’d like. You may want to change the connectivity check to every 20 minutes instead of 20 seconds, for example. Unfortunately I’m afraid you’ll have to play around with it in order to figure out what works best for your particular flavor of connectivity issues.

Let me know how it works if you end up trying it.
Ian