Device having erratic behaviour

yn1v · July 26, 2019, 10:22pm

I have had a raspberry pi shake form the very first batch for some time using a raspberry pi 3. I first tried with a raspberry pi 1 and it didn’t work. Later that issue was fixed, but I have already installed the unit. It worked well for a long time, more that a year, maybe two. Then I have a a failure from the power back up unit. So I took the case and change everything. I put a raspberry pi 1 B with a 4D board and a new UPS. At the beginning it was kind of slow to respond to the setup web interface. I leave it for some time and a day later it was more responsive. Then I checked and was not responding at all. I can not get the web interface, and I can not connect by ssh. It only respond to ping. I did not have time to bring the unit from under the stairs at that moment. Next day it was working. It has a blank space in the graphs, but it was working. I downloaded the logs, but I did not make sense of what to look for. I leave it for two or three more days, and I got another white spaces on the graphs. Not all the graphs are showing the same interruptions. Not sure if it something to do with using a raspberry pi 1, and it is under too much load with a 4D board. I am not sure where to start debugging this.

iannesbitt · July 27, 2019, 12:18pm

Hi @yn1v If you could send us the logs when the frontend is working, we could try to see what’s going on.

yn1v · July 27, 2019, 4:09pm

I am uploading the whole log files, as there is no specific request of a particular file.

RSH.RF724.2019-07-27T16_00_56.logs.tar (1.6 MB)

I was wondering if there was, some comments about what each file is… I just found
https://manual.raspberryshake.org/logFiles.html
I will try to see if I can make sense of the logs files and provide more info if I can. I will happy to run do some changes and run some tests.
Best regards.

yn1v · July 27, 2019, 4:32pm

Acording to slarchive_127.0.0.1_18000.log there is a network time out for localhost.
rsh-fe.server.log said that: ERROR 6: GEOS support not enabled.
rsh-fe.output.log has a python error:

IOError: [Errno 2] No such file or directory: ‘/opt/settings/user/UDP-data-streams.conf’
rsh-data-producer has also error:
enabling diskspace monitoring
find: /opt/seedlink/acquisition/seedlink/RF724': No such file or directory find: /opt/seedlink/acquisition/seedlink/RF724’: No such file or directory
find: /opt/seedlink/acquisition/seedlink/RF724': No such file or directory find: /opt/seedlink/acquisition/seedlink/RF724’: No such file or directory
find: `/opt/seedlink/acquisition/seedlink/RF724’: No such file or directory
Signal 11 (SEGV) caught by ps (procps-ng version 3.3.9).
ps:display.c:66: please report this bug
/opt/bin/rsh-data-producer.start.sh: fork: Cannot allocate memory

rsh-data-consumer.log has the following:

2019-07-26 13:29:27 S.T. [ERROR] libslinkmm: [172.17.0.2:18000] timeout waiting for response to ‘HELLO’
heli_ewII: RequestWave: server: 127.0.0.1 16032 Trace RF724 EHZ AM 00: No connection to wave server.
heli_ewII: RequestWave: server: 127.0.0.1 16032 Trace RF724 EHZ AM 00: No connection to wave server.
heli_ewII: RequestWave: server: 127.0.0.1 16032 Trace RF724 EHZ AM 00: No connection to wave server.
ows.log
2019-07-26 13:29:27 S.T. [ERROR] libslinkmm: [172.17.0.2:18000] timeout waiting for response to ‘HELLO’
odf_SL_plugin.warn
2019 205 14:48:59>> DP send to DDS server failed, cannot requeue, queue is full (16257 messages)
odf_SL_plugin.err
2019 207 05:34:28>> DDSsend(): Send error: 0
2019 207 05:34:57>> sendDClientDP(): Error sending data …
2019 207 07:25:33>> DDSsendDP(): send error EPIPE (Broken pipe), closing socket

myshake.out:
System Info

heli_ewII : NOT Running
OWS : NOT Running
SeedLink : NOT Running
ODF : NOT Running
slarchive : NOT Running
SL info: NONE Available

ivor · July 27, 2019, 4:42pm

hi,

looking at the log files, there is indeed something strange going on, though it’s not entirely clear what:

when the unit was last booted, on day 192, NTP failed to start, likely due to no access to internet and / or NTP servers. this took 9 days to resolve, at which point the boot-up process continued. this does seem to be resolved now as an issue.
the unit is also configured to have both ethernet and WiFi interfaces ON. is this really what you want? this shouldn’t be a problem, but when trying to diagnose network-related errors, it’s always good to reduce the configuration to the bare minimum to see if there is any effect. (you can turn off WiFi from the FE configuration interface.)
the two docker containers responsible for producing and consuming the seismic data exited 1 day ago, which is why you currently are not seeing any data. sometimes this points to a corrupted SD card, but is not absolutely conclusive.

so, with those observations in mind, can you try?:

turn off WiFi
reboot the unit to start from a fresh state
resend the log files ~10 minutes after start-up

thanks in advance, i anticipate that the new log files will give us more information to work from in identifying next steps.

cheers,
richard

yn1v · July 27, 2019, 4:58pm

Before getting your feed back, I disabled the wifi. It has no wireless device, so it seems better to have that OFF.

I did restarted the equipment, I was guessing that a reboot will not hurt and maybe it will allow to get things available in the proper order. Makes sense with your comments.

So, these are the new logs:

RSH.RF724.2019-07-27T16_00_56.logs.tar (1.6 MB)

I need to do some errant, and I will be back online in three hours (more or less). But I will happy to try other things then.

Best regards

yn1v · July 27, 2019, 5:07pm

I have problem with the forum. It matched the previous file with the new file and it did not upload the new version.
https://www.dropbox.com/s/vi05a2hobiohot2/logs_after_reboot.tar?dl=0
Here is the logs after I rebooted.
Sorry if this make some confusion.

Neville

ivor · July 27, 2019, 5:31pm

hi, thanks for the quick response.

these are the same log files as before, which is a problem with the browser’s cache that we thought was fixed in the last release of the FE configuration program.

can you tell me what browser you are on?

and, to get around this for the moment, open a new browser tab in private mode, connect to the FE and download the files again.

thanks, sorry for the inconvenience,

richard

ivor · July 27, 2019, 5:34pm

and, i can also confirm that your data is now flowing to the server just fine, so it looks like your problem may have solved itself (at least for the moment):

https://raspberryshake.net/stationview/#?net=AM&sta=RF724

yn1v · July 27, 2019, 11:24pm

I am no longer at the office. The problem with logs was not on the browser side, at least not at first glance.
I downloaded the file and it was given a (1) as it was the same name as the previous log. The problem occurred when I uploaded the file to the forum. It dropped the (1) and it didn’t created a new file on the forum. The link, pointed to the old file. I renamed the file and still got issues. I think that it was different file, as I opened one log and I saw a more recent time stamp.
I will try again tomorrow, getting two logs and making a more thorough comparison. I will post results and provided details about the firefox version.
Best regards.

yn1v · July 28, 2019, 3:55pm

I have made two separate log downloads. The time between the 2 were about 40 minutes.

ls -lah RSH*
-rw-rw-r–. 1 neville neville 1.8M Jul 28 09:41 ‘RSH.RF724.2019-07-28T15_02_09.logs(1).tar’
-rw-rw-r–. 1 neville neville 1.8M Jul 28 09:02 RSH.RF724.2019-07-28T15_02_09.logs.tar

The content of both files is the same. I made a checksum and the results were the same.

I am using Fedora 29 with kernel 5.1.15-200 64bits and Firefox 67.0.4

There is anything else I can do to help you with this bug in downloading the logs? There is a work around?

Best regards

Neville

ivor · July 29, 2019, 2:12pm

please try to download from a private browsing tab

yn1v · July 29, 2019, 5:03pm

The good news is that the raspberry shake continues to work without any more problems.

I tried downloading the log, then waiting for about 15 minutes, then opening a private browser windows and downloading the logs again.

$ ls -lah RSH.RF724.2019-07-29*
-rw-rw-r–. 1 neville neville 1.9M Jul 29 10:43 RSH.RF724.2019-07-29T16_43_30.logs.tar
-rw-rw-r–. 1 neville neville 1.9M Jul 29 10:54 RSH.RF724.2019-07-29T16_54_21.logs.tar

I ran a sha256sum on the files and now the sum is different.

If there is something else that I can do, I will gladly run more test.

Best regards

Neville

ivor · July 29, 2019, 5:39pm

hi neville,

can you upload one of those tar files? i’d still like to see the before and after results to try to understand what was going wrong, and why / how it seemed to fix itself.

thanks,

richard

yn1v · July 29, 2019, 6:04pm

Sure,

This file are the logs from today:
RSH.RF724.2019-07-29T16_54_21.logs.tar (1.9 MB)
This file are the logs from yestarday:
RSH.RF724.2019-07-28T15_02_09.logs.tar (1.7 MB)

I am happy that I can help. Raspberry Shake is a wonderful project.

Best regards

yn1v · July 31, 2019, 5:15pm

I have new blank spaces on the graphs. But then it become feeding again the graphs. I was out of office yesterday. I will look into the logs to see what I can learn.
RSH.RF724.2019-07-31T17_12_53.logs.tar (2.0 MB)

Best regards

Neville

yn1v · July 31, 2019, 5:44pm

I see in the logs two things that catch my attention.
First, the CPU load went from 0.68 to 3.70 … it is being overloaded.
Second, The local connection to the docker interface is not being resolved. As if the container was offline.
In the mean time, I will reboot the device.

Device having erratic behaviour

myshake.out: System Info

myshake.out:
System Info