Shake still running, but data went offline

jbeale · June 14, 2019, 3:52pm

I got an email this morning 14-June-2019 mentioning my Shake AM.R79D5.00.EHZ hadn’t been seen in 3 days. I found its local webpage was still working, but “Data Consumer” and “Data Producer” were both OFF. Looking at the seedlink.log file I see something bad happened 3 days ago:
Tue Jun 11 16:12:13 2019 - seedlink: seedlink.cc:362: Cannot allocate memory

Prior to this, it had been running OK for months. I just rebooted it and it is running again- for now. I do not know what went wrong. Note this Shake is unusual in that it runs on a Pi Zero W, although not on wifi, it has an external USB-Ethernet device. Is it possible there is a slow memory leak so memory usage creeps up over time? If so the Pi Zero would die sooner, since it has less available memory than a Pi2 or Pi3. I will try to attach the log files here.
RSH.R79D5.2019-06-14T15_04_05.logs.zip (209.7 KB)

iannesbitt · June 14, 2019, 10:10pm

Hi @jbeale—you are correct that the Pi is running out of memory. In this case it’s OWS that’s the culprit, although it’s still unclear as to whether or not it’s a slow memory leak or simply running out of memory in standard operation.

Do you run SWARM all the time? Are there multiple clients that pull data from the unit, or just a single SWARM instance?

In any case, unless you have a specific low power consumption requirement, you’d probably avoid this issue (or at least avoid it for much, much longer) by upgrading to a Pi that has > 1 GB of memory instead of 512 MB.

You could also try increasing the swap size if you’re not afraid of that sort of thing.

Ian

jbeale · June 15, 2019, 6:36am

Thanks for the reply, interesting. I did run SWARM continually for a while some time ago, but have not opened SWARM on any machine in several weeks. Using a swap file on the microSD card seems like moving in the wrong direction for long-term reliability. I assume a Pi with larger memory would at least postpone the problem, and there’s no problem with somewhat more power, although since I was not using SWARM or any other function on the machine when it crashed, I’m not confident I understand what could have caused the problem in this instance. I wonder if there is any additional logging that could shed light on the issue?

iannesbitt · June 16, 2019, 11:14pm

I’m not sure if it would be captured in system logs although I’m curious what /var/log/syslog, /var/log/dmesg, and /var/log/messages would have to say.