Both alive with no feedback from the institutes… Sorry but they are “Free Spirits” …
If I have logs or more info I will upload it.
So sorry for the inconveniences
…Both alive with no feedback from the institutes… Sorry but they are “Free Spirits” …
If I have logs or more info I will upload it.
So sorry for the inconveniences
…Good News! Some info from R4D07, it is running firmware v18. Please find attachet the log files!
Original data copied each night to our owncloud by davfs2 protocol, so data in the microsd, has sometimes some gaps during nights. Station stops ¿? MicroSD corrupted ¿?
RSH.R4D07.2020-02-19T11_35_28.logs.tar (2.9 MB)And now logs from R888C, finally I could get them! It is also running Firmware V0.18
hi mario,
it’s there now. there was a sync problem between the data server and the station-view server that has been resolved.
sorry about that,
richard
Thank you!
Let’s see if the logs give some information about what could happen to these two stations…
Regards
Mario
Hi,
Did you find anything in the R888C and R4D07 log files that could explain what happened last week?
We are intrigued, now everything seems to work correctly …
Best regards
Mario
hi mario,
at some point in january, both of these units were having problems with their network / internet connections. one way this manifested itself was that each would make a connection request to the shake data server, which was granted, but would then immediately be followed by a “socket read error” on the server-side, prompting the connection to be closed. the unit then would reconnect and the cycle would continue.
since this behavior is detrimental to the overal health of the server (to which the entire AM network is connected), the IP addresses for these two machines were banned from being able to communicate with the server at all, thus preventing the connection request from even getting in.
your query prompted these IP addresses to be “un-banned” to test if they were the same units you had reported, and to see if the network problem had resolved; it was true in both cases.
a better solution to this problem (client network has issues) is to put more intelligence into the client to recognize that this ping-pong / up-down connecting to the server is occuring and to change the re-connection request interval to something more reasonable, say, once per hour.
cheers,
richard
Thank you very much for this info Richard.
But, what about the data gaps in the mseed files recorded in the microsd? Is all related? Can a client/ server problem produce these gaps in the acquired dará files?
Regads
Mario
hi,
the gaps you see here locally are not related to the story with the server connection. these two data “destinations” are fully uncoupled from each other, in that, if delivery of data to one fails, this will not affect the delivery to the other in any way.
what does seem to be happening is that the data transfer between the shake board and the Pi is sometimes compromised (see log file system.log
), where the data reader program is unable to read the data coming off the port quickly enough, some data is dropped, and the packet is rejected. this is the first short gap you see on day 049 at 01:30.
as for the gap that starts a short time later, continuing until the morning is unexplained, i cannot find anything in the logs pointing to a definitive cause. you said that you download the data in the middle of the night, at what time? is it possible that the download is somehow having an adverse effect on data collection? do you see this on any other units you fetch data for in the middle of the night?
in any case, i would confirm the cable connections between the Shake board and the Pi unit, this might be a cause of the intermittent dropped data packets.
sorry i can’t say more about the longer gap. your suspicion that the SD card may be corrupt is also a possibility. when it’s easy, replace the SD card and see if this results in a (positive) change.
cheers,
richard
Dear Richard,
Thank you very much for your explanations, they are very instructive.
Only some remarks:
Thank you very much for your time and help.
Best Regards,
Mario
Maybe this image allows clarifying the point that intrigues me, there are gaps in the data when the server issue is solved. Why?. Maybe the two data channels are not as uncoupled/independent as expected.
What I mean is, could the large gap of day 049, without explanation in the system registry (system.log), be related to the problem of not being able to send data to the server?
hi mario,
your analysis is, of course, compelling! (and i have to say, you very much back me into a corner as well when you use my own program to visually state your case… )
what i now think is going on is that instead of the problem having anything to do with data being sent to the server, the problem instead lies with the data not being properly read off the serial port. it cannot be a coincidence that when the problem sending data to the server is solved happens to be the same moment when the data-producer program is restarted. so the problem with data flow stopping must be further upstream.
as well, i would agree that this is not related to your daily download since you are doing this on several units and see problems with only one. (is that correct?)
at this point, i can only suggest to reseat the the Shake board, perhaps this could help. if this continues to be a problem after that, then if you have a spare Pi you could swap it out with, that would answer whether or not the problem sits with the computer. when the problem persists across two Pi’s, then that could point to the problem being with the board itself.
sorry i can’t be more definitive, the log files have told me only so much and don’t really point me to where this problem could be originating from. in these cases, the only real course of action is to make a guess where it could be, make a modification, and check for any change in behaviour. a process of elimination, as it were…
let me know any more of what you find. speaking of which, it would be nice if you could put numbers on this:
thanks,
richard
Dear Richard,
Sorry, this query is getting longer and the fronts are diverging, however there are still some points that I don’t understand. These 2 stations are working fine now, they acted in a strange way during 3 days, but they recovered the expected behaviour, so I’m not sure about the electronic explanation…
Quoting your last answer :
I don’t understand this new explanation. In a precedent answer (#14, Feb, 25) I was told: the IP addresses for these two machines were banned from being able to communicate with the server at all, thus preventing the connection request from even getting in. Your query prompted these IP addresses to be “un-banned” to test if they were the same units you had reported, and to see if the network problem had resolved; it was true in both cases.
So, if I understood well, you had to act on the server, and it is clear that at point the communication is re-established, we had a gap in both stations. If the problem was the serial port, why do we have data, before and after this server un-banning, in the microSD ¿?
Yes and No. Yes, We are running this script in 9 RS and never found anything strange… Only on these Two stations (R4D07 and R888C) around the same day… So, No, We had problems in two stations.
They are in 8 high schools (plus one at home), nine in total. It will be difficult to act on the 8 RS till we recover the instruments. Now they are working fine, so… The one at home is a RS4D and had a “similar communication/hardware problems”, but I changed the PI, re-burned the microSD 3 times and plugged it to the ethernet connector of a wifi tplink repeater, and it works better now (RS 4D Continously rebooting). I’m really concerned with the problems at home when working with wifi, I had a lot of gaps and strange resets, but now it is difficult to know, as I changed all the configuration. So we can leave this one apart …
The only way to know and quantify this is running a msi over the mseed copied from the microSDs cards, please find the resulting file attached. I only run it over 2020 data, it is long… The problem is that we won’t have the logs, I can have them for 4, maximum 5, RS, so it will be difficult to know their origin, but what it is sure is that they are not transmission gaps to caps server …
Best regards
Mario
rasp_2020_gaps.txt (147.6 KB)
hi mario,
sorry, but it’s entirely possible i’ve lost the plot. be assured, it is not my intent to try to deflect or confuse, but the restrictions on my time are real, and sometimes the best i can do is provide an explanation based on my best deductive reasoning; and there will be no guarantee any of my conclusions are actually correct.
if i understand correctly now, there are currently no problems with the data capture, either to the local disk or to the server?
if so, then it may have to remain a mystery as to the complete dynamics of what caused the problem in the first place. i think we can conclude that the banning of the IP indeed had an overall detrimental effect. if un-banning the IP’s caused the problems to go away, then there is no longer a problem to solve. again, without setting up a thorough test of what happens in all combinations of settings, both client- and server-side, i’m unable to do more than guess based on the information that’s available (log files plus your observations).
if this is something that must be understood, in absolute terms, please have a look at purchasing technical support so that resources can be assigned to looking into this at a deeper level. as well, i would also encourage you to investigate why the network connection was faulty in the first place, generating the socket read errors on the server, which then caused the IP’s to be banned.
cheers,
richard
hi again,
looking at the server logs just now, the station R4D07 is again exhibiting the ‘socket read’ problem; its IP is banned from communicating with the server until the problem resolves.
not sure how long this problem persists, but while it does would be a good time to get a better understanding of what is happening with the network client-side, to see if anything can be done to prevent it.
cheers,
richard
Thank you very much Richard for your time and help. I wrote to the person in charge of the Network in this scool, to try to know what happened, but, as you, I think it will remain a Mistery. Could you please unban it again from your server?
If I get the Logs I will send them again.
Have a nive weekend!
Regards
Mario
hi mario,
the IP has been un-banned, unit has connected and seems to be functioning normally again. and since this seems to be a recurring problem, we really need to figure out why this network issue is occurring and solve it since this really is detrimental to the server’s operations.
thanks, buen fin de semana…
richard
Thank you very much, I will try to obtain as much as information as possible!
Have a good weekend!
Mario