Currently discussing the problem in the jethome github:
Didn’t try what ijdqq suggested yet, but it seems that we got a solution in github:
Do this:
Stop z2m
Delete z2m backups
Clean the NV_MEM script zigbee2mqtt/zStackEraseAllNvMem.js at master · Koenkk/zigbee2mqtt · GitHub
Cold restart stick 2538
Start z2m
Set up a new network, add devices.
Stop z2m and check that the NIB table is 116 bytes. If so, then your network will work stably.Use the firmware for 2538 20201010.
After this, the NIB table was 116 bytes for me and cold restarts, power losses etc. were not a problem anymore. I will try to put this into some kind of script, but here are the commands I run on my machine assuming
Warning: you will lose all pairings of course.
device: /dev/ttyAMA0
z2m installation: /opt/zigbee2mqtt/
systemctl stop zigbee2mqtt
mv /opt/zigbee2mqtt/data ~/backup/zigbee2mqtt
mkdir /opt/zigbee2mqtt/data
cp ~/backup/zigbee2mqtt/configuration.yaml /opt/zigbee2mqtt/data/configuration.yaml
[Wipe all network related data from /opt/zigbee2mqtt/data/configuration.yaml]
node /opt/zigbee2mqtt/scripts/zStackEraseAllNvMem.js /dev/ttyAMA0
shutdown -h now
[unplug raspberry, replug it]
[boot, start zigbee2mqtt, re-pair all devices]
Try with one or two devices first If you have done this, then cold restart (unplug power) and test if the devices are still available. If yes, connect the rest of your network.
Just to mention.
To me the problem seems to be that the initial firmware either doesn’t make sure that the NVram is in a defined state after flashing, or it has some weird pre-flashed values in there.
Thank you so much @tsmt. If this indeed will solve the problem I will of course do the nvram-reset after flashing the firmware and before sending them out.
Cheers!
Confirmed fixed, see
hey guys,
thx for your invested time to fix this issue.
Maybe there is somebody who can write an low level description step by step to fix. german would be great. english also ok.
Thanks again everybody.
Felix
Hey,
I’ll switch to german for a second.
german:
Ich nehme an mit low-level meinst du eine “einfachere” step by step anleitung? Klar, kann ich machen. Hast du zigbee2mqtt mit oder ohne docker installiert?
Ich plane aber auch, dafür ein script zu schreiben was den ganzen prozess besser automatisiert. Ist das sehr dringend, oder kannst du da ein paar Tage drauf warten?
deepL for english because I’m lazy:
I assume by low-level you mean an “easier” step by step guide? Sure, I can do that. Did you install zigbee2mqtt with or without docker?
But I also plan to write a script for it which automates the whole process better. Is this very urgent, or can you wait a few days for it?
Hey
Korrekt low level Kenntnis bei mir
Ich nutze zigbee direkt mit homegear ohne zigbee2mqtt.
Warten kann ich noch hab aktuell nur 4 zigbee module die kann ich im Notfall schnell wieder anlernen
Gruß
Felix
@Adrian, could you say something about the outcome of the error? Maybe something you can implement into homegear to fix this?
I’ll try to look over it next week to see if I can do something when the stick is initialized (when the network is ‘reset’).
Either clearing NVRAM or maybe dealing with that buggy table only. I might have to look over the firmware sources for the details.
It seems that I already had an attempt in the code to clear NV on network initialization.
The only thing I did differently on what I found pointed out here:
Was using a soft reset for the stick instead of a hard reset. I’ll change the code to do a hard reset to see if it makes a difference.
I think I had some issues with hard resets in the past, that’s why I avoided it (it disconnected the serial connection or something like that).
LE: Yes, I did, so for now I’ll try to find some other solution than using a hard reset.
I’m currently trying to understand why the issue is happening.
By looking over the sources of the firmware and over the TI forums, I suspect that changing the buffers sizes when compiling the sources is a cause of the bug.
The ‘table’ that gets truncated is actually a struct that has a byte ‘all fresh’ which I think does nothing, two bytes that contain the network manager address (this should be zero, I think), two bytes that contain a counter for transmissions and a byte that represents an ‘update id’. This one seems to be dangerous to be corrupted (along with the network manager address) as it’s uses by devices that ‘think’ they lost the network, to rejoin it. It’s used for network updates.
From what I’ve seen in the sources, the devices might receive a ‘network update’ request which they would ignore if the update id they have in the NIB has a bigger or equal value with the one from the request. It appears that such requests might be sent when the channel and/or the channel mask changed.
Maybe the issue would be solved if setting the channel mask to allow only a single channel? It would be worth testing with such a setting, to see if it solves the problem…
Anyway, my attempt to solve it now is to have the NIB explicitly cleared/filled with 116 zeros when network is initialized. I’ll commit the new sources soon.
This is great @Adrian. So in theory, with your fix, the CC2538 will just start working. Or should the devices all be re-paired?
The network should be re-initialized, so the devices need re-pair.
I would remove the devices from the network, then issue a reset from the command line, then re-pair them.
I didn’t create a full blown script to github, but just instructions on how to reset your controller. This is due to the fact that I had problems with handling NPM from bash script. Basically because npm doesn’t respect signal handling pretty well and I didn’t have the time to find reliable workarounds for it.
Anyway, here is a very detailed tutorial together with some scripts to repair your devices.
@dr_snuggles
In deutsch. ich hab das aus Faulheit mit DeepL übersetzt und nur drüber gelesen. Falls da noch verständnisschwierigkeiten sind, schreib mir gerne nochmal dann bessere ich das aus.
German
@Adrian
Sorry I don’t really know how homegear handles zigbee that’s why there is a TODO in the tutorial. If you can give me a short instruction if homegear does buffer the NIB_TABLE and how I could instruct users to reset it until you find a in-software way around it, that would be great.
Also, if you do a PR to homegear which fixes this, can you notify me because I would just be interested in how you solve this.
It doesn’t buffer NIB table. It just explicitly erases it now at network initialization / commissioning, as a workaround for the firmware bug.
Great, so no need to delete a backup file as with z2m. Thanks!
Hey,
der Link führt ins Nirvana Also beide
Kannst mal prüfen ob da ein zahlen Dreher drin ist.
Für mich nochmal zum Verständnis funktioniert das jetzt nur für Zigbee2mqtt oder auch wenn man zigbee im homegear nutzt.
Danke und Gruß
Der Doc
Link sollte jetzt gehen. Sorry, das github repo war auf privat gestellt.
Ja, das läuft über zigbee2mqtt. Hab aus beruflichen Gründen momentan leider nicht die Zeit, da ein extra tool für zu bauen. Du kannst danach zigbee2mqtt wieder löschen. Während du das script laufen lässt, mach bitte homegear aus
LG
@here
Please wait using the mentioned tutorial, another test showed it might still be unstable.
Bitte noch nicht das script anwenden. Der Prozess scheint instabil zu sein, ist mir in meiner Produktionsumgebung eben aufgefallen.