Hi,
May I suggest also asking questions about cc2538 on TI forums?
Zigbee & Thread forum - Zigbee & Thread - TI E2E support forums
Some people giving answers there are very knowledgeable about both the hw and zstack.
Hi,
May I suggest also asking questions about cc2538 on TI forums?
Zigbee & Thread forum - Zigbee & Thread - TI E2E support forums
Some people giving answers there are very knowledgeable about both the hw and zstack.
Thank you @trapperjohn and @Adrian. I could talk @tsmt into checking the Z-Stack SDK and check what goes wrong with the nvram (probably) when using UART.
Sent him hardware for testing, so hopefully he can work on it this week.
Hey guys,
First of all Thank you all for your testing and time you invest. It’s great to see that so many smart people helps us to figure this out.
I tried to understand what you did and maybe partly I understood. What I’m not fully aware of is do you have an idea or do we still have no clue why this happens?
Thanks again for the support and no pressure just interested
Felix
@tsmt will have a look into this in the next days. I’ve sent him some testhardware which should arrive today or tomorrow.
They arrived today, thank you! Will continue setting up my testing environment the upcoming days and keep you up2date here!
Hi everyone. While waiting for a proper solution on firmware-level and given the assumption that the non-volatile memory is somehow corrupted on, or during boot-up from power outage: Can someone with a “test setup” maybe try if dumping the nvram of the cc2538 to json with zigpy-znp (GitHub - zigpy/zigpy-znp: TI CC2531, CC13x2, CC26x2 radio support for Zigpy and ZHA) after pairing of devices has finished and restoring it (also with zigpy-znp) from the backup after a simulated power outage/pulling the psu from the raspberry pi allows to circumvent the re-pairing process? I would be happy to try that on my own, but my zigbee-net is used “in production” including some battery powered devices in non-tool-free accessible places and my girlfriend is probably gonna kill me if I put the whole thing into dysfunctional state again.
Hey @ijdqq,
thank you for your support. @tsmt is currently working on the problem and has a test setup. I also can test if you wich.
Sadly I currenntly have any spare modules, otherwise I could have sent you one.
Best,
Patrik
hey guys
Sorry for leaving out updates here. I was pretty occupied the last weeks. So I wasn’t able to spend meaningful time on the topic and just did some babysteps to get myself going:
Until now, I was basically able to reproduce the issues known and build an environment for myself where I can make changes to the Z-Stack code, build and deploy it to the CC2538 module @pmayer provided.
Thanks, @ijdqq for the detailed update. I will get my hands on the topic again in the next days and test dumping and restoring the nvram with zigpy-znp after a power outage.
Currently discussing the problem in the jethome github:
Didn’t try what ijdqq suggested yet, but it seems that we got a solution in github:
Do this:
Stop z2m
Delete z2m backups
Clean the NV_MEM script zigbee2mqtt/zStackEraseAllNvMem.js at master · Koenkk/zigbee2mqtt · GitHub
Cold restart stick 2538
Start z2m
Set up a new network, add devices.
Stop z2m and check that the NIB table is 116 bytes. If so, then your network will work stably.Use the firmware for 2538 20201010.
After this, the NIB table was 116 bytes for me and cold restarts, power losses etc. were not a problem anymore. I will try to put this into some kind of script, but here are the commands I run on my machine assuming
Warning: you will lose all pairings of course.
device: /dev/ttyAMA0
z2m installation: /opt/zigbee2mqtt/
systemctl stop zigbee2mqtt
mv /opt/zigbee2mqtt/data ~/backup/zigbee2mqtt
mkdir /opt/zigbee2mqtt/data
cp ~/backup/zigbee2mqtt/configuration.yaml /opt/zigbee2mqtt/data/configuration.yaml
[Wipe all network related data from /opt/zigbee2mqtt/data/configuration.yaml]
node /opt/zigbee2mqtt/scripts/zStackEraseAllNvMem.js /dev/ttyAMA0
shutdown -h now
[unplug raspberry, replug it]
[boot, start zigbee2mqtt, re-pair all devices]
Try with one or two devices first If you have done this, then cold restart (unplug power) and test if the devices are still available. If yes, connect the rest of your network.
Just to mention.
To me the problem seems to be that the initial firmware either doesn’t make sure that the NVram is in a defined state after flashing, or it has some weird pre-flashed values in there.
Thank you so much @tsmt. If this indeed will solve the problem I will of course do the nvram-reset after flashing the firmware and before sending them out.
Cheers!
Confirmed fixed, see
hey guys,
thx for your invested time to fix this issue.
Maybe there is somebody who can write an low level description step by step to fix. german would be great. english also ok.
Thanks again everybody.
Felix
Hey,
I’ll switch to german for a second.
german:
Ich nehme an mit low-level meinst du eine “einfachere” step by step anleitung? Klar, kann ich machen. Hast du zigbee2mqtt mit oder ohne docker installiert?
Ich plane aber auch, dafür ein script zu schreiben was den ganzen prozess besser automatisiert. Ist das sehr dringend, oder kannst du da ein paar Tage drauf warten?
deepL for english because I’m lazy:
I assume by low-level you mean an “easier” step by step guide? Sure, I can do that. Did you install zigbee2mqtt with or without docker?
But I also plan to write a script for it which automates the whole process better. Is this very urgent, or can you wait a few days for it?
Hey
Korrekt low level Kenntnis bei mir
Ich nutze zigbee direkt mit homegear ohne zigbee2mqtt.
Warten kann ich noch hab aktuell nur 4 zigbee module die kann ich im Notfall schnell wieder anlernen
Gruß
Felix
@Adrian, could you say something about the outcome of the error? Maybe something you can implement into homegear to fix this?
I’ll try to look over it next week to see if I can do something when the stick is initialized (when the network is ‘reset’).
Either clearing NVRAM or maybe dealing with that buggy table only. I might have to look over the firmware sources for the details.
It seems that I already had an attempt in the code to clear NV on network initialization.
The only thing I did differently on what I found pointed out here:
Was using a soft reset for the stick instead of a hard reset. I’ll change the code to do a hard reset to see if it makes a difference.
I think I had some issues with hard resets in the past, that’s why I avoided it (it disconnected the serial connection or something like that).
LE: Yes, I did, so for now I’ll try to find some other solution than using a hard reset.
I’m currently trying to understand why the issue is happening.
By looking over the sources of the firmware and over the TI forums, I suspect that changing the buffers sizes when compiling the sources is a cause of the bug.
The ‘table’ that gets truncated is actually a struct that has a byte ‘all fresh’ which I think does nothing, two bytes that contain the network manager address (this should be zero, I think), two bytes that contain a counter for transmissions and a byte that represents an ‘update id’. This one seems to be dangerous to be corrupted (along with the network manager address) as it’s uses by devices that ‘think’ they lost the network, to rejoin it. It’s used for network updates.
From what I’ve seen in the sources, the devices might receive a ‘network update’ request which they would ignore if the update id they have in the NIB has a bigger or equal value with the one from the request. It appears that such requests might be sent when the channel and/or the channel mask changed.
Maybe the issue would be solved if setting the channel mask to allow only a single channel? It would be worth testing with such a setting, to see if it solves the problem…
Anyway, my attempt to solve it now is to have the NIB explicitly cleared/filled with 116 zeros when network is initialized. I’ll commit the new sources soon.