4-5 seconds sounds too quck for an OLSR restart. Should take on the order of 30-90+ seconds average to re establish links on a reboot by my estimates to reestablish routing. If your OLSR is rebooting you should see an incrementing OLSR restart count on your status screen. If that is the case then yes the watchdog is triggering and keeping the node online. A watchdog however is just a "patch" it's not a fix to the ultimate issue. As for "closed without a release" this is normal for our dev cycle. When an issue has been "fixed" it is closed out to get it out of the developer queue so that way only "unfixed" issues show up (since I'm the only person working tickets this is very important to keeping track of what is going on. Fixed simply means it's been commited to the DEVELOPMENT branch and that it is belived to be working and is ready to go through further testing. If a bug exists in any implementation (or combined implementation) it shows up in beta (and yes sometimes 2-3 patches interact on ways not seen independently ) It should also be noted I commited the patch VERY close to leaveing the country for work. Very hard to build beta builds while I'm nowhere near my gear, and that I've just returned back to the gear. Eventually I hope to see the lab servers build untested "development" releases and publish builds automatically, but if I work on that than other bigger issues like OLSRD crashing (which is a much bigger issue) would have to go unworked on. If anyone wants to step up and devote the resources to debugging (20 node lab environment looks to be the sweet spot right now) help is always welcomed.
To to give an idea of what it takes to build an official release currently. 1) An intenrnal to me only build gets run. (This takes 15 minutes by itself to do each time by the way) 2) The build is tested in my lab (this can take a couple hours EASILY and the test list gets longer each time) 3) Any issues found in step 2 are resolved and steps 1-3 are repeated. 4) Once I find all the issues I can it gets releases to the BETA test team to double chexk me (step 1 gets ran again as a beta build this time). Any issues they find are resolved (this takes a lot of time, each person has to dedicate hours to testing). and we start at step 1 again to be sure we are good if they find a bug. 5) only after it's been vetted does a public release get made (again another 15 minutes). I should mention the above is just the Ubiquti procedure, Linksus requires me to run the steps again for building images. This obviously doesn't catch every issue. Some times we can't see issues till networks get bigger. (This issue for exmaple seems to show up because now 3-5 nodes at a site is becoming the norm when before 1 node was the norm. We have increased traffic to a point where deep bugs are more likely to happen and we find the flaws now to continue growth. All this keeping in mind I can work anywhere from 40-80 hours in week depending upon how the week goes, that I'm the Repeater Technical Chair and active board for my local club,that I'm regually responsible for planning and running net control for health and welfare traffic of endurance runners using Amature Radio (I have a 50k and a 100k that have just begun planning in the last week, that I'm actice in promoting Amatre Radio to public (Street Fairs, Public Servixe demonstrairons, Scout Merrit Badges, etc) Since February 2013 I probably have over 20 days (480 hours) of time into this (guesstimate. I hanent tracked the hours. May very well be more) fixing one major security bug last releases took over 40 hours itself in planning and implanting and testing, which shows itself with just a few hundred lines of simple changes as the end result of deep though and testing to be sure it's done right. |