Should habitat validate a service is back up before moving on to the next when update strategy is “rolling”?
We just ran into an issue where an update to a package was detected, downloaded, and dutifully habitat restarted all of our nodes one at a time… however, due to the change, the service was no longer able to start and is left in a “flapping” state.
hab svc status also reports the service as “up”.
Perhaps habitat usually attempts to check if a service is up before moving on to the next node if there’s a service check hook? Maybe if our core/consul plan had a service check hook it would have said “well node 1 didn’t come back, let’s not update the other nodes”?
Fortunately, we’re not in a production lab, so we’re not service impacting… but I could see how if someone was running core/consul in production this could have been a major issue.
Anyway, just thinking out loud.