`core/postgresql` stuck waiting for election


I think departure is not working. I wrote some additional shell functions to streamline querying the ring, and I see this after departing the old Postgres primary member_id and then rebuilding all 3 hosts that should run core/postgresql:

service_group_leaders 'ssh ubuntu@staging-permanent-peer-0.domain.tld' 'postgresql.staging'
Hostname:  ip-172-31-12-169
Alive:     false
Leader:    true
Departed:  true
MemberID:  9df128caafd34ad3873c3e4c08596b7a

service_group_members 'ssh ubuntu@staging-permanent-peer-0.domain.tld' 'postgresql.staging'
Hostname:  ip-172-31-6-32
Leader:    false
Departed:  false
MemberID:  572f4d4d34164be9a91d2b09f247ffb1
Hostname:  ip-172-31-13-204
Leader:    false
Departed:  false
MemberID:  68813a17f4044220b39685cf7a6c63f4
Hostname:  ip-172-31-15-72
Leader:    false
Departed:  false
MemberID:  f1b82aa4459e4cc4a16ea9bbef24af4e


Hello again! Thanks for your patience. I’ve done enough reading of the elections code now to have a decent general understanding of how it’s supposed to work and I see some places where it may be going wrong. However, that code also has woefully little logging which makes it hard to tell exactly what’s going wrong.

The thing that jumps out at me now is that (as you said) it’s likely a problem with membership, and that makes sense with the log message you posted in the first message:

postgresql.staging(SR): Waiting to execute hooks; election in progress, and we have no quorum.

In order for an election to occur, of all the nodes in the service group (that is, ones that have added Service rumors) which are not Departed (meaning Alive, Suspect or Confirmed), we must have a majority which are Alive. I think the way we handle departing nodes needs to be changed, and I also think we need a mechanism for leaving a service group. These issues would explain why your election isn’t proceeding due to quorum issues.

I also looked into whether we have an issue with the suitability hook. Originally, I thought the fact that it wasn’t returning distinct values of different nodes was suspicious. And while that still seems wrong, it shouldn’t cause an unbreakable tie since we fall back to using the member ID. You pointed out this log message:

postgresql.staging hook[init]:(HK): Waiting for leader to become available before initializing

If the suitability hook depends on the service itself, but the service can’t run because it’s waiting on elections, that could certainly deadlock things. However, it looks here like the init hook hasn’t succeeded, so based on this code:

    pub fn suitability(&self) -> Option<u64> {
        if !self.initialized {
            return None;
        self.hooks.suitability.as_ref().and_then(|hook| {

The service shouldn’t be initialized and I don’t think the suitability hook should be getting called at all (again, more logging could help to know), but based on the postgres init hook code, I’d expect it to exit with a status of 1

In that case, there should be a log from this line containing Initialization failed. Do you see that? If so, suitability is not the issue and we probably just need to address the membership/quorum problems that are preventing the election from getting started.


I do indeed see multiple occurrences of Initialization failed in the logs we had saved from the broken Postgres ring:

null_resource.postgresql_services[1] (remote-exec): Oct 17 16:17:46 ip-172-31-5-152 hab[11107]: postgresql.staging(HK): Initialization failed! 'init' exited with status code 1


@bixu: have you tried hab sup departing the known-dead member IDs? We’ll definitely work on fixing up our membership issues, but this might suffice as a workaround in the meantime.

I’ll add more logging to the elections code to make these kinds of issues easier to diagnose in the future.


We did indeed write some code to handle departures. However, that didn’t seem to have the effect we wanted. Not that it had a bad effect, but the issues that we are debugging here were present even after the departure code was added.


Couple thoughts about the Postgres plan:

Regarding the suitability hook, there’s definitely a clear bug in local_xlog_position where it should return an integer value even if the psql command is unsuccessful - probably a 0.

I think it absolutely should still be based on the latest xlog position, this shouldn’t be changed. The idea is that you don’t accidentally elect a new leader that has older data - that can be disastrous. If you have two members with the same xlog position they are arguably equally qualified to become the leader.

One thing we could do is add some number ( 1?) to the suitability value if it is the current leader, making it more likely that you won’t arbitrarily switch leaders in case of a topology change. Thoughts?

Regarding the init hook bombing out if a leader isn’t ready here, we may need to modify this behavior for an already established cluster - I’m not sure. While it makes sense during initial cluster setup (keep retrying the follower setup until the leader is ready), it’s clearly impeding re-election. I’m open to ideas on what we can do here.


pg_controldata could be used instead of a psql connection - it can report WAL location even if the server is down.

There actually should be some checks for the systemid AND timeline in there somewhere as well.


Great idea @jamessewell , pg_controldata is way better than depending on PG to be up!

Would you be interested in pairing up on implementing these checks? It seems like you have quite a bit of expertise on this topic!


I don’t think you’d ever change leaders as long as the existing leader continues to be Alive, but I’ll confirm.

That one I need to give some more thought to. I believe leader election should only require that the service is loaded (not that the init hook as completed). This may mean the suitability hook gives an error, but that can be dealt with.


I really don’t agree with this approach - leader election should only use instances which are passing monitoring checks (although I know this is hard at the moment as monitoring isn’t first class)


Sure happy to help.

This diagram from Patroni is obviously way above what we could or even can do in Habitat - but it’s a good reference for PostgreSQL cluster best practice.

Patroni core loop diagram

bodymindarts (he’s not active in Habitat anymore) originally wrote the PostgreSQL core plan (very optimistically) as a drop in replacement for Patroni.

Lack of application monitoring integration and lack of a strongly consistent ring mean that we won’t reach parity with Patroni any time soon - but we can get a lot closer!


There is definitely a bug here that https://github.com/habitat-sh/habitat/pull/5859 should address at least part of. I’ll post again when the next release (planned for the week of Nov 26) comes out.


@baumanj, nice work on the fix in 0.69.0. I’m still testing, but it seems that the bad behavior we were seeing is now corrected and a dead leader will cause an election restart instead of hang. Thanks!


Thanks for bringing the issue to our attention and documenting your experience so well, @bixu!