How to run Postgres with Habitat

Hello,

I was following along with this post on how to run a PGSql cluster with Habitat. Having run psql clusters in the past, it really was a pleasure to be able to install and run a cluster in a matter of minutes (even with the issues I ran into!). However, things don’t appear to work as advertised and I ran into a few issues…

The first thing I ran into was a permission issue. I’m sure this is likely due to our umask 027 setting, but changing to a umask 022 is simply a non-starter and doesn’t really address the problem.

On my thee nodes I ran:

hab start core/postgresql --topology leader --group production
hab start core/postgresql --topology leader --group production --peer 10.10.10.10

Once I ran the start on the 3rd node, I was flooded with:

postgresql.production(SR): Initializing
postgresql.production(HK): Hook failed to run, init, hab-sup(ER)[components/sup/src/error.rs:500:9]: Permission denied (os error 13)
postgresql.production(SR): Initializing
postgresql.production(HK): Hook failed to run, init, hab-sup(ER)[components/sup/src/error.rs:500:9]: Permission denied (os error 13)
postgresql.production(SR): Initializing
postgresql.production(HK): Hook failed to run, init, hab-sup(ER)[components/sup/src/error.rs:500:9]: Permission denied (os error 13)

So my first question is, what the heck is being denied? As far as I can tell, there’s no way to coax the actual error message out of Habitat, and running the init script as root, seems to work ok-ish:

[root@psql-2 ~]# bash /hab/svc/postgresql/hooks/init
Ensuring proper ownership: chown -RL hab:hab /hab/svc/postgresql/var /hab/svc/postgresql/data
 Database does not exist, creating with 'initdb'
initdb: cannot be run as root
Please log in (using, e.g., "su") as the (unprivileged) user that will
own the server process.

As there is no postgres user, I’m unable to run the init hook as the postgres user.

Changing ownership of the /hab directory allowed postgres to start:

[root@psql-2 ~]# chown -R hab:hab /hab
[root@psql-2 ~]# hab start core/postgresql --topology leader --group production --peer 10.10.10.10
hab-sup(MR): Supervisor Member-ID 617b7bdd134048d99ddac41d543a821c
hab-sup(MR): Starting core/postgresql
postgresql.production(UCW): Watching user.toml
hab-sup(MR): Starting gossip-listener on 0.0.0.0:9638
hab-sup(MR): Starting http-gateway on 0.0.0.0:9631
postgresql.production(HK): Hooks compiled
postgresql.production(SR): Executing hooks; 617b7bdd134048d99ddac41d543a821c is the leader
postgresql.production(SR): Initializing
postgresql.production hook[init]:(HK): Ensuring proper ownership: chgrp -RL 1001 /hab/svc/postgresql/var /hab/svc/postgresql/data
postgresql.production hook[init]:(HK):  Database does not exist, creating with 'initdb'
postgresql.production hook[init]:(HK): could not change directory to "/root": Permission denied
postgresql.production hook[init]:(HK): The files belonging to this database system will be owned by user "hab".
postgresql.production hook[init]:(HK): This user must also own the server process.
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): The database cluster will be initialized with locale "C".
postgresql.production hook[init]:(HK): The default text search configuration will be set to "english".
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): Data page checksums are enabled.
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): fixing permissions on existing directory /hab/svc/postgresql/data/pgdata ... ok
postgresql.production hook[init]:(HK): creating subdirectories ... ok
postgresql.production hook[init]:(HK): selecting default max_connections ... 100
postgresql.production hook[init]:(HK): selecting default shared_buffers ... 128MB
postgresql.production hook[init]:(HK): selecting dynamic shared memory implementation ... posix
postgresql.production hook[init]:(HK): creating configuration files ... ok
postgresql.production hook[init]:(HK): running bootstrap script ... ok
postgresql.production hook[init]:(HK): performing post-bootstrap initialization ... ok
postgresql.production hook[init]:(HK): syncing data to disk ... ok
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): WARNING: enabling "trust" authentication for local connections
postgresql.production hook[init]:(HK): You can change this by editing pg_hba.conf or using the option -A, or
postgresql.production hook[init]:(HK): --auth-local and --auth-host, the next time you run initdb.
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): Success. You can now start the database server using:
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK):     pg_ctl -D /hab/svc/postgresql/data/pgdata -l logfile start
postgresql.production hook[init]:(HK):
postgresql.production(SV): Starting service as user=hab, group=hab
postgresql.production(O): Executing run hook
postgresql.production(O): Writing postgresql.local.conf file based on memory settings
postgresql.production(O): Ensuring proper ownership: chgrp -RL 1001 /hab/svc/postgresql/var /hab/svc/postgresql/data
postgresql.production hook[health_check]:(HK): could not change directory to "/root": Permission denied
postgresql.production(O): Starting PostgreSQL
postgresql.production(O): LOG:  00000: could not change directory to "/root": Permission denied
postgresql.production(O): LOCATION:  resolve_symlinks, exec.c:293
postgresql.production(O): 2018-04-25 01:38:26 GMT [3293]: [1-1] user=,db=,client=  (0:00000)LOG:  redirecting log output to logging collector process
postgresql.production(O): 2018-04-25 01:38:26 GMT [3293]: [2-1] user=,db=,client=  (0:00000)HINT:  Future log output will appear in directory "/hab/svc/postgresql/var/pg_log".
^Chab-launch(SV): Supervisor process killed by signal 2; shutting everything down now
postgresql.production(SV): Stopping...
postgresql.production(SV): Shutdown OK: Graceful Termination

But this just feels wrong… I’ve not had to chown hab:hab /hab for other services and in fact it has caused trouble with other services…

The second issue I ran into is there appears to be no resiliency in running a psql cluster with habitat. I don’t really have any good logs to capture this, but once I was able to get the cluster running, the first thing I wanted to test is “what happens if we lose a node”, and it seems that when one of the nodes goes away, habitat stops the other two running psql nodes. How do I tell habitat “do everything you can to keep this running, don’t just throw your hands up in the air and drop your processes on the floor”?

Perhaps my understanding of how the leader/follower is supposed to work?

Finally, based on the above linked blog post, we setup our rails app to connect to the database as admin/admin to the admin database. But this DB doesn’t seem to be created:

[root@psql-2 ~]# psql -h localhost -U admin
psql: FATAL:  database "admin" does not exist
[root@psql-2 ~]# psql -h localhost -u admin
psql: invalid option -- 'u'
Try "psql --help" for more information.
[root@psql-2 ~]# psql -h localhost -U psql
psql: FATAL:  role "psql" does not exist
[root@psql-2 ~]# psql -h localhost --username=postgres
psql: FATAL:  role "postgres" does not exist
[root@psql-2 ~]# sudo -u postgres
sudo: unknown user: postgres
sudo: unable to initialize policy plugin
[root@psql-2 ~]# psql
psql: FATAL:  role "root" does not exist
[root@psql-2 ~]# psql -h localhost --username=admin
psql: FATAL:  database "admin" does not exist
[root@psql-2 ~]# psql -d postgres
psql: FATAL:  role "root" does not exist
[root@psql-2 ~]# sudo -u hab psql
could not change directory to "/root": Permission denied
psql: FATAL:  role "hab" does not exist
[root@psql-2 ~]# psql -h localhost -U postgres
psql: FATAL:  role "postgres" does not exist
[root@psql-2 ~]# psql -h localhost -U admin
psql: FATAL:  database "admin" does not exist
[root@psql-2 ~]# createdb
createdb: could not connect to database template1: FATAL:  role "root" does not exist
[root@psql-2 ~]# createdb -U admin admin
[root@psql-2 ~]# psql -h localhost -U admin
psql (9.6.8)
Type "help" for help.
admin=# \l
                         List of databases
   Name    | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-------+----------+---------+-------+-------------------
 admin     | admin | UTF8     | C       | C     |
 postgres  | admin | UTF8     | C       | C     |
 root      | admin | UTF8     | C       | C     |
 template0 | admin | UTF8     | C       | C     | =c/admin         +
           |       |          |         |       | admin=CTc/admin
 template1 | admin | UTF8     | C       | C     | =c/admin         +
           |       |          |         |       | admin=CTc/admin
(5 rows)

But the init hook seems to try to create the postgres db?

if [[ ! -f "/hab/svc/postgresql/data/pgdata/PG_VERSION" ]]; then
  echo " Database does not exist, creating with 'initdb'"
    initdb -U admin \
    -E utf8 \
    -D /hab/svc/postgresql/data/pgdata \
    --pwfile /hab/svc/postgresql/config/pwfile \
    --locale POSIX \
    --data-checksums
fi

Ohhhh!!! Maybe I need a psql -h localhost -U admin postgres, doh! relevant doc

But I still the think the blog post is mistaken as that admin database isn’t created by default.

:part_alternation_mark: @qubitrenegade

Hi!
There is certainly a lot of stuff to talk through here. I'll do my best to answer as many of your questions as I can. For starters though, can you tell me what version of habitat you're using?

Changing ownership of the /hab directory allowed postgres to start

Yeah this is definitely not something you should have to do. I'm also curious about what version of postgresql you're using. As an aside if you ever want any extra information about a service at runtime you can always pass export RUST_LOG=debug which should give you a bit more useful output. I know that recently the postgresql package got updated to allow for non-root running https://github.com/habitat-sh/core-plans/pull/980 theoretically this change should have affected all versions of postgres.

Finally, based on the above linked blog post, we setup our rails app to connect to the database as admin/admin to the admin database. But this DB doesn’t seem to be created

I'd suggest for this it might be worth taking a look at the postgresql readme it might help to clarify a bit. But the gist of it is: superuser access is exposed to consumers when binding. Meaning whatever service needs to consume this database. With the current pattern we advise that any required databases, schemas or roles be created in the init hook of that consuming service. The created roles (with restricted access) should then be exposed to the application code that will get run. I haven't actually totally read through that blog post but there have been some major changes to that package since that blog post was written so its possible that these things are changes in the intended behavior of the postgres package itself.

Whenever I see the permissions mentioned, I have to bring up this ticket I opened because I had a restrictive umask that persisted into the hab packages.
https://github.com/habitat-sh/habitat/issues/4625

Yeah @mattray brings up a good point, I don’t think it would necessarily apply to what youre seeing here. But worth remembering that habitat doesn’t manipulate your file metadata or settings. So if you’re laying down a file with a chmod of 0600 we don’t do anything to change that, it’s left up to the users to set their plans up in a way that those file permissions are corrected or for them to update those file permissions before they build.

Well, I’m not really sure it’s the right solution. but removing the /hab directory then setting umask for root to 022 allowed me to run it. I see now that the README on the core/postgresql plan has been updated a bit, so I’ll have another go with the newest packages.