Hello,
I was following along with this post on how to run a PGSql cluster with Habitat. Having run psql clusters in the past, it really was a pleasure to be able to install and run a cluster in a matter of minutes (even with the issues I ran into!). However, things don’t appear to work as advertised and I ran into a few issues…
The first thing I ran into was a permission issue. I’m sure this is likely due to our umask 027
setting, but changing to a umask 022
is simply a non-starter and doesn’t really address the problem.
On my thee nodes I ran:
hab start core/postgresql --topology leader --group production
hab start core/postgresql --topology leader --group production --peer 10.10.10.10
Once I ran the start on the 3rd node, I was flooded with:
postgresql.production(SR): Initializing
postgresql.production(HK): Hook failed to run, init, hab-sup(ER)[components/sup/src/error.rs:500:9]: Permission denied (os error 13)
postgresql.production(SR): Initializing
postgresql.production(HK): Hook failed to run, init, hab-sup(ER)[components/sup/src/error.rs:500:9]: Permission denied (os error 13)
postgresql.production(SR): Initializing
postgresql.production(HK): Hook failed to run, init, hab-sup(ER)[components/sup/src/error.rs:500:9]: Permission denied (os error 13)
So my first question is, what the heck is being denied? As far as I can tell, there’s no way to coax the actual error message out of Habitat, and running the init script as root, seems to work ok-ish:
[root@psql-2 ~]# bash /hab/svc/postgresql/hooks/init
Ensuring proper ownership: chown -RL hab:hab /hab/svc/postgresql/var /hab/svc/postgresql/data
Database does not exist, creating with 'initdb'
initdb: cannot be run as root
Please log in (using, e.g., "su") as the (unprivileged) user that will
own the server process.
As there is no postgres user, I’m unable to run the init hook as the postgres user.
Changing ownership of the /hab
directory allowed postgres to start:
[root@psql-2 ~]# chown -R hab:hab /hab
[root@psql-2 ~]# hab start core/postgresql --topology leader --group production --peer 10.10.10.10
hab-sup(MR): Supervisor Member-ID 617b7bdd134048d99ddac41d543a821c
hab-sup(MR): Starting core/postgresql
postgresql.production(UCW): Watching user.toml
hab-sup(MR): Starting gossip-listener on 0.0.0.0:9638
hab-sup(MR): Starting http-gateway on 0.0.0.0:9631
postgresql.production(HK): Hooks compiled
postgresql.production(SR): Executing hooks; 617b7bdd134048d99ddac41d543a821c is the leader
postgresql.production(SR): Initializing
postgresql.production hook[init]:(HK): Ensuring proper ownership: chgrp -RL 1001 /hab/svc/postgresql/var /hab/svc/postgresql/data
postgresql.production hook[init]:(HK): Database does not exist, creating with 'initdb'
postgresql.production hook[init]:(HK): could not change directory to "/root": Permission denied
postgresql.production hook[init]:(HK): The files belonging to this database system will be owned by user "hab".
postgresql.production hook[init]:(HK): This user must also own the server process.
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): The database cluster will be initialized with locale "C".
postgresql.production hook[init]:(HK): The default text search configuration will be set to "english".
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): Data page checksums are enabled.
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): fixing permissions on existing directory /hab/svc/postgresql/data/pgdata ... ok
postgresql.production hook[init]:(HK): creating subdirectories ... ok
postgresql.production hook[init]:(HK): selecting default max_connections ... 100
postgresql.production hook[init]:(HK): selecting default shared_buffers ... 128MB
postgresql.production hook[init]:(HK): selecting dynamic shared memory implementation ... posix
postgresql.production hook[init]:(HK): creating configuration files ... ok
postgresql.production hook[init]:(HK): running bootstrap script ... ok
postgresql.production hook[init]:(HK): performing post-bootstrap initialization ... ok
postgresql.production hook[init]:(HK): syncing data to disk ... ok
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): WARNING: enabling "trust" authentication for local connections
postgresql.production hook[init]:(HK): You can change this by editing pg_hba.conf or using the option -A, or
postgresql.production hook[init]:(HK): --auth-local and --auth-host, the next time you run initdb.
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): Success. You can now start the database server using:
postgresql.production hook[init]:(HK):
postgresql.production hook[init]:(HK): pg_ctl -D /hab/svc/postgresql/data/pgdata -l logfile start
postgresql.production hook[init]:(HK):
postgresql.production(SV): Starting service as user=hab, group=hab
postgresql.production(O): Executing run hook
postgresql.production(O): Writing postgresql.local.conf file based on memory settings
postgresql.production(O): Ensuring proper ownership: chgrp -RL 1001 /hab/svc/postgresql/var /hab/svc/postgresql/data
postgresql.production hook[health_check]:(HK): could not change directory to "/root": Permission denied
postgresql.production(O): Starting PostgreSQL
postgresql.production(O): LOG: 00000: could not change directory to "/root": Permission denied
postgresql.production(O): LOCATION: resolve_symlinks, exec.c:293
postgresql.production(O): 2018-04-25 01:38:26 GMT [3293]: [1-1] user=,db=,client= (0:00000)LOG: redirecting log output to logging collector process
postgresql.production(O): 2018-04-25 01:38:26 GMT [3293]: [2-1] user=,db=,client= (0:00000)HINT: Future log output will appear in directory "/hab/svc/postgresql/var/pg_log".
^Chab-launch(SV): Supervisor process killed by signal 2; shutting everything down now
postgresql.production(SV): Stopping...
postgresql.production(SV): Shutdown OK: Graceful Termination
But this just feels wrong… I’ve not had to chown hab:hab /hab
for other services and in fact it has caused trouble with other services…
The second issue I ran into is there appears to be no resiliency in running a psql cluster with habitat. I don’t really have any good logs to capture this, but once I was able to get the cluster running, the first thing I wanted to test is “what happens if we lose a node”, and it seems that when one of the nodes goes away, habitat stops the other two running psql nodes. How do I tell habitat “do everything you can to keep this running, don’t just throw your hands up in the air and drop your processes on the floor”?
Perhaps my understanding of how the leader/follower is supposed to work?
Finally, based on the above linked blog post, we setup our rails app to connect to the database as admin/admin to the admin database. But this DB doesn’t seem to be created:
[root@psql-2 ~]# psql -h localhost -U admin
psql: FATAL: database "admin" does not exist
[root@psql-2 ~]# psql -h localhost -u admin
psql: invalid option -- 'u'
Try "psql --help" for more information.
[root@psql-2 ~]# psql -h localhost -U psql
psql: FATAL: role "psql" does not exist
[root@psql-2 ~]# psql -h localhost --username=postgres
psql: FATAL: role "postgres" does not exist
[root@psql-2 ~]# sudo -u postgres
sudo: unknown user: postgres
sudo: unable to initialize policy plugin
[root@psql-2 ~]# psql
psql: FATAL: role "root" does not exist
[root@psql-2 ~]# psql -h localhost --username=admin
psql: FATAL: database "admin" does not exist
[root@psql-2 ~]# psql -d postgres
psql: FATAL: role "root" does not exist
[root@psql-2 ~]# sudo -u hab psql
could not change directory to "/root": Permission denied
psql: FATAL: role "hab" does not exist
[root@psql-2 ~]# psql -h localhost -U postgres
psql: FATAL: role "postgres" does not exist
[root@psql-2 ~]# psql -h localhost -U admin
psql: FATAL: database "admin" does not exist
[root@psql-2 ~]# createdb
createdb: could not connect to database template1: FATAL: role "root" does not exist
[root@psql-2 ~]# createdb -U admin admin
[root@psql-2 ~]# psql -h localhost -U admin
psql (9.6.8)
Type "help" for help.
admin=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-------+----------+---------+-------+-------------------
admin | admin | UTF8 | C | C |
postgres | admin | UTF8 | C | C |
root | admin | UTF8 | C | C |
template0 | admin | UTF8 | C | C | =c/admin +
| | | | | admin=CTc/admin
template1 | admin | UTF8 | C | C | =c/admin +
| | | | | admin=CTc/admin
(5 rows)
But the init hook seems to try to create the postgres db?
if [[ ! -f "/hab/svc/postgresql/data/pgdata/PG_VERSION" ]]; then
echo " Database does not exist, creating with 'initdb'"
initdb -U admin \
-E utf8 \
-D /hab/svc/postgresql/data/pgdata \
--pwfile /hab/svc/postgresql/config/pwfile \
--locale POSIX \
--data-checksums
fi
Ohhhh!!! Maybe I need a psql -h localhost -U admin postgres
, doh! relevant doc
But I still the think the blog post is mistaken as that admin database isn’t created by default.