When `hab svc load` doesn't actually start a process

I’m struggling right now with a case where, even though I load my service like

hab svc load pkg/ident <args>

and I get a “success” message, I don’t actually see the service running. I do see a .spec file for the service, but no processes for it and nothing in my journalctl logs (the Supervisor is bootstrapped by systemd in this case).

Setting RUST_LOG=debug isn’t showing anything especially interesting…

What kind of args? Application args or hab args?

What does hab sup status show?

Sometimes I feel like I see this when there is a template error and you get one log line in a sea of log output.

Habitat args in this case.

$ hab sup status | column -t
package                                          type        state  elapsed  (s)     pid                          group
smartb/api_celerybeat/0.0.1/20180911151413       standalone  up     6        19058   api_celerybeat.staging
smartb/redis/3.2.4/20180911151411                standalone  up     2297     14265   redis.staging
smartb/pgbouncer/1.7.2/20180911151412            standalone  up     2179     14727   pgbouncer.staging
smartb/journalbeat/5.6.0/20180502145432          standalone  up     2178     14734   journalbeat.staging
smartb/api_celeryd_default/0.0.1/20180911145435  standalone  up     6        19061   api_celeryd_default.staging
smartb/circonus-agent/0.13.0/20180810181314      standalone  up     2161     14901   circonus-agent.staging
smartb/api_celeryd_slow/0.0.1/20180911151413     standalone  up     6        19064   api_celeryd_slow.staging
smartb/nginx/1.13.10/20180911151414              standalone  down   2151     <none>  nginx.staging

The smartb/nginx service that is ‘down’ here is down because it needs to bind to the missing smartb-private/api service.

is smartb-private/api the one that has the spec file and doesn’t show up in the status output?

Yes, that’s the mystery service.

I may have found the issue. The content of the

BINDS_OPTIONAL

metadata file for the service is wrong:

$ cat /hab/pkgs/smartb-private/api/0.1.0/20180911145435/BINDS_OPTIONAL
irondb=port

There’s a lingering old binding there from a previous iteration of the package. And the binding for redis isn’t present. I’m going to fix that and push to see what happens.

Is there a place where this would get logged? I didn’t see such a log entry anywhere…

I don’t think we log anything for optional bindings since they aren’t required to start the service.

I meant something more like, if the service isn’t actually starting (though a .spec file is created, maybe we should log that?

Okay, the lingering bad binding doesn’t seem to be the explanation for the problem. Still digging…

We also have someone on the team seeing the same issue in 0.62.0 so we’re trying to chase it down on our end as well

Some new info. If I shut down the systemd-initialized Supervisor and do a hab sup run I see this:

# hab sup run smartb-private/api --group=staging --channel=unstable --topology=standalone --strategy=at-once --bind=redis:redis.staging
hab-sup(MR): Supervisor Member-ID 2ce8a23b5f7547b4be47787af9eb3cd6
hab-sup(AG): The smartb-private/api service was successfully loaded
hab-sup(MR): Starting smartb/nginx
hab-sup(CF): The user configuration location at /hab/svc/nginx/user.toml is deprecated, consider putting it in /hab/user/nginx/config/user.toml
nginx.staging(UCW): Not watching user.toml, because it is located in deprecated path (/hab/svc/nginx).
hab-sup(MR): Starting smartb/api_celeryd_default
hab-sup(CF): The user configuration location at /hab/svc/api_celeryd_default/user.toml is deprecated, consider putting it in /hab/user/api_celeryd_default/config/user.toml
api_celeryd_default.staging(UCW): Not watching user.toml, because it is located in deprecated path (/hab/svc/api_celeryd_default).
hab-sup(MR): Starting smartb/redis
hab-sup(CF): The user configuration location at /hab/svc/redis/user.toml is deprecated, consider putting it in /hab/user/redis/config/user.toml
redis.staging(UCW): Not watching user.toml, because it is located in deprecated path (/hab/svc/redis).
hab-sup(MR): Starting smartb-private/api
hab-sup(MR): Unable to start smartb-private/api, hab-sup(SS)[components/sup/src/manager/service/spec.rs:379:24]: Invalid bind(s), redis

This is the last time I see anything about smartb-private/api in the logs…

Do you think I can work around this by going back to 0.59.0? I’d be fine with that, just to get our Staging environment back.

Are you specifying a bind flag to redis?

You might get it working by falling back to 0.59.0 but it’s hard to say without knowing what the underlying issue is

Yes, in the code snippet above:

$ hab sup run smartb-private/api \
  --group=staging \
  --channel=unstable \
  --topology=standalone \
  --strategy=at-once \
  --bind=redis:redis.staging

Does that look wrong (edit: added line breaks for easy reading)?

I agree about digging into the underlying issue, but so far I’ve not been able to figure out how to get any useful info. Maybe shutdown down the systemd service and doing a hab sup run with Rust debug vars set? Any other ideas?

yeah that looks right - can you share a snippet from the plan where the binds are defined?

I think you mean this?

pkg_binds_optional=(
  [redis]="port"
)

That’s the plan.sh bit for smartb-private/api

yeah thanks. At this point you might try shutting down the sup and pushing back to 0.59.0 to see if it works. At least then you’re unblocked if it works or we can shift our focus if it doesn’t

I’m already back to 0.59.0 and still seeing the behavior.

Here’s the debug output (the package is already installed so I’m not setting HAB_AUTH_TOKEN here):

RUST_LOG=debug RUST_BACKTRACE=1 hab sup run smartb-private/api   --group=staging   --channel=unstable   --topology=standalone   --strategy=at-once   --bind=redis:redis.staging
DEBUG 2018-09-12T15:33:22Z: habitat_common::ui: UI { shell: Shell { input: InputStream { isatty: true }, out: OutputStream { coloring: Auto, isatty: true, is_colored(): true, supports_color(): true }, err: OutputStream { coloring: Auto, isatty: true, is_colored(): true, supports_color(): true } } }
DEBUG 2018-09-12T15:33:22Z: habitat_core::os::process::imp: Calling execvp(): ("/hab/pkgs/core/hab-launcher/8282/20180829165730/bin/hab-launch") ["run", "smartb-private/api", "--group=staging", "--channel=unstable", "--topology=standalone", "--strategy=at-once", "--bind=redis:redis.staging"]
DEBUG 2018-09-12T15:33:22Z: habitat_launcher::server: Checking Supervisor "/hab/pkgs/core/hab-sup/0.59.0/20180712161546/bin/hab-sup" version
DEBUG 2018-09-12T15:33:22Z: habitat_launcher::server: Checking Supervisor version '0.59.0' against requirement '>= 0.56'
DEBUG 2018-09-12T15:33:22Z: habitat_launcher::server: Starting Supervisor "/hab/pkgs/core/hab-sup/0.59.0/20180712161546/bin/hab-sup" with args ["run", "smartb-private/api", "--group=staging", "--channel=unstable", "--topology=standalone", "--strategy=at-once", "--bind=redis:redis.staging"], HAB_LAUNCHER_PIPE=/tmp/rust-ipc-socket.qsUE1i...
DEBUG 2018-09-12T15:33:22Z: habitat_launcher::server: Waiting on connect thread for 5 secs
DEBUG 2018-09-12T15:33:22Z: habitat_launcher::server: connect thread started
DEBUG 2018-09-12T15:33:22Z: habitat_launcher::server: Connect thread finished; notifying waiting thread
DEBUG 2018-09-12T15:33:22Z: habitat_sup::manager: Creating data directory: /hab/sup/default/data
DEBUG 2018-09-12T15:33:22Z: habitat_sup::manager: Creating specs directory: /hab/sup/default/specs
DEBUG 2018-09-12T15:33:22Z: habitat_sup::manager: Creating composites directory: /hab/sup/default/composites
DEBUG 2018-09-12T15:33:22Z: habitat_sup::manager: Cleaning cached health checks
hab-sup(MR): Supervisor Member-ID 55716f3fecfd4daa93c324e0cf7b3826
DEBUG 2018-09-12T15:33:22Z: habitat_sup::manager::spec_watcher: SpecWatcher(/hab/sup/default/specs) thread starting
DEBUG 2018-09-12T15:33:22Z: habitat_sup::manager::service::spec: Writing service spec to '/hab/sup/default/specs/api.spec': ServiceSpec { ident: PackageIdent { origin: "smartb-private", name: "api", version: None, release: None }, group: "staging", application_environment: None, bldr_url: "https://bldr.habitat.sh", channel: "unstable", topology: Standalone, update_strategy: AtOnce, binds: [ServiceBind { name: "redis", service_group: ServiceGroup("redis.staging"), service_name: None }], binding_mode: Strict, config_from: None, desired_state: Up, svc_encrypted_password: None, composite: None }
 WARN 2018-09-12T15:33:22Z: habitat_sup::ctl_gateway: Attempted to reply to a non-transactional message with ConsoleLine { line: "The smartb-private/api service was successfully loaded\n" }
hab-sup(AG): The smartb-private/api service was successfully loaded
 WARN 2018-09-12T15:33:22Z: habitat_sup::ctl_gateway: Attempted to reply to a non-transactional message with NetOk
DEBUG 2018-09-12T15:33:22Z: habitat_sup::manager::spec_watcher: Service spec for api is new, enqueuing AddService(ServiceSpec { ident: PackageIdent { origin: "smartb-private", name: "api", version: None, release: None }, group: "staging", application_environment: None, bldr_url: "https://bldr.habitat.sh", channel: "unstable", topology: Standalone, update_strategy: AtOnce, binds: [ServiceBind { name: "redis", service_group: ServiceGroup("redis.staging"), service_name: None }], binding_mode: Strict, config_from: None, desired_state: Up, svc_encrypted_password: None, composite: None }) event
hab-sup(MR): Starting smartb-private/api
hab-sup(MR): Unable to start smartb-private/api, hab-sup(SS)[components/sup/src/manager/service/spec.rs:370:24]: Invalid bind(s), redis

@christophermaier do you have any thoughts here?