Builder build failing with unmounting error

builder

#1

I’m running into a really odd issue in Builder where my builds fail just before pkg upload with this error:

umount: can't unmount /hab/studios/src/dev/pts: Resource busy

I’ve tried setting RUST_LOG=debug via the secrets feature of Builder, but that’s not shown anything useful so far. This is blocking the build of one of our most important service packages, so any advice on troubleshooting would be greatly appreciated. We haven’t been able to reproduce the build failure in the macOS Docker Studio.

Here’s the deets about the package itself plus some sanitized log output from the build:

  • package: smartb-private/api
  • most recent failure timestamp: Monday, September 17, 2018 at 11:24:13 AM
  • tail of build log (edited to remove sensitive material):
hab-sup(AG): Supervisor stopping smartb/postgresql_testdb. See the Supervisor output for more details.
Supervisor stopping smartb/postgresql_testdb. See the Supervisor output for more details.
   api: Installing
   api: Writing configuration
find: ` ! -name *.sw? ! -name *~ ! -name *.bak': No such file or directory
   api: Writing service management scripts
   api: Using run hook /src/habitat/plans/api/hooks/run
   api: Building package metadata
   api: Generating blake2b hashes of all files in the package
∅ Enabling feature: IgnoreLocal
hab-launch(SV): Child for service 'postgresql_testdb.default' with PID 3902 exited with code exit code: 1
   api: Generating signed metadata FILES
∅ Enabling feature: IgnoreLocal
» Signing api_blake2bsums
☛ Signing api_blake2bsums with smartb-private-20171228134427 to create /hab/pkgs/smartb-private/api/0.1.0/20180917111750/FILES
★ Signed artifact /hab/pkgs/smartb-private/api/0.1.0/20180917111750/FILES.
   api: Creating manifest
   api: Generating package artifact
/hab/pkgs/core/tar/1.29/20170513213607/bin/tar: Removing leading `/' from member names
∅ Enabling feature: IgnoreLocal
» Signing /hab/cache/artifacts/.smartb-private-api-0.1.0-20180917111750-x86_64-linux.tar.xz
☛ Signing /hab/cache/artifacts/.smartb-private-api-0.1.0-20180917111750-x86_64-linux.tar.xz with smartb-private-20171228134427 to create /hab/cache/artifacts/smartb-private-api-0.1.0-20180917111750-x86_64-linux.hart
★ Signed artifact /hab/cache/artifacts/smartb-private-api-0.1.0-20180917111750-x86_64-linux.hart.
∅ Enabling feature: IgnoreLocal
'/hab/cache/artifacts/smartb-private-api-0.1.0-20180917111750-x86_64-linux.hart' -> '/src/results/smartb-private-api-0.1.0-20180917111750-x86_64-linux.hart'
   api: hab-plan-build cleanup
Using TensorFlow backend.
/hab/pkgs/smartb-private/api/0.1.0/20180917111750/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
   api: Packaged Keras version: 2.1.4
   api: Packaged uWSGI version: 2.0.15
   api: Packaged Celery version: 4.1.1
/hab/pkgs/smartb-private/api/0.1.0/20180917111750/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
   api: Packaged Tensorflow version: 1.4.1
   api: Processes using /dev/pts:
∅ Enabling feature: IgnoreLocal
/bin/sh/hab/pkgs/core/hab-studio/0.63.0/20180914030343/bin/hab-studio-ksmartb-privatebuildhabitat/plans/api/hab/pkgs/core/bash/4.3.42/20170513213519/bin/bash--login/hab/pkgs/core/bash/4.3.42/20170513213519/bin/bash--loginscript-c env -i 'TERM=xterm-256color' 'BUILDER_SLACK_URL=https://hooks.slack.com/services/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' 'OLDPWD=/' 'LC_ALL=POSIX' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:' 'HAB_BLDR_URL=https://bldr.habitat.sh' 'PATH=/hab/bin:/hab/pkgs/core/hab-plan-build/0.63.0/20180914030231/bin:/hab/pkgs/core/diffutils/3.3/20170513214227/bin:/hab/pkgs/core/less/481/20170513213936/bin:/hab/pkgs/core/make/4.2.1/20170513214620/bin:/hab/pkgs/core/mg/20160118/20170513220846/bin:/hab/pkgs/core/util-linux/2.27.1/20170513214732/bin:/hab/pkgs/core/vim/8.0.0004/20170513220742/bin:/hab/pkgs/core/ncurses/6.0/20170513213009/bin:/hab/pkgs/core/acl/2.2.52/20170513213108/bin:/hab/pkgs/core/attr/2.4.47/20170513213059/bin:/hab/pkgs/core/bash/4.3.42/20170513213519/bin:/hab/pkgs/core/binutils/2.25.1/20170513201927/bin:/hab/pkgs/core/bzip2/1.0.6/20170513212938/bin:/hab/pkgs/core/coreutils/8.25/20170513213226/bin:/hab/pkgs/core/file/5.24/20170513201915/bin:/hab/pkgs/core/findutils/4.4.2/20170513214305/bin:/hab/pkgs/core/gawk/4.1.3/20170513213646/bin:/hab/pkgs/core/glibc/2.22/20170513201042/bin:/hab/pkgs/core/grep/2.22/20170513213444/bin:/hab/pkgs/core/gzip/1.6/20170513214605/bin:/hab/pkgs/core/hab/0.63.0/20180914025124/bin:/hab/pkgs/core/libcap/2.24/20170513213120/bin:/hab/pkgs/core/libidn/1.32/20170513215043/bin:/hab/pkgs/core/openssl/1.0.2l/20171014213633/bin:/hab/pkgs/core/pcre/8.38/20170513213423/bin:/hab/pkgs/core/rq/0.9.2/20170612005822/bin:/hab/pkgs/core/sed/4.2.2/20170513213123/bin:/hab/pkgs/core/tar/1.29/20170513213607/bin:/hab/pkgs/core/unzip/6.0/20180310001951/bin:/hab/pkgs/core/wget/1.19.1/20171024102323/bin:/hab/pkgs/core/xz/5.2.2/20170513214327/bin' 'BUILDER_SLACK_ALERTS_CHANNEL=alerts' 'PWD=/src' 'HAB_FEAT_IGNORE_LOCAL=true' 'HAB_BINLINK_DIR=/hab/bin' 'HAB_NONINTERACTIVE=true' 'HAB_BLDR_CHANNEL=bldr-1073044346252861440' 'DO_CHECK=true' 'BUILDER_SLACK_CHANNEL=info' 'SHLVL=1' 'CIRCONUS_AUTH_TOKEN=XXXX' 'HOME=/root' 'TERMINFO=/hab/pkgs/core/ncurses/6.0/20170513213009/share/terminfo' 'STUDIO_TYPE=default' 'HAB_AUTH_TOKEN=XXXX' '_=/hab/pkgs/core/coreutils/8.25/20170513213226/bin/env'  /hab/bin/build habitat/plans/api-e/src/results/logs/api.2018-09-17-111750.log/hab/pkgs/core/bash/4.3.42/20170513213519/bin/bash/hab/pkgs/core/hab-plan-build/0.63.0/20180914030231/bin/hab-plan-buildhabitat/plans/api/hab/pkgs/core/hab-launcher/8425/20180913224044/bin/hab-launchrun/hab/pkgs/core/hab-sup/0.63.0/20180914030447/bin/hab-suprun/src/habitat/plans/api/plan.sh: line 191: /dev/pts: Is a directory
   api: Attempting to forcibly detach processes from :
∅ Enabling feature: IgnoreLocal
Killed
kill: can't kill pid 3772: No such process
Warning: '/hab/pkgs/core/hab-studio/0.63.0/20180914030343/libexec/busybox kill 3772' failed with status 1
umount: can't unmount /hab/studios/src/dev/pts: Resource busy
An error occurred when unmounting filesystem '/hab/studios/src/dev/pts'
ERROR: Unmount of /hab/studios/src/dev/pts failed
Failed studio build for smartb-private/api, err=BuildEnvFile("/hab/svc/builder-worker/data/1073044347646443520/out/pre_build.env", Os { code: 2, kind: NotFound, message: "No such file or directory" })
builder_log_section::end::build_package
builder_log::end::1073044347646443520

#2

I’ll start taking a look at this. In the meantime, can your provide any further log output. The package being built and approximate time would also be helpful for tracking down more information in the backend.


#3

I’ll edit the original post to clarify that stuff.


#4

I wonder if this could be a consequence of [studio] Fast-fail when unmounting on Studio cleanup. Previously we were more tolerant of umount errors. Perhaps it makes sense to have that be a settable behavior since it doesn’t help us to fail in the context of an ephemeral studio like builder if the build is otherwise OK.

It would still be nice to know what’s ultimately leading to the error, still.


#5

Unfortunately, I don’t think we currently expose a way to set DEBUG for jobs running in Builder (the HAB_STUDIO_SECRET_* trick doesn’t work, because the stuff we’re interested in here is outside the studio). That would really help clarify things :frowning:


#6

That sounds really likely to me. In our Plan we try to make use of fuser to debug this - you can see some of the relevant output in the log tail.

Here’s the debug setup we tried in the Plan:

do_end() {
  # test key modules
  build_line "Packaged Keras version: $(python -c 'import keras; print(keras.__version__)')"
  build_line "Packaged uWSGI version: $(uwsgi --version)"
  build_line "Packaged Celery version: $(python -c 'import celery; print(celery.__version__)')"
  build_line "Packaged Tensorflow version: $(python -c 'import tensorflow; print(tensorflow.__version__)')"

  # trace and clean up processes using pseudoterminal
  build_line "Processes using /dev/pts:"
  for proc in $(hab pkg exec core/busybox-static fuser -m /dev/pts)
  do
    cat /proc/$proc/cmdline
  done

  build_line "Attempting to forcibly detach processes from `/dev/pts`:"
  hab pkg exec core/busybox-static fuser -k -m /dev/pts
  return $?
}

do_after_success() {
  report_build_status "success"
  return $?
}

do_after_failure() {
  report_build_status "failure"
  return $?
}

#7

After a bit of in-person debugging, it seems as though there was some unanticipated interaction between running a Supervisor in a do_check() hook for testing that was causing this conflict.

For now, the recommendation is to avoid running a Supervisor in a do_check() hook. It was originally intended for less in-depth tests than integration tests on full packages.


#8

Hi, I just wanted to bump up this thread because I’ve been experiencing this issue on multiple Jenkins build servers. We do not implement the do_check callback in any of our packages.

This started happening after upgrading from hab 0.59 to 0.62.1. I’ve attached the stacktrace but I did not have debug enabled so it probably wont help much…

[vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ] $ hab studio build .
   hab-studio: Destroying Studio at /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ ()
After unmounting all known filesystems, there are still remaining mounted filesystems under /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ:
    * devtmpfs on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--chef-base/dev type devtmpfs (rw,seclabel,nosuid,size=3986380k,nr_inodes=996595,mode=755)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--chef-base/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devtmpfs on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ@2--compliance/dev type devtmpfs (rw,seclabel,nosuid,size=3986380k,nr_inodes=996595,mode=755)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--chef-base/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ@2--compliance/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devtmpfs on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--cwlogs/dev type devtmpfs (rw,seclabel,nosuid,size=3986380k,nr_inodes=996595,mode=755)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ@2--compliance/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--chef-base/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devtmpfs on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ@2--jenkins/dev type devtmpfs (rw,seclabel,nosuid,size=3986380k,nr_inodes=996595,mode=755)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--cwlogs/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devtmpfs on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--nexus/dev type devtmpfs (rw,seclabel,nosuid,size=3986380k,nr_inodes=996595,mode=755)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--nexus/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ@2--jenkins/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--cwlogs/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ@2--compliance/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
    * devpts on /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ--chef-base/dev/pts type devpts (rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000)
Unmount these remaining filesystem using `umount(8)'and retry the last command.
ERROR: Remaining mounted filesystems found under /hab/studios/hab--svc--jenkins-slave--data--workspace--workspace--vault-helper_master-EC74TH545SIUEWVXDQYDK2DYARLKKBRFSWMEJNXZ7QUFFFACAQHQ

#9

@fnichol might have some additional context here with the work he’s been doing with our internal build services


#10

@christophermaier, I’m removing the ‘solved’ checkmark from your workaround because I was able to dig down the root problem, which seems to be that either the Launcher or Supervisor (which we were using to run smartb/postgresql_testdb for our unit tests) were holding on to /dev/pts. I added these cleanup lines to our plan.sh and we are able to build again with the Builder secrete HAB_STUDIO_SECRET_DO_CHECK set to true:

  hab svc stop smartb/postgresql_testdb
  until ! hab pkg exec core/busybox-static pgrep postgres > /dev/null
  do
    sleep 1
  done
  rm -rf /hab/svc/postgresql_testdb/data/*
  hab pkg exec core/busybox-static pkill hab-launch
  until ! hab pkg exec core/busybox-static pgrep hab-launch > /dev/null
  do
    sleep 1
  done
  return $?

Backing away from the problem details a bit, I continue to believe that making tests a first-class citizen of the build process is extremely valuable. Doing this lets us encapsulate the entire application lifecycle in a single Plan. (I realized as I was doing another Habitat-related talk at All Systems Go this year that it’s the application lifecycle that Habitat manages, not just one revision of the running app.)

The fact that checks can’t be run in Builder without setting the secret I mention above is I think enough of a hurdle to prevent bullet-in-foot injuries by users new to Habitat. I’d love to continue to be able to run tests inside Builder builds when I want to.


#11

@bixu Thanks for the extra info… we’ll look into it.