This should not be needed frequently, however there are times when the VMs could be in an unrecoverable or other undesired state for some reason, and the best solution is to rebuild them.
This document targets the Live environment - the steps for Acceptance are similar.
Set up the Terraform environment following the instructions here: https://github.com/habitat-sh/cloud-environments
This doc assumes the setup instructions above are up-to-date, and you have been able to successfully do a
- Set up a maintenance window in status.io since active builds may be impacted
- Change directory to the
- In the
default.tffile, change the
jobsrv_worker_countvalue to 0
jobsrv_worker_count = 0
- Run a
terraform plan. This should show that the workers (and related networks) will be deleted. Double and triple check to make sure other services are not being deleted or changed (an exceptions is the
aws_s3_bucketwhich seems to always want to update itself).
- If all looks good, run a
- After all the instances are deleted, go back and change the
jobsrv_worker_countback to 50 (or whatever the original value was).
- Repeat the
- Once all the worker instances are re-created, ssh into the
builder-datastorenode, and re-run the
apply_config.shscript (this ensures that all the key files are properly sent over to the workers).
- Update the maintenance window to complete.
If you see the following error during worker creation, ignore it - it is verbiage from trying to clean up networks that don’t actually exist.
module.builder_environment.module.builder.null_resource.worker_studio_network (remote-exec): error: Invalid value for '--ns-dir <NS_DIR>': directory '/hab/svc/builder-worker/data/network/airlock-ns' cannot be found, must exist