Update!
After continuing to play around with this problem I’ve discovered it might not be a dind problem but possibly more of a teamcity hosted on docker problem. It, however, is obviously possible to host teamcity on docker and many people do it, so there must be a solution.
What I know now:
-
Regardless of if I try to run a docker build on my dind container (discussed below), the agent itself, an external VM running a docker server, or binding the docker.sock of the agent container to it’s host, I encounter the same problem as described below.
The tl;dr is failing when it tries to run:
bin/sh -c . /opt/buildagent/temp/agentTmp/docker-wrapper-5897200216492230985.sh && docker exec -w /opt/buildagent/work/33e31afd4a2c64f0 6ec5e71a142a711bf3bc97b2258dd89e153eaaf9b8a906439b31d728c887588e /bin/sh -c /opt/buildagent/temp/agentTmp/docker-shell-script-10444342258323450815.sh` which results in this error being logged `/bin/sh: /opt/buildagent/temp/agentTmp/docker-shell-script-10444342258323450815.sh: not found
Upon further investigations it becomes apparent that
docker-shell-script-{id}.sh
is available on the build agent but is unable to findcustom_script{id2}' which causes the
not found` error. -
The error is not caused by dind since the same error happens even when not doing dind.
-
However, the problem does go away when I try to do a build that does not use docker at all.
######################## Original Post ########################
Goals
- Host Teamcity on docker. This includes the server, database, agents, and any other containers needed to support that effort.
- Isolate the builds from the docker host to avoid negative impacts from builds on the other applications running on the docker server. (Performance tuning will be done after it is working.)
- Use docker compose to deploy the system.
- Avoid manual customization after running
docker-compose up -d
. Everything should be handled in the docker compose file or a bash script that can be run before running docker.
The Setup
- Docker Host
- Debian 12
- Docker version 20.10.24
Teamcity on Docker
- Summary – This is a standard Teamcity on docker setup except there is one additional container
teamcity-dind
. This container runs thenestybox/ubuntu-noble-systemd-docker
image with the sysbox runtime. This combination gives me a container that is much more like a VM than a normal container as it gives me systemd and dockerd within the container. I expose port 2375 for the agents to connect when they need to run docker workloads. The rest of the configuration is pretty standard. After setting up, everything was imported from our Windows installation of Teamcity that we want to replace with this solution.
The docker-compose.yaml:
services:
teamcity-db:
image: postgres:latest
container_name: ${POSTGRES_HOST}
restart: unless-stopped
environment:
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_DB=${POSTGRES_DB}
- PG_DATA=/var/lib/postgresql/data
volumes:
- ./buildserver_pgdata:/var/lib/postgresql/data
networks:
- teamcity_net
teamcity:
image: jetbrains/teamcity-server:${TEAMCITY_VERSION}
container_name: teamcity
volumes:
- ./data_dir:/data/teamcity_server/datadir
- ./teamcity-server-logs:/opt/teamcity/logs
- ./ssh:/opt/teamcity/.ssh
labels:
# HTTPS routing for the dashboard
- "traefik.enable=true"
- "traefik.http.routers.teamcity.rule=Host(`teamcity.srom.local`)"
- "traefik.http.routers.teamcity.entrypoints=web"
- "traefik.http.routers.teamcity.service=teamcity"
- "traefik.http.services.teamcity.loadbalancer.server.port=8111"
depends_on:
- teamcity-db
networks:
- teamcity_net
restart: unless-stopped
teamcity-dind: #This container runs the build containers in an isolated are using sysbox.
image: nestybox/ubuntu-noble-systemd-docker:latest
container_name: teamcity-dind
runtime: sysbox-runc
privileged: false
environment:
- DOCKER_TLS_CERTDIR=""
volumes:
- teamcity-dind-data:/data
- teamcity-dind-cache:/var/lib/docker
- ./dind-override.conf:/etc/systemd/system/docker.service.d/override.conf:ro
networks:
- teamcity_net
restart: unless-stopped
teamcity-agent-1:
build:
context: .
dockerfile: Dockerfile.agent
args:
- TEAMCITY_VERSION=${TEAMCITY_VERSION}
- NODE_VERSION=${NODE_VERSION}
container_name: teamcity-agent-1
depends_on:
- teamcity-dind
volumes:
- ./agents/agent-1/conf:/data/teamcity_agent/conf
environment:
- SERVER_URL=http://teamcity:8111
- DOCKER_HOST=tcp://teamcity-dind:2375 # Point to the Docker daemon running in the DinD container
networks:
- teamcity_net
restart: unless-stopped
teamcity-agent-2:
build:
context: .
dockerfile: Dockerfile.agent
args:
- TEAMCITY_VERSION=${TEAMCITY_VERSION}
- NODE_VERSION=${NODE_VERSION}
container_name: teamcity-agent-2
depends_on:
- teamcity-dind
volumes:
- ./agents/agent-2/conf:/data/teamcity_agent/conf
environment:
- SERVER_URL=http://teamcity:8111
- DOCKER_HOST=tcp://teamcity-dind:2375 # Point to the Docker daemon running in the DinD container
networks:
- teamcity_net
restart: unless-stopped
networks:
teamcity_net:
external: true
volumes:
teamcity-dind-data:
Build Configuration
As you can see, this build step is very simple. It just echos ‘hello world’ but it is instructed to run this inside of the alpine:latest container.
Expected Results
The build step to complete and echo ‘hello world’. This works just fine if I execute on the agent instead of passing it off to a container.
Actual Results
The build process fails:
Step 1: test (Command Line)
22:34:05 Running step within container alpine:latest
22:34:05 Starting: . /opt/buildagent/temp/agentTmp/docker-wrapper-1006525611025290252.sh && docker run --rm -w /opt/buildagent/work/33e31afd4a2c64f0 --label jetbrains.teamcity.buildId=2153 -id -v "/opt/buildagent/lib:/opt/buildagent/lib:ro" -v "/opt/buildagent/tools:/opt/buildagent/tools:ro" -v "/opt/buildagent/plugins:/opt/buildagent/plugins:ro" -v "/opt/buildagent/work/33e31afd4a2c64f0:/opt/buildagent/work/33e31afd4a2c64f0" -v "/opt/buildagent/temp/agentTmp:/opt/buildagent/temp/agentTmp" -v "/opt/buildagent/temp/buildTmp:/opt/buildagent/temp/buildTmp" -v "/opt/buildagent/system:/opt/buildagent/system" --env-file /opt/buildagent/temp/agentTmp/docker-wrapper-12799287116685116967.envList --entrypoint /bin/sh "alpine:latest"
22:34:06 Process exited with code 0
22:34:06 Successfully created a reusable container, container id = 704738e10303a5912b0b06d29f9d9b00e141b12a71c83949df69c5bc163483bf
22:34:06 Starting: /bin/sh -c . /opt/buildagent/temp/agentTmp/docker-wrapper-5770134226595135069.sh && docker exec -w /opt/buildagent/work/33e31afd4a2c64f0 704738e10303a5912b0b06d29f9d9b00e141b12a71c83949df69c5bc163483bf /bin/sh -c /opt/buildagent/temp/agentTmp/docker-shell-script-11090623290713555801.sh
22:34:06 in directory: /opt/buildagent/work/33e31afd4a2c64f0
22:34:06 /bin/sh: /opt/buildagent/temp/agentTmp/docker-shell-script-11090623290713555801.sh: not found
22:34:06 Process exited with code 127
22:34:06 Process exited with code 127 (Step: test (Command Line))
22:34:07 Step test (Command Line) failed
As I investigated I discovered that the agent in fact had the file that was ‘not found’, /opt/buildagent/temp/agentTmp/docker-shell-script-11090623290713555801.sh
, but, it tries to run another file /opt/buildagent/temp/agentTmp/custom_script17351432578670884465
that is absent from the file system.
buildagent@c6a26dfd854d:/$ ls -la /opt/buildagent/temp/agentTmp/custom_script17351432578670884465
ls: cannot access '/opt/buildagent/temp/agentTmp/custom_script17351432578670884465': No such file or directory
This seems to be the cause of the failure. It seems that this file is never passed to the agent by the server, unless it is somehow deleted before I can see it. When I investigate the dind container, I can see that the agent was communicating with the dind container as it has downloaded the container image.
Things I Plan to Try Next
- Setup an agent on a VM instead of a container and see if that works. I’m guessing it will but I don’t want that setup.
- Tried this, the result is unexpectedly the same as when trying to run in the dind container. So the problem is not the dind container itself.
- Figure out what this missing custom_script is supposed to be and why it isn’t being passed.
- Hope someone here has some insights.
Why even bother?
I think there is a lot of value in being able to run the docker builds inside a more isolated sysbox container instead of having to spin up additional VM’s or give the agents free rein on the host’s docker socket. I like being able to control the resource allocation so a bad build doesn’t bring down the other applications hosted on my docker server. It seems like a much better setup, but it has been resisting my attempts to make it work. This teamcity instance will not see high traffic or I would consider placing my build agents in their own VMs.
1