I have a python script which runs on command line interface. This script contains some python unit tests for the functions in the script. The repository I created on Gitlab and my Gitlab runner is hosted on openstack cloud linux instance which has 16GB-RAM/100GB-HDD/8-CPU.
Now, when the test job is running on the runner it has a small bash script which downloads two files of size 4GB and 1.2GB. Everytime I create Pull request gitlab CI pipeline downloads these two files takes longer to finish the job. Initially I thought of caching these two files so, when everytime when I push the code and create PR or someone else does it will not download these two files and instead will use cached files.
Here is my basic project structure and .gitlab-ci.yml
configuration:
.
└── myproject/
├── .githookes/
├── fixture_data/
│ ├── fixture_data_1.txt # This is size of 5GB
│ └── fixture_data_2.txt # This is size of 1.2GB
├── src/
│ └── main_script.py
├── tests/
│ └── test_test_script.py
├── download_fixture_data.sh
├── .gitlab-ci.yml
├── .gitignore
└── Dockerfile
And .gitlab-ci.yml
.
stages:
- build
- test
cache:
paths:
- fixture_data/fixture_data_1.txt
- fixture_data/fixture_data_2.txt
build:
stage: build
image: docker:23.0.6
tags:
- openstack-autoscale
- autoscale-docker-in-docker
services:
- name: docker:23.0.6-dind
script:
- make build-docker
rules:
- if: $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == $CI_DEFAULT_BRANCH
exists:
- Dockerfile
- changes:
- requirements.txt
- Dockerfile
test:
image: $DOCKER_IMAGE_NAME
stage: test
tags:
- openstack-autoscale
- autoscale-docker-in-docker
before_script:
- ./download_fixture_data.sh
cache:
key: ${{ path }} # Inherit cache key from the root of the pipeline
paths:
- fixture_data/fixture_data_1.txt
- fixture_data/fixture_data_2.txt
script:
- docker run my_image pip install -r requirements.txt
- docker run my_image python -m unittest
download_fixture_data.sh
:
#!/bin/bash
mkdir fixture_data
curl -o fixture_data/fixture_data_1.txt https://some-url-to-download-fixture_data_1.txt
curl -o fixture_data/fixture_data_2.txt https://some-url-to-download-fixture_data_2.txt
With above configuration gitlab pipeline and jobs are running fine but it is not doing what it supposed to do, that is caching two files for next PR pipeline run.
I am not sure why it is not working and whether which approach to follow when caching files for gitlab ci pipeline. I have three people including me working on the same repo. If you could guide me or possibly could tell what am I missing or wrong here, would be great!