I am building an algorithmic trading app that runs Python scripts on Google Cloud VMs. I have two scripts:
startup.sh
: A script that installs dependencies and runs my application on the first VM boot.resume.sh
: A script that resumes execution of the application in case the VM is stopped and restarted.
I use a Google Cloud Function to create and configure the VM, passing metadata to determine if the startup script or the resume script should be executed.
The problem:
Even when I stop and restart my VM, it runs the startup script (startup.sh
) instead of the resume.sh
. I want to use metadata or some other mechanism to differentiate between a fresh startup and a resumed VM so that:
- On fresh startup: The VM runs
startup.sh
to set up dependencies and start my app. - On VM resume (after stop/start): The VM should run
resume.sh
, which resumes execution of my app without reinstalling dependencies.
Current setup:
Google Cloud Function: Creates the VM and passes metadata for session configurations. The metadata is used to decide if the strategy server should start.
def create_vm():
compute = googleapiclient.discovery.build('compute', 'v1')
vm_metadata = {
'items': [
{'key': 'session_configs', 'value': base64_encoded_config},
# Trying to pass some flag to indicate it's a resume
{'key': 'run_resume_script', 'value': 'false'}, # Trying to control resume
]
}
instance_body = {
'name': 'algotapp-vm',
'machineType': 'zones/us-central1-a/machineTypes/n1-standard-1',
'metadata': vm_metadata,
'disks': [{
# Disk config here
}],
'networkInterfaces': [{
# Network config here
}],
'tags': {
'items': ['http-server']
},
'serviceAccounts': [{
# Permissions config here
}],
'metadata': vm_metadata,
'tags': {'items': ['http-server']},
}
return compute.instances().insert(
project=PROJECT,
zone=ZONE,
body=instance_body
).execute()
Scripts:
startup.sh (this should only run on the first startup):
#!/bin/bash
USER_NAME=$(whoami)
USER_HOME=/home/${USER_NAME}
PROGRAM_DIR=${USER_HOME}/algotapp/program
VENV_DIR=${PROGRAM_DIR}/venv
FIRST_START_FILE="${PROGRAM_DIR}/.first_start_done"
SESSION_CONFIGS_PATH="/tmp/session_configs.json"
# Install dependencies and setup virtual environment
if [[ ! -f ${FIRST_START_FILE} ]]; then
python3 -m venv ${VENV_DIR}
source ${VENV_DIR}/bin/activate
pip install -r requirements.txt
touch ${FIRST_START_FILE}
fi
# Check for strategy_server_enabled in session configs
strategy_server_enabled=$(python3 - <<EOF
import json
with open('${SESSION_CONFIGS_PATH}') as f:
configs = json.load(f)
print(configs.get('app_config', {}).get('strategy_server_enabled', False))
EOF
)
if [[ "$strategy_server_enabled" == "True" ]]; then
echo "Starting server.py..."
nohup python3 ${PROGRAM_DIR}/server/server.py > ${PROGRAM_DIR}/server.log 2>&1 &
fi
# Start the main application
python3 ${PROGRAM_DIR}/AppEntry.py --session_configs /tmp/session_configs.json
resume.sh (this should run when the VM is restarted):
#!/bin/bash
USER_NAME=$(whoami)
USER_HOME=/home/${USER_NAME}
PROGRAM_DIR=${USER_HOME}/algotapp/program
VENV_DIR=${PROGRAM_DIR}/venv
SESSION_CONFIGS_PATH="/tmp/session_configs.json"
# Activate virtual environment
source ${VENV_DIR}/bin/activate
# Resume strategy server
strategy_server_enabled=$(python3 - <<EOF
import json
with open('${SESSION_CONFIGS_PATH}') as f:
configs = json.load(f)
print(configs.get('app_config', {}).get('strategy_server_enabled', False))
EOF
)
if [[ "$strategy_server_enabled" == "True" ]]; then
echo "Resuming server.py..."
nohup python3 ${PROGRAM_DIR}/server/server.py --resume_execution > ${PROGRAM_DIR}/server_resume.log 2>&1 &
fi
# Resume main application
python3 ${PROGRAM_DIR}/AppEntry.py --resume_execution True --session_configs $SESSION_CONFIGS_PATH
My attempt to switch between scripts:
In startup.sh
, I tried to add logic to enable a systemd service for resume.sh
, but I can’t control whether the VM should run the startup or resume script when restarted. Here’s the relevant part of the script that attempts to set up the service:
if [[ ! -f ${FIRST_START_FILE} ]]; then
cat << EOF | sudo tee /etc/systemd/system/resume-algotapp.service
[Unit]
Description=Resume AlgoTapp Strategy Execution
After=network.target
[Service]
Type=simple
User=${USER_NAME}
ExecStart=${PROGRAM_DIR}/cmd/resume.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable resume-algotapp.service
touch ${FIRST_START_FILE}
fi
Questions:
- How can I reliably differentiate between a fresh startup and a VM resume after stop/start so that the appropriate script (
startup.sh
orresume.sh
) is executed? - Is there a way to pass metadata indicating that the VM is being resumed, or is there another mechanism (e.g., startup-script,
run-resume-script
flag) I can use to control this? - Can I modify the Google Cloud Function to handle this scenario, or should I rely entirely on systemd or another method within the VM?
Any guidance or examples of handling startup vs resume for VMs would be appreciated!