I need to run a cluster that processes jobs which require a desktop GUI. One of the issues I’m running into is after some period of time (<12 hours), the instance becomes entirely unreachable via (SSH/HTTP/RDP), I see this in the AWS console. I’ve attempted to take instance screenshots using the AWS console, but those also fail.
- If I restart the instance, everything works just fine
- I’ve tried this with MATE and Ubuntu desktop, the same thing happens
- On MATE desktop, I took a screenshot under “Monitor and troubleshoot” in AWS Console
- This showed a login screen which prompted me to disable automatic logout using the settings UI, but the instance was still unreachable
- I assumed I didn’t properly disable automatic logout on MATE, so I switched to Ubuntu desktop where doing so was straight forward.
- When using Ubuntu desktop, the instance becomes unreachable and then won’t restart: I need to force stop the instance
Details about the VM:
- AMI: ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20240701.1
- Instance Type: g4dn.xlarge
- Appropriate ports are all open for inbound and outbound connection (I can connect until the machine is idle for some time)
- Autologout is disabled
Here is the script I use to set up the instance:
echo "=== Updating package lists"
export NEEDRESTART_SUSPEND=1
sudo apt update
echo "=== Installing AWS CLI"
sudo snap install aws-cli --classic
echo "=== Installing GPU drivers"
sudo apt install ubuntu-drivers-common
sudo ubuntu-drivers install nvidia:535
echo 'blacklist vga16fb' | sudo tee -a /etc/modprobe.d/blacklist.conf
echo 'blacklist nouveau' | sudo tee -a /etc/modprobe.d/blacklist.conf
echo 'blacklist rivafb' | sudo tee -a /etc/modprobe.d/blacklist.conf
echo 'blacklist nvidiafb' | sudo tee -a /etc/modprobe.d/blacklist.conf
echo 'blacklist rivatv' | sudo tee -a /etc/modprobe.d/blacklist.conf
echo "=== Installing desktop and RDP"
sudo apt install -y ubuntu-desktop xrdp
echo "gnome-session" > ~/.xsession
sudo adduser xrdp ssl-cert
sudo ufw allow 3389/tcp
echo "=== Upgrading the system"
sudo apt upgrade -y
echo "=== Rebooting the system"
sudo reboot