Getting started with Ray on AWS cluster and trying to understand the declarative yaml config as in ray github. I can see it is possible to directly add the Docker images of ray on the AWS ec2 instances fired up by ray itself. But I have a few questions
- I don’t see any option for replicas of the containers running based on the docker images, although I can see parameter to configure the number of nodes (
max_workers
). Does it mean every worker will run only one worker container? Or, can it have a Replicaset like a Kubernetes cluster where the if the deployment specifies replicas that is independent of number of nodes? - If my application requires some specific library that does not come by default in
rayproject/ray-ml
, then what’s the best way to make sure it is available to every worker? I can base my docker image as
FROM rayproject/ray-ml
... # Install my own libraries
CMD [...] # What to do here?
then push to my registry in Dockerhub. But is there a recommended way to fire up the necessary entry points in the Docker image so that ray master container can see the worker containers?
Ideas or documentations around these will be sincerely appreciated.