I’m setting up a Hadoop environment using Docker and encountering an error during installation:
My setup:
-
Using Docker-compose.yml to configure multiple services (namenode, datanode, resourcemanager, nodemanager, Spark master/worker).
-
Docker-compose.yml excerpt:
version: '3'
services:
namenode:
image: apache/hadoop:3
hostname: namenode
command: [ "hdfs", "namenode" ]
ports:
- 9870:9870
env_file:
- ./config2
environment:
ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name"
HADOOP_HOME: /opt/hadoop
PATH: /opt/hadoop/bin:/opt/hadoop/sbin:$PATH
volumes:
- ./hadoop-entrypoint.sh:/hadoop-entrypoint.sh
entrypoint: ["/hadoop-entrypoint.sh"]
datanode:
image: apache/hadoop:3
command: [ "hdfs", "datanode" ]
env_file:
- ./config2
environment:
HADOOP_HOME: /opt/hadoop
PATH: /opt/hadoop/bin:/opt/hadoop/sbin:$PATH
volumes:
- ./hadoop-entrypoint.sh:/hadoop-entrypoint.sh
entrypoint: ["/hadoop-entrypoint.sh"]
resourcemanager:
image: apache/hadoop:3
hostname: resourcemanager
command: [ "yarn", "resourcemanager" ]
ports:
- 8088:8088
env_file:
- ./config2
environment:
HADOOP_HOME: /opt/hadoop
PATH: /opt/hadoop/bin:/opt/hadoop/sbin:$PATH
volumes:
- ./test.sh:/opt/test.sh
- ./hadoop-entrypoint.sh:/hadoop-entrypoint.sh
entrypoint: ["/hadoop-entrypoint.sh"]
nodemanager:
image: apache/hadoop:3
command: [ "yarn", "nodemanager" ]
env_file:
- ./config2
environment:
HADOOP_HOME: /opt/hadoop
PATH: /opt/hadoop/bin:/opt/hadoop/sbin:$PATH
volumes:
- ./hadoop-entrypoint.sh:/hadoop-entrypoint.sh
entrypoint: ["/hadoop-entrypoint.sh"]
spark-master:
container_name: spark-master
hostname: spark-master
build:
context: .
dockerfile: Dockerfile.spark
command: bin/spark-class org.apache.spark.deploy.master.Master
volumes:
- ./config:/opt/bitnami/spark/config
- ./jobs:/opt/bitnami/spark/jobs
- ./datasets:/opt/bitnami/spark/datasets
- ./requirements.txt:/requirements.txt
ports:
- "9090:8080"
- "7077:7077"
networks:
- code-with-yu
spark-worker: &worker
container_name: spark-worker
hostname: spark-worker
build:
context: .
dockerfile: Dockerfile.spark
command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
volumes:
- ./config:/opt/bitnami/spark/config
- ./jobs:/opt/bitnami/spark/jobs
- ./datasets:/opt/bitnami/spark/datasets
- ./requirements.txt:/requirements.txt
depends_on:
- spark-master
environment:
SPARK_MODE: worker
SPARK_WORKER_CORES: 2
SPARK_WORKER_MEMORY: 1g
SPARK_MASTER_URL: spark://spark-master:7077
networks:
- code-with-yu
# spark-worker-2:
# <<: *worker
#
# spark-worker-3:
# <<: *worker
#
# spark-worker-4:
# <<: *worker
networks:
code-with-yu:
- Dockerfile.spark
FROM apache/hadoop:latest
FROM bitnami/spark:latest
COPY requirements.txt .
USER root
RUN apt-get clean &&
apt-get update &&
apt-get install -y python3-pip &&
pip3 install -r ./requirements.txt
Any help or insights would be greatly appreciated.
Using apache/hadoop:3 for Hadoop services and bitnami/spark:latest for Spark services.
Despite setting HADOOP_HOME in Docker-compose.yml, the error persists. How can I properly set HADOOP_HOME in this Docker environment? What might be causing this error?
Additional info:
OS: Windows 11
Docker version: 26.1.1
Docker-compose version: v2.27.0-desktop.2