Error building Dockerfile with Apache Arrow and re2 on Alpine Linux
I am trying to build a Docker image that includes installing Apache Arrow from the source, but I am encountering an error related to the re2 library. Here is my Dockerfile:
FROM python:3.11-alpine
# Install system dependencies
RUN apk add --no-cache
gcc
libc-dev
python3-dev
libffi-dev
openssl-dev
g++
libgomp
libstdc++
cmake
autoconf
automake
libtool
gdal-dev
proj
proj-dev
proj-util
linux-headers
geos-dev
make
zlib-dev
bash
# Set PROJ environment variables
ENV PROJ_DIR=/usr
ENV PROJ_LIBDIR=/usr/lib
ENV PROJ_INCDIR=/usr/include
# Copy and install Python dependencies
COPY requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip install --no-cache-dir -r requirements.txt
# Install specific version of Cython
RUN pip install --no-cache-dir "cython<3.0"
# Install pyarrow from source with fix for re2
RUN apk add --no-cache --virtual .build-deps
build-base
cmake
git
boost-dev
zlib-dev
bzip2-dev
snappy-dev
lz4-dev
zstd-dev
brotli-dev
py3-numpy-dev
libc-dev
libffi-dev
openssl-dev
cython
thrift &&
git clone --branch apache-arrow-16.0.0 https://github.com/apache/arrow.git &&
cd arrow/cpp &&
mkdir build &&
cd build &&
cmake -DCMAKE_BUILD_TYPE=release
-DCMAKE_INSTALL_LIBDIR=lib
-DCMAKE_INSTALL_PREFIX=/usr/local
-DARROW_WITH_BZ2=ON
-DARROW_WITH_ZLIB=ON
-DARROW_WITH_ZSTD=ON
-DARROW_WITH_LZ4=ON
-DARROW_WITH_SNAPPY=ON
-DARROW_PARQUET=ON
-DARROW_PYTHON=ON
-DARROW_BUILD_TESTS=OFF
.. &&
# Apply patch to make Findre2.cmake idempotent
sed -i '/add_library(re2::re2 INTERFACE IMPORTED)/i if(NOT TARGET re2::re2)' /opt/local/lib/cmake/grpc/modules/Findre2.cmake &&
sed -i '/add_library(re2::re2 INTERFACE IMPORTED)/a endif()' /opt/local/lib/cmake/grpc/modules/Findre2.cmake &&
make -j$(nproc) &&
make install &&
cd ../../python &&
python setup.py build_ext --build-type=release --with-parquet --inplace &&
apk del .build-deps &&
rm -rf /arrow
# Install pyproj from GitHub repository
RUN pip install --no-cache-dir pyproj==3.6.1
# Cleanup
RUN rm -rf /var/cache/apk/*
&& rm -rf /root/.cache
# Set the command to run your application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
When I run docker build -t python-api-v2:latest ., I get the following
error:
docker build -t python-api-v2:latest .
[+] Building 39.8s (11/13) docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.68kB 0.0s
=> [internal] load metadata for docker.io/library/python:3.11-alpine 1.6s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 3.32kB 0.0s
=> [1/9] FROM docker.io/library/python:3.11-alpine@sha256:0b5ed25d3cc27cd35c7b0352bac8ef2ebc8dd3da72a0c03caaf4eb15d9ec827a 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 38B 0.0s
=> CACHED [2/9] RUN apk add --no-cache gcc libc-dev python3-dev libffi-dev openssl-dev g++ libgomp libstdc++ cmake autoconf automa 0.0s
=> CACHED [3/9] COPY requirements.txt /app/requirements.txt 0.0s
=> CACHED [4/9] WORKDIR /app 0.0s
=> CACHED [5/9] RUN pip install --no-cache-dir -r requirements.txt 0.0s
=> CACHED [6/9] RUN pip install --no-cache-dir "cython<3.0" 0.0s
=> ERROR [7/9] RUN apk add --no-cache --virtual .build-deps build-base cmake git boost-dev zlib-dev bzip2-dev snappy-dev lz4-dev zstd-dev 38.1s
------
Dockerfile:82
--------------------
81 | # Install pyarrow from source with fix for re2
82 | >>> RUN apk add --no-cache --virtual .build-deps
83 | >>> build-base
84 | >>> cmake
85 | >>> git
86 | >>> boost-dev
87 | >>> zlib-dev
88 | >>> bzip2-dev
89 | >>> snappy-dev
90 | >>> lz4-dev
91 | >>> zstd-dev
92 | >>> brotli-dev
93 | >>> py3-numpy-dev
94 | >>> libc-dev
95 | >>> libffi-dev
96 | >>> openssl-dev
97 | >>> cython
98 | >>> thrift &&
99 | >>> git clone --branch apache-arrow-16.0.0 https://github.com/apache/arrow.git &&
100 | >>> cd arrow/cpp &&
101 | >>> mkdir build &&
102 | >>> cd build &&
103 | >>> cmake -DCMAKE_BUILD_TYPE=release
104 | >>> -DCMAKE_INSTALL_LIBDIR=lib
105 | >>> -DCMAKE_INSTALL_PREFIX=/usr/local
106 | >>> -DARROW_WITH_BZ2=ON
107 | >>> -DARROW_WITH_ZLIB=ON
108 | >>> -DARROW_WITH_ZSTD=ON
109 | >>> -DARROW_WITH_LZ4=ON
110 | >>> -DARROW_WITH_SNAPPY=ON
112 | >>> -DARROW_PYTHON=ON
113 | >>> -DARROW_BUILD_TESTS=OFF
114 | >>> .. &&
115 | >>> # Apply patch to make Findre2.cmake idempotent
116 | >>> sed -i '/add_library(re2::re2 INTERFACE IMPORTED)/i if(NOT TARGET re2::re2)' /opt/local/lib/cmake/grpc/modules/Findre2.cmake &&
117 | >>> sed -i '/add_library(re2::re2 INTERFACE IMPORTED)/a endif()' /opt/local/lib/cmake/grpc/modules/Findre2.cmake &&
118 | >>> make -j$(nproc) &&
119 | >>> make install &&
120 | >>> cd ../../python &&
121 | >>> python setup.py build_ext --build-type=release --with-parquet --inplace &&
122 | >>> apk del .build-deps &&
123 | >>> rm -rf /arrow
124 |
--------------------
ERROR: failed to solve: process "/bin/sh -c apk add --no-cache --virtual .build-deps build-base cmake git boost-dev zlib-dev bzip2-dev snappy-dev lz4-dev zstd-dev brotli-dev py3-numpy-dev libc-dev libffi-dev openssl-dev cython thrift && git clone --branch apache-arrow-16.0.0 https://github.com/apache/arrow.git && cd arrow/cpp && mkdir build && cd build && cmake -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_INSTALL_PREFIX=/usr/local -DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_PARQUET=ON -DARROW_PYTHON=ON -DARROW_BUILD_TESTS=OFF .. && sed -i '/add_library(re2::re2 INTERFACE IMPORTED)/i if(NOT TARGET re2::re2)' /opt/local/lib/cmake/grpc/modules/Findre2.cmake && sed -i '/add_library(re2::re2 INTERFACE IMPORTED)/a endif()' /opt/local/lib/cmake/grpc/modules/Findre2.cmake && make -j$(nproc) && make install && cd ../../python && python setup.py build_ext --build-type=release --with-parquet --inplace && apk del .build-deps && rm -rf /arrow" did not complete successfully: exit code: 1
`sed: /app/arrow/cpp/build/re2_ep-prefix/src/re2_ep/util/pcre.h: No such file or directory`
I have tried several approaches, including installing additional dependencies and modifying the build script, but I still cannot resolve the issue.
I have also ensured that all necessary dependencies are installed, and tried different versions of the libraries, but the problem persists.
Question:
How can I resolve the No such file or directory error related to pcre.h and pcre.cc when building Apache Arrow with re2 on Alpine Linux? Is there a specific step or configuration that I am missing to ensure the re2 library is correctly patched and built?
Any suggestions on how to fix this problem?
Thanks!
Mauricio Bedun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.