I have built an image from: https://archive.apache.org/dist/spark/spark-$2.4.4/spark-$2.4.4-bin-hadoop2.7.tgz
Once is downloaded, I do:
cd spark-2.4.4-bin-hadoop2.7 && bin/docker-image-tool.sh build
Then my image spark-py:latest
is built.
I want to install pyarrow in it using this docker file:
FROM spark-py:latest
COPY *.jar /opt/spark/jars/
RUN rm /opt/spark/jars/kubernetes-*-4.1.2.jar
RUN apk add --no-cache \
build-base \
cmake \
bash \
boost-dev \
autoconf \
zlib-dev \
flex \
bison \
g++
RUN wget -q https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py && rm -f get-pip.py
RUN apk update
RUN apk add --update --no-cache py3-arrow
but I have an error:
> [8/8] RUN apk add --update --no-cache py3-arrow:
#12 0.552 fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz
#12 1.269 fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz
#12 1.672 ERROR: unsatisfiable constraints:
#12 1.688 py3-arrow (missing):
#12 1.688 required by: world[py3-arrow]
The repository is here: https://pkgs.alpinelinux.org/package/edge/testing/x86/py3-arrow
I can see that is repository testing but I don't know how to install it.
This package is located in the testing repository. By default /etc/apk/repositories
doesn't contain one.
You could add repo with the apk add
command.
RUN apk add \
--no-cache -X http://dl-cdn.alpinelinux.org/alpine/edge/testing \
--update --no-cache py3-arrow
or insert it directly to the end of /etc/apk/repositories
RUN echo 'http://dl-cdn.alpinelinux.org/alpine/edge/testing' >> /etc/apk/repositories
RUN apk add --update --no-cache py3-arrow