8000 [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile by Yikun · Pull Request #2 · apache/spark-docker · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[SPARK-40516] Add Apache Spark 3.3.0 Dockerfile #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Yikun
Copy link
Member
@Yikun Yikun commented Oct 10, 2022

What changes were proposed in this pull request?

This patch adds Apache Spark 3.3.0 Dockerfile:

  • 3.3.0-scala2.12-java11-python3-ubuntu: pyspark + scala
  • 3.3.0-scala2.12-java11-ubuntu: scala
  • 3.3.0-scala2.12-java11-r-ubuntu: sparkr + scala
  • 3.3.0-scala2.12-java11-python3-r-ubuntu: All in one image

Why are the changes needed?

This is needed by Docker Official Image

See also in: https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o

Does this PR introduce any user-facing change?

No

How was this patch tested?

The action won't be triggered until the workflow is merged to the default branch, so I can only test it in my local repo:

@Yikun Yikun changed the title [WIP][SPARK-40516] Add Apache Spark 3.3.0 Dockerfile and test [WIP][SPARK-40516] Add Apache Spark 3.3.0 Dockerfile Oct 10, 2022
@Yikun Yikun changed the title [WIP][SPARK-40516] Add Apache Spark 3.3.0 Dockerfile [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile Oct 10, 2022
@Yikun Yikun force-pushed the SPARK-40516 branch 2 times, most recently from 89ef9ec to 5652970 Compare October 10, 2022 15:13
@Yikun Yikun marked this pull request as ready for review October 10, 2022 15:13
@Yikun Yikun marked this pull request as draft October 10, 2022 15:14
@Yikun Yikun force-pushed the SPARK-40516 branch 3 times, most recently from d42ee5c to 546f1d1 Compare October 11, 2022 00:46
@Yikun Yikun marked this pull request as ready for review October 11, 2022 01:06
@Yikun
Copy link
Member Author
Yikun commented Oct 11, 2022

cc @HyukjinKwon @zhengruifeng

After this patch merged, we can update the URL to apache/spark-docker in docker-library/official-images#13089 .

Meanwhile, we will add more CI and generate script in followup PRs.

@Yikun
Copy link
Member Author
Yikun commented Oct 11, 2022

Review notes:

You can review 3.3.0/scala2.12-java11-python3-r-ubuntu/* first and below diff are other dockerfiles with all in one image

SparkR specifc change

➜  3.3.0 git:(SPARK-40516) diff scala2.12-java11-python3-r-ubuntu/Dockerfile scala2.12-java11-python3-ubuntu/Dockerfile
27d26
<     apt install -y r-base r-base-dev && \
67d65
<     mv R /opt/spark/; \
74d71
< ENV R_HOME /usr/lib/R

PySpark specifc change

➜  3.3.0 git:(SPARK-40516) diff scala2.12-java11-python3-r-ubuntu/Dockerfile scala2.12-java11-r-ubuntu/Dockerfile
25,26d24
<     apt install -y python3 python3-pip && \
<     pip3 install --upgrade pip setuptools && \
29d26
<     mkdir /opt/spark/python && \
65,66d61
<     mv python/pyspark /opt/spark/python/pyspark/; \
<     mv python/lib /opt/spark/python/lib/; \

PySpark and SparkR specifc change

➜  3.3.0 git:(SPARK-40516) diff scala2.12-java11-python3-r-ubuntu/Dockerfile scala2.12-java11-ubuntu/Dockerfile
25,27d24
<     apt install -y python3 python3-pip && \
<     pip3 install --upgrade pip setuptools && \
<     apt install -y r-base r-base-dev && \
29d25
<     mkdir /opt/spark/python && \
65,67d60
<     mv python/pyspark /opt/spark/python/pyspark/; \
<     mv python/lib /opt/spark/python/lib/; \
<     mv R /opt/spark/; \
74d66
< ENV R_HOME /usr/lib/R

with:
spark: 3.3.0
scala: 2.12
java: 11

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we have images for java: 8 ?

Copy link
Member Author
@Yikun Yikun Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the initial PR (this PR), there won't be. Main consideration as below:

  1. Considering that https://hub.docker.com/r/apache/spark currently only has java 11.

  2. The speed of DOI's PR review of new images will be relatively slow. Our top priority is to complete the review of the first image dockerfile. After this, update review will be very soon, only 2-3 days.

  3. As planned, we will add some scripts to automatically generate dockerfiles of different versions in follow up(such as java/scala/spark version).

But in future, we will consider add all java versions for spark supported (Of course it depends on community demand).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I ask it just because java 8 seems faster than 11/17 for now.

see apache/spark#37799 (comment)

@Yikun
Copy link
Member Author
Yikun commented Oct 11, 2022

@HyukjinKwon @zhengruifeng Thanks, I will merge this PR soon!

@Yikun Yikun closed this in e61aba1 Oct 11, 2022
@tgravescs
Copy link

so random question, what are the Apache requirements for LICENSE file and copyright notices for docker files? Especially if we are going to actually release the images. Sorry if I missed it on mailing list discussion

@Yikun
Copy link
Member Author
Yikun commented Oct 11, 2022

@tgravescs Thanks Tom, it's a very import reminder!

Just like apache/spark, all dockerfiles are under Apache 2.0 License, I haven't upload the LICENSE FILE yet. : )

What we need to do just according https://www.apache.org/licenses/LICENSE-2.0#apply:

  1. Add LICENCE and NOTICE in repo. I just submited a JIRA: https://issues.apache.org/jira/browse/SPARK-40754
  2. Add LICENSE header for each files.

@tgravescs
Copy link

Ok, I'm mostly curious about the actual docker image publishing, do we need special NOTICE-binary files or anything to be able to properly publish?

@Yikun
Copy link
Member Author
Yikun commented Oct 11, 2022

@tgravescs From existing apache repo, it doesn't inculded
https://github.com/apache/solr-docker
https://github.com/apache/flink-docker

BTW, the image might included two mean:

Yikun added a commit that referenced this pull request Oct 13, 2022
### What changes were proposed in this pull request?
This pach adds LICENSE and NOTICE:
- LICENSE: https://www.apache.org/licenses/LICENSE-2.0.txt
- NOTICE: https://github.com/apache/spark/blob/master/NOTICE

### Why are the changes needed?
https://www.apache.org/licenses/LICENSE-2.0#apply

See also: #2 (comment)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No need

Closes #6 from Yikun/SPARK-40754.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0