8000 HADOOP-18708. Adds in CSE. by ahmarsuhail · Pull Request #5767 · apache/hadoop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

HADOOP-18708. Adds in CSE. #5767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ahmarsuhail
Copy link
Contributor

Description of PR

  • Adds in client side encryption.

Also included in this PR:

  • HADOOP-18749 - Comment out assertion in ITestS3AHugeFilesNoMultipart.
  • HADOOP-18673 - Remove InconsistentS3ClientFactory, move getS3Region to it's own operation.

How was this patch tested?

Tested in eu-west-1 with mvn -Dparallel-tests -DtestsThreadCount=16 clean verify. CSE specific failures are fixed in #5763. No other failures.

ahmarsuhail and others added 20 commits May 17, 2023 10:17
See aws_sdk_v2_changelog.md for details.

Co-authored-by: Ahmar Suhail <ahmarsu@amazon.co.uk>
Co-authored-by: Alessandro Passaro <alexpax@amazon.co.uk>
addresses review comments + yetus errors

Co-authored-by: Ahmar Suhail <ahmarsu@amazon.co.uk>
…pache#5421)

Changes include
* use bundled transfer manager
* adds transfer listener to upload
* adds support for custom signers
* don't set default endpoint
* removes v1 sdk bundle, only use core package
* implements region caching
+ many more

Note: spotbugs is warning about inconsistent
synchronization in accessing a new s3a FS field.
This will be fixed in a follow-up patch.

Contributed by Ahmar Suhail
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 7 new or modified test files.
_ feature-HADOOP-18073-s3a-sdk-upgrade Compile Tests _
+0 🆗 mvndep 18m 53s Maven dependency ordering for branch
+1 💚 mvninstall 22m 37s feature-HADOOP-18073-s3a-sdk-upgrade passed
+1 💚 compile 18m 15s feature-HADOOP-18073-s3a-sdk-upgrade passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 16m 47s feature-HADOOP-18073-s3a-sdk-upgrade passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 4m 39s feature-HADOOP-18073-s3a-sdk-upgrade passed
+1 💚 mvnsite 1m 35s feature-HADOOP-18073-s3a-sdk-upgrade passed
+1 💚 javadoc 1m 21s feature-HADOOP-18073-s3a-sdk-upgrade passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 29s feature-HADOOP-18073-s3a-sdk-upgrade passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 54s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
-1 ❌ spotbugs 1m 23s /branch-spotbugs-hadoop-tools_hadoop-aws-warnings.html hadoop-tools/hadoop-aws in feature-HADOOP-18073-s3a-sdk-upgrade has 1 extant spotbugs warnings.
+1 💚 shadedclient 23m 34s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 1m 33s Maven dependency ordering for patch
+1 💚 mvninstall 0m 47s the patch passed
+1 💚 compile 17m 10s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 17m 10s the patch passed
+1 💚 compile 16m 31s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 javac 16m 31s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 3 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 4m 18s /results-checkstyle-root.txt root: The patch generated 10 new + 24 unchanged - 1 fixed = 34 total (was 25)
+1 💚 mvnsite 1m 49s the patch passed
+1 💚 javadoc 1m 31s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 34s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 45s hadoop-project has no data from spotbugs
+1 💚 spotbugs 1m 35s hadoop-tools/hadoop-aws generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1)
+1 💚 shadedclient 25m 45s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 41s hadoop-project in the patch passed.
+1 💚 unit 3m 4s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 11s The patch does not generate ASF License warnings.
196m 40s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5767/1/artifact/out/Dockerfile
GITHUB PR #5767
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname Linux 6fcae384e82b 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-HADOOP-18073-s3a-sdk-upgrade / c9a72d6
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5767/1/testReport/
Max. process+thread count 654 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5767/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

sorry, I'd missed this one.

@ahmarsuhail
Copy link
Contributor Author

this can wait a few days. I want to get https://issues.apache.org/jira/browse/HADOOP-18741 into this as well, and the SDK V2 now supports cross region requests so we can potentially get rid of the getS3Region() stuff.

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed this; sorry I'd missed it.

  1. I'd like to be set up to add customer managed keys, even if not done in this patch -params to S3 client MUST be ready for it.
  2. And region operation to take a callback interface (currently implemented in s3afs) so we can do some unit tests as well, e.g: bucket not found, auth failure and verify handling

@@ -1145,6 +1146,17 @@
</exclusion>
</exclusions>
</dependency>
<dependency> 8000
<groupId>software.amazon.encryption.s3</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this come in bundle.jar?

.wrappedAsyncClient(s3AsyncClient)
.wrappedClient(s3Client)
.enableLegacyUnauthenticatedModes(true)
.keyring(keyring)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've had some interest in supporting customer supplied keys too...this would be the time to add it -or at least leave the points for it open. in particular, that third argument should be a structure which contains the keyring and some enum of option to use

import org.apache.hadoop.fs.impl.prefetch.ExecutorServiceFuturePool;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import software.amazon.encryption.s3.materials.KmsKeyring;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be with reset of imports

* @param kmsKeyring kms wrapping key to be used
* @return S3EncryptionClient
*/
S3Client createS3EncryptionClient(S3AsyncClient s3AsyncClient, S3Client s3Client,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned, encryption options should be our own struct with an option to say kms keyring; placeholder for client key management

}

try (DurationInfo ignore =
new DurationInfo(LOG, false, "Get S3 region")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include bucket in log message

try (DurationInfo ignore =
new DurationInfo(LOG, false, "Get S3 region")) {

SET_REGION_WARNING.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move above the try() just to tune log ordering

// to the actual region the bucket is in. As the request is signed with us-east-1 and
// not the bucket's region, it fails.
S3Client getRegionS3Client =
S3Client.builder().region(Region.EU_WEST_1).credentialsProvider(credentials)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going to happen with govcloud or china regions?

I'd propose having a callback interface we've done with the other operations, so this class can be given unit tests with a different implementation.


if (exception.statusCode() == SC_404_NOT_FOUND) {
throw new UnknownStoreException("s3a://" + bucket + "/",
" Bucket does " + "not exist");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join the strings

@@ -600,4 +601,17 @@ public static boolean isSTSSignerCalled() {
return stsSignerCalled;
}
}

private S3Client getS3Client(String reason) throws IllegalAccessException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a javadoc to explain this

@@ -358,7 +359,7 @@ public void shouldBeAbleToSwitchOnS3PathStyleAccessViaConfigProperty()
try {
fs = S3ATestUtils.c A3E2 reateTestFileSystem(conf);
assertNotNull(fs);
S3Client s3 = fs.getAmazonS3ClientForTesting("configuration");
S3Client s3 = getS3Client("configuration");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about just force disabling CSE by removing base/bucket overrides and setting to false?

@steveloughran
Copy link
Contributor

I'm thinking this move is time to revisit my refactoring plans and actually split s3afs into two layers, filesystem and store, as abfs does. we'd have all those callback interfaces done in the store class, which would also contain the threadpool, transfer manager etc -and all those inner*() ops in s3a fs would move down with the audit context passed in as an explicit param.

lets make sure this pr is ready for it by using an interface for callbacks to create s3 client, ideally even invoke the region call. that way we can hide these classes from the details of the SDK

@steveloughran
Copy link
Contributor

@ahmarsuhail : side issue. does the cse algorithm implementation complain about it being a bad algorithm, as the v1 sdk does.

@asfgit asfgit force-pushed the feature-HADOOP-18073-s3a-sdk-upgrade branch from 75220b7 to acb31b6 Compare July 20, 2023 18:49
Copy link
Contributor
@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial review. Looks good, sorry about not checking this and the CSE test fix patch earlier, I think Mukund has already approved the test fix one. I'll focus on this bit more now.


private final StoreContext context;

private final static Map<String, Region> BUCKET_REGIONS = new HashMap<>();
Copy link
Contributor
@mehakmeet mehakmeet Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so we aren't caching in the SDK anymore?


return bucketRegion;
} catch (S3Exception exception) {
if (exception.statusCode() == SC_301_MOVED_PERMANENTLY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here explaining what we are doing here in case of 301

exception.awsErrorDetails().sdkHttpResponse().headers().get(BUCKET_REGION_HEADER)
.get(0));
BUCKET_REGIONS.put(bucket, bucketRegion);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log an error/warn about the exception caught before returning the region.


import org.apache.hadoop.util.Lists;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import software.amazon.encryption.s3.S3EncryptionClientException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the top with the rest of the same imports

< 10084 details-collapsible>
intercept(UnsupportedRequestException.class, () ->
fs2.rename(hugefile, hugefileRenamed));
}
// try (FileSystem fs2 = FileSystem.get(fs.getUri(), conf)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be changed I assume.

@steveloughran
Copy link
Contributor

thought: we should add a test to verify backwards compat with CSE v1.

proposed: src/test/resources to include a small binary file of CSE data, which is uploaded to s3 and then downloaded on a client with CSE and the same secrets.

would that work?

@steveloughran
Copy link
Contributor

@ahmarsuhail can you rebase this -we need to get it in ASAP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0