-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-18708. Adds in CSE. #5767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18708. Adds in CSE. #5767
Conversation
See aws_sdk_v2_changelog.md for details. Co-authored-by: Ahmar Suhail <ahmarsu@amazon.co.uk> Co-authored-by: Alessandro Passaro <alexpax@amazon.co.uk>
addresses review comments + yetus errors Co-authored-by: Ahmar Suhail <ahmarsu@amazon.co.uk>
…pache#5421) Changes include * use bundled transfer manager * adds transfer listener to upload * adds support for custom signers * don't set default endpoint * removes v1 sdk bundle, only use core package * implements region caching + many more Note: spotbugs is warning about inconsistent synchronization in accessing a new s3a FS field. This will be fixed in a follow-up patch. Contributed by Ahmar Suhail
💔 -1 overall
This message was automatically generated. |
sorry, I'd missed this one. |
this can wait a few days. I want to get https://issues.apache.org/jira/browse/HADOOP-18741 into this as well, and the SDK V2 now supports cross region requests so we can potentially get rid of the getS3Region() stuff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed this; sorry I'd missed it.
- I'd like to be set up to add customer managed keys, even if not done in this patch -params to S3 client MUST be ready for it.
- And region operation to take a callback interface (currently implemented in s3afs) so we can do some unit tests as well, e.g: bucket not found, auth failure and verify handling
@@ -1145,6 +1146,17 @@ | |||
</exclusion> | |||
</exclusions> | |||
</dependency> | |||
<dependency> 8000 | |||
<groupId>software.amazon.encryption.s3</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this come in bundle.jar?
.wrappedAsyncClient(s3AsyncClient) | ||
.wrappedClient(s3Client) | ||
.enableLegacyUnauthenticatedModes(true) | ||
.keyring(keyring) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've had some interest in supporting customer supplied keys too...this would be the time to add it -or at least leave the points for it open. in particular, that third argument should be a structure which contains the keyring and some enum of option to use
import org.apache.hadoop.fs.impl.prefetch.ExecutorServiceFuturePool; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
import software.amazon.encryption.s3.materials.KmsKeyring; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be with reset of imports
* @param kmsKeyring kms wrapping key to be used | ||
* @return S3EncryptionClient | ||
*/ | ||
S3Client createS3EncryptionClient(S3AsyncClient s3AsyncClient, S3Client s3Client, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned, encryption options should be our own struct with an option to say kms keyring; placeholder for client key management
} | ||
|
||
try (DurationInfo ignore = | ||
new DurationInfo(LOG, false, "Get S3 region")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include bucket in log message
try (DurationInfo ignore = | ||
new DurationInfo(LOG, false, "Get S3 region")) { | ||
|
||
SET_REGION_WARNING.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move above the try() just to tune log ordering
// to the actual region the bucket is in. As the request is signed with us-east-1 and | ||
// not the bucket's region, it fails. | ||
S3Client getRegionS3Client = | ||
S3Client.builder().region(Region.EU_WEST_1).credentialsProvider(credentials) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's going to happen with govcloud or china regions?
I'd propose having a callback interface we've done with the other operations, so this class can be given unit tests with a different implementation.
|
||
if (exception.statusCode() == SC_404_NOT_FOUND) { | ||
throw new UnknownStoreException("s3a://" + bucket + "/", | ||
" Bucket does " + "not exist"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
join the strings
@@ -600,4 +601,17 @@ public static boolean isSTSSignerCalled() { | |||
return stsSignerCalled; | |||
} | |||
} | |||
|
|||
private S3Client getS3Client(String reason) throws IllegalAccessException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a javadoc to explain this
@@ -358,7 +359,7 @@ public void shouldBeAbleToSwitchOnS3PathStyleAccessViaConfigProperty() | |||
try { | |||
fs = S3ATestUtils.c A3E2 reateTestFileSystem(conf); | |||
assertNotNull(fs); | |||
S3Client s3 = fs.getAmazonS3ClientForTesting("configuration"); | |||
S3Client s3 = getS3Client("configuration"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about just force disabling CSE by removing base/bucket overrides and setting to false?
I'm thinking this move is time to revisit my refactoring plans and actually split s3afs into two layers, filesystem and store, as abfs does. we'd have all those callback interfaces done in the store class, which would also contain the threadpool, transfer manager etc -and all those lets make sure this pr is ready for it by using an interface for callbacks to create s3 client, ideally even invoke the region call. that way we can hide these classes from the details of the SDK |
@ahmarsuhail : side issue. does the cse algorithm implementation complain about it being a bad algorithm, as the v1 sdk does. |
75220b7
to
acb31b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial review. Looks good, sorry about not checking this and the CSE test fix patch earlier, I think Mukund has already approved the test fix one. I'll focus on this bit more now.
|
||
private final StoreContext context; | ||
|
||
private final static Map<String, Region> BUCKET_REGIONS = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so we aren't caching in the SDK anymore?
|
||
return bucketRegion; | ||
} catch (S3Exception exception) { | ||
if (exception.statusCode() == SC_301_MOVED_PERMANENTLY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here explaining what we are doing here in case of 301
exception.awsErrorDetails().sdkHttpResponse().headers().get(BUCKET_REGION_HEADER) | ||
.get(0)); | ||
BUCKET_REGIONS.put(bucket, bucketRegion); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log an error/warn about the exception caught before returning the region.
|
||
import org.apache.hadoop.util.Lists; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
import software.amazon.encryption.s3.S3EncryptionClientException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to the top with the rest of the same imports
intercept(UnsupportedRequestException.class, () -> | ||
fs2.rename(hugefile, hugefileRenamed)); | ||
} | ||
// try (FileSystem fs2 = FileSystem.get(fs.getUri(), conf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be changed I assume.
thought: we should add a test to verify backwards compat with CSE v1. proposed: src/test/resources to include a small binary file of CSE data, which is uploaded to s3 and then downloaded on a client with CSE and the same secrets. would that work? |
@ahmarsuhail can you rebase this -we need to get it in ASAP |
Description of PR
Also included in this PR:
How was this patch tested?
Tested in
eu-west-1
withmvn -Dparallel-tests -DtestsThreadCount=16 clean verify
. CSE specific failures are fixed in #5763. No other failures.