-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-13327 Output Stream Specification #575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-13327 Output Stream Specification #575
Conversation
HDFS test have run out of memory
|
I consider that to be an HDFS problem: not mine. This test is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first pass of the review. I haven't run any integration tests yet.
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Syncable.java
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Syncable.java
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Outdated
Show resolved
Hide resolved
...ct/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractCreateTest.java
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CanSetDropBehind.java
Show resolved
Hide resolved
...-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/StoreImplementationUtils.java
Outdated
Show resolved
Hide resolved
...-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/StoreImplementationUtils.java
Outdated
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/StreamStateModel.java
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/StreamStateModel.java
Outdated
Show resolved
Hide resolved
This algorithm works for filesystems which are consistent with metadata and | ||
data, as well as HDFS. What is important to know is that, in HDFS | ||
`getFileStatus(FS, path).getLen()==0` does not imply that `data(FS, path)` is | ||
empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only if the file is still open for writing, correct? Is it possible to tell if a given path is open for writing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no way, I'm afraid.
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md
Outdated
Show resolved
Hide resolved
and that of filesystems, as it allows >1 client to create a file with `overwrite==false`, | ||
and potentially confuse file/directory logic. In particular, using `create()` to acquire | ||
an exclusive lock on a file (whoever creates the file without an error is considered | ||
the holder of the lock) is not a valid algorithm when working with object stores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...is not a valid algorithm when working with object stores.
So shouldn't this be disallowed as the feature is not supported?
...ct/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractFSContractTestBase.java
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/StreamStateModel.java
Show resolved
Hide resolved
if (len == 0) { | ||
return; | ||
} | ||
acquireLock(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can lock and then throw. Should it be inside the try so the finally block can unlock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving this all to a separate patch, eventually
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
Show resolved
Hide resolved
1. If a file was created with `overwrite = True`, the existing data my still | ||
be visible: `data(FS', path) = data(FS, path)`. | ||
|
||
1. The check for existing data in a `create()` call with `overwrite=False`, may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the overwrite==
above in the document...
1. The check for existing data in a `create()` call with `overwrite=False`, may | |
1. The check for existing data in a `create()` call with `overwrite==false`, may |
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/StreamStateModel.java
Show resolved
Hide resolved
13dc093
to
8947901
Compare
8947901
to
455203f
Compare
just rebased to trunk; not tested. I see there's comments I've got to address -will do |
} | ||
|
||
/** | ||
* Create for a path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
/** | ||
* Create for a path. | ||
* @param path optional path for exception messages. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
/** | ||
* Models a stream's state and can be used for checking this state before | ||
* any operation. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
private IOException exception; | ||
|
||
/** | ||
* Create for a path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
} | ||
|
||
/** | ||
* Create for a path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
/** | ||
* Create for a path. | ||
* @param path optional path for exception messages. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
/** | ||
* Models a stream's state and can be used for checking this state before | ||
* any operation. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
private IOException exception; | ||
|
||
/** | ||
* Create for a path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
…vro schema types Author: Aditya Toomula <atoomula@linkedin.com> Reviewers: Srinivasulu Punuru <spunuru@linkedin.com> Closes apache#575 from atoomula/bytes1 and squashes the following commits: 855a03d7 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types df4886d8 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types 80268fc1 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
Change-Id: I1b6bc258a40a8bd57879d9edc3e5bb1303f0fff2
* Address code-side comments from reviewers; markdown is still WiP. * move all inner stream propagation of hasCapability() tests to the StoreImplementationUtils operation. Change-Id: Ic5be4ac09ddd36b8804eacfee5ad0587a819eaf0
455203f
to
8a2154d
Compare
Had some time to look at the comments here after too long break; more work is needed Output stream specNeeds more review of the comments, especiallys Sean's Stream Model + impl.I now suspect that I need to do a bit more resilience in the failure handling of S3ABlockOutputStream. Specifically: if 1+ POST has failed in a multipart write, we should make a best effort attempt to abort the upload. Admittedly, the same network or authentication errors which cause the post to fail are likely to cause the abort operation to fail too -that should not stop us up from at least trying. Proposed: close() will attempt to abort the upload if a failure occurred earlier. This is going to be fun to test. Indeed, it's a good argument for WriteOperationHelper to be converted to an interface with a mocking implementation alongside the production one. I might play with some subclassing instead; PlanI think these two bits of work can be separated; the S3ABlockOutput stream model/locking/aborting is complex enough to merit isolation. Will follow up with that in a separate PR while retaining this purely for the output stream spec and tests. This one doesn't apply to trunk no more; once I have gone through the comments I may produce a new PR for the output stream changes too. |
Change-Id: I92788063536a4bf15ee28f17187c7e09fa09940f
private IOException exception; | ||
|
||
/** | ||
* Create for a path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
} | ||
|
||
/** | ||
* Create for a path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
/** | ||
* Create for a path. | ||
* @param path optional path for exception messages. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
I'm closing this and opening up a peer without any of the S3A changes, #1694. It currently addresses most of the comments of people's reviews, excluding Sean's request about restructuring the model
|
HADOOP-13327 Output Stream Specification
Change-Id: I1b6bc258a40a8bd57879d9edc3e5bb1303f0fff2