8000 HADOOP-13327 Output Stream Specification by steveloughran · Pull Request #575 · apache/hadoop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

HADOOP-13327 Output Stream Specification #575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steveloughran
Copy link
Contributor

HADOOP-13327 Output Stream Specification

Change-Id: I1b6bc258a40a8bd57879d9edc3e5bb1303f0fff2

@steveloughran
Copy link
Contributor Author

HDFS test have run out of memory


[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 64.999 s <<< FAILURE! - in org.apache.hadoop.TestRefreshCallQueue
[ERROR] testRefreshCallQueueWithFairCallQueue(org.apache.hadoop.TestRefreshCallQueue)  Time elapsed: 4.751 s  <<< ERROR!
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:717)
	at org.apache.hadoop.fs.FileSystem$Statistics.<clinit>(FileSystem.java:3707)
	at org.apache.hadoop.fs.FileSystem.getStatistics(FileSystem.java:4175)
	at org.apache.hadoop.fs.FileSystem.initialize(FileSystem.java:295)
	at org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:99)
	at org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3324)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:136)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3373)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3341)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:491)
	at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:447)
	at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:159)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2785)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2698)
	at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1697)
	at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:913)

@steveloughran
Copy link
Contributor Author

I consider that to be an HDFS problem: not mine. This test is ready for review

Copy link
@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first pass of the review. I haven't run any integration tests yet.

This algorithm works for filesystems which are consistent with metadata and
data, as well as HDFS. What is important to know is that, in HDFS
`getFileStatus(FS, path).getLen()==0` does not imply that `data(FS, path)` is
empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only if the file is still open for writing, correct? Is it possible to tell if a given path is open for writing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no way, I'm afraid.

and that of filesystems, as it allows >1 client to create a file with `overwrite==false`,
and potentially confuse file/directory logic. In particular, using `create()` to acquire
an exclusive lock on a file (whoever creates the file without an error is considered
the holder of the lock) is not a valid algorithm when working with object stores.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...is not a valid algorithm when working with object stores.

So shouldn't this be disallowed as the feature is not supported?

if (len == 0) {
return;
}
acquireLock(true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can lock and then throw. Should it be inside the try so the finally block can unlock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this all to a separate patch, eventually

1. If a file was created with `overwrite = True`, the existing data my still
be visible: `data(FS', path) = data(FS, path)`.

1. The check for existing data in a `create()` call with `overwrite=False`, may
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the overwrite== above in the document...

Suggested change
1. The check for existing data in a `create()` call with `overwrite=False`, may
1. The check for existing data in a `create()` call with `overwrite==false`, may

@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec branch from 13dc093 to 8947901 Compare March 22, 2019 22:49
@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec branch from 8947901 to 455203f Compare June 7, 2019 12:02
@steveloughran
Copy link
Contributor Author

just rebased to trunk; not tested. I see there's comments I've got to address -will do

}

/**
* Create for a path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

/**
* Create for a path.
* @param path optional path for exception messages.
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

/**
* Models a stream's state and can be used for checking this state before
* any operation.
*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

private IOException exception;

/**
* Create for a path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

}

/**
* Create for a path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

/**
* Create for a path.
* @param path optional path for exception messages.
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

/**
* Models a stream's state and can be used for checking this state before
* any operation.
*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

private IOException exception;

/**
* Create for a path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

@apache apache deleted a comment from hadoop-yetus Sep 3, 2019
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
…vro schema types

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>

Closes apache#575 from atoomula/bytes1 and squashes the following commits:

855a03d7 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
df4886d8 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
80268fc1 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
steveloughran and others added 2 commits October 22, 2019 14:22
Change-Id: I1b6bc258a40a8bd57879d9edc3e5bb1303f0fff2
* Address code-side comments from reviewers; markdown is still WiP.
* move all inner stream propagation of hasCapability() tests to the StoreImplementationUtils operation.

Change-Id: Ic5be4ac09ddd36b8804eacfee5ad0587a819eaf0
@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec branch from 455203f to 8a2154d Compare October 22, 2019 13:22
@steveloughran
Copy link
Contributor Author

Had some time to look at the comments here after too long break; more work is needed

Output stream spec

Needs more review of the comments, especiallys Sean's

Stream Model + impl.

I now suspect that I need to do a bit more resilience in the failure handling of S3ABlockOutputStream.

Specifically: if 1+ POST has failed in a multipart write, we should make a best effort attempt to abort the upload. Admittedly, the same network or authentication errors which cause the post to fail are likely to cause the abort operation to fail too -that should not stop us up from at least trying.

Proposed: close() will attempt to abort the upload if a failure occurred earlier.

This is going to be fun to test. Indeed, it's a good argument for WriteOperationHelper to be converted to an interface with a mocking implementation alongside the production one. I might play with some subclassing instead;

Plan

I think these two bits of work can be separated; the S3ABlockOutput stream model/locking/aborting is complex enough to merit isolation.

Will follow up with that in a separate PR while retaining this purely for the output stream spec and tests.

This one doesn't apply to trunk no more; once I have gone through the comments I may produce a new PR for the output stream changes too.

Change-Id: I92788063536a4bf15ee28f17187c7e09fa09940f
private IOException exception;

/**
* Create for a path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

}

/**
* Create for a path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

/**
* Create for a path.
* @param path optional path for exception messages.
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@steveloughran
Copy link
Contributor Author

I'm closing this and opening up a peer without any of the S3A changes, #1694. It currently addresses most of the comments of people's reviews, excluding Sean's request about restructuring the model

Could this be in a section about visibility and not in the model definition? Maybe later. "here's the model, here's how that model works with creation, here's how it works when reading/writing" flows much better and visibility should be in that third part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0