8000 MAPREDUCE-7282. Move away from V2 commit algorithm by steveloughran · Pull Request #2349 · apache/hadoop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

MAPREDUCE-7282. Move away from V2 commit algorithm #2349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

steveloughran
Copy link
Contributor
@steveloughran steveloughran commented Sep 29, 2020
  • Default to v1 commit algorithm
  • log at WARN on job set up when v2 is used
  • use a special log for that so people who know what they are doing can
    tell it to be quiet

Supercedes #2320

* Default to v1 commit algorithm
* log at WARN on job set up when v2 is used
* use a special log for that so people who know what they are doing can
  tell it to be quiet

Change-Id: I9922cb6b86997e027870c6a445b715eb3ff5a39e
Copy link
Contributor
@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated PR @steveloughran!

Comment on lines 357 to 358
log.warn("The v2 commit algorithm is deprecated;"
+ " please switch to the v1 algorithm");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use the word deprecated. That implies that this algorithm will be removed in a future release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you suggest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switching to your text

Comment on lines 1570 to 1571
The version 2 algorithm is deprecated and no longer the default
as task commits were not atomic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, remove "deprecated".

Comment on lines 1572 to 1593
If a first task attempt fails part-way
through its task commit, the output directory could end up
with data from that failed commit, alongside the data
from any subsequent attempts.

See https://issues.apache.org/jira/browse/MAPREDUCE-7282

Although no-longer the default, this algorithm is safe to use if
all task attempts for a single task meet the following requirements
-they generate exactly the same set of files
-the contents of each file are exactly the same in each task attempt

That is:
1. If a second attempt commits work, there will be no leftover files from
a first attempt which failed during its task commit.
2. If a network partition causes the first task attempt to overwrite
some/all of the output of a second attempt, the result will be
exactly the same as if it had not done so.

To avoid the warning message on job setup, set the log level of the log
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.Algorithm
to ERROR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section should be moved to the end of the Algorithm 2 section below. You can add (see below for details) to the end of the line that says why algorithm v2 in no longer the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@steveloughran
Copy link
Contributor Author

@jbrennan333 what do you think we should say instead of deprecated? "not recommended".

I was thinking of adding a link to the JIRA and changing the issue text there to clarify

cleanup of job will delete the whole job attempt dir so that's the maximum time that a partitioned TA may commit work. There's no risk of some VM pausing for 3 hours, restarting and an in progress TA completing its work and overwriting the final output. This is good.

@steveloughran
Copy link
Contributor Author

(Yetus failure is from no new tests)

@jbrennan333
Copy link
Contributor

@steveloughran It's hard to think of a terse warning for this. I think your comment above gets close. Maybe something like "The v2 commit algorithm assumes that the content of generated output files is consistent across all task attempts - if this is not true for this job, the v1 commit algorithm is strongly recommended."

Change-Id: I2a069cc633e24c559311d69b5c0064aaa88a4f3b
Copy link
Contributor
@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @steveloughran. The code changes look good to me, but we need to resolve whether the default should be actually be changed - see @daryn-sharp's -1 in the Jira.

Copy link
Member
@jiwq jiwq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks @steveloughran for the work. But I'm not sure whether we should discuss in the mail list.

@steveloughran
Copy link
Contributor Author

Once the manifest committer #2971 is in, the abfs and gcs stores will get something faster than v1 but with its task failure semantics

  • all treewalking for task attempt listing will be in task commit
  • no dir renames in task commit, just saving of the manifest
  • job commit: parallel load of manifests, merge of list of directories to create, parallel set of mkdirs and then the parallelized renames.

In this world, we can just leave people using v1/v2 alone, and for (spark) jobs in azure and google cloud say "use the manifest committer".
Which means I can just close this as a wontfix.

@steveloughran steveloughran deleted the BUG/MAPREDUCE-7282-MRv2-warn branch October 15, 2021 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0