10000 libjars by coyotemarin · Pull Request #1342 · Yelp/mrjob · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

libjars #1342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jun 30, 2016
Merged

libjars #1342

merged 42 commits into from
Jun 30, 2016

Conversation

coyotemarin
Copy link
Collaborator
@coyotemarin coyotemarin commented Jun 30, 2016

This makes the libjars option work for both Hadoop and EMR (sorry, Dataproc!). Fixes #198.

To make this work on EMR, I had to introduce the concept of "master node setup," which is basically an EMR step that runs a simple script on the master node using script-runner.jar. In theory, we should be able to run anything here (like bootstrap and setup; see #1336), but for now it just copies jars from S3 into a local directory.

This also makes sure that generic args created by mrjob (e.g. -D mapreduce.job.reduces=0) always come before hadoop_extra_args, which could contain a combination of generic and JAR-specific args.
(fixes #1331, #1332).

David Marin added 30 commits May 27, 2016 16:24
also fixed a typo having to do with the combiner for hadoop_tmp_dir,
should note this in CHANGES.txt
(looks like the master node setup script currently fails)

We probably want to monitor the master node setup script like we do with other
steps, and just be careful about how this affects self._log_interpretations
still to do:
- StepFailedException should say "Master node setup step" not "Step 0 of ..."
- num_steps is incorrect for error reporting
- step log parsing should fall back to parsing stderr if no job ID in syslog
also suppressed warnings about missing job ID when parsing errors from script-runner.jar
@coyotemarin coyotemarin merged commit 2cd2bfa into Yelp:master Jun 30, 2016
@coyotemarin coyotemarin deleted the libjars branch June 30, 2016 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hadoop: -D before -outputformat when no reducer --libjar option
1 participant
0