8000 Bug/exec get steps os error 8 by timtadh · Pull Request #683 · Yelp/mrjob · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Bug/exec get steps os error 8 #683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 5, 2013

Conversation

timtadh
Copy link
Contributor
@timtadh timtadh commented Aug 3, 2013

In rare cases, files that are installed via a setup.py seem to get their
executable bit set (even when they shouldn't). This causes the
emr._executable method to assume they are executable and causes it to
skip adding the interepretter to the command to be exectuted. If the
script in questions lacks (or has an incorrect) #! line then the job
will fail.

I decided the best way to fix this is safety first and always include
the interpreter.

In rare cases, files that are installed via a setup.py seem to get their
executable bit set (even when they shouldn't). This causes the
emr._executable method to assume they are executable and causes it to
skip adding the interepretter to the command to be exectuted. If the
script in questions lacks (or has an incorrect) #! line then the job
will fail.

I decided the best way to fix this is safety first and always include
the interpreter.

Signed-off-by: Tim Henderson <tim.tadh@gmail.com>
@irskep
Copy link
Contributor
irskep commented Aug 3, 2013

This will break in the future for people who legitimately want to not use an interpreter, i.e. go binaries. But we don't really support those at this time. There should be a --no-interpreter option for those people.

Did those test start failing on this change? I would be surprised.

@timtadh
Copy link
Contributor Author
timtadh commented Aug 3, 2013

No the tests were already failing. I have separate branch with that fix. I should submit that as well. I merged it into this one.

@timtadh
Copy link
Contributor Author
timtadh commented Aug 3, 2013

This will break in the future for people who legitimately want to not use an interpreter, i.e. go binaries. But we don't really support those at this time. There should be a --no-interpreter option for those people.

I don't really understand this. Won't the job files always be written in python but some times the steps will be a binary file that gets run?

@irskep
Copy link
Contributor
irskep commented Aug 3, 2013

No, job files will not always be written in Python.

@irskep
Copy link
Contributor
irskep commented Aug 3, 2013

Please try to avoid combining unrelated pull requests or using merge commits in topic branches. Rebase is preferable.

@irskep
Copy link
Contributor
irskep commented Aug 3, 2013

Also, your test change to the URI parsing actually breaks the test in Travis.

@timtadh
Copy link
Contributor Author
timtadh commented Aug 3, 2013

That is super weird as the test is broken without that change and the
change seems to make sense. Are you sure the travis thing is right?

On Sat, Aug 3, 2013 at 2:09 PM, Steve Johnson notifications@github.comwrote:

Also, your test change to the URI parsing actually breaks the test in
Travis.


Reply to this email directly or view it on GitHubhttps://github.com//pull/683#issuecomment-22058920
.

@timtadh
Copy link
Contributor Author
timtadh commented Aug 3, 2013

No, job files will not always be written in Python.

  1. I can't find in the docs how that is supposed to work.
  2. If I had to imagine how that would work it would be an option to mrjob. In which case the _exectuble thing in the base runner should be handling it. The base runner would look at the options and correctly construct the --steps command.
  3. If the job is python we should always specify the interpreter.

@irskep
Copy link
Contributor
irskep commented Aug 3, 2013

Re: random executables, I didn't mean to imply that you need to change anything, except please leave the _executable() method there so we can implement that stuff later.

As for the tests, what's your Python version? Yes, there is a bug in urlparse, which is my mrjob has its own version.

@irskep
Copy link
Contributor
irskep commented Aug 3, 2013

Oh JK about leaving _executable(), I see you just removed the subclass's implementation.

@timtadh
Copy link
Contributor Author
timtadh commented Aug 3, 2013

As for the tests, what's your Python version? Yes, there is a bug in urlparse, which is my mrjob has its own version.

2.7.4, but it seems that your fixed version of urlparse has a bug. So maybe just backport the 2.7.4 version and be done with it?

@coyotemarin
Copy link
Collaborator

For what it's worth, I'd prefer to choose the interpreter based on file extension alone. If a file foo.py is executable, it's almost always safe to do python foo.py anyway. My thinking is that no file extension should imply no interpreter, and an unknown file extension should lead to an exception prompting the user to specify an interpreter.

If we're supposed to run a script with no interpreter, and the executable bit isn't set, probably the friendly thing to do would be to issue a warning (or maybe an exception).

@timtadh
Copy link
Contributor Author
timtadh commented Aug 4, 2013

For what it's worth, I'd prefer to choose the interpreter based on file
extension alone. If a file foo.py is executable, it's almost always safe
to do python foo.py anyway. My thinking is that no file extension should
imply no interpreter, and an unknown file extension should lead to an
exception prompting the user to specify an interpreter.

If we're supposed to run a script with no interpreter, and the executable
bit isn't set, probably the friendly thing to do would be to issue a
warning (or maybe an exception).

My preference is it is a configuration flag. Eg. always use the python
interpreter unless told otherwise and specify what to use instead (or
nothing). Anything else other than explicit configuration is liable to lead
to bugs which only appear in certain environments. Explicit over implicit
is better in this case.

coyotemarin pushed a commit that referenced this pull request Aug 5, 2013
@coyotemarin coyotemarin merged commit bd102c3 into Yelp:master Aug 5, 2013
@coyotemarin
Copy link
Collaborator

Yup, this looks right to me.

I think some inferring of interpreter would be friendly, but the logic for that could be in the mrjob run command (that is, mrjob.launch.MRJobLauncher.run_job()), rather than the runner classes.

@timtadh
Copy link
Contributor Author
timtadh commented Aug 5, 2013
'' /usr/local/lib/python2.7/dist-packages/dancemaster-0.1-py2.7.egg/dancemaster/jobs/device_upload.py --steps [... bunch of args ...]

mrjob somehow ended up with empty string as the python interpreter when run in a cron job (which runs with a different PATH than normal). This cause the following understandable error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dancemaster-0.1-py2.7.egg/dancemaster/jobs/device_upload.py", line 68, in <module>
Job.run()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/job.py", line 500, in run
mr_job.execute()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/job.py", line 518, in execute
super(MRJob, self).execute()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/launch.py", line 146, in execute
self.run_job()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/launch.py", line 207, in run_job
runner.run()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/runner.py", line 458, in run
self._run()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/emr.py", line 827, in _run
self._prepare_for_launch()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/emr.py", line 837, in _prepare_for_launch
self._add_job_files_for_upload()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/emr.py", line 904, in _add_job_files_for_upload
for step in self._get_steps():
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/runner.py", line 710, in _get_steps
steps_proc = Popen(args, stdout=PIPE, stderr=PIPE, env=env)
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied

I believe the fix is to use sys.executable as the interpreter. This will ensure we get the interpreter the script was run with. Thoughts?

@timtadh
Copy link
Contributor Author
timtadh commented Aug 5, 2013

Upon further investigation it appears that sys.executable is also empty string. No idea how that is possible.

@coyotemarin
Copy link
Collaborator

This is after your patch was applied?

@timtadh
Copy link
Contributor Author
timtadh commented Aug 5, 2013

This is after your patch was applied?

Yes. I don't know if we can solve for it because it seems like in certain situations Python can't figure out what interpreter called it. In the end I ended up just specifying the steps-interpreter. I don't think there are any actual changes to make.

Sorry for the confusion.

scottknight added a commit to timtadh/mrjob that referenced this pull request Oct 10, 2013
secondary sort and self-terminating job flows
 * jobs:
   * SORT_VALUES: Secondary sort by value (Yelp#240)
     * see mrjob/examples/
   * can now override jobconf() again (Yelp#656)
   * renamed mrjob.compat.get_jobconf_value() to jobconf_from_env()
   * examples:
     * bash_wrap/ (mapper/reducer_cmd() example)
     * mr_most_used_word.py (two step job)
     * mr_next_word_stats.py (SORT_VALUES example)
 * runners:
   * All runners:
     * single --setup option works but is not yet documented (Yelp#206)
     * setup now uses sh rather than python internally
   * EMR runner:
     * max_hours_idle: self-terminating idle job flows (Yelp#628)
       * mins_to_end_of_hour option gives finer control over self-termination.
     * Can reuse pooled job flows where previous job failed (Yelp#633)
     * Throws IOError if output path already exists (Yelp#634)
     * Gracefully handles SSL cert issues (Yelp#621, Yelp#706)
     * Automatically infers EMR/S3 endpoints from region (Yelp#658)
     * ls() supports s3n:// schema (Yelp#672)
     * Fixed log parsing crash on JarSteps (Yelp#645)
     * visible_to_all_users works with boto <2.8.0 (Yelp#701)
     * must use --interpreter with non-Python scripts (Yelp#683)
     * cat() can decompress gzipped data (Yelp#601)
   * Hadoop runner:
     * check_input_paths: can disable input path checking (Yelp#583)
     * cat() can decompress gzipped data (Yelp#601)
   * Inline/Local runners:
     * Fixed counter parsing for multi-step jobs in inline mode
     * Supports per-step jobconf (Yelp#616)
 * Documentation revamp
 * mrjob.parse.urlparse() works consistently across Python versions (Yelp#686)
 * deprecated:
   * many constants in mrjob.emr replaced with functions in mrjob.aws
 * removed deprecated features:
   * old conf locations (~/.mrjob and in PYTHONPATH) (Yelp#747)
   * built-in protocols must be instances (Yelp#488)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0