Bug/exec get steps os error 8 #683

timtadh · 2013-08-03T17:56:24Z

In rare cases, files that are installed via a setup.py seem to get their
executable bit set (even when they shouldn't). This causes the
emr._executable method to assume they are executable and causes it to
skip adding the interepretter to the command to be exectuted. If the
script in questions lacks (or has an incorrect) #! line then the job
will fail.

I decided the best way to fix this is safety first and always include
the interpreter.

In rare cases, files that are installed via a setup.py seem to get their executable bit set (even when they shouldn't). This causes the emr._executable method to assume they are executable and causes it to skip adding the interepretter to the command to be exectuted. If the script in questions lacks (or has an incorrect) #! line then the job will fail. I decided the best way to fix this is safety first and always include the interpreter. Signed-off-by: Tim Henderson <tim.tadh@gmail.com>

irskep · 2013-08-03T18:00:29Z

This will break in the future for people who legitimately want to not use an interpreter, i.e. go binaries. But we don't really support those at this time. There should be a --no-interpreter option for those people.

Did those test start failing on this change? I would be surprised.

timtadh · 2013-08-03T18:02:16Z

No the tests were already failing. I have separate branch with that fix. I should submit that as well. I merged it into this one.

timtadh · 2013-08-03T18:04:12Z

This will break in the future for people who legitimately want to not use an interpreter, i.e. go binaries. But we don't really support those at this time. There should be a --no-interpreter option for those people.

I don't really understand this. Won't the job files always be written in python but some times the steps will be a binary file that gets run?

irskep · 2013-08-03T18:07:29Z

No, job files will not always be written in Python.

irskep · 2013-08-03T18:08:10Z

Please try to avoid combining unrelated pull requests or using merge commits in topic branches. Rebase is preferable.

irskep · 2013-08-03T18:09:17Z

Also, your test change to the URI parsing actually breaks the test in Travis.

timtadh · 2013-08-03T18:20:58Z

That is super weird as the test is broken without that change and the
change seems to make sense. Are you sure the travis thing is right?

On Sat, Aug 3, 2013 at 2:09 PM, Steve Johnson notifications@github.comwrote:

Also, your test change to the URI parsing actually breaks the test in
Travis.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/683#issuecomment-22058920
.

timtadh · 2013-08-03T18:24:40Z

No, job files will not always be written in Python.

I can't find in the docs how that is supposed to work.
If I had to imagine how that would work it would be an option to mrjob. In which case the _exectuble thing in the base runner should be handling it. The base runner would look at the options and correctly construct the --steps command.
If the job is python we should always specify the interpreter.

irskep · 2013-08-03T19:23:07Z

Re: random executables, I didn't mean to imply that you need to change anything, except please leave the _executable() method there so we can implement that stuff later.

As for the tests, what's your Python version? Yes, there is a bug in urlparse, which is my mrjob has its own version.

irskep · 2013-08-03T19:23:30Z

Oh JK about leaving _executable(), I see you just removed the subclass's implementation.

timtadh · 2013-08-03T19:30:47Z

As for the tests, what's your Python version? Yes, there is a bug in urlparse, which is my mrjob has its own version.

2.7.4, but it seems that your fixed version of urlparse has a bug. So maybe just backport the 2.7.4 version and be done with it?

coyotemarin · 2013-08-04T04:39:33Z

For what it's worth, I'd prefer to choose the interpreter based on file extension alone. If a file foo.py is executable, it's almost always safe to do python foo.py anyway. My thinking is that no file extension should imply no interpreter, and an unknown file extension should lead to an exception prompting the user to specify an interpreter.

If we're supposed to run a script with no interpreter, and the executable bit isn't set, probably the friendly thing to do would be to issue a warning (or maybe an exception).

timtadh · 2013-08-04T16:16:22Z

For what it's worth, I'd prefer to choose the interpreter based on file
extension alone. If a file foo.py is executable, it's almost always safe
to do python foo.py anyway. My thinking is that no file extension should
imply no interpreter, and an unknown file extension should lead to an
exception prompting the user to specify an interpreter.

If we're supposed to run a script with no interpreter, and the executable
bit isn't set, probably the friendly thing to do would be to issue a
warning (or maybe an exception).

My preference is it is a configuration flag. Eg. always use the python
interpreter unless told otherwise and specify what to use instead (or
nothing). Anything else other than explicit configuration is liable to lead
to bugs which only appear in certain environments. Explicit over implicit
is better in this case.

Bug/exec get steps os error 8

coyotemarin · 2013-08-05T05:53:19Z

Yup, this looks right to me.

I think some inferring of interpreter would be friendly, but the logic for that could be in the mrjob run command (that is, mrjob.launch.MRJobLauncher.run_job()), rather than the runner classes.

timtadh · 2013-08-05T15:51:19Z

'' /usr/local/lib/python2.7/dist-packages/dancemaster-0.1-py2.7.egg/dancemaster/jobs/device_upload.py --steps [... bunch of args ...]

mrjob somehow ended up with empty string as the python interpreter when run in a cron job (which runs with a different PATH than normal). This cause the following understandable error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dancemaster-0.1-py2.7.egg/dancemaster/jobs/device_upload.py", line 68, in <module>
Job.run()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/job.py", line 500, in run
mr_job.execute()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/job.py", line 518, in execute
super(MRJob, self).execute()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/launch.py", line 146, in execute
self.run_job()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/launch.py", line 207, in run_job
runner.run()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/runner.py", line 458, in run
self._run()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/emr.py", line 827, in _run
self._prepare_for_launch()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/emr.py", line 837, in _prepare_for_launch
self._add_job_files_for_upload()
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/emr.py", line 904, in _add_job_files_for_upload
for step in self._get_steps():
  File "/home/ubuntu/dancemaster/src/mrjob/mrjob/runner.py", line 710, in _get_steps
steps_proc = Popen(args, stdout=PIPE, stderr=PIPE, env=env)
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied

I believe the fix is to use sys.executable as the interpreter. This will ensure we get the interpreter the script was run with. Thoughts?

timtadh · 2013-08-05T16:05:20Z

Upon further investigation it appears that sys.executable is also empty string. No idea how that is possible.

coyotemarin · 2013-08-05T17:03:10Z

This is after your patch was applied?

timtadh · 2013-08-05T19:31:20Z

This is after your patch was applied?

Yes. I don't know if we can solve for it because it seems like in certain situations Python can't figure out what interpreter called it. In the end I ended up just specifying the steps-interpreter. I don't think there are any actual changes to make.

Sorry for the confusion.

secondary sort and self-terminating job flows * jobs: * SORT_VALUES: Secondary sort by value (Yelp#240) * see mrjob/examples/ * can now override jobconf() again (Yelp#656) * renamed mrjob.compat.get_jobconf_value() to jobconf_from_env() * examples: * bash_wrap/ (mapper/reducer_cmd() example) * mr_most_used_word.py (two step job) * mr_next_word_stats.py (SORT_VALUES example) * runners: * All runners: * single --setup option works but is not yet documented (Yelp#206) * setup now uses sh rather than python internally * EMR runner: * max_hours_idle: self-terminating idle job flows (Yelp#628) * mins_to_end_of_hour option gives finer control over self-termination. * Can reuse pooled job flows where previous job failed (Yelp#633) * Throws IOError if output path already exists (Yelp#634) * Gracefully handles SSL cert issues (Yelp#621, Yelp#706) * Automatically infers EMR/S3 endpoints from region (Yelp#658) * ls() supports s3n:// schema (Yelp#672) * Fixed log parsing crash on JarSteps (Yelp#645) * visible_to_all_users works with boto <2.8.0 (Yelp#701) * must use --interpreter with non-Python scripts (Yelp#683) * cat() can decompress gzipped data (Yelp#601) * Hadoop runner: * check_input_paths: can disable input path checking (Yelp#583) * cat() can decompress gzipped data (Yelp#601) * Inline/Local runners: * Fixed counter parsing for multi-step jobs in inline mode * Supports per-step jobconf (Yelp#616) * Documentation revamp * mrjob.parse.urlparse() works consistently across Python versions (Yelp#686) * deprecated: * many constants in mrjob.emr replaced with functions in mrjob.aws * removed deprecated features: * old conf locations (~/.mrjob and in PYTHONPATH) (Yelp#747) * built-in protocols must be instances (Yelp#488)

coyotemarin pushed a commit that referenced this pull request Aug 5, 2013

Merge pull request #683 from timtadh/bug/exec-get-steps-os-error-8

bd102c3

Bug/exec get steps os error 8

coyotemarin merged commit bd102c3 into Yelp:master Aug 5, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug/exec get steps os error 8 #683

Bug/exec get steps os error 8 #683

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Bug/exec get steps os error 8 #683

Bug/exec get steps os error 8 #683

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!