-
Notifications
You must be signed in to change notification settings - Fork 589
Bug/exec get steps os error 8 #683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In rare cases, files that are installed via a setup.py seem to get their executable bit set (even when they shouldn't). This causes the emr._executable method to assume they are executable and causes it to skip adding the interepretter to the command to be exectuted. If the script in questions lacks (or has an incorrect) #! line then the job will fail. I decided the best way to fix this is safety first and always include the interpreter. Signed-off-by: Tim Henderson <tim.tadh@gmail.com>
This will break in the future for people who legitimately want to not use an interpreter, i.e. go binaries. But we don't really support those at this time. There should be a Did those test start failing on this change? I would be surprised. |
No the tests were already failing. I have separate branch with that fix. I should submit that as well. I merged it into this one. |
I don't really understand this. Won't the job files always be written in python but some times the steps will be a binary file that gets run? |
No, job files will not always be written in Python. |
Please try to avoid combining unrelated pull requests or using merge commits in topic branches. Rebase is preferable. |
Also, your test change to the URI parsing actually breaks the test in Travis. |
That is super weird as the test is broken without that change and the On Sat, Aug 3, 2013 at 2:09 PM, Steve Johnson notifications@github.comwrote:
|
|
Re: random executables, I didn't mean to imply that you need to change anything, except please leave the As for the tests, what's your Python version? Yes, there is a bug in urlparse, which is my mrjob has its own version. |
Oh JK about leaving |
2.7.4, but it seems that your fixed version of urlparse has a bug. So maybe just backport the 2.7.4 version and be done with it? |
For what it's worth, I'd prefer to choose the interpreter based on file extension alone. If a file If we're supposed to run a script with no interpreter, and the executable bit isn't set, probably the friendly thing to do would be to issue a warning (or maybe an exception). |
|
Bug/exec get steps os error 8
Yup, this looks right to me. I think some inferring of interpreter would be friendly, but the logic for that could be in the |
mrjob somehow ended up with empty string as the python interpreter when run in a cron job (which runs with a different PATH than normal). This cause the following understandable error:
I believe the fix is to use |
Upon further investigation it appears that sys.executable is also empty string. No idea how that is possible. |
This is after your patch was applied? |
Yes. I don't know if we can solve for it because it seems like in certain situations Python can't figure out what interpreter called it. In the end I ended up just specifying the steps-interpreter. I don't think there are any actual changes to make. Sorry for the confusion. |
secondary sort and self-terminating job flows * jobs: * SORT_VALUES: Secondary sort by value (Yelp#240) * see mrjob/examples/ * can now override jobconf() again (Yelp#656) * renamed mrjob.compat.get_jobconf_value() to jobconf_from_env() * examples: * bash_wrap/ (mapper/reducer_cmd() example) * mr_most_used_word.py (two step job) * mr_next_word_stats.py (SORT_VALUES example) * runners: * All runners: * single --setup option works but is not yet documented (Yelp#206) * setup now uses sh rather than python internally * EMR runner: * max_hours_idle: self-terminating idle job flows (Yelp#628) * mins_to_end_of_hour option gives finer control over self-termination. * Can reuse pooled job flows where previous job failed (Yelp#633) * Throws IOError if output path already exists (Yelp#634) * Gracefully handles SSL cert issues (Yelp#621, Yelp#706) * Automatically infers EMR/S3 endpoints from region (Yelp#658) * ls() supports s3n:// schema (Yelp#672) * Fixed log parsing crash on JarSteps (Yelp#645) * visible_to_all_users works with boto <2.8.0 (Yelp#701) * must use --interpreter with non-Python scripts (Yelp#683) * cat() can decompress gzipped data (Yelp#601) * Hadoop runner: * check_input_paths: can disable input path checking (Yelp#583) * cat() can decompress gzipped data (Yelp#601) * Inline/Local runners: * Fixed counter parsing for multi-step jobs in inline mode * Supports per-step jobconf (Yelp#616) * Documentation revamp * mrjob.parse.urlparse() works consistently across Python versions (Yelp#686) * deprecated: * many constants in mrjob.emr replaced with functions in mrjob.aws * removed deprecated features: * old conf locations (~/.mrjob and in PYTHONPATH) (Yelp#747) * built-in protocols must be instances (Yelp#488)
In rare cases, files that are installed via a setup.py seem to get their
executable bit set (even when they shouldn't). This causes the
emr._executable method to assume they are executable and causes it to
skip adding the interepretter to the command to be exectuted. If the
script in questions lacks (or has an incorrect) #! line then the job
will fail.
I decided the best way to fix this is safety first and always include
the interpreter.