8000 Tags · liurusi101/mrjob · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tags: liurusi101/mrjob

Tags

v0.4.2

Toggle v0.4.2's commit message
that's one small step for a JAR

 * jobs:
   * can interpolate input and output path(s) into arguments of JarSteps,
     so they can be part of multi-step jobs (Yelp#773)
     * see mrjob/examples/mr_jar_step_example.py
   * JarStep now takes keyword arguments only (Yelp#769)
     * removed useless "name" field; "step_args" is now just "args"
   * MRJobStep (usually accessed via MRJob.mr()) is now MRStep
 * runners:
   * All runners:
     * --setup is now fully functional (Yelp#206)
       * --python-archive, --setup-cmd, and --setup-script are deprecated
     * --bootstrap option works and uses sh (Yelp#206)
       * --bootstrap-cmd, --bootstrap-file, --bootstrap-python-package,
         --bootstrap-script are deprecated
     * setup commands can no longer corrupt a task's input and output (Yelp#803)
     * sh_bin is now "sh -e" by default so setup fails fast (Yelp#810)
       * default is "/bin/sh -e" on EMR
   * EMR:
     * JarSteps work again (Yelp#763)
     * auto-uploads jars for JarSteps (Yelp#772)
       * JARs on the EMR instances can be accessed with file:/// URIs
     * ssh_cat() no longer raises an error when catting a file
       containing an error (Yelp#807)
     * Fixed SignatureDoesNotMatchError that happens with boto 2.10.0+
       with Python prior to 2.7.5 (Yelp#778)
   * Hadoop:
     * now handles JarSteps too (Yelp#770)
 * Fix to mrjob.parse.urlparse() that was breaking Python 2.5
 * mrjob.util.buffer_iterator_to_line_iterator() is now more efficient
   and uses a bounded amount of memory
 * bz2 decompression no longer discards data (Yelp#817)

v0.4.1

Toggle v0.4.1's commit message
secondary sort and self-terminating job flows

 * jobs:
   * SORT_VALUES: Secondary sort by value (Yelp#240)
     * see mrjob/examples/
   * can now override jobconf() again (Yelp#656)
   * renamed mrjob.compat.get_jobconf_value() to jobconf_from_env()
   * examples:
     * bash_wrap/ (mapper/reducer_cmd() example)
     * mr_most_used_word.py (two step job)
     * mr_next_word_stats.py (SORT_VALUES example)
 * runners:
   * All runners:
     * single --setup option works but is not yet documented (Yelp#206)
     * setup now uses sh rather than python internally
   * EMR runner:
     * max_hours_idle: self-terminating idle job flows (Yelp#628)
       * mins_to_end_of_hour option gives finer control over self-termination.
     * Can reuse pooled job flows where previous job failed (Yelp#633)
     * Throws IOError if output path already exists (Yelp#634)
     * Gracefully handles SSL cert issues (Yelp#621, Yelp#706)
     * Automatically infers EMR/S3 endpoints from region (Yelp#658)
     * ls() supports s3n:// schema (Yelp#672)
     * Fixed log parsing crash on JarSteps (Yelp#645)
     * visible_to_all_users works with boto <2.8.0 (Yelp#701)
     * must use --interpreter with non-Python scripts (Yelp#683)
     * cat() can decompress gzipped data (Yelp#601)
   * Hadoop runner:
     * check_input_paths: can disable input path checking (Yelp#583)
     * cat() can decompress gzipped data (Yelp#601)
   * Inline/Local runners:
     * Fixed counter parsing for multi-step jobs in inline mode
     * Supports per-step jobconf (Yelp#616)
 * Documentation revamp
 * mrjob.parse.urlparse() works consistently across Python versions (Yelp#686)
 * deprecated:
   * many constants in mrjob.emr replaced with functions in mrjob.aws
 * removed deprecated features:
   * old conf locations (~/.mrjob and in PYTHONPATH) (Yelp#747)
   * built-in protocols must be instances (Yelp#488)

v0.4.0

Toggle v0.4.0's commit message
v0.4, 2013-04-30 -- Slouching toward nirvana

v0.4.0pre3

Toggle v0.4.0pre3's commit message
0.4 RC3

v0.4.0pre2

Toggle v0.4.0pre2's commit message
0.4 RC2

v0.4.0pre1

Toggle v0.4.0pre1's commit message
First release candidate for the 0.4 release

v0.3.5

Toggle v0.3.5's commit message
v0.3.5, 2012-08-21 -- The Last Ride of v0.3.x[?]

 * EMR:
   * --pool-wait-minutes option lets you wait up to X minutes before creating a
     job flow (Yelp#455)
   * Job flow ID included in error messages on failure (Yelp#452)
   * JOB and JOB_FLOW cleanup options (Yelp#485, Yelp#455)
 * EMR and Hadoop:
   * Compatibility fixes related to deprecated options and Hadoop's bizarre
     non-sequential version numbers (Yelp#489, Yelp#534)
 * Other:
   * Warn when *_PROTOCOL is not a class (Yelp#490)
 * Bug fixes:
   * Unicode strings can be used when specifying interpreters (Yelp#431)
   * --enable-emr-logging no longer causes the wrong counters/logs to be parsed
     (Yelp#446)
   * TMP_DIR inserted into 'sort' environment variables (Yelp#477)
   * Setting hadoop_home in mrjob.conf works again
   * Gzipped input files work when specified with relative paths (Yelp#494)
   * Passthrough options are not re-ordered when sent to Hadoop Streaming
     (Yelp#509)
   * json module is supported again if simplejson doesn't exist (Yelp#544)
   * HadoopJobRunner.path_exists() is no longer backwards (Yelp#549)

v0.3.4.1

Toggle v0.3.4.1's commit message
v0.3.4.1, 2012-06-12 -- The test suite doesn't catch everything...

 * Local mode doesn't try to send multiple mappers to the same output file
   when using multiple compressed files as input

v0.3.4

Toggle v0.3.4's commit message
v0.3.4, 2012-06-11 -- We are friendly people.

 * Experimental support for IronPython in the local and inline runners
 * set_status() and increment_counter() will encode messages/names of type
   'unicode' as UTF-8 when writing to Hadoop Streaming
 * EMR and Hadoop counter parsing is more correct
 * mrjob.tools.emr.fetch_logs fetches logs from S3 when asked instead of
   incorrectly refusing to do so
 * jobconf values can be booleans in mrjob.conf as well as 'true' and 'false'
   strings
 * hadoop_version can be a float in mrjob.conf, but a warning is printed to the
   console
 * Command line help is split across several --help-* commands
 * Local runner sorts output consistently

v0.3.3.2

Toggle v0.3.3.2's commit message
v0.3.3.2, 2012-04-10 -- It's a race [condition]!

 * Option parsing no longer dies when -- is used as an argument (Yelp#435)
 * Fixed race condition where two jobs can join same job flow thinking it is
   idle, delaying one of the jobs (Yelp#438)
 * Better error message when a config file contains no data for the current
   runner (Yelp#433)
0