Spark harness should be able to turn off internal protocols

To duplicate the semantics of MRJobs, our Spark harness encodes and decodes data between mappers and reducers and between job steps. This is usually unnecessary, since PySpark already knows how to serialize Python data structures,.

It should be possible to turn this off, only having the job decode the initial data and encode the output (e.g. --no-internal-protocols).

It might make sense to turn this on automatically for jobs that use PickleProtocol internally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions