8000 More elaborate mechanism to detect code changes? · Issue #99 · insitro/redun · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ < 8000 div data-turbo-body class="logged-out env-production page-responsive" style="word-wrap: break-word;">
Skip to content
More elaborate mechanism to detect code changes? #99
Open
@codethief

Description

@codethief

Consider this slightly modified version of 01_hello_world:

from redun import task

redun_namespace = "redun.examples.hello_world"

planet = "World"  # assume this to be a constant

@task()
def get_planet() -> str:
    return planet

@task()
def greeter(greet: str, thing: str) -> str:
    return "{}, {}!".format(greet, thing)

@task()
def main(greet: str="Hello") -> str:
    return greeter(greet, get_planet())

If I change the constant planet to another value like "Mars", redun will not bust the cache and instead still yield the output "Hello, World!" for the main task. This is of course not entirely surprising given how redun detects code changes, but I'd still say it represents a rather big footgun for people who are used to writing "normal" Python code.[0]

Similar situations arise when people use regular (non-@task) Python functions in their code.

In light of this, would it be feasible to offer an alternative caching mechanism which is less aggressive, i.e. more careful about what code changes could cause a task to change? Two ideas that come to mind:

  1. File changes to a given Python module could always bust the cache of all tasks in the module. Of course this would invalidate the cache much more frequently but, personally, I think I would still prefer this over accidentally running into incorrect cache hits time and again.

  2. Use Python's trace module to track calls to other functions and, more generally, which lines of code get executed. Use the content of those lines to compute a hash.

[0]: I am looking into using redun to define build & CI/CD pipelines for a (largely) non-Python project. Since the CI/CD code will not be something people will work on all the time, I would like to reduce footguns like erroneous cache hits as far as possible to make it easier for my coworkers to contribute to the pipelines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0