8000 cmd/reload: Calling this when long-running job is running causes commands to time out thereafter · Issue #283 · dshearer/jobber · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cmd/reload: Calling this when long-running job is running causes commands to time out thereafter #283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue May 10, 2020 · 9 comments · Fixed by #289
Assignees
Labels
Milestone

Comments

@ghost
Copy link
ghost commented May 10, 2020

Most jobs that I submit through Jobber on my Ubuntu 18.4 machine run with no problem, but certain jobs cause problems ...

When one of those jobs starts up, I can no longer run any jobber commands. Every jobber command that I try hangs, and then after a while, it prints "Call to Jobber timed out." This includes invocations of jobber list, jobber log, jobber pause, and all other jobber commands.

Most jobs don't cause this to occur, but one job which reliably causes this problem is when I run a program called "luckyBackup", which is a front end to an rsync-based backup. I run that job once a day in order to back up my machine to a filesystem on a mounted USB drive. That job caused no ill effects when it ran under cron, but for some reason, it consistently causes Jobber commands to hang and time out when it is running.

Also, this is not simply an effect of a job running for a long time. I can run /bin/sleep 300 as a Jobber job, and this does not occur.

Also, I can specify my job as follows, and it still causes the problem:

cmd: ( /path/to/my/job args ... & ) </dev/null 1>/dev/null 2>&1

In this case, the command returns immediately, and Jobber thinks it ran. Nonetheless, the fact that it is running in the background still leads to jobber commands timing out.

As soon as I kill the process, jobber starts working again.

This seems to be due to some characteristic of the actions that the job itself is performing. Since the problematic job involves running a program that does an rsync-based backup of my entire machine, I'm guessing that something that rsync does at a low level might be interfering with Jobber's job management code.

@dshearer
Copy link
Owner

In this case, the command returns immediately, and Jobber thinks it ran. Nonetheless, the fact that it is running in the background still leads to jobber commands timing out.

How strange!

@ghost
Copy link
Author
ghost commented May 11, 2020

OOPS! This is not true, after all. I improperly tested that case before ...

In this case, the command returns immediately, and Jobber thinks it ran. Nonetheless, the fact that it is running in the background still leads to jobber commands timing out.

I apologize for the false alarm.

Jobber only gets blocked if I do not force the job into the background.

@ghost
Copy link
Author
ghost commented May 13, 2020

I have an idea (more like an educated guess) as to what might be causing this problem with certain commands blocking the job runner. I am not a very experienced Go programmer, so I might be on the wrong track, but just in case I hit upon something ...

Look at lines 77 through 87 in the file common/exec.go. And then check out the discussion here:
golang/go#16787
(see the comment there by "quentinmit" from August 18, 2016).

It seems like something like Output() or CombinedOutput() needs to be called or Goroutines need to be utilized, instead of using the two ReadAll() calls, because otherwise, the stdout or stderr pipes could block ... and this seems to be what is happening in the case that I am reporting here.

Is this a possibility?

@ghost
Copy link
Author
ghost commented May 13, 2020

I think the following solution might work. Just replace lines 76-84 of common/exec.go with this code ...

    var stdoutBytes []byte
    var stderrBytes []byte

    go func() {
    	stdoutBytes, _ = ioutil.ReadAll(stdout)
    }()
    go func() {
        stderrBytes, _ = ioutil.ReadAll(stderr)
    }()

I tested this locally with the latest code base, and it seems to work. However, I don't have enough experience with Go to feel confident about submitting this as a PR.

@dshearer
Copy link
Owner
dshearer commented May 14, 2020 via email

@ghost
Copy link
Author
ghost commented May 15, 2020

... but now I'm not sure whether this really fixed it. :(

Now I have more info about when the problem seems to occur.

If a long-running jobber-initiated job is in progress, I can do a jobber list with no problem. However, if during the course of that job running, I re-run jobber reload, that's when I am still noticing that the program hangs. And then it hangs for all subsequent jobber commands, including jobber list.

I'm not sure if this happens all the time under those circumstances, because I don't have time for in-depth testing at the moment. But this might provide more useful info for diagnosing the problem.

@ghost
Copy link
Author
ghost commented May 15, 2020

After a quick perusal of jobber/cmd_reload.go and jobberrunner/ipc_server.go, it seems to me that the reload command is traversing through the list of jobs and accessing their sockets. If any job's socket is in use because of a long-running job, this could cause the reload command to hang as it's waiting for access to that socket.

Again, I'm not sure about this, and I don't have time right now for more in-depth investigation, but perhaps the job sockets need to be handled differently (somehow ... ???) in order to allow jobs to run and reloads to take place at the same time.

@dshearer dshearer self-assigned this May 17, 2020
@dshearer dshearer added the bug label May 17, 2020
@dshearer dshearer added this to the 1.4.3 milestone May 17, 2020
dshearer added a commit that referenced this issue May 18, 2020
If, while a job is running, the user does 'jobber reload', this
command will fail with timeout, and then subsequently all commands
will also fail with timeout.

This commit fixes this problem by having Jobber cancel all running
jobs when the user does 'jobber reload'.
@dshearer dshearer mentioned this issue May 18, 2020
@dshearer dshearer linked a pull request May 18, 2020 that will close this issue
@dshearer
Copy link
Owner

I think I have a fix for this. Could you please install this package -- https://github.com/dshearer/jobber/suites/690922681/artifacts/6521232 -- and let me know if it works?

@ghost
Copy link
Author
ghost commented May 18, 2020

To @dshearer ...

This indeed seems to have fixed the problem. Thank you!

I scheduled my long-running job in ~/.jobber and did a `jobber reload'.

Then, I waited for the job to fire off, which it indeed did at the correct time.
As it continued to run for around 20 minutes or so, I invoked jobber reload and other jobber ... commands numerous times, and now, the software is never hanging when I run any of these commands.

So, as far as I'm concerned, it looks like the problem has indeed been fixed.
Thank you again. :)

dshearer added a commit that referenced this issue May 24, 2020
If, while a job is running, the user does 'jobber reload', this
command will fail with timeout, and then subsequently all commands
will also fail with timeout.

This commit fixes this problem by having Jobber cancel all running
jobs when the user does 'jobber reload'.
@dshearer dshearer changed the title Error: "Call to Jobber timed out." jobberrunner: Calling "reload" when long-running job is running causes commands to time out May 25, 2020
@dshearer dshearer changed the title jobberrunner: Calling "reload" when long-running job is running causes commands to time out jobberrunner: Calling reload when long-running job is running causes commands to time out May 25, 2020
@dshearer dshearer changed the title jobberrunner: Calling reload when long-running job is running causes commands to time out jobberrunner: Calling reload when long-running job is running causes commands to time out thereafter May 25, 2020
@dshearer dshearer changed the title jobberrunner: Calling reload when long-running job is running causes commands to time out thereafter cmd/reload: Calling this when long-running job is running causes commands to time out thereafter May 25, 2020
dshearer added a commit that referenced this issue May 26, 2020
If, while a job is running, the user does 'jobber reload', this
command will fail with timeout, and then subsequently all commands
will also fail with timeout.

This commit fixes this problem by having Jobber cancel all running
jobs when the user does 'jobber reload'.
dshearer added a commit that referenced this issue May 26, 2020
If, while a job is running, the user does 'jobber reload', this
command will fail with timeout, and then subsequently all commands
will also fail with timeout.

This commit fixes this problem by having Jobber cancel all running
jobs when the user does 'jobber reload'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant
2AB3
0