cmd/reload: Calling this when long-running job is running causes commands to time out thereafter #283

ghost · 2020-05-10T21:28:21Z

Most jobs that I submit through Jobber on my Ubuntu 18.4 machine run with no problem, but certain jobs cause problems ...

When one of those jobs starts up, I can no longer run any jobber commands. Every jobber command that I try hangs, and then after a while, it prints "Call to Jobber timed out." This includes invocations of jobber list, jobber log, jobber pause, and all other jobber commands.

Most jobs don't cause this to occur, but one job which reliably causes this problem is when I run a program called "luckyBackup", which is a front end to an rsync-based backup. I run that job once a day in order to back up my machine to a filesystem on a mounted USB drive. That job caused no ill effects when it ran under cron, but for some reason, it consistently causes Jobber commands to hang and time out when it is running.

Also, this is not simply an effect of a job running for a long time. I can run /bin/sleep 300 as a Jobber job, and this does not occur.

Also, I can specify my job as follows, and it still causes the problem:

cmd: ( /path/to/my/job args ... & ) </dev/null 1>/dev/null 2>&1

In this case, the command returns immediately, and Jobber thinks it ran. Nonetheless, the fact that it is running in the background still leads to jobber commands timing out.

As soon as I kill the process, jobber starts working again.

This seems to be due to some characteristic of the actions that the job itself is performing. Since the problematic job involves running a program that does an rsync-based backup of my entire machine, I'm guessing that something that rsync does at a low level might be interfering with Jobber's job management code.

The text was updated successfully, but these errors were encountered:

dshearer · 2020-05-10T23:00:23Z

In this case, the command returns immediately, and Jobber thinks it ran. Nonetheless, the fact that it is running in the background still leads to jobber commands timing out.

How strange!

ghost · 2020-05-11T19:59:28Z

OOPS! This is not true, after all. I improperly tested that case before ...

In this case, the command returns immediately, and Jobber thinks it ran. Nonetheless, the fact that it is running in the background still leads to jobber commands timing out.

I apologize for the false alarm.

Jobber only gets blocked if I do not force the job into the background.

ghost · 2020-05-13T03:39:05Z

I have an idea (more like an educated guess) as to what might be causing this problem with certain commands blocking the job runner. I am not a very experienced Go programmer, so I might be on the wrong track, but just in case I hit upon something ...

Look at lines 77 through 87 in the file common/exec.go. And then check out the discussion here:
golang/go#16787
(see the comment there by "quentinmit" from August 18, 2016).

It seems like something like Output() or CombinedOutput() needs to be called or Goroutines need to be utilized, instead of using the two ReadAll() calls, because otherwise, the stdout or stderr pipes could block ... and this seems to be what is happening in the case that I am reporting here.

Is this a possibility?

ghost · 2020-05-13T20:57:24Z

I think the following solution might work. Just replace lines 76-84 of common/exec.go with this code ...

    var stdoutBytes []byte
    var stderrBytes []byte

    go func() {
    	stdoutBytes, _ = ioutil.ReadAll(stdout)
    }()
    go func() {
        stderrBytes, _ = ioutil.ReadAll(stderr)
    }()

I tested this locally with the latest code base, and it seems to work. However, I don't have enough experience with Go to feel confident about submitting this as a PR.

dshearer · 2020-05-14T20:51:25Z

This is awesome. I will check this out. I think you’re onto something.

…

On May 13, 2020, at 1:57 PM, Lloyd Zusman ***@***.***> wrote: I think the following solution might work. Just replace lines 76-84 of common/exec.go with this code ... var stdoutBytes []byte var stderrBytes []byte go func() { stdoutBytes, _ = ioutil.ReadAll(stdout) }() go func() { stderrBytes, _ = ioutil.ReadAll(stderr) }() I tested this locally with the latest code base, and it seems to work. However, I don't have enough experience with Go to feel confident about submitting this as a PR. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ghost · 2020-05-15T15:47:11Z

... but now I'm not sure whether this really fixed it. :(

Now I have more info about when the problem seems to occur.

If a long-running jobber-initiated job is in progress, I can do a jobber list with no problem. However, if during the course of that job running, I re-run jobber reload, that's when I am still noticing that the program hangs. And then it hangs for all subsequent jobber commands, including jobber list.

I'm not sure if this happens all the time under those circumstances, because I don't have time for in-depth testing at the moment. But this might provide more useful info for diagnosing the problem.

ghost · 2020-05-15T19:02:29Z

After a quick perusal of jobber/cmd_reload.go and jobberrunner/ipc_server.go, it seems to me that the reload command is traversing through the list of jobs and accessing their sockets. If any job's socket is in use because of a long-running job, this could cause the reload command to hang as it's waiting for access to that socket.

Again, I'm not sure about this, and I don't have time right now for more in-depth investigation, but perhaps the job sockets need to be handled differently (somehow ... ???) in order to allow jobs to run and reloads to take place at the same time.

If, while a job is running, the user does 'jobber reload', this command will fail with timeout, and then subsequently all commands will also fail with timeout. This commit fixes this problem by having Jobber cancel all running jobs when the user does 'jobber reload'.

dshearer · 2020-05-18T01:52:16Z

I think I have a fix for this. Could you please install this package -- https://github.com/dshearer/jobber/suites/690922681/artifacts/6521232 -- and let me know if it works?

ghost · 2020-05-18T17:46:13Z

To @dshearer ...

This indeed seems to have fixed the problem. Thank you!

I scheduled my long-running job in ~/.jobber and did a `jobber reload'.

Then, I waited for the job to fire off, which it indeed did at the correct time.
As it continued to run for around 20 minutes or so, I invoked jobber reload and other jobber ... commands numerous times, and now, the software is never hanging when I run any of these commands.

So, as far as I'm concerned, it looks like the problem has indeed been fixed.
Thank you again. :)

If, while a job is running, the user does 'jobber reload', this command will fail with timeout, and then subsequently all commands will also fail with timeout. This commit fixes this problem by having Jobber cancel all running jobs when the user does 'jobber reload'.

dshearer self-assigned this May 17, 2020

dshearer added the bug label May 17, 2020

dshearer added this to the 1.4.3 milestone May 17, 2020

dshearer mentioned this issue May 18, 2020

Issue 283 #289

Merged

dshearer linked a pull request May 18, 2020 that will close this issue

Issue 283 #289

Merged

dshearer mentioned this issue May 18, 2020

Jobber commands are timing out intermittently #280

Open

dshearer closed this as completed in #289 May 24, 2020

dshearer changed the title ~~Error: "Call to Jobber timed out."~~ jobberrunner: Calling "reload" when long-running job is running causes commands to time out May 25, 2020

dshearer changed the title ~~jobberrunner: Calling "reload" when long-running job is running causes commands to time out~~ jobberrunner: Calling reload when long-running job is running causes commands to time out May 25, 2020

dshearer changed the title ~~jobberrunner: Calling reload when long-running job is running causes commands to time out~~ jobberrunner: Calling reload when long-running job is running causes commands to time out thereafter May 25, 2020

dshearer changed the title ~~jobberrunner: Calling reload when long-running job is running causes commands to time out thereafter~~ cmd/reload: Calling this when long-running job is running causes commands to time out thereafter May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmd/reload: Calling this when long-running job is running causes commands to time out thereafter #283

cmd/reload: Calling this when long-running job is running causes commands to time out thereafter #283

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cmd/reload: Calling this when long-running job is running causes commands to time out thereafter #283

cmd/reload: Calling this when long-running job is running causes commands to time out thereafter #283

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!