Jobs executed twice after 24 hours #192

ckotte · 2018-04-26T08:18:27Z

The first backup is executed at 2AM as scheduled, but all subsequent backups also start 2 seconds before 2AM!?

bash-4.4# jobber list
NAME                STATUS  SEC/MIN/HR/MDAY/MTH/WDAY  NEXT RUN TIME               NOTIFY ON SUCCESS  NOTIFY ON ERR  NOTIFY ON FAIL  ERR HANDLER
backup_confluence   Good    0 0 2 * * *               Apr 27 02:00:00 +0200 CEST  false              false          false           Continue
backup_jira         Good    0 0 2 * * *               Apr 27 02:00:00 +0200 CEST  false              false          false           Continue
backup_bitbucket    Good    0 0 2 * * *               Apr 27 02:00:00 +0200 CEST  false              false          false           Continue
backup_letsencrypt  Good    0 0 2 * * *               Apr 27 02:00:00 +0200 CEST  false              false          false           Continue
backup_syncthing    Good    0 0 2 * * *               Apr 27 02:00:00 +0200 CEST  false              false          false           Continue

bash-4.4# jobber log
TIME                  JOB                 SUCCEEDED  RESULT
Apr 26 02:00:00 2018  backup_bitbucket    true       Good
Apr 26 02:00:00 2018  backup_letsencrypt  true       Good
Apr 26 02:00:00 2018  backup_jira         true       Good
Apr 26 02:00:00 2018  backup_syncthing    true       Good
Apr 26 02:00:00 2018  backup_confluence   true       Good
Apr 26 01:59:58 2018  backup_bitbucket    false      Good
Apr 26 01:59:58 2018  backup_letsencrypt  true       Good
Apr 26 01:59:58 2018  backup_jira         true       Good
Apr 26 01:59:58 2018  backup_syncthing    true       Good
Apr 26 01:59:58 2018  backup_confluence   true       Good
Apr 25 02:00:00 2018  backup_bitbucket    true       Good
Apr 25 02:00:00 2018  backup_letsencrypt  true       Good
Apr 25 02:00:00 2018  backup_jira         true       Good
Apr 25 02:00:00 2018  backup_syncthing    true       Good
Apr 25 02:00:00 2018  backup_confluence   true       Good

This will create two backup files for each job and sometimes the backup fails because it can't access files while the same backup job runs more or less in parallel.

This issue happens with Jobber 1.2 and also with Jobber 1.3.2. I use a custom Docker container based on Alpine 3.7 with Jobber built from sources.

The text was updated successfully, but these errors were encountered:

dshearer · 2018-04-27T21:42:57Z

Um... That is quite strange. I think I need to ask you for steps to reproduce this. If you can publish a Docker image that reproduces this, that would be ideal.

ckotte · 2018-05-03T13:45:52Z

Sorry, I didn't get a notification from Github.

I use this image: blacklabelops/jobber-cron. I also created a fork and modified it to support Jobber 1.3.2 with the same result.

I will try your docker image and give you feedback.

ckotte · 2018-05-04T08:47:49Z

I have a similar behavior with your image. The jobs are executed twice right away:

/ $ jobber log
TIME                  JOB                 SUCCEEDED  RESULT
May  4 02:00:00 2018  backup_bitbucket    false      Good
May  4 02:00:00 2018  backup_letsencrypt  false      Good
May  4 02:00:00 2018  backup_jira         false      Good
May  4 02:00:00 2018  backup_syncthing    false      Good
May  4 02:00:00 2018  backup_confluence   false      Good
May  4 01:59:58 2018  backup_bitbucket    false      Good
May  4 01:59:58 2018  backup_syncthing    false      Good
May  4 01:59:58 2018  backup_letsencrypt  false      Good
May  4 01:59:58 2018  backup_confluence   false      Good
May  4 01:59:58 2018  backup_jira         false      Good

dshearer · 2018-05-14T01:05:48Z

Here's the scheduling code:

		// sleep till it's time to run it
		if now.Before(*job.NextRunTime) {
			afterChan := time.After(job.NextRunTime.Sub(now))
			select {
			case now = <-afterChan:
			case <-ctx.Done():
				// abort!
				heap.Push(&jq.q, job)
				return nil
			}
		}

		// schedule this job's next run
		job.NextRunTime = nextRunTime(job, now.Add(time.Second))
		if job.NextRunTime != nil {
			heap.Push(&jq.q, job)

I suspect that, for some reason, the time.After timer is firing two seconds early.

ckotte · 2018-05-15T19:07:56Z

It's executed 2-3 seconds before and again at the scheduled time. I will try your container on another Docker host. Maybe it's related to the kernel and/or Docker version used on my primary Docker host...

Any hints on how can I debug this? Do I have to add logging of the variables to the code and compile it again or can I debug the compiled code?

dshearer · 2018-05-15T23:11:46Z

I may have a fix. I’m going to compile it and send it to you to try out. Please give me a few days. Dylan

…

On May 15, 2018, at 12:07 PM, Christian Kotte ***@***.***> wrote: It's executed 2-3 seconds before and again at the scheduled time. I will try your container on another Docker host. Maybe it's related to the kernel and/or Docker version used on my primary Docker host... Any hints on how can I debug this? Do I have to add logging of the variables to the code and compile it again or can I debug the compiled code? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

MohnishSingla · 2018-05-28T11:45:10Z

I am facing the same issue with 1.2 version. In my case, it is happening alternate day. For example, refer below log:

2018-05-18 10:00:00.859403633 -0400 EDT DailyEmails root true Good
2018-05-18 09:59:58.856692681 -0400 EDT DailyEmails root true Good
2018-05-17 10:00:00.079149698 -0400 EDT DailyEmails root true Good
2018-05-16 10:00:00.079163695 -0400 EDT DailyEmails root true Good
2018-05-16 09:59:59.076448174 -0400 EDT DailyEmails root true Good

If we look at the logs, job executed twice on 16th and 18th whereas once on 17th.

- Queue was causing jobs to run twice within a few seconds of the time they were supposed to run once.

dshearer · 2018-05-28T18:43:58Z

I've made a Docker image containing what I hope is a fix. Could one of you please try it out? It's dshearer/jobber:1.3.3-pr.1-alpine3.7

ckotte · 2018-05-30T08:02:56Z

I tried it. Unfortunately, I have the same behavior:

/ $ jobber log
TIME                  JOB                 SUCCEEDED  RESULT
May 30 02:00:00 2018  backup_bitbucket    false      Good
May 30 02:00:00 2018  backup_letsencrypt  false      Good
May 30 02:00:00 2018  backup_jira         false      Good
May 30 02:00:00 2018  backup_syncthing    false      Good
May 30 02:00:00 2018  backup_confluence   false      Good
May 30 01:59:58 2018  backup_bitbucket    false      Good
May 30 01:59:58 2018  backup_letsencrypt  false      Good
May 30 01:59:58 2018  backup_jira         false      Good
May 30 01:59:58 2018  backup_confluence   false      Good
May 30 01:59:58 2018  backup_syncthing    false      Good
May 29 02:00:00 2018  backup_letsencrypt  false      Good
May 29 02:00:00 2018  backup_jira         false      Good
May 29 02:00:00 2018  backup_syncthing    false      Good
May 29 02:00:00 2018  backup_confluence   false      Good
May 29 02:00:00 2018  backup_bitbucket    false      Good

dshearer · 2018-06-02T19:56:32Z

I really can't imagine what is causing this. Unfortunately, I cannot reproduce it. If you can provide instructions for reproducing it, that would help. Otherwise, I need to think about this.

ckotte · 2018-06-06T15:05:22Z

Sorry, I'm a bit busy at the moment.

I don't know how to reproduce. Maybe it's related to the kernel or Docker version I use..!? I need to do more tests to narrow down the issue..

Could you create a special version that logs the values of the variables etc. to a logfile during execution or something similar?

# docker --version
Docker version 17.07.0-ce, build fd7dc00

# uname -r
4.2.8

This is a part of the Dockerfile I used with your Docker image:

version: '2'

services:
  jobber-docker:
  [...]
  jobber-tools:
  [...]
  # TEST
  jobber-test:
    restart: always
    image: dshearer/jobber:1.3.3-pr.1-alpine3.7
    container_name: jobber-test
    hostname: jobber-test
    networks:
      - atlassian_backend
    volumes:
      - /xxx/Container/jobber_test:/home/jobberuser
      - /xxx/Backups/Test:/backup
      - /xxx/Container/postgres:/mnt/postgresql:ro
      - /xxx/Container/bitbucket:/mnt/bitbucket:ro
      - /xxx/Container/jira:/mnt/jira:ro
      - /xxx/Container/confluence:/mnt/confluence:ro
      - /xxx/Container/letsencrypt:/mnt/letsencrypt:ro
      - /xxx/Container/syncthing:/mnt/syncthing:ro
    environment:
      TZ: "Europe/Berlin"

networks:
  atlassian_backend:
    external: true

This is the test jobber file:

[/share/Container/jobber_test] # cat .jobber
[jobs]
- name: backup_confluence
  cmd: /backup/container-backup/container-backup.sh -p confluence 2>> /var/log/backup-confluence-err.log >> /var/log/backup-confluence.log
  time: '0 0 2 * * *'
  onError: Continue
  notifyOnError: false
  notifyOnFailure: false
- name: backup_jira
  cmd: /backup/container-backup/container-backup.sh -p jira 2>> /var/log/backup-jira-err.log >> /var/log/backup-jira.log
  time: '0 0 2 * * *'
  onError: Continue
  notifyOnError: false
  notifyOnFailure: false
- name: backup_bitbucket
  cmd: /backup/container-backup/container-backup.sh -p bitbucket 2>> /var/log/backup-bitbucket-err.log >> /var/log/backup-bitbucket.log
  time: '0 0 2 * * *'
  onError: Continue
  notifyOnError: false
  notifyOnFailure: false
- name: backup_letsencrypt
  cmd: /backup/container-backup/container-backup.sh -p letsencrypt 2>> /var/log/backup-letsencrypt-err.log >> /var/log/backup-letsencrypt.log
  time: '0 0 2 * * *'
  onError: Continue
  notifyOnError: false
  notifyOnFailure: false
- name: backup_syncthing
  cmd: /backup/container-backup/container-backup.sh -p syncthing 2>> /var/log/backup-syncthing-err.log >> /var/log/backup-syncthing.log
  time: '0 0 2 * * *'
  onError: Continue
  notifyOnError: false
  notifyOnFailure: false

Current status on jobber test container:

/ $ jobber log
TIME                  JOB                 SUCCEEDED  RESULT
Jun  6 02:00:00 2018  backup_bitbucket    false      Good
Jun  6 02:00:00 2018  backup_letsencrypt  false      Good
8000

Jun  6 02:00:00 2018  backup_jira         false      Good
Jun  6 02:00:00 2018  backup_syncthing    false      Good
Jun  6 02:00:00 2018  backup_confluence   false      Good
Jun  6 01:59:57 2018  backup_bitbucket    false      Good
Jun  6 01:59:57 2018  backup_letsencrypt  false      Good
Jun  6 01:59:57 2018  backup_jira         false      Good
Jun  6 01:59:57 2018  backup_syncthing    false      Good
Jun  6 01:59:57 2018  backup_confluence   false      Good
Jun  5 02:00:00 2018  backup_bitbucket    false      Good
Jun  5 02:00:00 2018  backup_letsencrypt  false      Good
Jun  5 02:00:00 2018  backup_jira         false      Good
Jun  5 02:00:00 2018  backup_syncthing    false      Good
Jun  5 02:00:00 2018  backup_confluence   false      Good
Jun  5 01:59:58 2018  backup_bitbucket    false      Good
Jun  5 01:59:58 2018  backup_letsencrypt  false      Good
Jun  5 01:59:58 2018  backup_jira         false      Good
Jun  5 01:59:58 2018  backup_syncthing    false      Good
Jun  5 01:59:58 2018  backup_confluence   false      Good
[...]

MohnishSingla · 2018-06-08T08:39:51Z

Hello,
For me, unfortunately this is happening only on one or two servers and m not able to repro it in my boxes.

dshearer · 2018-06-09T21:45:06Z

Thanks for the info. I'll keep working on this.

dshearer · 2018-06-10T18:35:59Z

Ok. @ckotte , I've done as you asked: I've made a build that logs some useful info when scheduling jobs. dshearer/jobber:1.3.3-pr.2-alpine-3.7

Here's an example of the output:

Next job to run is ExampleJob, at Jun 10 18:32:49.
It is now Jun 10 18:32:49, which is NOT before Jun 10 18:32:49
Running ExampleJob
jobberuser: echo "Jobber is running!"
Calling pop
Next job to run is ExampleJob, at Jun 10 18:32:50.
It is now Jun 10 18:32:49.
Sleeping for 999.251675ms.
It is now Jun 10 18:32:50, which is NOT before Jun 10 18:32:50
Running ExampleJob
jobberuser: echo "Jobber is running!"

It goes to stdout. You can also write this to disk with the logPath setting in your jobfile:

[prefs]
logPath: /path/to/logfile

[jobs]
...

ckotte · 2018-06-11T08:09:00Z

thx. I updated my jobber test container with your new version. I will let you know.

ckotte · 2018-06-12T08:56:50Z

Ok. Here is the output:

/ $ jobber log
TIME                  JOB                 SUCCEEDED  RESULT
Jun 12 02:00:00 2018  backup_bitbucket    false      Good
Jun 12 02:00:00 2018  backup_letsencrypt  false      Good
Jun 12 02:00:00 2018  backup_jira         false      Good
Jun 12 02:00:00 2018  backup_syncthing    false      Good
Jun 12 02:00:00 2018  backup_confluence   false      Good
Jun 12 01:59:59 2018  backup_bitbucket    false      Good
Jun 12 01:59:59 2018  backup_letsencrypt  false      Good
Jun 12 01:59:59 2018  backup_jira         false      Good
Jun 12 01:59:59 2018  backup_syncthing    false      Good
Jun 12 01:59:59 2018  backup_confluence   false      Good
Jun 12 01:59:58 2018  backup_bitbucket    false      Good
Jun 12 01:59:58 2018  backup_letsencrypt  false      Good
Jun 12 01:59:58 2018  backup_jira         false      Good
Jun 12 01:59:58 2018  backup_syncthing    false      Good
Jun 12 01:59:58 2018  backup_confluence   false      Good
Jun 11 02:00:00 2018  backup_bitbucket    false      Good
Jun 11 02:00:00 2018  backup_letsencrypt  false      Good
Jun 11 02:00:00 2018  backup_jira         false      Good
Jun 11 02:00:00 2018  backup_syncthing    false      Good
Jun 11 02:00:00 2018  backup_confluence   false      Good

Starting job-runner thread...
Listening for commands on /var/jobber/100/cmd.sock
JobManager Waiting...
Launching job runner thread
done
Calling pop
Next job to run is backup_confluence, at Jun 11 02:00:00.
It is now Jun 10 22:47:18.
Sleeping for 3h12m41.974725264s.
Jobberrunner Quitting
JobManager cancelling...
done
JobManager Waiting...
Main thread cancelled
JobRunner: cancelling
Consuming remaining run recs...
Run thread got 'stop'
JobRunner: cleaning up...
JobRunner done
Done onsuming remaining run recs
done
Jobberrunner Quitting
JobManager cancelling...
done
JobManager Waiting...
done
Starting job-runner thread...
Launching job runner thread
done
Calling pop
Next job to run is backup_confluence, at Jun 11 02:00:00.
It is now Jun 10 22:53:51.
Sleeping for 3h6m8.980066293s.
It is now Jun 11 02:00:00, which is NOT before Jun 11 02:00:00
Running backup_confluence
jobberuser: /backup/container-backup/container-backup.sh -p confluence 2>> /var/log/backup-confluence-err.log >> /var/log/backup-confluence.log
Calling pop
Next job to run is backup_syncthing, at Jun 11 02:00:00.
It is now Jun 11 02:00:00, which is NOT before Jun 11 02:00:00
Running backup_syncthing
jobberuser: /backup/container-backup/container-backup.sh -p syncthing 2>> /var/log/backup-syncthing-err.log >> /var/log/backup-syncthing.log
Calling pop
Next job to run is backup_jira, at Jun 11 02:00:00.
It is now Jun 11 02:00:00, which is NOT before Jun 11 02:00:00
Running backup_jira
jobberuser: /backup/container-backup/container-backup.sh -p jira 2>> /var/log/backup-jira-err.log >> /var/log/backup-jira.log
Calling pop
Next job to run is backup_letsencrypt, at Jun 11 02:00:00.
It is now Jun 11 02:00:00, which is NOT before Jun 11 02:00:00
ExecAndWait: /bin/sh: exit status 1
ExecAndWait: /bin/sh: exit status 1
ExecAndWait: /bin/sh: exit status 1
Running backup_letsencrypt
jobberuser: /backup/container-backup/container-backup.sh -p letsencrypt 2>> /var/log/backup-letsencrypt-err.log >> /var/log/backup-letsencrypt.log
Calling pop
Next job to run is backup_bitbucket, at Jun 11 02:00:00.
It is now Jun 11 02:00:00, which is NOT before Jun 11 02:00:00
ExecAndWait: /bin/sh: exit status 1
Running backup_bitbucket
jobberuser: /backup/container-backup/container-backup.sh -p bitbucket 2>> /var/log/backup-bitbucket-err.log >> /var/log/backup-bitbucket.log
Calling pop
Next job to run is backup_confluence, at Jun 12 02:00:00.
It is now Jun 11 02:00:00.
Sleeping for 23h59m59.754671253s.
ExecAndWait: /bin/sh: exit status 1
Got command
It is now Jun 12 01:59:58, which is NOT before Jun 12 02:00:00
Running backup_confluence
jobberuser: /backup/container-backup/container-backup.sh -p confluence 2>> /var/log/backup-confluence-err.log >> /var/log/backup-confluence.log
Calling pop
Next job to run is backup_syncthing, at Jun 12 02:00:00.
It is now Jun 12 01:59:58.
Sleeping for 122.217005ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 01:59:58, which is NOT before Jun 12 02:00:00
Running backup_syncthing
jobberuser: /backup/container-backup/container-backup.sh -p syncthing 2>> /var/log/backup-syncthing-err.log >> /var/log/backup-syncthing.log
Calling pop
Next job to run is backup_jira, at Jun 12 02:00:00.
It is now Jun 12 01:59:58.
Sleeping for 30.217956ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 01:59:58, which is NOT before Jun 12 02:00:00
Running backup_jira
jobberuser: /backup/container-backup/container-backup.sh -p jira 2>> /var/log/backup-jira-err.log >> /var/log/backup-jira.log
Calling pop
Next job to run is backup_letsencrypt, at Jun 12 02:00:00.
It is now Jun 12 01:59:58.
Sleeping for 29.751361ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 01:59:58, which is NOT before Jun 12 02:00:00
Running backup_letsencrypt
jobberuser: /backup/container-backup/container-backup.sh -p letsencrypt 2>> /var/log/backup-letsencrypt-err.log >> /var/log/backup-letsencrypt.log
Calling pop
Next job to run is backup_bitbucket, at Jun 12 02:00:00.
It is now Jun 12 01:59:58.
Sleeping for 31.366824ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 01:59:58, which is NOT before Jun 12 02:00:00
Running backup_bitbucket
jobberuser: /backup/container-backup/container-backup.sh -p bitbucket 2>> /var/log/backup-bitbucket-err.log >> /var/log/backup-bitbucket.log
Calling pop
Next job to run is backup_confluence, at Jun 12 02:00:00.
It is now Jun 12 01:59:58.
Sleeping for 1.784548946s.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 01:59:59, which is NOT before Jun 12 02:00:00
Running backup_confluence
jobberuser: /backup/container-backup/container-backup.sh -p confluence 2>> /var/log/backup-confluence-err.log >> /var/log/backup-confluence.log
Calling pop
Next job to run is backup_syncthing, at Jun 12 02:00:00.
It is now Jun 12 01:59:59.
Sleeping for 122.076316ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 01:59:59, which is NOT before Jun 12 02:00:00
Running backup_syncthing
jobberuser: /backup/container-backup/container-backup.sh -p syncthing 2>> /var/log/backup-syncthing-err.log >> /var/log/backup-syncthing.log
Calling pop
Next job to run is backup_jira, at Jun 12 02:00:00.
It is now Jun 12 01:59:59.
Sleeping for 30.214068ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 01:59:59, which is NOT before Jun 12 02:00:00
Running backup_jira
jobberuser: /backup/container-backup/container-backup.sh -p jira 2>> /var/log/backup-jira-err.log >> /var/log/backup-jira.log
Calling pop
Next job to run is backup_letsencrypt, at Jun 12 02:00:00.
It is now Jun 12 01:59:59.
Sleeping for 29.762412ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 01:59:59, which is NOT before Jun 12 02:00:00
Running backup_letsencrypt
jobberuser: /backup/container-backup/container-backup.sh -p letsencrypt 2>> /var/log/backup-letsencrypt-err.log >> /var/log/backup-letsencrypt.log
Calling pop
Next job to run is backup_bitbucket, at Jun 12 02:00:00.
It is now Jun 12 01:59:59.
Sleeping for 31.5993ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 01:59:59, which is NOT before Jun 12 02:00:00
Running backup_bitbucket
jobberuser: /backup/container-backup/container-backup.sh -p bitbucket 2>> /var/log/backup-bitbucket-err.log >> /var/log/backup-bitbucket.log
Calling pop
Next job to run is backup_confluence, at Jun 12 02:00:00.
It is now Jun 12 01:59:59.
Sleeping for 784.902407ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 02:00:00, which is NOT before Jun 12 02:00:00
Running backup_confluence
jobberuser: /backup/container-backup/container-backup.sh -p confluence 2>> /var/log/backup-confluence
8000
-err.log >> /var/log/backup-confluence.log
Calling pop
Next job to run is backup_syncthing, at Jun 12 02:00:00.
It is now Jun 12 02:00:00.
Sleeping for 91.668149ms.
ExecAndWait: /bin/sh: exit status 1
It is now Jun 12 02:00:00, which is NOT before Jun 12 02:00:00
Running backup_syncthing
jobberuser: /backup/container-backup/container-backup.sh -p syncthing 2>> /var/log/backup-syncthing-err.log >> /var/log/backup-syncthing.log
Calling pop
Next job to run is backup_jira, at Jun 12 02:00:00.
It is now Jun 12 02:00:00, which is NOT before Jun 12 02:00:00
ExecAndWait: /bin/sh: exit status 1
Running backup_jira
jobberuser: /backup/container-backup/container-backup.sh -p jira 2>> /var/log/backup-jira-err.log >> /var/log/backup-jira.log
Calling pop
Next job to run is backup_letsencrypt, at Jun 12 02:00:00.
It is now Jun 12 02:00:00, which is NOT before Jun 12 02:00:00
ExecAndWait: /bin/sh: exit status 1
Running backup_letsencrypt
jobberuser: /backup/container-backup/container-backup.sh -p letsencrypt 2>> /var/log/backup-letsencrypt-err.log >> /var/log/backup-letsencrypt.log
Calling pop
Next job to run is backup_bitbucket, at Jun 12 02:00:00.
It is now Jun 12 02:00:00, which is NOT before Jun 12 02:00:00
ExecAndWait: /bin/sh: exit status 1
Running backup_bitbucket
jobberuser: /backup/container-backup/container-backup.sh -p bitbucket 2>> /var/log/backup-bitbucket-err.log >> /var/log/backup-bitbucket.log
Calling pop
Next job to run is backup_confluence, at Jun 13 02:00:00.
It is now Jun 12 02:00:00.
Sleeping for 23h59m59.745693673s.
ExecAndWait: /bin/sh: exit status 1
Got command
Got list jobs cmd
Got command

dshearer · 2018-06-17T21:02:16Z

It is now Jun 12 01:59:59, which is NOT before Jun 12 02:00:00

I am flabbergasted.

dshearer · 2018-06-30T22:21:56Z

Sorry, I've been quite busy with some personal stuff. Would one of you be able to do some experiments?

Please change the time of day these jobs run.
Please change the frequency at which they run -- say, have them run every 2 hours.

Does the problem remain after either of these changes?

ckotte · 2018-07-03T14:09:51Z

I will try this after I'm back from vacation in 2 weeks.

ckotte · 2018-07-12T07:22:44Z

Same issue with different frequency. Looks like the issue always starts the next day...

/ $ jobber list
NAME                STATUS  SEC/MIN/HR/MDAY/MTH/WDAY  NEXT RUN TIME               NOTIFY ON SUCCESS  NOTIFY ON ERR  NOTIFY ON FAIL  ERR HANDLER
backup_confluence   Good    0 0 */1 * * *             Jul 12 10:00:00 +0200 CEST  false              false          false           Continue
backup_jira         Good    0 0 */2 * * *             Jul 12 10:00:00 +0200 CEST  false              false          false           Continue
backup_bitbucket    Good    0 0 8 * * *               Jul 13 08:00:00 +0200 CEST  false              false          false           Continue
backup_letsencrypt  Good    0 0 2 * * *               Jul 13 02:00:00 +0200 CEST  false              false          false           Continue
backup_syncthing    Good    0 0 10 * * *              Jul 12 10:00:00 +0200 CEST  false              false          false           Continue

/ $ jobber log
TIME                  JOB                 SUCCEEDED  RESULT
Jul 12 09:00:00 2018  backup_confluence   false      Good
Jul 12 08:59:58 2018  backup_confluence   false      Good
Jul 12 08:00:00 2018  backup_jira         false      Good
Jul 12 08:00:00 2018  backup_confluence   false      Good
Jul 12 08:00:00 2018  backup_bitbucket    false      Good
Jul 12 07:59:59 2018  backup_bitbucket    false      Good
Jul 12 07:00:00 2018  backup_confluence   false      Good
Jul 12 06:00:00 2018  backup_jira         false      Good
Jul 12 06:00:00 2018  backup_confluence   false      Good
Jul 12 05:59:59 2018  backup_jira         false      Good
Jul 12 05:00:00 2018  backup_confluence   false      Good
Jul 12 04:59:59 2018  backup_confluence   false      Good
Jul 12 04:00:00 2018  backup_jira         false      Good
Jul 12 04:00:00 2018  backup_confluence   false      Good
Jul 12 03:00:00 2018  backup_confluence   false      Good
Jul 12 02:00:00 2018  backup_jira         false      Good
Jul 12 02:00:00 2018  backup_confluence   false      Good
Jul 12 02:00:00 2018  backup_letsencrypt  false      Good
Jul 12 01:00:00 2018  backup_confluence   false      Good
Jul 12 00:00:00 2018  backup_jira         false      Good
Jul 12 00:00:00 2018  backup_confluence   false      Good
Jul 11 23:00:00 2018  backup_confluence   false      Good
Jul 11 22:00:00 2018  backup_jira         false      Good
Jul 11 22:00:00 2018  backup_confluence   false      Good
Jul 11 21:00:00 2018  backup_confluence   false      Good
Jul 11 20:00:00 2018  backup_jira         false      Good
Jul 11 20:00:00 2018  backup_confluence   false      Good
Jul 11 19:00:00 2018  backup_confluence   false      Good
Jul 11 18:00:00 2018  backup_jira         false      Good
Jul 11 18:00:00 2018  backup_confluence   false      Good
Jul 11 17:00:00 2018  backup_confluence   false      Good
Jul 11 16:00:00 2018  backup_jira         false      Good
Jul 11 16:00:00 2018  backup_confluence   false      Good
Jul 11 15:00:00 2018  backup_confluence   false      Good
Jul 11 14:00:00 2018  backup_jira         false      Good
Jul 11 14:00:00 2018  backup_confluence   false      Good
Jul 11 13:00:00 2018  backup_confluence   false      Good
Jul 11 12:00:00 2018  backup_jira         false      Good
Jul 11 12:00:00 2018  backup_confluence   false      Good
Jul 11 11:00:00 2018  backup_confluence   false      Good
Jul 11 10:00:00 2018  backup_jira         false      Good
Jul 11 10:00:00 2018  backup_syncthing    false      Good
Jul 11 10:00:00 2018  backup_confluence   false      Good

dshearer · 2018-08-02T02:17:09Z

I have finally reproduced this! I'll have a fix this weekend.

dshearer · 2018-08-06T02:20:36Z

Actually, this is taking longer than expected to make the fix. I'll keep this issue updated.

ckotte · 2018-08-06T07:20:23Z

ok. thx. How could you reproduce this? Is it a general issue in the code or related to a specific setting?

dshearer · 2018-08-08T00:12:02Z

This seems to be limited to jobber running in Docker. I reproduced it by running jobber in Docker with a small job that runs every hour.

…

On Aug 6, 2018, at 12:20 AM, Christian Kotte ***@***.***> wrote: ok. thx. How could you reproduce this? Is it a general issue in the code or related to a specific setting? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

dshearer · 2018-08-11T17:56:19Z

Ugh. I haven't been able to reproduce it again.

Sorry this is taking so long to fix. I can only work on this on the weekends.

dshearer · 2018-08-18T23:10:13Z

I don't know guys. Jobber's code is correct. This has something to do with Docker. I really don't know what to do at this point.

ckotte · 2018-08-18T23:40:21Z

Thank you very much for your time and effort. Could you re-produce it in Docker again or only the first time?

dshearer · 2018-08-19T00:39:54Z

Only the first time. It is very frustrating.

dshearer · 2018-08-19T19:50:47Z

Okay, I reproduced it again, with a job called "Test" that runs every minute:

Next job to run is Test, at Aug 19 19:28:00.
It is now Aug 19 19:27:00.
Sleeping for 59.9998044s.
It is now Aug 19 19:27:59, which is NOT before Aug 19 19:28:00
Running Test
jobberuser: echo Hi
Calling pop
Next job to run is Test, at Aug 19 19:28:00.
It is now Aug 19 19:27:59.
Sleeping for 999.8574ms.
It is now Aug 19 19:28:00, which is NOT before Aug 19 19:28:00
Running Test

Here's the implementation: At startup, Jobber computes the next run time for each job, and then puts each job in a priority queue that is sorted by runtime. When Pop is called on this queue, it finds the next job that should run, sleeps until it's time to run it, and then returns it. Here's the part of the code corresponding to the output above (variable now is of type time.Time, and initially contains the current time):

// get next-scheduled job
job := heap.Pop(&jq.q).(*jobfile.Job)

var timeFmt = "Jan _2 15:04:05"

common.Logger.Printf("Next job to run is %v, at %v.", job.Name,
	job.NextRunTime.Format(timeFmt))

// sleep till it's time to run it
for now.Before(*job.NextRunTime) {
	sleepDur := job.NextRunTime.Sub(now)

	common.Logger.Printf("It is now %v.", now.Format(timeFmt))
	common.Logger.Printf("Sleeping for %v.", sleepDur)

	afterChan := time.After(sleepDur)
	select {
	case now = <-afterChan:
	case <-ctx.Done():
		// abort!
		heap.Push(&jq.q, job)
		return nil
	}
}

common.Logger.Printf("It is now %v, which is NOT before %v",
	now.Format(timeFmt), job.NextRunTime.Format(timeFmt))

The log indicates that, at some point, now == Aug 19 19:27:59, *job.NextRunTime == Aug 19 19:28:00, and yet now.Before(*job.NextRunTime) returned false.

dshearer · 2018-08-19T20:50:15Z

I've managed to reproduce this in a small program. I'll probably open a bug with the Go team.

- A bug in Go's time package (golang/go#27090) was causing jobs to be executed multiple times. In this commit, we provide a workaround.

(#192) Bugfix in queue.go: Job execed multiple times

dshearer · 2018-09-17T00:53:34Z

Implemented workaround.

dshearer · 2018-09-17T02:45:44Z

Fix is in v1.3.4. Please let me know if it works.

dshearer self-assigned this Apr 27, 2018

dshearer added the bug label Apr 27, 2018

dshearer added this to the 1.3.3 milestone May 28, 2018

dshearer added a commit that referenced this issue May 28, 2018

queue: (#192) Bugfix

d486b43

- Queue was causing jobs to run twice within a few seconds of the time they were supposed to run once.

dshearer added a commit that referenced this issue Jun 10, 2018

Add debug logging to JobQueue::Pop for issue #192

b09bfd5

dshearer mentioned this issue Aug 20, 2018

time: wall and monotonic clocks get out of sync golang/go#27090

DC44 Open

dshearer modified the milestones: 1.3.3, 1.3.4 Aug 25, 2018

dshearer added a commit that referenced this issue Sep 10, 2018

(#192) Bugfix in queue.go: Job execed multiple times

c9fbe94

- A bug in Go's time package (golang/go#27090) was causing jobs to be executed multiple times. In this commit, we provide a workaround.

dshearer added a commit that referenced this issue Sep 10, 2018

Merge pull request #210 from dshearer/maint-1.3-issue-192

a6270d9

(#192) Bugfix in queue.go: Job execed multiple times

dshearer added a commit that referenced this issue Sep 10, 2018

Revert "(#192) Bugfix in queue.go: Job execed multiple times"

882bd0d

dshearer closed this as completed Sep 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Jobs executed twice after 24 hours #192

Jobs executed twice after 24 hours #192

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jobs executed twice after 24 hours #192

Jobs executed twice after 24 hours #192

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!