Ticket #599 (closed maintenance: fixed)
Server time drift
Reported by: | chris | Owned by: | chris |
---|---|---|---|
Priority: | critical | Milestone: | Maintenance |
Component: | Live server | Keywords: | |
Cc: | jim, ed, aland | Estimated Number of Hours: | 0.0 |
Add Hours to Ticket: | 0 | Billable?: | yes |
Total Hours: | 1.54 |
Description
The servers are not keeping good time at the moment.
Attachments
Change History
comment:1 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.0 to 0.25
comment:2 follow-up: ↓ 3 Changed 3 years ago by jim
- Add Hours to Ticket changed from 0.0 to 0.2
- Total Hours changed from 0.25 to 0.45
Puffin: This doesn't seem to be working... This has consequences for Drupal so need fixing.
tn@puffin:~/static/transition-network-d6-p005$ date Tue Oct 8 00:38:45 BST 2013
It's 18.58 Oct 7!
rdate -s hangs, so this must have got hit but the BOA update, so I've edit /etc/csf/csf.conf to allow port 37 and set _CUSTOM_CSF to 'YES' in /root/.barracuda.cnf to avoid future clobberage.
Added crontab entry again too. Ideally we'd use a non-root user with their own cron, or another place that isn't controlled by BOA.
puffin:/data/conf# date ; rdate -s ntp.demon.co.uk ; date Tue Oct 8 00:44:26 BST 2013 Mon Oct 7 19:02:36 BST 2013
comment:3 in reply to: ↑ 2 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.1
- Total Hours changed from 0.45 to 0.55
Replying to jim:
Ideally we'd use a non-root user with their own cron
Thanks for spotting this, and good point, I have commented the root cron job and set it to run as me.
comment:4 Changed 3 years ago by jim
Per my comment over on #555 I have a suspicion the regular date alteration by the cron task is the cause of the load spikes...
I'd like to reduce the cron frequency to every 4 hours to see what effect that has.
I await Chris' thoughts to my brain-fart over on #555.
comment:5 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.1
- Total Hours changed from 0.55 to 0.65
I don't think the time drift has anything to do with the load spikes. The time drift only started when the motherboard was swapped and the load spikes predate this by some months.
Alan is going to try to solve the time drift issue during tonight's downtime, see https://lists.webarch.co.uk/pipermail/webarch-xen1/2013-October/000005.html
comment:6 follow-up: ↓ 7 Changed 3 years ago by jim
- Add Hours to Ticket changed from 0.0 to 0.1
- Total Hours changed from 0.65 to 0.75
There are two types of load spikes:
- Those from 'before' which were high and irregular ultimately caused by hardware issues -- fixed by the motherboard changed.
- The recent ones which are much lower in intensity and regularly spaced around the hour (or there abouts) -- these coincide with the crontab date sync being enabled as far as I can tell (see #555).
I note that the server has been rebooted (which I think was the clock fix) and the load spikes have now largely stopped.
Two questions for Chris:
- Has the clock issue been resolved?
- Has the crontab entry for the date sync been removed?
comment:7 in reply to: ↑ 6 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.75 to 1.0
Replying to jim:
There are two types of load spikes:
- Those from 'before' which were high and irregular ultimately caused by hardware issues -- fixed by the motherboard changed.
Changing the motherboard stopped the crashes and created the time drift problem but I'm not sure that anything else can be put down to it.
- The recent ones which are much lower in intensity and regularly spaced around the hour (or there abouts) -- these coincide with the crontab date sync being enabled as far as I can tell (see #555).
I note that the server has been rebooted (which I think was the clock fix) and the load spikes have now largely stopped.
The reboot was to try to fix the clock but it didn't fix it.
Two questions for Chris:
- Has the clock issue been resolved?
No.
- Has the crontab entry for the date sync been removed?
No, it's running as me now.
comment:9 Changed 3 years ago by aland
- Add Hours to Ticket changed from 0.0 to 0.17
- Total Hours changed from 1.0 to 1.17
Installed chrony ( replacement for ntp )
commented out crontab entry for using rdate
aptitude install chrony
watched clock with date for some few minutes
clocks now in sync
comment:10 Changed 3 years ago by aland
Logged into parrot
installed chrony
clocks in sync now
comment:11 Changed 3 years ago by aland
- Add Hours to Ticket changed from 0.0 to 0.17
- Total Hours changed from 1.17 to 1.34
logged into penguin
installed chrony
removed cronjob
clock is running accurately now
comment:12 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.1
- Total Hours changed from 1.34 to 1.44
These two munin graphs changed quite dramatically when chrony was installed:
I don't know exactly what it means though.
comment:13 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.1
- Status changed from new to closed
- Resolution set to fixed
- Total Hours changed from 1.44 to 1.54
The server don't have a issue with time keeping since chrony has been installed as far as I'm aware, closing this ticket.
comment:14 Changed 3 years ago by chris
There is a crontab entry to reset the clock after a reboot, see wiki:PuffinServer#Cron
Rdate doesn't run on wiki:PuffinServer due to the firewall, so /etc/csf/csf.conf was edited and port 37 was added to TCP_IN and TCP_OUT and the firewall restarted.
Checking and then setting the date and checking it again:
The following was added to the root crontab:
On wiki:PenguinServer
A crontab was also added.
On wiki:ParrotServer:
A crontab was also added.
This is just a temp workaround -- we need to solve the clock drifting, not sure what the answer is yet though.