Ticket #618 (closed maintenance: fixed)
Migrate Penguin and Parrot to the ZFS fileserver
Reported by: | chris | Owned by: | chris |
---|---|---|---|
Priority: | trivial | Milestone: | Maintenance |
Component: | Dev server | Keywords: | |
Cc: | ed, aland | Estimated Number of Hours: | 2.0 |
Add Hours to Ticket: | 0 | Billable?: | yes |
Total Hours: | 1.03 |
Description
Since wiki:PuffinServer has been running from the ZFS fileserver, see ticket:593, it has been performing better -- we should also migrate wiki:PenguinServer and wiki:ParrotServer to the ZFS server prior to upgrading them to Debian Wheezy on ticket:535.
Attachments
Change History
comment:1 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.0 to 0.25
Alan migrated wiki:PenguinServer and wiki:ParrotServer to the ZFS server last night on ticket:535#comment:22.
I have spent some time today looking at the Munin stats and it looks to me that for twese two servers the ZFS network file server is somewhat slower that the directly attached disks that the servers were using.
wiki:ParrotServer disk latency:
wiki:PenguinServer disk latency:
It will be worth keeping an eye on these stats over the next week or so.
comment:2 Changed 3 years ago by ed
so - the move has slowed things down?
If the move has slowed things down, we measure, then move back?
comment:3 Changed 3 years ago by sam
Hi I'm quite keen to close any tickets that we can just to get a better idea of what is a live issue. Can I close this one? Any concerns?
Thanks
Sam
comment:4 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.3
- Total Hours changed from 0.25 to 0.55
We (Webarchitects) have been somewhat concerned about the latency of the ZFS file system, last night we moved the swap partitions for all the virtual servers to local disks rather than the networked ZFS file system in the hope that this would improve performance.
On wiki:ParrotServer we have:
- /dev/xvda1 swap
- /dev/xvda2 /
- /dev/xvda3 /home
On wiki:PenguinServer we have:
- /dev/xvda1 swap
- /dev/xvda2 /
Tomorrow we (Webarchitects) intend to replace a drive in the ZFS server which has been generating some SMART errors in the hope that this disk is the cause of the latency we have been seeing.
I'm afraid I think we probably need to keep this ticket open for now.
comment:5 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.48
- Priority changed from major to trivial
- Total Hours changed from 0.55 to 1.03
We have made a massive breakthrough on this issue, look at the following graphs from today for wiki:ParrotServer:
and wiki:PuffinServer:
This very dramatic change is a result of disabling the ZIL by running this on the NFS/ZFS zvol:
zfs set sync=disable zroot
See SOLVED: Performance Issues With FreeBSD ZFS Backed ESXi Storage Over NFS for more details.
So there is finally an answer to Ed's question from some months back:
Replying to ed:
so - the move has slowed things down?
If the move has slowed things down, we measure, then move back?
The NFS/ZFS server is now probably faster than the directly attached disks, it might be worth revisiting this ticket in a few weeks to look at the annual Munin graphs to see how things are looking before closing it.
The time I have recorded with this comment is simply the time taken to upload the images and post the comment, it doesn't include any of the time spend on this issue, which included getting (and reading and deleting) an email every 5 mins for each server when the disk IO was over 1 second (countless emails over the last six months or so).
comment:6 Changed 3 years ago by chris
- Status changed from new to closed
- Resolution set to fixed
This is now sorted, I have also added a note on ticket:593#comment:12 for future reference.