Ticket #618 (closed maintenance: fixed)

Opened 3 years ago

Last modified 3 years ago

Migrate Penguin and Parrot to the ZFS fileserver

Reported by: chris Owned by: chris
Priority: trivial Milestone: Maintenance
Component: Dev server Keywords:
Cc: ed, aland Estimated Number of Hours: 2.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 1.03

Description

Since wiki:PuffinServer has been running from the ZFS fileserver, see ticket:593, it has been performing better -- we should also migrate wiki:PenguinServer and wiki:ParrotServer to the ZFS server prior to upgrading them to Debian Wheezy on ticket:535.

Attachments

parrot-2013-11-18_diskstats_latency-week.png (33.1 KB) - added by chris 3 years ago.
penguin-2013-11-18_diskstats_latency-week.png (28.4 KB) - added by chris 3 years ago.
parrot_2013-01-22_xvda1-year.png (68.1 KB) - added by chris 3 years ago.
Parrot Swap
parrot_2013-01-22_xvda2-year.png (41.4 KB) - added by chris 3 years ago.
Parrot Root
parrot_2013-01-22_xvda3-year.png (37.6 KB) - added by chris 3 years ago.
Parrot Home
penguin_2013-01-22_xvda1-year.png (62.8 KB) - added by chris 3 years ago.
Penguin Swap
penguin_2013-01-22_xvda2-year.png (37.2 KB) - added by chris 3 years ago.
Penguin Root
parrot_2014-03-26_diskstats_latency-day.png (40.4 KB) - added by chris 3 years ago.
parrot_2014-03-26_diskstats_utilization-day.png (47.6 KB) - added by chris 3 years ago.
parrot_2014-03-26_iostat_ios-day.png (74.3 KB) - added by chris 3 years ago.
parrot_2014-03-26_load-day.png (22.4 KB) - added by chris 3 years ago.
penguin_2014-03-26_diskstats_latency-day.png (42.0 KB) - added by chris 3 years ago.
penguin_2014-03-26_diskstats_utilization-day.png (44.0 KB) - added by chris 3 years ago.
penguin_2014-03-26_load-day.png (31.0 KB) - added by chris 3 years ago.
penguin_2014-03-26_load-day.2.png (31.0 KB) - added by chris 3 years ago.
puffin_2014-03-26_diskstats_latency-day.png (34.3 KB) - added by chris 3 years ago.
puffin_2014-03-26_diskstats_utilization-day.png (37.9 KB) - added by chris 3 years ago.
puffin_2014-03-26_iostat_ios-day.png (59.8 KB) - added by chris 3 years ago.
puffin_2014-03-26_load-day.png (30.7 KB) - added by chris 3 years ago.

Change History

Changed 3 years ago by chris

Changed 3 years ago by chris

comment:1 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 0.0 to 0.25

Alan migrated wiki:PenguinServer and wiki:ParrotServer to the ZFS server last night on ticket:535#comment:22.

I have spent some time today looking at the Munin stats and it looks to me that for twese two servers the ZFS network file server is somewhat slower that the directly attached disks that the servers were using.

wiki:ParrotServer disk latency:


wiki:PenguinServer disk latency:


It will be worth keeping an eye on these stats over the next week or so.

comment:2 Changed 3 years ago by ed

so - the move has slowed things down?
If the move has slowed things down, we measure, then move back?

comment:3 Changed 3 years ago by sam

Hi I'm quite keen to close any tickets that we can just to get a better idea of what is a live issue. Can I close this one? Any concerns?

Thanks

Sam

Changed 3 years ago by chris

Parrot Swap

Changed 3 years ago by chris

Parrot Root

Changed 3 years ago by chris

Parrot Home

Changed 3 years ago by chris

Penguin Swap

Changed 3 years ago by chris

Penguin Root

comment:4 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.3
  • Total Hours changed from 0.25 to 0.55

We (Webarchitects) have been somewhat concerned about the latency of the ZFS file system, last night we moved the swap partitions for all the virtual servers to local disks rather than the networked ZFS file system in the hope that this would improve performance.

On wiki:ParrotServer we have:

  • /dev/xvda1 swap
  • /dev/xvda2 /
  • /dev/xvda3 /home

Parrot Swap

Parrot Home

Parrot Home

On wiki:PenguinServer we have:

  • /dev/xvda1 swap
  • /dev/xvda2 /

Penguin Swap

Penguin Root

Tomorrow we (Webarchitects) intend to replace a drive in the ZFS server which has been generating some SMART errors in the hope that this disk is the cause of the latency we have been seeing.

I'm afraid I think we probably need to keep this ticket open for now.

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

comment:5 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.48
  • Priority changed from major to trivial
  • Total Hours changed from 0.55 to 1.03

We have made a massive breakthrough on this issue, look at the following graphs from today for wiki:ParrotServer:





wiki:PenguinServer:




and wiki:PuffinServer:





This very dramatic change is a result of disabling the ZIL by running this on the NFS/ZFS zvol:

zfs set sync=disable zroot

See SOLVED: Performance Issues With FreeBSD ZFS Backed ESXi Storage Over NFS for more details.

So there is finally an answer to Ed's question from some months back:

Replying to ed:

so - the move has slowed things down?
If the move has slowed things down, we measure, then move back?

The NFS/ZFS server is now probably faster than the directly attached disks, it might be worth revisiting this ticket in a few weeks to look at the annual Munin graphs to see how things are looking before closing it.

The time I have recorded with this comment is simply the time taken to upload the images and post the comment, it doesn't include any of the time spend on this issue, which included getting (and reading and deleting) an email every 5 mins for each server when the disk IO was over 1 second (countless emails over the last six months or so).

Last edited 3 years ago by chris (previous) (diff)

comment:6 Changed 3 years ago by chris

  • Status changed from new to closed
  • Resolution set to fixed

This is now sorted, I have also added a note on ticket:593#comment:12 for future reference.

Note: See TracTickets for help on using tickets.