Ticket #531 (closed maintenance: fixed)

Opened 4 years ago

Last modified 2 years ago

Disk usage on puffin

Reported by: chris Owned by: chris
Priority: critical Milestone: Maintenance
Component: Live server Keywords:
Cc: ed Estimated Number of Hours: 1.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 1.5

Description

The disk usage on puffing is currently at 85% and it's been going up at around 5% a week, see:

https://penguin.transitionnetwork.org/munin/transitionnetwork.org/puffin.transitionnetwork.org/df.html

This will become a critical issue in a couple of weeks, it would be good to find and address the cause before then.

Change History

comment:1 Changed 4 years ago by ed

  • Priority changed from major to critical

raising to critical - as it's critical

comment:2 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Status changed from new to closed
  • Resolution set to fixed
  • Total Hours changed from 0.0 to 1.0

It's 65GB of metche backups in /var/lib/metche, this was raised as a bug in 2006:

And it's still unresolved:

There are 66K backups:

cd /var/lib/metche
ls | wc -l
66405

60K of them are more than 10 days old:

find /var/lib/metche -atime +10 -type f | wc -l
60157

So I deleted those:

find /var/lib/metche -atime +10 -type f -delete

That left 6GB, still a lot, so I created a cronjob which deleted files older than a day and set it to run once a day and I added this to my crontab as the root one is overwritten via BOA upgrades.

But after doing all this I realised that there were some symlinks pointing to files which I had just deleted, so I wrote a script, wiki:MetcheCleanScript to only delete the things we don't want and it is now running via cron on wiki:PuffinServer.

comment:3 Changed 4 years ago by jim

  • Status changed from closed to reopened
  • Resolution fixed deleted
It might be the Aegir auto site backups in /data/disk/arch.
On 10 Apr 2013 14:12, "Transiton Technology Trac" <
trac@tech.transitionnetwork.org> wrote:

> #531: Disk usage on puffin
> -----------------------------------------+-----------------------------
>                  Reporter:  chris        |                Owner:  chris
>                      Type:  maintenance  |               Status:  new
>                  Priority:  major        |            Milestone:
>                 Component:  Live server  |             Keywords:
> Estimated Number of Hours:  1            |  Add Hours to Ticket:  0
>                 Billable?:  1            |          Total Hours:  0
> -----------------------------------------+-----------------------------
>  The disk usage on puffing is currently at 85% and it's been going up at
>  around 5% a week, see:
>
>
> https://penguin.transitionnetwork.org/munin/transitionnetwork.org/puffin.transitionnetwork.org/df.html
>
>  This will become a critical issue in a couple of weeks, it would be good
>  to find and address the cause before then.
>
> --
> Ticket URL: <https://tech.transitionnetwork.org/trac/ticket/531>
> Transition Technology <https://tech.transitionnetwork.org/trac>
> Support and issues tracking for the Transition Network Web Project.
>

comment:4 Changed 4 years ago by chris

  • Status changed from reopened to closed
  • Resolution set to fixed

comment:5 Changed 4 years ago by chris

  • Milestone set to Maintenance

comment:6 Changed 2 years ago by chris

  • Cc jim removed
  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 1.0 to 1.5

This same issue arose on wiki:PenguinServer two days ago, I started getting these emails from Munin:

From: munin@penguin.webarch.net
Date: Tue, 01 Jul 2014 19:50:13 +0100
Subject: penguin.transitionnetwork.org Munin Alert

transitionnetwork.org :: penguin.transitionnetwork.org :: Disk usage in percent
        WARNINGs: / is 92.07 (outside range [:92]), / is 92.07 (outside range [:92]).
        OKs: /run/shm is 0.00, /run is 0.04, /dev is 0.00, /run/lock is 0.00.

After doing some searching for the cause of the disk usage I found it was the backups in /var/lib/metche so these were deleted. There is now lots of free space:

df -h
Filesystem      Size  Used Avail Use% Mounted on
rootfs           40G   19G   19G  51% /

I have now added the wiki:MetcheCleanScript to wiki:PenguinServer and set it to run via cron:

21      11      *       *       *       /usr/local/bin/metche-clean -d 

And documented this on the wiki, wiki:PenguinServer#Metche

The time recorded reflects the 15 mins spent on 1st July, which wasn't documented at the time, and the 15 mins I have just spent installing the script, crontab and documenting what has been done.

Note: See TracTickets for help on using tickets.