Context Navigation

← Previous Ticket
Next Ticket →

Ticket #531 (closed maintenance: fixed)

Opened 4 years ago

Last modified 2 years ago

Disk usage on puffin

Reported by:	chris	Owned by:	chris
Priority:	critical	Milestone:	Maintenance
Component:	Live server	Keywords:
Cc:	ed	Estimated Number of Hours:	1.0
Add Hours to Ticket:	0	Billable?:	yes
Total Hours:	1.5

Description

The disk usage on puffing is currently at 85% and it's been going up at around 5% a week, see:

https://penguin.transitionnetwork.org/munin/transitionnetwork.org/puffin.transitionnetwork.org/df.html

This will become a critical issue in a couple of weeks, it would be good to find and address the cause before then.

Change History

comment:1 Changed 4 years ago by ed

Priority changed from major to critical

raising to critical - as it's critical

comment:2 Changed 4 years ago by chris

Add Hours to Ticket changed from 0.0 to 1.0
Status changed from new to closed
Resolution set to fixed
Total Hours changed from 0.0 to 1.0

It's 65GB of metche backups in /var/lib/metche, this was raised as a bug in 2006:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=360371

And it's still unresolved:

https://labs.riseup.net/code/issues/2977

There are 66K backups:

cd /var/lib/metche
ls | wc -l
66405

60K of them are more than 10 days old:

find /var/lib/metche -atime +10 -type f | wc -l
60157

So I deleted those:

find /var/lib/metche -atime +10 -type f -delete

That left 6GB, still a lot, so I created a cronjob which deleted files older than a day and set it to run once a day and I added this to my crontab as the root one is overwritten via BOA upgrades.

But after doing all this I realised that there were some symlinks pointing to files which I had just deleted, so I wrote a script, wiki:MetcheCleanScript to only delete the things we don't want and it is now running via cron on wiki:PuffinServer.

comment:3 Changed 4 years ago by jim

Status changed from closed to reopened
Resolution fixed deleted

It might be the Aegir auto site backups in /data/disk/arch.
On 10 Apr 2013 14:12, "Transiton Technology Trac" <
trac@tech.transitionnetwork.org> wrote:

> #531: Disk usage on puffin
> -----------------------------------------+-----------------------------
>                  Reporter:  chris        |                Owner:  chris
>                      Type:  maintenance  |               Status:  new
>                  Priority:  major        |            Milestone:
>                 Component:  Live server  |             Keywords:
> Estimated Number of Hours:  1            |  Add Hours to Ticket:  0
>                 Billable?:  1            |          Total Hours:  0
> -----------------------------------------+-----------------------------
>  The disk usage on puffing is currently at 85% and it's been going up at
>  around 5% a week, see:
>
>
> https://penguin.transitionnetwork.org/munin/transitionnetwork.org/puffin.transitionnetwork.org/df.html
>
>  This will become a critical issue in a couple of weeks, it would be good
>  to find and address the cause before then.
>
> --
> Ticket URL: <https://tech.transitionnetwork.org/trac/ticket/531>
> Transition Technology <https://tech.transitionnetwork.org/trac>
> Support and issues tracking for the Transition Network Web Project.
>

comment:4 Changed 4 years ago by chris

Status changed from reopened to closed
Resolution set to fixed

comment:5 Changed 4 years ago by chris

Milestone set to Maintenance

comment:6 Changed 2 years ago by chris

Cc jim removed
Add Hours to Ticket changed from 0.0 to 0.5
Total Hours changed from 1.0 to 1.5

This same issue arose on wiki:PenguinServer two days ago, I started getting these emails from Munin:

From: munin@penguin.webarch.net
Date: Tue, 01 Jul 2014 19:50:13 +0100
Subject: penguin.transitionnetwork.org Munin Alert

transitionnetwork.org :: penguin.transitionnetwork.org :: Disk usage in percent
        WARNINGs: / is 92.07 (outside range [:92]), / is 92.07 (outside range [:92]).
        OKs: /run/shm is 0.00, /run is 0.04, /dev is 0.00, /run/lock is 0.00.

After doing some searching for the cause of the disk usage I found it was the backups in /var/lib/metche so these were deleted. There is now lots of free space:

df -h
Filesystem      Size  Used Avail Use% Mounted on
rootfs           40G   19G   19G  51% /

I have now added the wiki:MetcheCleanScript to wiki:PenguinServer and set it to run via cron:

21      11      *       *       *       /usr/local/bin/metche-clean -d

And documented this on the wiki, wiki:PenguinServer#Metche

The time recorded reflects the 15 mins spent on 1st July, which wasn't documented at the time, and the 15 mins I have just spent installing the script, crontab and documenting what has been done.

Note: See TracTickets for help on using tickets.

Download in other formats: