<?xml version="1.0"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>Transition Technology: Ticket #610: Aegir database intensive (migrate, clone, restore) tasks hang for larger sites</title>
    <link>http://localhost:8080/trac/ticket/610</link>
    <description>&lt;p&gt;
Large sites (TN.org and variants) will simply not complete their migrate, clone or restore tasks in Aegir.
&lt;/p&gt;
&lt;p&gt;
However, smaller sites are fine, and all tasks work for them.
&lt;/p&gt;
&lt;p&gt;
The process largely completes -- codebase installs, database is cloned, symlinks for sites aliases and files created... BUT the process never completes in Aegir, so the final steps of switching a site's served location never occurs.
&lt;/p&gt;
&lt;p&gt;
Useful links/comments in this issue:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/610#comment:30"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;Tests of Aegir commands&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a class="ext-link" href="https://drupal.org/node/984256"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://drupal.org/node/984256&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a class="ext-link" href="https://omega8.cc/aegir-task-fails-or-spins-forever-126"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;How to fix: Aegir task fails or spins forever&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;</description>
    <language>en-us</language>
    <image>
      <title>Transition Technology</title>
      <url>/trac/chrome/site/TransitionNetwork-Logo-Web-Small.jpg</url>
      <link>http://localhost:8080/trac/ticket/610</link>
    </image>
    <generator>Trac 0.12.5</generator>
    <item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Tue, 15 Oct 2013 10:25:58 GMT</pubDate>
      <title>hours, priority, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:1</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:1</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;priority&lt;/strong&gt;
                changed from &lt;em&gt;critical&lt;/em&gt; to &lt;em&gt;blocker&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Yeah the OA site is doing it too -- can't migrate to the new OpenAtrium platform either.
&lt;/p&gt;
&lt;p&gt;
Looking into it... This now blocks &lt;a class="assigned ticket" href="http://localhost:8080/trac/ticket/582" title="maintenance: TN.org platform and sites (assigned)"&gt;#582&lt;/a&gt; and &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/560" title="enhancement: Install drupal-based project management system onto our servers (closed: fixed)"&gt;#560&lt;/a&gt;.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Tue, 15 Oct 2013 10:26:38 GMT</pubDate>
      <title>summary changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:2</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:2</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;summary&lt;/strong&gt;
                changed from &lt;em&gt;STG won't migrate - hangs&lt;/em&gt; to &lt;em&gt;Aegir migrate tasks hang on Drushrc load&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>chris</dc:creator>

      <pubDate>Tue, 15 Oct 2013 13:31:14 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:3</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:3</guid>
      <description>
        &lt;p&gt;
Replying to &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/610" title="defect: Aegir database intensive (migrate, clone, restore) tasks hang for larger ... (closed: fixed)"&gt;jim&lt;/a&gt;:
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;p&gt;
Per &lt;a class="ext-link" href="http://drupalcode.org/project/octopus.git/blob/HEAD:/docs/UPGRADE.txt"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;the docs&lt;/a&gt;, the command is:
&lt;/p&gt;
&lt;pre class="wiki"&gt;octopus up-stable o1
&lt;/pre&gt;&lt;p&gt;
I'll do this tonight at 9pm.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
I probably won't be around then but there is also the outstanding issue of removing New Relic, &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/568" title="defect: TC blog: can't *not* send notifications (closed: fixed)"&gt;ticket:568&lt;/a&gt; -- it might be worth removing that at the same time?
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Tue, 15 Oct 2013 14:56:50 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:4</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:4</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.05&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;0.1&lt;/em&gt; to &lt;em&gt;0.15&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Good idea. I'll remove NR too... This time I won't re-run the Barracuda update as that didn't seem to affect it. I'll simply visit the files changed and comment out, then killall newrelic services.
&lt;/p&gt;
&lt;p&gt;
The files changed based on
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/586#comment:27"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;The PHP ini files&lt;/a&gt; we found before;
&lt;/li&gt;&lt;li&gt;&lt;a class="ext-link" href="https://github.com/omega8cc/nginx-for-drupal/search?q=newrelic&amp;amp;ref=cmdform"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;A GitHub search of the BOA codebase&lt;/a&gt; to mop up.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
Once that is completed and nginx and php53-fpm are restarted, we're all hopefully good. I'll go on to update Octopus after that.
&lt;/p&gt;
&lt;p&gt;
Shout if you have any other thoughts or issues.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Tue, 15 Oct 2013 14:57:21 GMT</pubDate>
      <title>description changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:5</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:5</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;description&lt;/strong&gt;
              modified (&lt;a href="/trac/ticket/610?action=diff&amp;amp;version=5"&gt;diff&lt;/a&gt;)
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
And (just noticed) it'll be  &lt;tt&gt;octopus up-stable tn&lt;/tt&gt; (not o1).
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Tue, 15 Oct 2013 19:51:20 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:6</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:6</guid>
      <description>
        &lt;p&gt;
Ok,  &lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/586#comment:35"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;NR has been removed&lt;/a&gt; so I'll run the octopus update...
&lt;/p&gt;
&lt;p&gt;
As a precursor I've also told Octopus to always install the new Open Atrium D7 (2.x) branch -- this is the same one that we want for &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/560" title="enhancement: Install drupal-based project management system onto our servers (closed: fixed)"&gt;#560&lt;/a&gt; and it's recently been added. Plus, the update is set to 'autopilot'  now... changes to .tn.octopus.cnf are:
&lt;/p&gt;
&lt;pre class="wiki"&gt;_PLATFORMS_LIST="D7P OA7"
_AUTOPILOT=YES
&lt;/pre&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Tue, 15 Oct 2013 20:10:18 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:7</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:7</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.05&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;0.15&lt;/em&gt; to &lt;em&gt;0.2&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Octopus updated... will test later on.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Tue, 15 Oct 2013 20:10:55 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:8</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:8</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;0.2&lt;/em&gt; to &lt;em&gt;0.3&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
(Missed a little time)
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Tue, 15 Oct 2013 20:14:39 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:9</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:9</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.05&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;0.3&lt;/em&gt; to &lt;em&gt;0.35&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
No change... the command hangs on drushrc.php load as before...
&lt;/p&gt;
&lt;pre class="wiki"&gt;tn@puffin:~$ drush @stg.transitionnetwork.org provision-migrate '@platform_TransitionNetworkD6S006' --backend --debug -v
Bootstrap to phase 0. [0.03 sec, 2.87 MB]                            [bootstrap]
Drush bootstrap phase : _drush_bootstrap_drush() [0.04 sec, 3.09 MB] [bootstrap]
Load alias @stg.transitionnetwork.org [0.04 sec, 3.1 MB]                                                                       [notice]
Loading drushrc "/data/disk/tn/static/transition-network-d6-s002/sites/stg.transitionnetwork.org/drushrc.php" into "site"   [bootstrap]
scope. [0.04 sec, 3.11 MB]
--- HANGS ---
&lt;/pre&gt;&lt;p&gt;
So now we're into debug land.
&lt;/p&gt;
&lt;p&gt;
This issue is strange since it was all working just a couple of months back. I'll see if there's a way to rebuild/clean the &lt;tt&gt;drushrc.php&lt;/tt&gt; file.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>chris</dc:creator>

      <pubDate>Tue, 15 Oct 2013 22:35:46 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:10</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:10</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.2&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;0.35&lt;/em&gt; to &lt;em&gt;0.55&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
We could try tweaking some settings in &lt;tt&gt;/etc/php5/cli/phi.ini&lt;/tt&gt;, eg allowing scripts more time to run and giving them more memory. I have looked at the cli error log, &lt;tt&gt;/var/log/php/error_log_cli_53&lt;/tt&gt;, but it is clobbered by BOA so there is nothing in it, it might be worth removing the clobbering to see if anything is recorded there.
&lt;/p&gt;
&lt;p&gt;
Also there was a load spike around the time you were working on this -- if the load goes above 3.88 while you are running drush the task will be killed by second.sh -- we should probably increase this setting, but it's not one we can set in &lt;tt&gt;/root/.barracuda.cnf&lt;/tt&gt; as far as I'm aware, see &lt;a class="wiki" href="http://localhost:8080/trac/wiki/PuffinServer#LoadSpikes"&gt;wiki:PuffinServer#LoadSpikes&lt;/a&gt; -- it needs to be set in &lt;tt&gt;/var/xdrango/second.sh&lt;/tt&gt;.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 17 Oct 2013 13:18:12 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:11</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:11</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.25&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;0.55&lt;/em&gt; to &lt;em&gt;0.8&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Having followed (&lt;tt&gt;tail -f&lt;/tt&gt;) those files whilst I ran the command, nothing gets put in there. And there's no memory issue...
&lt;/p&gt;
&lt;p&gt;
So I re-ran the command with the maximum debugging output (&lt;tt&gt;drush @stg.transitionnetwork.org provision-migrate '@platform_TransitionNetworkD6S006' --backend --debug -vvvv&lt;/tt&gt;), and that generated a HUGE amount of JSON data. Having scanned this, the following entry looks to be the cause:
&lt;/p&gt;
&lt;pre class="wiki"&gt;"PROVISION_BACKUP_EXTRACTION_FAILED":["Failed to extract the contents of \/data\/disk\/tn\/backups\/stg.transitionnetwork.org-20131017.123311.tar.gz to \/data\/disk\/tn\/static\/transition-network-d6-s006\/sites\/stg.transitionnetwork.org.restore (The target directory could not be written to)"]
&lt;/pre&gt;&lt;p&gt;
So this looks to be a simple folder perms/owner issue. I'll take a gander now.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>chris</dc:creator>

      <pubDate>Sun, 20 Oct 2013 15:48:42 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:12</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:12</guid>
      <description>
        &lt;p&gt;
The problems with these tasks hanging &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/555#comment:111" title="maintenance: Load spikes causing the TN site to be stopped for 15 min at a time (closed: fixed)"&gt;coincides with loads&lt;/a&gt;, just after 10pm, which were above the &lt;a class="wiki" href="http://localhost:8080/trac/wiki/PuffinServer#LoadSpikes"&gt;18.88 threshold&lt;/a&gt; which causes php and drush tasks to be killed.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Sun, 20 Oct 2013 16:18:57 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:13</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:13</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.2&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;0.8&lt;/em&gt; to &lt;em&gt;1.0&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Odd.  I wasn't working at 10-11pm, I was working at 3-5pm. So the work I did wasn't directly the cause at the time... And the 'Hang' isn't doing any processing...
&lt;/p&gt;
&lt;p&gt;
What &lt;em&gt;is&lt;/em&gt; happening (as far as I can tell so far) is:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;I start a migrate or clone task;
&lt;/li&gt;&lt;li&gt;The drush command never completes as it fails after making the DB backup in its prep tasks.
&lt;/li&gt;&lt;li&gt;Actually behind the scenes it's stopped (as far as I can tell) due to the permission issue I mentioned a few comments back.
&lt;/li&gt;&lt;li&gt;However, Aegir then never gets told that the task failed for some reason, which leaves a task in a 'Processing' state, even though it's actually dead.
&lt;/li&gt;&lt;li&gt;What MIGHT coincide with the times Chris mentions, is when I go into the Aegir UI, go to the task node that's just sitting there, confused, and manually delete it... This is recommended on &lt;a class="ext-link" href="http://community.aegirproject.org/node/321"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;How to fix a stuck task (Aegir docs)&lt;/a&gt;. I wonder if when this task node is deleted it then runs off and finishes up some stuff, hence the load spike.... Hmm.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
So this needs investigation, and, as Chris says, I'll keep an eye on &lt;tt&gt;uptime&lt;/tt&gt; and &lt;tt&gt;top&lt;/tt&gt; when I work.
&lt;/p&gt;
&lt;p&gt;
I think fundamentally there's a file/directory perms/owner issue causing the actual hang -- and that's what I'll work to fix next week when I have some time, and again I'll also watch the load too.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>chris</dc:creator>

      <pubDate>Sun, 20 Oct 2013 17:08:49 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:14</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:14</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;1.0&lt;/em&gt; to &lt;em&gt;1.1&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Replying to &lt;a href="http://localhost:8080/trac/ticket/610#comment:13" title="Comment 13 for Ticket #610"&gt;jim&lt;/a&gt;:
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;p&gt;
Odd.  I wasn't working at 10-11pm, I was working at 3-5pm.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
Ah, I was also getting the 24 hour clock mixed up -- the comments here were posted between:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/timeline?from=2013-10-15T11%3A25%3A58%2B01%3A00&amp;amp;precision=second"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://tech.transitionnetwork.org/trac/timeline?from=2013-10-15T11%3A25%3A58%2B01%3A00&amp;amp;precision=second&lt;/a&gt; 11:25
&lt;/li&gt;&lt;li&gt;&lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/timeline?from=2013-10-15T21%3A14%3A39%2B01%3A00&amp;amp;precision=second"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://tech.transitionnetwork.org/trac/timeline?from=2013-10-15T21%3A14%3A39%2B01%3A00&amp;amp;precision=second&lt;/a&gt; 21:14
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
I read "11:25" as "23:25", do'h.
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;p&gt;
What &lt;em&gt;is&lt;/em&gt; happening (as far as I can tell so far) is:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;I start a migrate or clone task;
&lt;/li&gt;&lt;li&gt;The drush command never completes as it fails after making the DB backup in its prep tasks.
&lt;/li&gt;&lt;li&gt;Actually behind the scenes it's stopped (as far as I can tell) due to the permission issue I mentioned a few comments back.
&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;
&lt;p&gt;
OK, we need to track that down.
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;ul&gt;&lt;li&gt;However, Aegir then never gets told that the task failed for some reason, which leaves a task in a 'Processing' state, even though it's actually dead.
&lt;/li&gt;&lt;li&gt;What MIGHT coincide with the times Chris mentions, is when I go into the Aegir UI, go to the task node that's just sitting there, confused, and manually delete it... This is recommended on &lt;a class="ext-link" href="http://community.aegirproject.org/node/321"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;How to fix a stuck task (Aegir docs)&lt;/a&gt;. I wonder if when this task node is deleted it then runs off and finishes up some stuff, hence the load spike.... Hmm.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
So this needs investigation, and, as Chris says, I'll keep an eye on &lt;tt&gt;uptime&lt;/tt&gt; and &lt;tt&gt;top&lt;/tt&gt; when I work.
&lt;/p&gt;
&lt;p&gt;
I think fundamentally there's a file/directory perms/owner issue causing the actual hang -- and that's what I'll work to fix next week when I have some time, and again I'll also watch the load too.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
Cool, thanks.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 12:47:06 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:15</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:15</guid>
      <description>
        &lt;p&gt;
I'm working on the basis of the tips/issue here: &lt;a class="ext-link" href="https://drupal.org/node/1279860"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://drupal.org/node/1279860&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
And I can avoid the high load by not testing with the migration of such a huge site...
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 13:31:14 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:16</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:16</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.15&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;1.1&lt;/em&gt; to &lt;em&gt;1.25&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Test 1: PASS - Aegir created a site no problem on its stock D7 platform.
&lt;/p&gt;
&lt;p&gt;
Will now try same on non-stock platform.
&lt;/p&gt;
&lt;p&gt;
Have also had a look around but not seen any general permission/owner issues as yet - will get specific next.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 13:34:56 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:17</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:17</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.05&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;1.25&lt;/em&gt; to &lt;em&gt;1.3&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Test 2: PASS - site made new STG platform. Will now attempt to migrate it.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 13:40:13 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:18</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:18</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.5&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;1.3&lt;/em&gt; to &lt;em&gt;1.8&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Test 3: PASS - new site migrated from s006 to s002 platform Ok... now back the other way.
&lt;/p&gt;
&lt;p&gt;
This is good, it means aegir is fine, but the perms issue is specific to the site's files, rather than platform or system.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 13:44:55 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:19</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:19</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.05&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;1.8&lt;/em&gt; to &lt;em&gt;1.85&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Confirmed, it's an issue with the sites made so far - gotta run now but will continue at 5pm and fix this shit.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 18:27:58 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:20</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:20</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;1.85&lt;/em&gt; to &lt;em&gt;1.95&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Aha, but migrating the test site to the new (made by me) Open Atrium site hangs. I'll try sending the site back to S002 to rule out S006 being the cause.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 19:57:04 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:21</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:21</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;1.95&lt;/em&gt; to &lt;em&gt;2.05&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Getting somewhere now... Deteted the old spaces/OA sites/platforms. Am building a new one based on processes in &lt;a class="ext-link" href="https://github.com/omega8cc/nginx-for-drupal/blob/ecae6a98e70b17fc7421ac1b228f15f3b1217364/aegir/scripts/AegirSetupC.sh.txt"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://github.com/omega8cc/nginx-for-drupal/blob/ecae6a98e70b17fc7421ac1b228f15f3b1217364/aegir/scripts/AegirSetupC.sh.txt&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
Why?
&lt;/p&gt;
&lt;p&gt;
Because in a moment of perfect timing, Drupal.org is down for Halloween whilst they upgrade to D7!
&lt;/p&gt;
&lt;p&gt;
So it's a manual job, though the steps should only take 10 mins or so...
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 20:55:58 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:22</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:22</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.75&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;2.05&lt;/em&gt; to &lt;em&gt;2.8&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Steps taken (damn you Drupal.org!) based on the &lt;a class="ext-link" href="https://github.com/omega8cc/nginx-for-drupal/blob/ecae6a98e70b17fc7421ac1b228f15f3b1217364/aegir/scripts/AegirSetupC.sh.txt"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;BOA script&lt;/a&gt;:
&lt;/p&gt;
&lt;pre class="wiki"&gt;# Get, unzip OA
wget http://ftp.drupal.org/files/projects/openatrium-7.x-2.0-core.tar.gz
tar -xzf openatrium-7.x-2.0-core.tar.gz
# Manually do prepare_drupal7_core()
wget "http://files.aegir.cc/dev/drupal-7.23.3.tar.gz"
tar -xzf drupal-7.23.3.tar.gz
cd drupal-7.23.3/
# .. fix_dirs_files()
rm -f ./*.txt
rm -f ./modules/*.txt
rm -f ./themes/*.txt
rm -f -r ./modules/cookie_cache_bypass
mkdir -p ./sites/default/files
mkdir -p ./cache/{normal,perm}
chmod -R 777 ./cache
cp -af ./sites/default/default.settings.php ./sites/default/settings.php
chmod a+rw ./sites/default/settings.php
chmod a+rwx ./sites/default/files
mkdir -p ./profiles
mkdir -p ./sites/all/{modules,libraries,themes}
rm -f ./core/modules/*.txt
rm -f ./core/themes/*.txt
rm -f ./modules/*.txt
rm -f ./themes/*.txt
rm -f ./sites/all/*.txt
echo empty &amp;gt; ./profiles/EMPTY.txt
echo empty &amp;gt; ./sites/all/EMPTY.txt
echo empty &amp;gt; ./sites/all/modules/EMPTY.txt
echo empty &amp;gt; ./sites/all/libraries/EMPTY.txt
echo empty &amp;gt; ./sites/all/themes/EMPTY.txt
chmod 02775 ./profiles &amp;amp;&amp;gt; /dev/null
chmod 0751 ./sites
chmod 0751 ./sites/all
chmod 02775 ./sites/all/{modules,libraries,themes}
cp /data/disk/tn/distro/002/drupal-7.22.1-prod/.htaccess .
cp /data/disk/tn/distro/002/drupal-7.22.1-prod/crossdomain.xml .
# Copy D7 stuff into OA folder
cd ..
cp -af drupal-7.23.3/* openatrium-7.x-2.0/
# Manually do some of nocore_d7_dist_clean()
cd openatrium-7.x-2.0/
rm -f *.txt
rm -f web.config
# Do remove_default_core_seven_profiles()
rm -f -r profiles/minimal
rm -f -r profiles/standard
rm -f -r profiles/testing
# Do upgrade_contrib_less()
cd profiles/openatrium/modules/contrib
rm -R context
wget -q -U iCab http://files.aegir.cc/dev/contrib/context-7.x-3.1.tar.gz
tar -xzf context-7.x-3.1.tar.gz
rm -R entity
wget -q -U iCab http://files.aegir.cc/dev/contrib/entity-7.x-1.2.tar.gz
tar -xzf entity-7.x-1.2.tar.gz
rm *.tar.gz
# http://drupal.org/node/1766338#comment-6445882
curl -s -O -A iCab http://files.aegir.cc/dev/patches/views-revert-broken-filter-or-groups-1766338-7.patch
patch -p1 &amp;lt; views-revert-broken-filter-or-groups-1766338-7.patch
# BOA Apps patch
curl -s -O -A iCab  https://raw.github.com/omega8cc/nginx-for-drupal/ecae6a98e70b17fc7421ac1b228f15f3b1217364/aegir/patches/apps_msg.patch
patch -p1 &amp;lt; apps_msg.patch
# Finally make owners tn:users, www-data for key files/dirs
cd ../../../../..
chown -R tn:users *
chown -R tn:www-data cache/
chown -R tn:www-data sites/default/{local.settings.php,settings.php,civicrm.settings.php}
&lt;/pre&gt;&lt;p&gt;
PHEW! It's clearly much better when Octopus does this, but two issues: a) Drupal.org is down and b) this is all HEAD stuff, and we want the best OA install with all the trimmings NOW, not when BOA gets to v2.0.10.
&lt;/p&gt;
&lt;p&gt;
Now to add the platform and install OA 2.0.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 21:26:05 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:23</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:23</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.3&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;2.8&lt;/em&gt; to &lt;em&gt;3.1&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
OK so it installed ok... and works.
&lt;/p&gt;
&lt;p&gt;
But the Aegir 'install' task hasn't completed... Not sure where it got lost, but it has somewhere. Could be the UI never got told Drush completed it.
&lt;/p&gt;
&lt;p&gt;
Am playing now, will then create Ed's account and set up SSL.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 31 Oct 2013 21:50:19 GMT</pubDate>
      <title>hours, priority, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:24</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:24</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.3&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;priority&lt;/strong&gt;
                changed from &lt;em&gt;blocker&lt;/em&gt; to &lt;em&gt;critical&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;3.1&lt;/em&gt; to &lt;em&gt;3.4&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Hmm... had to delete the install task and re-verify the space site, then 'enable' it in Aegir... now seems fine.
&lt;/p&gt;
&lt;p&gt;
So something is still up, but I'm not sure if its my stuff or Aegir/BOA. This is because everything so far has worked when using stock stuff, and not when it's my/our custom platforms.
&lt;/p&gt;
&lt;p&gt;
I note too that the stg2 site I made a few weeks back works fine. This again means it looks like the commands are all working (when there's no perms issue) but Drush is timing out and so the Aegir UI (TN hostmaster site) never gets told that the job is a goodun...
&lt;/p&gt;
&lt;p&gt;
So from here we're in semi good shape... but for this likely timeout issue. I've been looking at issues and found this interesting link (Google Cached since D.org is being updated still):&lt;a class="ext-link" href="http://webcache.googleusercontent.com/search?q=cache:HFfqD-t4cdwJ:https://drupal.org/node/984256+&amp;amp;cd=3&amp;amp;hl=en&amp;amp;ct=clnk&amp;amp;gl=uk"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;Successful site install appears to hang in front-end&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
I'll downgrade this ticket and proceed on the basis that this is a timeout issue.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>ed</dc:creator>

      <pubDate>Mon, 04 Nov 2013 11:06:01 GMT</pubDate>
      <title>milestone set</title>
      <link>http://localhost:8080/trac/ticket/610#comment:25</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:25</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;milestone&lt;/strong&gt;
                set to &lt;em&gt;Maintenance&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 15 Nov 2013 12:04:55 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:26</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:26</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;3.4&lt;/em&gt; to &lt;em&gt;3.5&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
I'll wait to do continue with this ticket after &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/535" title="maintenance: Upgrade Puffin, Penguin and Parrot from Debian Squeeze to Wheezy (closed: fixed)"&gt;#535&lt;/a&gt; (Wheezy update,which covers &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/618" title="maintenance: Migrate Penguin and Parrot to the ZFS fileserver (closed: fixed)"&gt;#618&lt;/a&gt; BOA 2.1.1 update) is done. This will give us a 'clean'-ish system and some bug fixes, plus permissions tweaks built in.
&lt;/p&gt;
&lt;p&gt;
The key parts of pending &lt;a class="assigned ticket" href="http://localhost:8080/trac/ticket/582" title="maintenance: TN.org platform and sites (assigned)"&gt;#582&lt;/a&gt; (theme tweaks, gmap module &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/615" title="task: Move to GMap 6.x-2.x-dev as and get clusterer to work (closed: fixed)"&gt;#615&lt;/a&gt; and comment notifications) can be done manually if required. In fact, I'll do so for the theme tweaks and comment notifications later today.
&lt;/p&gt;
&lt;p&gt;
My hope is that the update and some newly built platforms will resolve this ticket for us with little or no extra work.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 13:44:10 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:27</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:27</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.25&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;3.5&lt;/em&gt; to &lt;em&gt;3.75&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
OK so I'm now certain this issue is not a permission problem, because a) there are no errors, and b) BOA is now fixing the perms of our platforms automatically every day.
&lt;/p&gt;
&lt;p&gt;
In a nutshell: drush does its think ok, files and databases are moved, but Aegir's tasks hang -- like they never get told about the task finishing.
&lt;/p&gt;
&lt;p&gt;
So this is strange because make platform, clone and migrate tasks are affected, but verify and others are not. Now clearly, making, cloning and migrating platforms can be rather long tasks that need to move databases etc, so I'm thinking it's a timeout.
&lt;/p&gt;
&lt;p&gt;
Based on the work on &lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/586#comment:27"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;the New Relic tidy-up&lt;/a&gt;, I think the issue could be with maximum execution settings for the CLI php instance... So I've now set &lt;tt&gt;max_execution_time&lt;/tt&gt; &lt;tt&gt;/opt/local/lib/php.ini&lt;/tt&gt; on line 444 to 7200s (10 mins) from 3600s.
&lt;/p&gt;
&lt;p&gt;
Will try another migration of STG now.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 13:52:57 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:28</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:28</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.05&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;3.75&lt;/em&gt; to &lt;em&gt;3.8&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
I also note that when Aegir gets the platform lists - or does other tasks in the UI - several of the following syslog errors appear:
&lt;/p&gt;
&lt;pre class="wiki"&gt;Jan  7 22:35:33 puffin mysqld: 140107 22:35:33  InnoDB: O_DIRECT is known to result in 'Invalid argument' on Linux on tmpfs, see MySQL Bug#26662
Jan  7 22:35:33 puffin mysqld: 140107 22:35:33  InnoDB: Failed to set O_DIRECT on file /run/shm/mysql/#sql1fb6_316b55_4.ibd: OPEN: Invalid argument, continuing anyway
&lt;/pre&gt;&lt;p&gt;
MySQL bug 26662 is &lt;a class="ext-link" href="http://bugs.mysql.com/bug.php?id=26662"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;mysqld assertion when creating temporary (InnoDB) table on a tmpfs filesystem&lt;/a&gt;. It seems the tempfs work done to enhance performance may have side effects -- though this appears to be a warning rather than a error.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>chris</dc:creator>

      <pubDate>Thu, 09 Jan 2014 14:06:41 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:29</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:29</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.25&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;3.8&lt;/em&gt; to &lt;em&gt;4.05&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Replying to &lt;a href="http://localhost:8080/trac/ticket/610#comment:28" title="Comment 28 for Ticket #610"&gt;jim&lt;/a&gt;:
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;p&gt;
It seems the tempfs work done to enhance performance may have side effects -- though this appears to be a warning rather than a error.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
I have read the thread and it seems safe to ignore this warning.
&lt;/p&gt;
&lt;p&gt;
Replying to &lt;a href="http://localhost:8080/trac/ticket/610#comment:27" title="Comment 27 for Ticket #610"&gt;jim&lt;/a&gt;:
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;p&gt;
I've now set max_execution_time /opt/local/lib/php.ini on line 444 to 7200s (10 mins) from 3600s.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
Will that file get clobbered when BOA is upgraded?
&lt;/p&gt;
&lt;p&gt;
It might be worth increasing the debugging -- there is nothing in the errorlog to indicate that there are PHP errors, see &lt;tt&gt;/var/log/php/error_log_cli_53&lt;/tt&gt;.
&lt;/p&gt;
&lt;p&gt;
Some other settings in this file that might need tweaking:
&lt;/p&gt;
&lt;pre class="wiki"&gt;max_input_time = 3600
memory_limit = 512M
post_max_size = 100M
upload_max_filesize = 100M
&lt;/pre&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 14:41:57 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:30</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:30</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.8&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;4.05&lt;/em&gt; to &lt;em&gt;4.85&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Thanks Chris re MySQL issue... Also I don't think we have a memory issue, nor a post size issue -- they would throw errors we can see... I don't think it's a PHP env issue now having done further tests. And if they get overridden by BOA updates, no probs, we don't need them.
&lt;/p&gt;
&lt;p&gt;
---
&lt;/p&gt;
&lt;p&gt;
I've just confirmed &lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/610#comment:13"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;everything I said before&lt;/a&gt;... Database gets made, files get moved, sites folder set up, but the final switchover in Aegir from old to new version of site never happens.
&lt;/p&gt;
&lt;p&gt;
Am now trying each key Aegir task in turn on stg2 to see what happens... Results:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Backup = ok
&lt;/li&gt;&lt;li&gt;Disable = ok
&lt;/li&gt;&lt;li&gt;Enable = ok
&lt;/li&gt;&lt;li&gt;Restore (backup from earlier) =  FAIL! (task deleted, stuck at &lt;tt&gt;/data/disk/tn/tools/drush/drush.php @stg2.transitionnetwork.org provision-restore '/data/disk/tn/backups/stg2.transitionnetwork.org-20140109.140706.tar.gz' --backend 2&amp;gt;&amp;amp;1&lt;/tt&gt;)
&lt;/li&gt;&lt;li&gt;Site briefly dead, simply ran 'verify' again to reinstate
&lt;/li&gt;&lt;li&gt;Verify = ok
&lt;/li&gt;&lt;li&gt;Site health check (new and quite handy!) = no issues found.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
Made new sites on each platform in question (S006 and S008):
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;stg-s006.transitionnetwork.org:
&lt;ul&gt;&lt;li&gt;site created = ok
&lt;/li&gt;&lt;li&gt;Site cloned (to own platform) = ok
&lt;/li&gt;&lt;li&gt;site migrated to S008 platform = ok
&lt;/li&gt;&lt;li&gt;Site and clone deleted = ok
&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;stg-s008.transitionnetwork.org:
&lt;ul&gt;&lt;li&gt;Site created = ok
&lt;/li&gt;&lt;li&gt;Site cloned (to own platform) = ok
&lt;/li&gt;&lt;li&gt;Site and clone deleted = ok
&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
SO THIS MEANS... all main tasks on new, clean sites on these two platforms work fine, but anything to do with clone, migrate or restore (all database heavy) on the STG2 site fails. I'm thinking this is to do with either some modules we have enabled on STG2 exploding, OR something to do with the symlinked files OR that it's database is big and there's some issue there.
&lt;/p&gt;
&lt;p&gt;
And a related issue I've found:  &lt;a class="ext-link" href="https://drupal.org/node/1392102"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://drupal.org/node/1392102&lt;/a&gt; -- tried running mysql_repair.sh last night, no errors, no changes.
&lt;/p&gt;
&lt;p&gt;
I'm now thinking its database, modules or files related... Going to walk dog now... will trial a few other options when I get back, plus clean up databases left orphaned by my testing on STG2 site.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 14:44:30 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:31</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:31</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.05&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;4.85&lt;/em&gt; to &lt;em&gt;4.9&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Spare databases for STG2 cleaned up.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 16:20:24 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:32</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:32</guid>
      <description>
        &lt;p&gt;
Having gone for a walk, it's clear that Aegir is actually working properly. BUT something about our STG2 site is wonky.
&lt;/p&gt;
&lt;p&gt;
So I'll start next by cloning a new STG from PROD to see what happens. PROD will be unaffected.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 16:51:44 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:33</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:33</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;4.9&lt;/em&gt; to &lt;em&gt;5.0&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Clone of PROD did not work - hung, as others did, on &lt;tt&gt;/data/disk/tn/tools/drush/drush.php @www.transitionnetwork.org provision-clone '@stg.transitionnetwork.org' '@platform_TransitionNetworkD6P005' --backend 2&amp;gt;&amp;amp;1&lt;/tt&gt;
&lt;/p&gt;
&lt;p&gt;
I'll now clear up the commands, databases and files, then run this manually to see what happens.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 17:20:22 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:34</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:34</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.55&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;5.0&lt;/em&gt; to &lt;em&gt;5.55&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
AHA! I just realised that if you 'verify' a platform that has sites in it that are not present in the Aegir list, they will be auto-discovered and imported.
&lt;/p&gt;
&lt;p&gt;
So I've done that on PROD platform, and imported the new STG. I'm now going to alter STG to be like STG2 and not have any chance of sending emails etc.
&lt;/p&gt;
&lt;p&gt;
1) Re-copy PROD files to STG:
&lt;/p&gt;
&lt;pre class="wiki"&gt;[as ROOT, to avoid perms issues]
cd /data/disk/tn/static/sites/
rm -R transitionnetwork.org-STG/
cp -af transitionnetwork.org-PROD/ transitionnetwork.org-STG/
&lt;/pre&gt;&lt;p&gt;
2) Wait 5 mins for it to complete, then symlink new STG files to our new site's directory:
&lt;/p&gt;
&lt;pre class="wiki"&gt;[as an aegir user]
cd ~/static/transition-network-d6-p005/sites/stg.transitionnetwork.org
rm -R files/
ln -s /data/disk/tn/static/sites/transitionnetwork.org-STG/files/ .
&lt;/pre&gt;&lt;p&gt;
3) Ensure the STG stuff happens using the local-settings.php file:
&lt;/p&gt;
&lt;pre class="wiki"&gt;[as ROOT]
cd /data/disk/tn/static/transition-network-d6-p005/sites/stg.transitionnetwork.org
cp ../../../transition-network-d6-s006/sites/stg2.transitionnetwork.org/local.settings.php .
&lt;/pre&gt;&lt;p&gt;
And the new STG is done. I'll remove the old STG2 site.
&lt;/p&gt;
&lt;p&gt;
Now to make a new STG2 from STG by running the &lt;tt&gt;drush provision-clone&lt;/tt&gt; command myself, and add -vv for extra verbosity:
&lt;/p&gt;
&lt;pre class="wiki"&gt;drush @stg.transitionnetwork.org provision-clone '@stg2.transitionnetwork.org' '@platform_TransitionNetworkD6P005' --backend -vv}}}
&lt;/pre&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 17:54:42 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:35</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:35</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.25&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;5.55&lt;/em&gt; to &lt;em&gt;5.8&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Hmmm... with &lt;tt&gt;-vvv&lt;/tt&gt; after ~5mins load of JSON got spat out, and I lost my connection to Puffin...  The JSON looks fine, no errors, though it's big at 700Kb.
&lt;/p&gt;
&lt;p&gt;
So I'll now try revalidating the platform since the DB, files and other bits all seem present and correcet.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 18:18:12 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:36</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:36</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;5.8&lt;/em&gt; to &lt;em&gt;5.9&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
That worked.
&lt;/p&gt;
&lt;p&gt;
So it's clear that Aegir is working as expected, but for the 'last gasp' of Drush telling it it's done.
&lt;/p&gt;
&lt;p&gt;
I'll try the same again with the migrate command originally run in &lt;a href="http://localhost:8080/trac/ticket/610#comment:9" title="Comment 9 for Ticket #610"&gt;comment:9&lt;/a&gt; and &lt;a href="http://localhost:8080/trac/ticket/610#comment:11" title="Comment 11 for Ticket #610"&gt;comment:11&lt;/a&gt;, but with the new platform name - like so:
&lt;/p&gt;
&lt;pre class="wiki"&gt;drush @stg2.transitionnetwork.org provision-migrate '@platform_TransitionNetworkD6S008' --backend --debug
&lt;/pre&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>chris</dc:creator>

      <pubDate>Thu, 09 Jan 2014 18:55:03 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:37</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:37</guid>
      <description>
        &lt;pre class="wiki"&gt;On Thu 09-Jan-2014 at 05:54:42PM -0000, Transiton Technology Trac wrote:
&amp;gt;  I lost my connection to Puffin...
ssh in and then start a screen session:
  screen
Then if your ssh session fails you can ssh in again and resume it:
  screen -r
We could also install mosh, the mobile shell, if you are able to install
it locally, it's very good for keeping sessions alive with the most dire
connections. It's installed on penguin already.
&lt;/pre&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 18:55:38 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:38</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:38</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.5&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;5.9&lt;/em&gt; to &lt;em&gt;6.4&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
OK so this time we have an error! About 3/4 though the JSON output we have:
&lt;/p&gt;
&lt;pre class="wiki"&gt;    "error_log": {
        "PROVISION_DROP_DB_FAILED": [
            "Unable to drop database."
        ]
    },
    "error_status": 1,
&lt;/pre&gt;&lt;p&gt;
and then a little later the actual error (see items 2 and 3 below, rest is for context):
&lt;/p&gt;
&lt;pre class="wiki"&gt;
        {
            "error": null,
            "memory": 18616968,
            "message": "Dropping database stg2transitionne",
            "timestamp": 1389292889.0647,
            "type": "notice"
        },
        {
            "error": null,
            "memory": 18619472,
            "message": "Failed to drop database stg2transitionne",
            "timestamp": 1389292889.065,
            "type": "warning"
        },
        {
            "error": "PROVISION_DROP_DB_FAILED",
            "memory": 18627632,
            "message": "Unable to drop database.",
            "timestamp": 1389292889.0661,
            "type": "error"
        },
        {
            "error": null,
            "memory": 18623680,
            "message": "Running: /usr/bin/php-cli /var/aegir/drush/drush.php --php='/usr/bin/php-cli'  --platform='@platform_TransitionNetworkD6P005' provision-save '@stg2.transitionnetwork.org' --backend  2&amp;gt;&amp;amp;1",
            "timestamp": 1389292889.0664,
            "type": "command"
        },
&lt;/pre&gt;&lt;p&gt;
And finally near then end, some "rollback" messages:
&lt;/p&gt;
&lt;pre class="wiki"&gt;
        {
            "error": null,
            "memory": 21928080,
            "message": "Peak memory usage was 14.41 MB",
            "timestamp": 1389292889.4101,
            "type": "memory"
        },
        {
            "error": null,
            "memory": 18656400,
            "message": "Changes made in drush_provision_drupal_provision_migrate have been rolled back.",
            "timestamp": 1389292889.413,
            "type": "rollback"
        },
        {
            "error": null,
            "memory": 18657952,
            "message": "Bringing site out of maintenance",
            "timestamp": 1389292889.4131,
            "type": "notice"
        },
        {
            "error": null,
            "memory": 18665776,
            "message": "Changed group ownership of &amp;lt;code&amp;gt;/data/disk/tn/static/transition-network-d6-p005/sites/stg2.transitionnetwork.org/local.settings.php&amp;lt;/code&amp;gt; to www-data",
            "timestamp": 1389292889.4148,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18667424,
            "message": "Changed permissions of &amp;lt;code&amp;gt;/data/disk/tn/static/transition-network-d6-p005/sites/stg2.transitionnetwork.org/local.settings.php&amp;lt;/code&amp;gt; to 440",
            "timestamp": 1389292889.4149,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18668576,
            "message": "Template loaded: /data/disk/tn/.drush/provision/Provision/Config/Drupal/provision_drupal_settings.tpl.php",
            "timestamp": 1389292889.415,
            "type": "notice"
        },
        {
            "error": null,
            "memory": 18675920,
            "message": "Changed permissions of /data/disk/tn/static/transition-network-d6-p005/sites/stg2.transitionnetwork.org/settings.php to 640",
            "timestamp": 1389292889.4152,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18677160,
            "message": "Generated config Drupal settings.php file",
            "timestamp": 1389292889.416,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18678800,
            "message": "Changed permissions of /data/disk/tn/static/transition-network-d6-p005/sites/stg2.transitionnetwork.org/settings.php to 440",
            "timestamp": 1389292889.4161,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18679824,
            "message": "Change group ownership of /data/disk/tn/static/transition-network-d6-p005/sites/stg2.transitionnetwork.org/settings.php to www-data",
            "timestamp": 1389292889.4165,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18674360,
            "message": "Platforms path /data/disk/tn/platforms exists.",
            "timestamp": 1389292889.4166,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18675696,
            "message": "Platforms ownership of /data/disk/tn/platforms has been changed to tn.",
            "timestamp": 1389292889.4169,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18677088,
            "message": "Platforms permissions of /data/disk/tn/platforms have been changed to 711.",
            "timestamp": 1389292889.417,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18677824,
            "message": "Platforms path /data/disk/tn/platforms is writable.",
            "timestamp": 1389292889.417,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18672560,
            "message": "Removed unused migration site package",
            "timestamp": 1389292889.4359,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18675512,
            "message": "Template loaded: /data/disk/tn/.drush/provision/http/Provision/Config/Nginx/vhost.tpl.php",
            "timestamp": 1389292889.4365,
            "type": "notice"
        },
        {
            "error": null,
            "memory": 18680320,
            "message": "Generated config virtual host configuration file",
            "timestamp": 1389292889.4374,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18678080,
            "message": "nginx on puffin.webarch.net has been restarted",
            "timestamp": 1389292889.596,
            "type": "notice"
        },
        {
            "error": null,
            "memory": 18678808,
            "message": "Changes made in drush_provision_drupal_pre_provision_migrate have been rolled back.",
            "timestamp": 1389292889.5962,
            "type": "rollback"
        },
        {
            "error": null,
            "memory": 18681224,
            "message": "Template loaded: /data/disk/tn/.drush/provision/http/Provision/Config/Nginx/vhost.tpl.php",
            "timestamp": 1389292889.5975,
            "type": "notice"
        },
        {
            "error": null,
            "memory": 18685640,
            "message": "Generated config virtual host configuration file",
            "timestamp": 1389292889.5989,
            "type": "message"
        },
        {
            "error": null,
            "memory": 18682296,
            "message": "Changes made in drush_http_pre_provision_migrate have been rolled back.",
            "timestamp": 1389292889.5992,
            "type": "rollback"
        },
&lt;/pre&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Thu, 09 Jan 2014 19:55:53 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:39</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:39</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.15&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;6.4&lt;/em&gt; to &lt;em&gt;6.55&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
OK so the verify on the S008 platform didn't add the site this time -- I expect because a site with the same name (url) is already known to Aegir and it won't do dupes. If I were to delete the STG2 site in P005 I'm sure the one in S008 would show on next verify.
&lt;/p&gt;
&lt;p&gt;
So I wonder if this is just down to database permissions... Looking into that now.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 10 Jan 2014 17:21:59 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:40</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:40</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.75&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;6.55&lt;/em&gt; to &lt;em&gt;7.3&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
So a quick review and the database permissions seem fine to me...
&lt;/p&gt;
&lt;p&gt;
I've now done a bunch of Googling and others have similar reports, but no fixes/causes as yet. There's an &lt;a class="ext-link" href="https://drupal.org/project/provision"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;Aegir Provision&lt;/a&gt; issue that seems relevant here: &lt;a class="ext-link" href="https://drupal.org/node/1517616"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;Unable to migrate remote site after 1.7 upgrade&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
I've now tried:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;deleted the site &lt;strong&gt;node&lt;/strong&gt; -- NOT SITE! -- for STG2 in P005 platfor (by using the 'edit' tab on the STG2 site, then changing the URL to end in 'delete' rather than 'edit', then confirming the detete)
&lt;/li&gt;&lt;li&gt;then re-verified the S008 platform to see if can see the STG2 it migrated there.
&lt;/li&gt;&lt;li&gt;It can! It found it and tried to import it but hung on &lt;tt&gt;Running: /data/disk/tn/tools/drush/drush.php @stg2.transitionnetwork.org provision-import --backend 2&amp;gt;&amp;amp;1&lt;/tt&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
It's now almost certainly a database issue -- either slow, permissions or timeout.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 10 Jan 2014 17:27:25 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:41</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:41</guid>
      <description>
        &lt;p&gt;
I'm now downloading a Aegir backup of STG to see what's in the files, and see what might be taking the time.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 10 Jan 2014 17:33:58 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:42</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:42</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.35&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;7.3&lt;/em&gt; to &lt;em&gt;7.65&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Ok so 51Mb for the zipped backup is pretty big... And it's pretty much all database -- that's 224Mb unzipped.
&lt;/p&gt;
&lt;p&gt;
Also I cleaned up S008 STG2 site that failed to import with:
&lt;/p&gt;
&lt;pre class="wiki"&gt;[as an Aegir user]
cd ~/static/transition-network-d6-s008/sites
rm -R stg2.transitionnetwork.org/ www.stg2.transitionnetwork.org
&lt;/pre&gt;&lt;p&gt;
Now re-running verify to see if we can get STG2 back. I'll kill it by hand if needs be (we don't really need it apart from my testing here).
&lt;/p&gt;
&lt;p&gt;
Now I'm going to try some mysql dumps and gzips by hand, to see what sort of time these things take.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 10 Jan 2014 18:26:58 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:43</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:43</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.5&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;7.65&lt;/em&gt; to &lt;em&gt;8.15&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Looking at the PROD database in [Chive], the numbers for the basic inbuilt Drupal search are scary:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;search_dataset = 118.79 MiB
&lt;/li&gt;&lt;li&gt;search_index = 55.66 MiB
&lt;/li&gt;&lt;li&gt;search_total = 5.78 MiB
&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Total = 180.23 MiB -- OR 180/224 = 80% of total database size &lt;/strong&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
This is almost certainly part of the issue...
&lt;/p&gt;
&lt;p&gt;
I'm about to do a manual DB dump of PROD to test server speed now.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 10 Jan 2014 19:38:28 GMT</pubDate>
      <title>hours, totalhours, description, summary changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:44</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:44</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.6&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;8.15&lt;/em&gt; to &lt;em&gt;8.75&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;description&lt;/strong&gt;
              modified (&lt;a href="/trac/ticket/610?action=diff&amp;amp;version=44"&gt;diff&lt;/a&gt;)
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;summary&lt;/strong&gt;
                changed from &lt;em&gt;Aegir migrate tasks hang on Drushrc load&lt;/em&gt; to &lt;em&gt;Aegir database intensive (migrate, clone, restore) tasks hang for larger sites&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Just tried migrating STG after truncating the search_* tables, but that hung... this time the DB will have been 20-25% smaller than before, weighing in around 50Mb unzipped.
&lt;/p&gt;
&lt;p&gt;
Updating the issue description with what I've found...
&lt;/p&gt;
&lt;p&gt;
My last work tonight will be to review the changes to the server in early October that has may have caused this issue -- ZFS and TempFS changes are the best candidates for perhaps causing this I'd guess.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>chris</dc:creator>

      <pubDate>Fri, 10 Jan 2014 19:55:12 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:45</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:45</guid>
      <description>
        &lt;pre class="wiki"&gt;On Fri 10-Jan-2014 at 07:38:28PM -0000, Transiton Technology Trac wrote:
&amp;gt;
&amp;gt;  My last work tonight will be to review the changes to the server in
&amp;gt;  early October that has may have caused this issue -- ZFS and TempFS
&amp;gt;  changes are the best candidates for perhaps causing this I'd guess.
Edit /etc/mysql/my.cnf to change MySQL back to using the filesystem
rather than RAM for temp files.
&lt;/pre&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 10 Jan 2014 20:29:42 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:46</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:46</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;8.75&lt;/em&gt; to &lt;em&gt;8.85&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Per Chris' suggestion, I've set the &lt;tt&gt;tmpdir&lt;/tt&gt; in &lt;tt&gt;/etc/mysql/my.cnf&lt;/tt&gt; back to &lt;tt&gt;/tmp&lt;/tt&gt; from &lt;tt&gt;/run/shm/mysql&lt;/tt&gt; -- this is just a comment/recomment lines thing...
&lt;/p&gt;
&lt;p&gt;
I waited until the load dropped below 2 I'll restarted mysql -- took 2 attmepts for some reason -- and happened at 20:26:
&lt;/p&gt;
&lt;pre class="wiki"&gt;puffin:~# service mysql restart
[ ok ] Stopping MariaDB database server: mysqld.
[ ok ] Starting MariaDB database server: mysqld . . . . ..
[info] Checking for corrupt, not cleanly closed and upgrade needing tables..
puffin:~# service mysql status
[info] MariaDB is stopped..
puffin:~# service mysql start
[ ok ] Starting MariaDB database server: mysqld . . . ..
[info] Checking for corrupt, not cleanly closed and upgrade needing tables..
puffin:~# service mysql status
[info] /usr/bin/mysqladmin  Ver 9.0 Distrib 5.5.34-MariaDB, for debian-linux-gnu on x86_64
Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
&amp;lt;snip&amp;gt;
&lt;/pre&gt;&lt;p&gt;
I'll let the caches warm up and the traffic die down a bit and try another clone of STG in a couple of hours.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 10 Jan 2014 23:20:52 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:47</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:47</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.5&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;8.85&lt;/em&gt; to &lt;em&gt;9.35&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Tried to clone the slimmer STG site with MySQL using disk caching, still no dice... Temp tables issue ruled out -- I've undone the previous config changes and have reverted back to TmpFS from disk, then restarted mysql.
&lt;/p&gt;
&lt;p&gt;
In a parallel test I've built the TN D6 platform on my server (Babylon), and have imported the fat version of the database, which took ~4.5 mins for the 213Mb DB file. I'll try cloning/migrating there too to rule out our code/DB.
&lt;/p&gt;
&lt;p&gt;
Also ran &lt;tt&gt;drush stg.transitionnetwork.org dis -y notifictions messaging piwik google_analytics mailchimp captcha mollom&lt;/tt&gt; to turn off stuff not needed/wanted in STG - did this on all copies on Puffin and Babylon, too.
&lt;/p&gt;
&lt;p&gt;
Clone on Babylon still running after 10 mins...
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 10 Jan 2014 23:32:45 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:48</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:48</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.1&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;9.35&lt;/em&gt; to &lt;em&gt;9.45&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Babylon took 15 mins to clone the site, but succeeded and the site works no probs. This rules out our database as the issue... Also the site upgrades went without hitch, will do some simple testing.
&lt;/p&gt;
&lt;p&gt;
So we're left with DB perms, ZFS, timeouts or something else as the cause of this... But it's definitely database server-related.
&lt;/p&gt;
&lt;p&gt;
I'll continue over the weekend.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Fri, 10 Jan 2014 23:34:20 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:49</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:49</guid>
      <description>
        &lt;p&gt;
FYI &lt;a class="ext-link" href="http://tn-test2.i-jk.co.uk/"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://tn-test2.i-jk.co.uk/&lt;/a&gt; is up to date with modules etc if anyone wants to test...
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Sat, 11 Jan 2014 19:04:16 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:50</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:50</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.5&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;9.45&lt;/em&gt; to &lt;em&gt;9.95&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
OK so let's try some timeout, size and other settings for php and mysql...
&lt;/p&gt;
&lt;p&gt;
In &lt;tt&gt;/opt/local/lib/php.ini&lt;/tt&gt; the following variables with ;; before them have been copied with new values around double or more what was there... Cleaned up output lists affected settings:
&lt;/p&gt;
&lt;pre class="wiki"&gt;cat /data/disk/tn/php.ini | grep ";;"
;;max_execution_time = 3600
;;max_input_time = 3600
;;post_max_size = 100M
;;upload_max_filesize = 100M
;;default_socket_timeout = 30
;;mysql.connect_timeout = 60
&lt;/pre&gt;&lt;p&gt;
A I said, the above have been doubled or in some cases increaesd more. The originals are in the file, and a copy at &lt;tt&gt;/opt/local/lib/php.ini.jk#610&lt;/tt&gt; is available.
&lt;/p&gt;
&lt;p&gt;
A quick check these values are being used by CLI PHP when logged in as the &lt;tt&gt;tn&lt;/tt&gt; Aegir user is to execute this:
&lt;/p&gt;
&lt;pre class="wiki"&gt;tn@puffin:~$ drush php-eval "echo phpinfo();"  | grep "max"
log_errors_max_len =&amp;gt; 1024 =&amp;gt; 1024
max_execution_time =&amp;gt; 0 =&amp;gt; 0
max_file_uploads =&amp;gt; 50 =&amp;gt; 50
max_input_nesting_level =&amp;gt; 64 =&amp;gt; 64
max_input_time =&amp;gt; -1 =&amp;gt; -1
max_input_vars =&amp;gt; 9999 =&amp;gt; 9999
post_max_size =&amp;gt; 300M =&amp;gt; 300M
upload_max_filesize =&amp;gt; 500M =&amp;gt; 500M
mysql.max_links =&amp;gt; Unlimited =&amp;gt; Unlimited
mysql.max_persistent =&amp;gt; Unlimited =&amp;gt; Unlimited
mysqli.max_links =&amp;gt; Unlimited =&amp;gt; Unlimited
mysqli.max_persistent =&amp;gt; Unlimited =&amp;gt; Unlimited
session.gc_maxlifetime =&amp;gt; 1440 =&amp;gt; 1440
&lt;/pre&gt;&lt;p&gt;
I note some values (max_execution_time, max_input_time etc) are set to 0 or -1, which means another file is being included that overrides these values. 0 or -1 generally means 'unlimited'.
&lt;/p&gt;
&lt;p&gt;
So the new values are in place for Drush... Now I'll take a look at what mysql timeout changes I can make with limited impact.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Sat, 11 Jan 2014 19:43:52 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:51</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:51</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.25&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;9.95&lt;/em&gt; to &lt;em&gt;10.2&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Re mysql: nothing useful in &lt;tt&gt;/etc/php5&lt;/tt&gt; or &lt;tt&gt;/data/conf&lt;/tt&gt; (overrides etc), so will assume for now it's only &lt;tt&gt;/etc/mysql&lt;/tt&gt; that matters, then also look in &lt;tt&gt;/var/aegir&lt;/tt&gt; where BOA keeps a few core things.
&lt;/p&gt;
&lt;pre class="wiki"&gt;puffin:/etc/mysql# grep time /etc/mysql/my.cnf
connect_timeout         = 60
#wait_timeout            = 3600
wait_timeout            = 120
long_query_time         = 5
innodb_lock_wait_timeout = 120
interactive-timeout
&lt;/pre&gt;&lt;p&gt;
Looks like &lt;tt&gt;wait_timeout&lt;/tt&gt; has been altered before... But according to &lt;a class="ext-link" href="https://github.com/omega8cc/nginx-for-drupal/search?q=wait_timeout&amp;amp;ref=cmdform"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;a search of the BOA source code&lt;/a&gt; the correct value is &lt;strong&gt;3600&lt;/strong&gt;, not 120 as presently set... So I've set this value back to the default -- could be important!
&lt;/p&gt;
&lt;p&gt;
Load is low, so restarting MySQL. The PHP settings changes in previous comment are instantly effective on CLI mode, so no FPM restarts needed.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Sat, 11 Jan 2014 19:51:24 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:52</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:52</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.15&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;10.2&lt;/em&gt; to &lt;em&gt;10.35&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Old STG2 and STG3 testing sites directories and databases removed, now will try a clone of STG...
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Sat, 11 Jan 2014 20:11:19 GMT</pubDate>
      <title>priority changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:53</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:53</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;priority&lt;/strong&gt;
                changed from &lt;em&gt;critical&lt;/em&gt; to &lt;em&gt;minor&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;h2 id="ITWORKED"&gt;IT WORKED!!!!&lt;/h2&gt;
&lt;p&gt;
WOOP!
&lt;/p&gt;
&lt;p&gt;
So it's almost certainly the mysql &lt;tt&gt;wait_timeout&lt;/tt&gt; value was far too low for big DB operations... A search of Trac shows &lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/555#comment:68"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;Chris' work to optimise MySQL&lt;/a&gt; was the cause.
&lt;/p&gt;
&lt;p&gt;
I'd guess we scraped though before under the 2min limit because the &lt;em&gt;throughput&lt;/em&gt; on the old disks was faster than ZFS, and the site DB was a good deal smaller than now. But wrangling a 1/4GB database needs a good timeout set, or very fast disks.
&lt;/p&gt;
&lt;p&gt;
So this is now a 'minor' ticket that needs a little cleanups -- revert my php changes from a couple of comments back, and raise two new tickets with recommendations I've come to during this ticket...
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>chris</dc:creator>

      <pubDate>Sat, 11 Jan 2014 21:10:03 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:54</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:54</guid>
      <description>
        &lt;pre class="wiki"&gt;On Sat 11-Jan-2014 at 08:11:20PM -0000, Transiton Technology Trac wrote:
&amp;gt;
&amp;gt;  So it's almost certainly the mysql {{{wait_timeout}}} value was far too
&amp;gt;  low for big DB operations... A search of Trac shows
&amp;gt;  [https://tech.transitionnetwork.org/trac/ticket/555#comment:68 Chris' work
&amp;gt;  to optimise MySQL] was the cause.
Ug, sorry :-(
&lt;/pre&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>jim</dc:creator>

      <pubDate>Sat, 11 Jan 2014 21:12:37 GMT</pubDate>
      <title>hours, status, totalhours changed; resolution set</title>
      <link>http://localhost:8080/trac/ticket/610#comment:55</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:55</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.15&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;status&lt;/strong&gt;
                changed from &lt;em&gt;new&lt;/em&gt; to &lt;em&gt;closed&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;resolution&lt;/strong&gt;
                set to &lt;em&gt;fixed&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;10.35&lt;/em&gt; to &lt;em&gt;10.5&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Chris said:
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;p&gt;
Ug, sorry :-(
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
No worries mate, easily done... It was a needle in a haystack kinda issue though, glad it's fixed!
&lt;/p&gt;
&lt;p&gt;
PHP tweaks from 3 comments up reverted.
&lt;/p&gt;
&lt;p&gt;
Migration of STG2 to S008 platform worked. Now will continue over on &lt;a class="assigned ticket" href="http://localhost:8080/trac/ticket/582" title="maintenance: TN.org platform and sites (assigned)"&gt;#582&lt;/a&gt; to do updates and wrap this puppy up...
&lt;/p&gt;
&lt;p&gt;
New issues arising from this one:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/670"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;#670: Roll back customisations and use stock BOA settings where possible&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/671"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;#671: Replace core Search module with Apache Solr&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
Happy to close this!
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>ed</dc:creator>

      <pubDate>Mon, 13 Jan 2014 08:59:01 GMT</pubDate>
      <title></title>
      <link>http://localhost:8080/trac/ticket/610#comment:56</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:56</guid>
      <description>
        &lt;p&gt;
Excellent work Holmes.
&lt;/p&gt;
&lt;p&gt;
I'm  noting also that Chris' changes were made in conversation with Jim so don't feel bad Chris - you did what you needed to do, then, and that became a thing now.
&lt;/p&gt;
&lt;p&gt;
So this was documented in a flurry at the time, and what I'm noting now is that that type of documentation wasn't enough to help us track issues later. Humanly, this is also due to the speed at which you were both working, the pressure you were under, and the different nature of the things you were sorting out, and the whole ticket not having one 'documentation process and owner'.
&lt;/p&gt;
&lt;p&gt;
THEREFORE - as per my suggestion on &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/670" title="maintenance: Roll back performance customisations and use stock BOA settings where ... (closed: fixed)"&gt;#670&lt;/a&gt; I'm happy for you two to do what you do to simplify the rig now - with the clear requirement:
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;* make it clearly documented on a wiki page &lt;/strong&gt;*
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>chris</dc:creator>

      <pubDate>Mon, 13 Jan 2014 10:56:05 GMT</pubDate>
      <title>hours, totalhours changed</title>
      <link>http://localhost:8080/trac/ticket/610#comment:57</link>
      <guid isPermaLink="false">http://localhost:8080/trac/ticket/610#comment:57</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;hours&lt;/strong&gt;
                changed from &lt;em&gt;0.0&lt;/em&gt; to &lt;em&gt;0.25&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;totalhours&lt;/strong&gt;
                changed from &lt;em&gt;10.5&lt;/em&gt; to &lt;em&gt;10.75&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Replying to &lt;a href="http://localhost:8080/trac/ticket/610#comment:56" title="Comment 56 for Ticket #610"&gt;ed&lt;/a&gt;:
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;p&gt;
So this was documented in a flurry at the time, and what I'm noting now is that that type of documentation wasn't enough to help us track issues later.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
The documentation of the key change to the MySQL config, which was done at the suggestion of mysqltuner, was documented enough for it to be found via a search:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/555#comment:68"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://tech.transitionnetwork.org/trac/ticket/555#comment:68&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
That change is also referenced from the Puffin MySQL tuning ticket, see the list in the description at the top:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/ticket/587"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://tech.transitionnetwork.org/trac/ticket/587&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
However it is true that the changes documented in the tickets hadn't been copied to the wiki page, this is where it should have been documented:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://tech.transitionnetwork.org/trac/wiki/PuffinServer#mysqlconfigchanges"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://tech.transitionnetwork.org/trac/wiki/PuffinServer#mysqlconfigchanges&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
Very sorry about that.
&lt;/p&gt;
&lt;p&gt;
Reflecting on this, I feel this might be related to the pressures of trying to do thing as quickly as possible, to keep costs down (see for example &lt;a class="closed ticket" href="http://localhost:8080/trac/ticket/629#comment:11" title="maintenance: Upgrade to BOA-2.1.3 Stable Edition (closed: wontfix)"&gt;ticket:629#comment:11&lt;/a&gt;) -- the documentation is often the thing that doesn't get done as comprehensively as it could be, because it is time consuming. It doesn't seem right that a job that, for example, takes perhaps 20 seconds to do might take 5 mins to document. I'll try to do more, better documentation (ie distil the documentation in the ticket comments into the wiki pages) in the future and resist the pressure to skip it to keep the time down.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item>
 </channel>
</rss>