Ticket #764 (new maintenance)
Policy decisions re-assessment on BOA and Drupal security updates
Reported by: | annesley | Owned by: | annesley |
---|---|---|---|
Priority: | major | Milestone: | Maintenance |
Component: | Unassigned | Keywords: | |
Cc: | chris, paul, ben, ed, sam | Estimated Number of Hours: | 0.0 |
Add Hours to Ticket: | 0 | Billable?: | yes |
Total Hours: | 5.39 |
Description
on-line meeting 5 / August @ 14:00 GMT:
we are phasing out the current D6 / BOA system. the new system may not use either. The TN.org website is not attractive to high level hackers or DOS attacks.
what are the risks with cancelling all further Unix, BOA and Drupal updates completely that do not allow direct un-mitigated access to the backend via bad PHP code / SQL?
Change History
comment:1 in reply to: ↑ description Changed 2 years ago by chris
comment:2 follow-up: ↓ 3 Changed 2 years ago by annesley
of course. but we are facing a cost / efficacy issue. no server is 100% secure, and we are already using Drupal which is extremely in-secure in comparison to most frameworks.
what is the risk / efficacy / politics ratio in either case?
for example: £10,000 / year to decrease our risk of attack and embarrassment from 0.2% -> 0.1%
and, how deep is our embarrassment to our community of making this choice? no one will burn their houses down. but they might receive a bit of spam. we will not be arrested or fined for data privacy failure. we are a social change movement, not a bank.
comment:3 in reply to: ↑ 2 Changed 2 years ago by chris
Replying to annesley:
what is the risk / efficacy / politics ratio in either case?
No idea, this is one for Ed isn't it?
For the record: I think it's a daft idea.
comment:4 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 0.0 to 0.125
The problem seems to be not with updating Drupal core or Linux. Updating these should be quick and painless. It seems to me the problem is with Aeigr?
I agree with Chris we need to apply security updates for the reasons given.
Drupal is secure. It's only insecure if you don't apply security updates.
comment:5 follow-up: ↓ 6 Changed 2 years ago by annesley
no IT system is "secure". some are more secure than others. all can be hacked with enough time and resources. certainly ours is more secure as a result of the work being done.
the question is not "are we secure?" but "does the website's financial / fun value to a hacker / script kiddie / robot exceed the amount of resources and cost it would take to hack our site?"
comment:6 in reply to: ↑ 5 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 0.125 to 0.25
Replying to annesley:
no IT system is "secure". some are more secure than others. all can be hacked with enough time and resources. certainly ours is more secure as a result of the work being done.
Understood :)
the question is not "are we secure?" but "does the website's financial / fun value to a hacker / script kiddie / robot exceed the amount of resources and cost it would take to hack our site?"
I don't think that is the question we need to ask. The question we should ask is why is updating drupal / linux so costly ?
Bottom line: if / when our site is hacked *we* will want to be able to say to all of our users we did all we could to keep *your* data safe.
comment:7 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 0.25 to 0.375
Sorry, adding time.
comment:8 follow-ups: ↓ 9 ↓ 14 Changed 2 years ago by annesley
we are not doing everything that we can to keep their data safe. we could happily spend another £10,000, £20,000, £200,000 working on improving our security.
updating Drupal has always caused many inter-dependency issues with other Modules for me. maybe i have been unlucky. certainly other projects have used 60+ contrib Modules. generally many more than this one...
comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 0.375 to 0.5
Replying to annesley:
we are not doing everything that we can to keep their data safe. we could happily spend another £10,000, £20,000, £200,000 working on improving our security.
I know we could do that. I just mean that we should be seen to be doing everything that would reasonably be expected of us given our resources.
updating Drupal has always caused many inter-dependency issues with other Modules for me. maybe i have been unlucky. certainly other projects have used 60+ contrib Modules. generally many more than this one...
I very rarely come across inter-dependency issues. Never in Drupal 7, so far!
comment:10 in reply to: ↑ 9 Changed 2 years ago by chris
Replying to paul:
we should be seen to be doing everything that would reasonably be expected of us given our resources.
Totally agree, not only "Seen to be doing" but actually doing and this is what we are doing, for all sites, apart from the WordPress sites (ticket:540), we have HTTPS only authentication, we keep all applications and operating systems up to date. There is more that could be done, wiki:ParrotServer and wiki:PenguinServer don't have firewalls and I could spend a hour sorting that out, I expect there are some other reasonable other things that could be done, but generally I think we do what we think is reasonably necessary and it would be unreasonable to expect less?
It's also worth noting that we have had security issues with the WordPress sites, see ticket:718 and ticket:749.
I suspect the biggest security issues are client side.
comment:11 Changed 2 years ago by chris
If we don't keep systems upto date we risk getting hit by things like this:
Malware dubbed Mayhem is spreading through Linux and FreeBSD web servers, researchers say. The software nasty uses a grab bag of plugins to cause mischief, and infects systems that are not up to date with security patches.
This, quote, from the article, sums things up well:
"In the *nix world, autoupdate technologies aren't widely used, especially in comparison with desktops and smartphones. The vast majority of web masters and system administrators have to update their software manually and test that their infrastructure works correctly," the trio wrote in a technical report for Virus Bulletin.
"For ordinary websites, serious maintenance is quite expensive and often webmasters don't have an opportunity to do it. This means it is easy for hackers to find vulnerable web servers and to use such servers in their botnets."
comment:12 Changed 2 years ago by chris
On Mon, 21 Jul 2014, in response to:
Maintenance: £1,000 per month: which we are trying to reduce as much as possible
https://www.transitionnetwork.org/blogs/ed-mitchell/2014-07/web-service-strategic-update-july-2014
I sent the following to the ttech list:
We could cron Debian updates and BOA updates and then just
intervene when things break, which wouldn't be very often,
apart from updates like the last php-fpm update on penguin:
That would reduce the amount of time I spend doing
updates, as long as the fixing things that get broken by
the automatic updates doesn't take longer than the manual
updates :-)
Further to the above a cronjob for automatically updating Debian packages could also be set up to post a comment to ticket:692 when updates are applied and the same kind of thing could be setup for BOA updates.
The meeting proposed on this ticket, "5 / August @ 14:00 GMT" is going to be a Skype or Etherpad or IRC meeting?
comment:13 Changed 2 years ago by paul
Etherpad / IRC +1
comment:14 in reply to: ↑ 8 Changed 2 years ago by chris
Replying to annesley:
we are not doing everything that we can to keep their data safe. we could happily spend another £10,000, £20,000, £200,000 working on improving our security.
Even quite modest proposals, like spending £30+VAT to get SSL certs for the WordPress sites hasn't been done in order to save money, see ticket:540#comment:3 -- nobody is proposing spending anything like another £10,000 are they?
comment:15 Changed 2 years ago by chris
Concerns about the time taken for BOA updates have been raised in the past, see ticket:629#comment:11 and following that I have been recording the times here wiki:PuffinServer#Upgradetickets.
Since this issue appears to be one that is going to come up again and again perhaps it would help if the Transition Network commissioned someone or some organisation to undertake an independent audit of all the maintenance work and the time it takes? Everything has been done openly and all the work has been recorded on this site so all the data on which to base a audit is publicly available.
Regarding the meeting to discuss this matter, it is during the day during the holidays -- usually a doodle is set up to arrange meeting times and dates which work for all, would it be appropriate to do that for this meeting?
comment:16 follow-up: ↓ 18 Changed 2 years ago by annesley
what do people think about each separate issue raised in ticket 758? /trac/ticket/758
1) DOS vulnerability
2) Access bypass in FileField? Module. required the attacker to be able to use FileField? content somehow.
3) XSS in Form API mitigated by requirement of "administer taxonomy" permission
4) Drupal general AJAX XSS vulnerability. complex circumstances required
5) XSS in general
comment:17 Changed 2 years ago by annesley
from my general knowledge on hacking i would like to add that we are not only vulnerable to someone stealing email addresses. please correct if i am wrong.
hackers often also use a weak server to piggy back their attack in to another server thus disguising their location. often they will hack a server in Brazil which has different laws.
hackers also use weak servers to do many other things. once a hacker has gained access they will install a "stealth pack" which prevents any detection of their presence from then onwards.
comment:18 in reply to: ↑ 16 ; follow-up: ↓ 20 Changed 2 years ago by chris
Replying to annesley:
what do people think about each separate issue raised in ticket:758?
I think that we could easily spend more time discussing each issue than it would take to upgrade Drupal to the latest version and that this would be a waste of time -- better spend our limited time keeping the site up to date so that when there is an issue that does require an urgent upgrade we don't have a massive backlog of updates to also deal with.
comment:19 Changed 2 years ago by ed
Hi all, sorry this went off when I had to be elsewhere - and now I'm on holidays - it's all good discussion and I asked Annesley to review everything which is what we're up to.
My understanding of most of the 'maintenance' work is that it has been related to BOA; the misfit with the servers, our learning curve with it, and the ongoing Aegir faffing. I don't think it needs a separate review. We know we're not going to self-host BoA again.
And meeting time - Currently 14:00 BST, 5/8/14: I'm happy to be there whenever, if it needs to go back to doodle to find a time that's fine by me.
comment:20 in reply to: ↑ 18 ; follow-ups: ↓ 22 ↓ 40 ↓ 41 Changed 2 years ago by annesley
Replying to chris:
Replying to annesley:
what do people think about each separate issue raised in ticket:758?
I think that we could easily spend more time discussing each issue than it would take to upgrade Drupal to the latest version and that this would be a waste of time -- better spend our limited time keeping the site upto date so that when there is an issue that does require an urgent upgrade we don't have a massive backlog of updates to also deal with.
Drupal updates are not necessarily like other updates. all of the updates in ticket 758 would have been simple direct code replacements. thus you will not need to apply them in sequence if you miss some out. you can simply download the latest and ignore the missed ones. this is only not true if there is a database schema change but that only usually happens when we upgrade to a new Major Drupal version. all the updates in 758 are like this.
if we have a policy on these types of updates then it will be extremely quick to make decisions on what is and isn't applied.
i am used to applying a full testing cycle before site upgrades in order to ensure that there is no accidental loss of service or errors from the inter-connectedness of updates. this is the same principle as applying the Unix server updates manually to make sure that nothing has been broken. i would like to see as few updates as possible and much more testing on each update.
i expect and hope that we will be moving towards a much larger, complex multi-faceted and popular web offering over the next 5 years and so i would like to start a more controlled attitude to updates to the live system. anything *Drupal* updates that are not needed immediately by us can go in to dev and move up to live after a full testing run. and, since we are de-commissioning live we have no need to apply un-necessary Drupal updates to it currently i would suggest.
thanks for the insight to the manual application of Unix updates. i understand the reasoning for this and understand the points about the importance of Unix updates. let's explore the possibility of automatic updates more though. how about a cron once a week on Tuesday morning on the staging server, and on wednesday morning on live. this would give us a day to check for failures on staging. i could do a 10 minute run around the staging site each Tuesday morning until we have a rig setup.
WDYT about these points?
comment:22 in reply to: ↑ 20 ; follow-ups: ↓ 23 ↓ 24 Changed 2 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.5 to 0.75
Replying to annesley:
how about a cron once a week on Tuesday morning on the staging server, and on wednesday morning on live.
We don't have any staging servers, we have 3 production servers, having staging servers would probably cost more that we might save by automating updates. We did have a development server which was used for testing updates prior to the switch to BOA.
In terms of how to do the updaes, we could modify the current script, wiki:AptitudeUpdateScript so it sends emails to Trac and run that via cron or use the Unattended Upgrades package.
comment:23 in reply to: ↑ 22 Changed 2 years ago by chris
Replying to chris:
We don't have any staging servers, we have 3 production servers, having staging servers would probably cost more that we might save by automating updates. We did have a development server which was used for testing updates prior to the switch to BOA.
The live servers are:
Prior to having these servers we had a live and dev server:
These are all linked from the front page of this wiki.
comment:24 in reply to: ↑ 22 ; follow-up: ↓ 25 Changed 2 years ago by sam
In terms of how to do the updates, we could modify the current script, wiki:AptitudeUpdateScript so it sends emails to Trac and run that via cron or use the Unattended Upgrades package.
I wonder if https://packages.debian.org/wheezy/apt-listbugs has a role to play if moving to a more automated upgrade process?
comment:25 in reply to: ↑ 24 ; follow-up: ↓ 28 Changed 2 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.75 to 1.0
Replying to sam:
I wonder if https://packages.debian.org/wheezy/apt-listbugs has a role to play if moving to a more automated upgrade process?
It sounds like it is aimed at people running Debian unstable:
apt-listbugs is a tool which retrieves bug reports from the Debian Bug Tracking System and lists them. Especially, it is intended to be invoked before each upgrade/installation by apt in order to check whether the upgrade/installation is safe.
Many developers and users prefer the unstable version of Debian for its new features and packages. apt, the usual upgrade tool, can break your system by installing a buggy package.
apt-listbugs lists critical bug reports from the Debian Bug Tracking System. Run it before apt to see if an upgrade or installation is known to be unsafe.
I think there is a danger of losing sight of the aim of this discussion -- how to save money?
If we start to consider each update in detail to decide if it is relevant to us or not and if we then test updates on development servers before applying them on live servers I expect that we would find that it takes more time and resources than simply applying all updates to everything when they become available without out much detailed consideration.
Replying to annesley:
i am used to applying a full testing cycle before site upgrades in order to ensure that there is no accidental loss of service or errors from the inter-connectedness of updates. this is the same principle as applying the Unix server updates manually to make sure that nothing has been broken. i would like to see as few updates as possible and much more testing on each update.
i expect and hope that we will be moving towards a much larger, complex multi-faceted and popular web offering over the next 5 years and so i would like to start a more controlled attitude to updates to the live system.
How big were these website you were working on and what was the budget of the organisations -- it sounds to me like you were were working on more expensive projects in the past?
This is the context of this discussion:
- Maintenance: £1,000 per month: which we are trying to reduce as much as possible
https://www.transitionnetwork.org/blogs/ed-mitchell/2014-07/web-service-strategic-update-july-2014
TN.org
My recommendation is something along the lines of ..
Development server (running web server,git, ..) for creating staging sites (pulling/pushing to various git development branches), and a production server (pulling from production branches)
Drop Aegir
Don't switch to cloud hosted solution like Pantheon.
Apply patches from the drupal security team - without looking into the details - on a stage site. If nothing is obviously broken then push these changes to production.
comment:27 Changed 2 years ago by annesley
hi chris. sorry that you will have answered these questions already: what are the specifications of your setup?:
guaranteed bandwidth
guaranteed dedicated CPU
uptime % over the last year
24/7 monitoring
politics (2nd hand equipment, support of other alternative social systems, energy usage, energy source, data protection, etc.)
and what are your thoughts on this offering of managed servers at £126 / month: http://www.hetzner.de/en/hosting/produktmatrix/managed-server-produktmatrix
also, i would be interested to hear your thoughts on the politics here. these are new servers in an air-conditioned data centre in Germany powered by hydro. it's clearly a mixed bag in my view: http://www.hetzner.de/en/hosting/unternehmen/umweltschutz
comment:28 in reply to: ↑ 25 ; follow-up: ↓ 29 Changed 2 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 1.0 to 1.25
Replying to paul:
TN.org
My recommendation is something along the lines of ..
Development server (running web server,git, ..) for creating staging sites (pulling/pushing to various git development branches), and a production server (pulling from production branches)
Drop Aegir
Don't switch to cloud hosted solution like Pantheon.
Apply patches from the drupal security team - without looking into the details - on a stage site. If nothing is obviously broken then push these changes to production.
The comment above was written by Paul, not me, but rather than it being posted as a separate comment it was added to a comment I posted, the original from me is at ticket:764?cversion=0&cnum_hist=25#comment:25
I think if people start editing other peoples comments and we have to look at diffs to work out who said what it'll get really confusing -- can we try not to edit other peoples comments?
Replying to annesley:
hi chris. sorry that you will have answered these questions already: what are the specifications of your setup?:
guaranteed bandwidth
guaranteed dedicated CPU
uptime % over the last year
24/7 monitoring
politics (2nd hand equipment, support of other alternative social systems, energy usage, energy source, data protection, etc.)
and what are your thoughts on this offering of managed servers at £126 / month: http://www.hetzner.de/en/hosting/produktmatrix/managed-server-produktmatrix
also, i would be interested to hear your thoughts on the politics here. these are new servers in an air-conditioned data centre in Germany powered by hydro. it's clearly a mixed bag in my view: http://www.hetzner.de/en/hosting/unternehmen/umweltschutz
Lots of questions, that need more than a quick answer, I'll try to answer them over the next few days, it would also be great if you could answer my questions, eg:
Replying to chris:
Replying to annesley:
i am used to applying a full testing cycle before site upgrades in order to ensure that there is no accidental loss of service or errors from the inter-connectedness of updates. this is the same principle as applying the Unix server updates manually to make sure that nothing has been broken. i would like to see as few updates as possible and much more testing on each update.
i expect and hope that we will be moving towards a much larger, complex multi-faceted and popular web offering over the next 5 years and so i would like to start a more controlled attitude to updates to the live system.
How big were these website you were working on and what was the budget of the organisations -- it sounds to me like you were were working on more expensive projects in the past?
comment:29 in reply to: ↑ 28 Changed 2 years ago by chris
Replying to paul:
TN.org
My recommendation is something along the lines of ..
Development server (running web server,git, ..) for creating staging sites (pulling/pushing to various git development branches), and a production server (pulling from production branches)
Drop Aegir
Don't switch to cloud hosted solution like Pantheon.
Apply patches from the drupal security team - without looking into the details - on a stage site. If nothing is obviously broken then push these changes to production.
The above makes perfect sense to me, it is basically how we did things before we switched to BOA.
comment:30 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 1.25 to 1.375
It sounds like we should turn the clock back.
@Team
Probably the quickest way forward would be to ask Chris to put forward a plan of action and then have a show of hands? My feeling is that on most things (probably all things) server related Chris is consistently on the money - so should lead on all server related matters.
comment:31 follow-up: ↓ 34 Changed 2 years ago by annesley
@chris: sorry! didn't mean to miss your questions. i have worked with a huge variety of topologies actually, from companies with sysadmin departments all the way to one outsourced managed dedicated server and even some friends sites and my own sites on shared hosting packages.
here is my LinkedIn? profile for all the information you need: https://www.linkedin.com/pub/annesley-newholm/6/19/320
comment:32 follow-ups: ↓ 33 ↓ 35 Changed 2 years ago by annesley
BOA: i am out-of-my-depth here with sysadmin so cannot make useful comment on the BOA / non-BOA options.
Aegir: as far as i understand Aegir is excellent for spinning up and managing new Drupal sites. i certainly enjoyed it's management interface and found things quite easy to use. span us several sites, updated things, monitored things. however, i am pretty sure we won't be spinning up multiple *copies* of the same codebase. thus Aegir is no longer relevant to us. if i have understood it correctly then i am currently happy to drop Aegir.
Pantheon: i have never used this or similar. the only thing i have used is the topology suggested by chris above: a server, managed by experienced sysadmins, and the programmers do all the Drupal and database stuff. sometimes the sysadmins promote the code when asked, sometimes the programmers do it directly or through GIT. it's always just been that simple. but maybe things have moved on? by what you are saying they have but it has not worked well... https://www.drupal.org/node/1585604
@paul: chris answer needs to include the rational for using / not using hetzner.de.
comment:33 in reply to: ↑ 32 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.15
- Total Hours changed from 1.375 to 1.525
Replying to annesley:
BOA: i am out-of-my-depth here with sysadmin so cannot make useful comment on the BOA / non-BOA options.
That is pretty much my position.
Aegir: as far as i understand Aegir is excellent for spinning up and managing new Drupal sites. i certainly enjoyed it's management interface and found things quite easy to use. span us several sites, updated things, monitored things. however, i am pretty sure we won't be spinning up multiple *copies* of the same codebase. thus Aegir is no longer relevant to us. if i have understood it correctly then i am currently happy to drop Aegir.
The interface is great.
Main problems are that it is slow to use for developers to use and difficult to administer for system administrators , i.e. costly
Pantheon: i have never used this or similar. the only thing i have used is the topology suggested by chris above: a server, managed by experienced sysadmins, and the programmers do all the Drupal and database stuff. sometimes the sysadmins promote the code when asked, sometimes the programmers do it directly or through GIT. it's always just been that simple.
Sounds good.
but maybe things have moved on? by what you are saying they have but it has not worked well... https://www.drupal.org/node/1585604
I have had a few annoying memory problems on pantheon servers during development recently: out of memory notices, white screens, features not reverting, .. It too feels like a slow process costing my clients money .
@paul: chris answer needs to include the rational for using / not using hetzner.de.
Very interesting. We should definitely use server resources that are environmentally friendly as possible. Do we have a mandate to use green electricity for our servers? Actually I thought we were already running the transition servers on green servers?
comment:34 in reply to: ↑ 31 Changed 2 years ago by chris
- Add Hours to Ticket changed from 0.0 to 1.4
- Total Hours changed from 1.525 to 2.925
Replying to annesley:
what are the specifications of your setup?:
guaranteed bandwidth
The three virtual servers come with the following bandwidth allowances:
- wiki:ParrotServer - 60GB
- wiki:PenguinServer - 120GB
- wiki:PuffinServer - 240GB
The actual bandwidth used is documented on wiki:ServerBandwidth and this has just been updated with the latest stats. Webarchitects has a monthly bandwidth allowance from the data centre but this has a soft limit, usage can burst over the limit up to the physical bandwidth of the datacentre.
guaranteed dedicated CPU
The servers have access to the following number of CPUs, this is documented on each page for each server on this wiki:
- wiki:ParrotServer - 4 CPUs
- wiki:PenguinServer - 8 CPUs
- wiki:PuffinServer - 14 CPUs
The physical server has a pair of 16 core AMD Opteron(tm) Processor 6128, 1885 MHz, 512 KB cache.
The CPU usage is monitored using Munin:
- wiki:ParrotServer - https://penguin.transitionnetwork.org/munin/transitionnetwork.org/parrot.transitionnetwork.org/cpu.html
- wiki:PenguinServer - https://penguin.transitionnetwork.org/munin/transitionnetwork.org/penguin.transitionnetwork.org/cpu.html
- wiki:PuffinServer - https://penguin.transitionnetwork.org/munin/transitionnetwork.org/puffin.transitionnetwork.org/cpu.html
uptime % over the last year
We haven't ever had a power failure at the data centre and we have had servers hosted there for over a decade. The last time there was a problem with uptime was last summer when the server motherboard was causing restarts, this was resolved by replacing the motherboard. I'm not aware of any downtime since then, the server the virtual machines are on has been up 190 days.
24/7 monitoring
We have a Nagios server monitoring the servers and it sends alerts direct to my phone when there is an issue. We have 24 hour access to the data centre and it's in easy cycling distance.
politics (2nd hand equipment, support of other alternative social systems, energy usage, energy source, data protection, etc.)
We are a multi-stakeholder co-operative (clients can join, this includes the Transition Network if it wishes to) with the aim:
"To enable the provision of internet based services for socially responsible groups and individuals, using free open source software wherever possible, in a manner that aims to minimise fossil fuel usage and ecological impacts and which also provides sustainable employment"
You can access our full rules from our rules page, we adhere to the Seven International Co-operative Principles and we have appiled to be associate members of Radical Routes, at our last AGM we agreed the Radical Routes Aims and Principles.
We only use Free-Libre and Open Source Software (FLOSS) and at our AGM in 2012 agreed:
"All software artefacts that the co-operative produces to be licensed under a FSF approved license."
We endeavour to reach decisions via consensus and are committed to all aspects of equality. We provide some free and reduced rate services to some campaigns (this include the Transition Network for whome we work for less than our usual rates).
Our Sheffield datacentre is powered by Good Energy and one of our clients and members, Sheffield Renewables, is looking at installing PV on the roof of our data centre.
The server running the virtual machines was provided by Very PC who are based near us and who have "set ourselves the task of creating the most sustainable PC on the planet". The filesystem for the servers is running from a DNUK server we inherited from our merger with Ecohost Co-operative, this runs using BSD and ZFS:
ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.
The use of ZFS allows the most frequently accessed data to be served direct from RAM and the next most frequently accessed data to be served from a pair of 960GB SSD disks, other data comes from the 14 1TB SATA disks in the server. This server is backed up to another former Ecohost server which also runs BSD and ZFS and has 4x4TB disks.
We do our best to balance the reuse and re-purposing of hardware with the need for fast servers.
All the servers have encrypted disks.
and what are your thoughts on this offering of managed servers at £126 / month: http://www.hetzner.de/en/hosting/produktmatrix/managed-server-produktmatrix
also, i would be interested to hear your thoughts on the politics here. these are new servers in an air-conditioned data centre in Germany powered by hydro. it's clearly a mixed bag in my view: http://www.hetzner.de/en/hosting/unternehmen/umweltschutz
I do sysadmin on some Hetzner servers, the console access isn't very Linux friendly as it uses a Java applet rather than SSH. The Transition network doesn't need managed servers as it employs people to manage its servers.
Replying to annesley:
here is my LinkedIn? profile for all the information you need: https://www.linkedin.com/pub/annesley-newholm/6/19/320
The question was "how big were these website you were working on and what was the budget of the organisations" when you were "applying a full testing cycle before site upgrades in order to ensure that there is no accidental loss of service or errors from the inter-connectedness of updates", I don't have Linkedin account details to hand so I can't access your CV to see if this information is there.
comment:35 in reply to: ↑ 32 Changed 2 years ago by chris
Replying to annesley:
@paul: chris answer needs to include the rational for using / not using hetzner.de.
I'm not sure I fully understand what you need me to do, where the Transition Network servers are hosted isn't a decision that is down to me, this is best raised with Ed?
comment:36 follow-up: ↓ 39 Changed 2 years ago by annesley
@chris: it's a public LinkedIn? profile so you shouldn't need to log in. did i not answer your question satisfactorily? i'll try again: i have worked with all sorts of sizes of websites, all sorts of budgets.
comment:37 Changed 2 years ago by sam
As an outsider reading this thread it feels a bit confrontational.
Just thought I'd mention it in case no one else had noticed.
comment:38 Changed 2 years ago by annesley
good point Sam. sorry, my last post could read as sarcastic. i wasn't intended, i was genuinely not sure what other information to give.
comment:39 in reply to: ↑ 36 Changed 2 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.15
- Total Hours changed from 2.925 to 3.075
Replying to annesley:
i have worked with all sorts of sizes of websites, all sorts of budgets.
When you have worked on sites which were "applying a full testing cycle before site upgrades in order to ensure that there is no accidental loss of service or errors from the inter-connectedness of updates", I assume there were live and staging servers for testing, and the question was: did they have a bigger budget than the Transition Network does? I have a feeling they might have, but if there is a way to use processes like this and spend less money than we currently do then that sounds great.
comment:40 in reply to: ↑ 20 Changed 2 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.35
- Total Hours changed from 3.075 to 3.425
Replying to annesley:
Drupal updates are not necessarily like other updates. all of the updates in ticket 758 would have been simple direct code replacements. thus you will not need to apply them in sequence if you miss some out. you can simply download the latest and ignore the missed ones. this is only not true if there is a database schema change but that only usually happens when we upgrade to a new Major Drupal version. all the updates in 758 are like this.
if we have a policy on these types of updates then it will be extremely quick to make decisions on what is and isn't applied
I think it is, essentially, the same with Debian updates, most of them don't have a direct impact on the services we are running, however it would quickly become more complicated and probably more costly to only selectively apply updates. This is because straight away all the tools we use to alert us to there being outstanding updates would no longer be usable, we would no longer be able to do things like apt-get update ; apt-get upgrade. The more we "fork" our setup from the ones that other people are maintaining the more complicated our setup becomes.
We moved away from having a development server, on which system and application updates were tested before the same updates being done on the live server in order to save money. The reasoning, if I remember correctly, being that having small amounts of downtime caused by things going wrong on the live server was not important enough to merit spending money to reduce the chance of these kind of problems happening. This has mostly worked out OK, but there have been times where things have gone wrong, like the last major MediaWiki upgrade.
I think that further cost saving could be made by automating Debian and BOA updates, but this would have to be done on the understanding that while it would mostly work without a problem, where there is a problem it would probably result in more downtime than there would have been when problems arose during manual updates. If we can live without staging/development servers for testing updates then I think we could live with the occasional downtime caused by automatic updates going wrong?
I don't believe that the Transition Network is so poor that it needs to consider stopping maintaining servers in order to save money (but perhaps I'm wrong?). I think we need systems that are quick and easy to update and maintain and that we have, in the past, made things so complicated that updates are not quick and easy and that we need to learn from this.
comment:41 in reply to: ↑ 20 Changed 2 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.1
- Total Hours changed from 3.425 to 3.525
Replying to annesley:
since we are de-commissioning live we have no need to apply un-necessary Drupal updates to it currently i would suggest.
If the new site was ready to be made live in a month or so then I think it would OK to consider not updating the current live Drupal site, however we are not in that situation, we have no date for the new site to go live (has a single line of code for the new site been written yet?) so I think it is clearly premature to consider stopping updating the current live Drupal site.
comment:42 Changed 2 years ago by annesley
@chris: very impeccable credentials for WebArchitects?! i have been to some Radical Roots meetings myself and am very very much in support of the non-hierarchical co-operative organisational form. of course. great attitude also to the environmental side of things.
comment:43 Changed 2 years ago by chris
For what it's worth this is what I'd suggest we consider doing:
- Continue to update and maintain the existing Drupal 6 site at https://ww.transitionnetwork.org/ until it is replaced, including Drupal core, Drupal modules and BOA.
- For the replacement site(s) we consider an alternative hosting solution to BOA, but continue to use BOA for the existing site at https://www.transitionnetwork.org/ until the Drupal 6 site is decomissioned.
- Continue to update and maintain other applications and servers, including MediaWiki, TransitionTrac, PiwikServer and WordPress sites, as we have been doing.
comment:44 Changed 2 years ago by chris
For reference, before we switched to BoaCodeManagement, this was the process for updating the https://www.transitionnetwork.org/ Drupal site and testing it on the DevelopmentServer before deploying it on the NewLiveServer:
- CodeManagementReleaseProcess (git)
- CodeManagementReleaseProcessOld (subversion)
comment:45 Changed 2 years ago by annesley
what are the benefits to Transition of using source code control in comparison to regular backups / and pre-change backups when we have only one developer?
comment:46 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 3.525 to 3.65
I think we have a few developers on the team now (you, Ben, me ..). Keeping all the code under version control, so that anyone on the team can see who did what and why - is good for TN. This time next year, the team could be completely different.
comment:47 follow-ups: ↓ 48 ↓ 49 ↓ 51 Changed 2 years ago by annesley
i have no problem with git by the way. i understand how it works and can use it reasonably well.
it has been suggested that one person be responsible for updates to live.
i will suggest at the forthcoming meeting also that i take over all Drupal patching and updates. for 3 reasons: i am cheaper, i can assess the cost-benefit of each change directly, and it's good to have all changes going through one person to overview things.
although i am questioning things, i generally agree that we should keep it.
comment:48 in reply to: ↑ 47 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 3.65 to 3.775
Replying to annesley:
i have no problem with git by the way. i understand how it works and can use it reasonably well.
it has been suggested that one person be responsible for updates to live.
i will suggest at the forthcoming meeting also that i take over all Drupal patching and updates. for 3 reasons: i am cheaper, i can assess the cost-benefit of each change directly, and it's good to have all changes going through one person to overview things.
I guess that's Ed's call. I probably won't vote be in favour of that :D
although i am questioning things, i generally agree that we should keep it.
comment:49 in reply to: ↑ 47 Changed 2 years ago by chris
comment:50 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 3.775 to 3.9
I can probably also do cheaper. It's all out of my hands. If the work is to be taken off me; I will just accept it, no problem.
comment:51 in reply to: ↑ 47 Changed 2 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 3.9 to 4.15
Replying to annesley:
i can assess the cost-benefit of each change directly
I'm not sure if I understand or agree with what you are proposing.
What we have been doing till now, and what I think we should continue to do, is basically the same as we do for all updates to applications and operating systems:
- When there are updates available the first thing to do is quickly assess the security implications for the Transition Network.
- If there is a clear and immediate security risk then update ASAP.
- If there is not a clear and immediate security risk then update at a suitable time.
I have the impression that you think some updates shouldn't be applied, it's hard discussing this in general, without specific examples, but I think this is the wrong approach as it would result, in time, with, for example, our version of Drupal 6 diverging from the version maintained at Drupal.org and I think this would have the potential to result in more work not less.
i will suggest at the forthcoming meeting also that i take over all Drupal patching and updates
Why do you want to take work off other people in the team? Don't you have enough to do developing the next version of the site?
Is the meeting still happening at 2pm tomorrow?
comment:52 Changed 2 years ago by ed
sorry i'm behind on this ticket - will catch up later this morning hopefully - i saw an email sayign chris you couldn't make it - and paul suggesting evenings this week... (?)
comment:53 Changed 2 years ago by paul
Hi Ed,
I think that's right. Maybe we could have another doodle.
comment:54 Changed 2 years ago by ed
OK - I've spoken with Chris on the phone - we've got a small window to meet this week on Thursday 7th August: 19:00 BST (20:00 CEST) - after that Chris is away on a remote campsite with family so only available for utterly urgent meltdown situations...
Can we do Thursday 7th August 19:00 BST (20:00 CEST)?
Or do we wait until September?
comment:55 Changed 2 years ago by annesley
eek. i'm busy that night already. sorry. i could do 19:00 -> 19:45 BST but would need to be quite strict about it.
comment:56 follow-up: ↓ 60 Changed 2 years ago by ed
Can we meet on Thursday 7th August 19:00 BST, 20:00 CEST for 45 minutes maximum - so that we make the most of a small window of opportunity.
There is a lot in this ticket, and it has become very intertwined. I salute you all for your diplomacy as you have circled around a lot of big topics and there is a lot of change going on. Also that TRAC is not the place to have multiply threaded conversations - it's geared best towards maintenance communications - so it's hard to handle some of these topics on this medium (imo).
Suggested agenda: Thursday meeting: identify and agree the big topics that have come out of this ticket and how and when we will proceed with those topics - i.e. we don’t try to ram in a million detailed interconnected things based on different approaches and goals and outlooks - we strip it back to the big topics and agree next steps.
I have seen these topics in this ticket and propose these as a starter for ten:
- CRON vs Manual - for BOA and Debian
- Drupal updates: all or some only?
- Publishing process: GIT/Aegir/file copy/individual roles
- Lead developer handover
- Future: strategy: developer network, access to TN.org, code vs code less drupal
- Web host
- BOA / hosting software environment
We agreed at our f2f that we would break out the topics into wiki pages - some of which are already up here:
https://wiki.transitionnetwork.org/Tnv3
comment:57 follow-ups: ↓ 58 ↓ 61 Changed 2 years ago by annesley
hi, sorry i created confusion with my last post:
i absolutely agree that all Debian / BOA updates should be applied. manually or automatically. decisions on usage of BOA are not for me to decide, i don't have the experience.
i was suggesting that one person promote all Drupal changes to live. updates, upgrades and code promotions. and that, for stability and limit of access to live, we try to do these once a month maximum after a good monthly testing run of 1 hour prior to code promotion from staging. multiple people may, of course, add content and panels / views pages as per requirements through the Drupal web interface.
i am happy to use GIT. i believe that Drupal should be codeless. if many or even one programmer is making lots of custom code then i think Drupal has been mis-understood and it will lead to a very dangerous maintenance situation. so i don't ever foresee many developers creating lots of custom code. they might create lots of panels / views and module installations through the web interface.
the recent 2 XSS security risks Drupal Module updates i would have applied urgently because XSS is a serious security specifically because TN is a charity which could feasibly accept donations.
@chris: i do not see why becoming out-of-line with Drupal.org is a problem. could you elaborate scenarios where this would cause issues for us? bear in mind that Drupal updates are not cumulative. that is, if your module is 3 updates behind, it does not create issues simply to install the latest over the top. same major version.
@chris and @ed: as far as i understand i have been asked to take over the developer role including all Drupal changes. apologies if i have misunderstood this.
comment:58 in reply to: ↑ 57 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 4.15 to 4.4
Replying to annesley:
hi, sorry i created confusion with my last post:
i absolutely agree that all Debian / BOA updates should be applied. manually or automatically. decisions on usage of BOA are not for me to decide, i don't have the experience.
Agreed.
i was suggesting that one person promote all Drupal changes to live. updates, upgrades and code promotions. and that, for stability and limit of access to live, we try to do these once a month maximum after a good monthly testing run of 1 hour prior to code promotion from staging. multiple people may, of course, add content and panels / views pages as per requirements through the Drupal web interface.
My suggestion would also be have to one person take responsibility for pushing code to the production site (and also have another member of the team who can take on this role when needed) but maybe having that person also be the person who is writing the code may not be the best option. It's always better not to concentrate too many roles in one person especially if they are connected.
I agree with Chris 1-3 on applying updates.
i am happy to use GIT. i believe that Drupal should be codeless. if many or even one programmer is making lots of custom code then i think Drupal has been mis-understood and it will lead to a very dangerous maintenance situation. so i don't ever foresee many developers creating lots of custom code. they might create lots of panels / views and module installations through the web interface.
Agreed. We should only write custom code when needed.
However In the next iteration of the site we should consider exporting our content types, views, configuration, .. into feature modules and getting these under version control
the recent 2 XSS security risks Drupal Module updates i would have applied urgently because XSS is a serious security specifically because TN is a charity which could feasibly accept donations.
Best to take 10 minutes and just apply them. You can't do any better than this.
@chris: i do not see why becoming out-of-line with Drupal.org is a problem. could you elaborate scenarios where this would cause issues for us? bear in mind that Drupal updates are not cumulative. that is, if your module is 3 updates behind, it does not create issues simply to install the latest over the top. same major version.
It's hard to suggest problematic scenarios that may emerge, but you may well be creating them by forking your distribution. The best you can do is to follow the wisdom of those that now best, and that is to just take 15 minutes and apply the update from the security team.
@chris and @ed: as far as i understand i have been asked to take over the developer role including all Drupal changes. apologies if i have misunderstood this.
comment:59 Changed 2 years ago by paul
Thursday is good for me.
comment:60 in reply to: ↑ 56 Changed 2 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.49
- Total Hours changed from 4.4 to 4.89
Replying to ed:
I have seen these topics in this ticket and propose these as a starter for ten:
- CRON vs Manual - for BOA and Debian
- Drupal updates: all or some only?
- Publishing process: GIT/Aegir/file copy/individual roles
- Lead developer handover
- Future: strategy: developer network, access to TN.org, code vs code less drupal
- Web host
- BOA / hosting software environment
That looks like an agenda for something like a half day face to face meeting, I'd suggest putting off most these issues till the autumn and for now just agreeing what we are doing between now and then.
I'd also suggest that for now we continue what we were doing and Annesley works on the new code (eg the IIRS) rather than working on maintaining and tweaking the existing site.
Replying to annesley:
i was suggesting that one person promote all Drupal changes to live. updates, upgrades and code promotions. and that, for stability and limit of access to live
I'd suggest that, for now, Paul should continue doing this.
we try to do these once a month maximum after a good monthly testing run of 1 hour prior to code promotion from staging.
I'd suggest we continue doing them as we do now, which is basically when needed and when we get them working.
i believe that Drupal should be codeless.
I recall that at our last meeting we agreed that we would do what Paul said (if we used Drupal 7):
In the next iteration of the site we should consider exporting our content types, views, configuration, .. into feature modules and getting these under version control
Replying to annesley:
i do not see why becoming out-of-line with Drupal.org is a problem. could you elaborate scenarios where this would cause issues for us?
To start with it would stop the automatic alerts about updates to Drupal core from working and more fundamentally we would, basically, be forking Drupal and I think that would create additional, unnecessary, work.
comment:61 in reply to: ↑ 57 Changed 2 years ago by chris
Replying to annesley:
if many or even one programmer is making lots of custom code then i think Drupal has been mis-understood and it will lead to a very dangerous maintenance situation.
That sounds like an argument against forking Drupal 6.
comment:62 Changed 2 years ago by chris
The reason I'm suggesting we continue doing what have been is that I have the impression that we have finally got into the swing of things following Jim leaving and if something is working then best not break it.
comment:63 follow-ups: ↓ 64 ↓ 66 Changed 2 years ago by ed
Chris apologies for my lack of clarity; I was not suggesting we cover those topics on Thurs evening in 30 mins of course.
I sent an email which you will see in the ttech list yesterday with a fuller outline of my suggestion which is:
- This ticket is *not* the place to continue this discussion; TRAC does not afford this sort of discussion well.
- We do not want to lose the many points raised on the many different topics
- Wiki pages with related editing and discussion *is* the place to continue this discussion (as agreed in our f2f; I am bringing this back to our earlier agreement of how we will work)
- Therefore: to use the time on Thursday evening to:
(a) agree what are the topics so that we don't lose them
(b) if there is time, to agree how to proceed in the next month
... That's it
By doing this, we won't lose the intellectual input from this ticket, everyone involved has a say in the topics we take forward, knowing how and what will proceed over August, and then in September.
Is that clearer?
Can I have confirmation of attendance/not attendance for Thursday evening?
comment:64 in reply to: ↑ 63 Changed 2 years ago by chris
Replying to ed:
to use the time on Thursday evening to:
(a) agree what are the topics so that we don't lose them
(b) if there is time, to agree how to proceed in the next month
... That's it
I'd suggest these two topic have to be the other way around -- we need to know if Annesley is taking all the Drupal update, maintenance and deployment work off Paul or not -- I don't see how we can afford to let that question fall off the agenda?
comment:65 follow-up: ↓ 67 Changed 2 years ago by ed
So from my perspective, the operational handover between the two is an ongoing process which is happening between them. I am encouraging them to have a meet next week to get into the details of that specific topic as it seems that is between them, rather than something the whole group needs to discuss.
The longer term questions of roles and process can happen on the wiki.
Therefore I don't see it as falling off the agenda of this meet, more that it's something that perhaps needs a bit more time and focus between the protagonists, will happen over time, and is mostly between the two of them.
If I am wrong in this assumption, then Annesley and Paul can definitely say so, and I'd be happy for us to check it in the meet. But if it's a go-er, I don't see it as important to the group as the topics.
comment:66 in reply to: ↑ 63 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 4.89 to 5.015
- Therefore: to use the time on Thursday evening to:
I'm not sure I can make the call this evening as were off out shortly. I'll reply further below ..
(a) agree what are the topics so that we don't lose them
This can be done without me as I don't have any topics that needs discussing
(b) if there is time, to agree how to proceed in the next month
I'm happy to continue with deployment work but if you want to take this work off me, that's fine. Just let me know what changes you want to make.
comment:67 in reply to: ↑ 65 ; follow-up: ↓ 68 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 5.015 to 5.14
Replying to ed:
So from my perspective, the operational handover between the two is an ongoing process which is happening between them. I am encouraging them to have a meet next week to get into the details of that specific topic as it seems that is between them, rather than something the whole group needs to discuss.
The longer term questions of roles and process can happen on the wiki.
Therefore I don't see it as falling off the agenda of this meet, more that it's something that perhaps needs a bit more time and focus between the protagonists, will happen over time, and is mostly between the two of them.
I think it's best that you make the call Ed or ask Annesley to make the call for you. I'm happy to continue with the work if you need me or help out as and when needed.
If I am wrong in this assumption, then Annesley and Paul can definitely say so, and I'd be happy for us to check it in the meet. But if it's a go-er, I don't see it as important to the group as the topics.
comment:68 in reply to: ↑ 67 Changed 2 years ago by ed
Replying to paul:
Replying to ed:
So from my perspective, the operational handover between the two is an ongoing process which is happening between them. I am encouraging them to have a meet next week to get into the details of that specific topic as it seems that is between them, rather than something the whole group needs to discuss.
The longer term questions of roles and process can happen on the wiki.
Therefore I don't see it as falling off the agenda of this meet, more that it's something that perhaps needs a bit more time and focus between the protagonists, will happen over time, and is mostly between the two of them.
I think it's best that you make the call Ed or ask Annesley to make the call for you. I'm happy to continue with the work if you need me or help out as and when needed.
OK Paul - well further to this morning's activities, I've been on the blower with Annesley, and we agreed that it's best to ask him focus on the development stuff and not ask him to also take on all the publishing/update stuff too; ie you - Paul continue to do the publishing now you have worked it out - I know that this was Chris and your proposal earlier on in this mega-ticket.
I see that you're not coming tonight after all.
Where and how are we meeting tonight?
comment:69 Changed 2 years ago by chris
Do we still need to meet tonight?
comment:70 follow-up: ↓ 71 Changed 2 years ago by ed
LOL
Following today's change of plan about publishing and updating, there is no change in our short term plans, so nothing urgent to talk about on that front.
I think it would be worthwhile to meet to extract some of the topics from this ultra long ticket so they don't get lost, and we can de-stress the situation, but if that doesn't feel useful or more urgent than packing for holiday to you, (which is fine - and Paul is not here, so it would be Ed Chris Annesley), I'm fine with that too.
comment:71 in reply to: ↑ 70 Changed 2 years ago by chris
Replying to ed:
I think it would be worthwhile to meet to extract some of the topics from this ultra long ticket so they don't get lost
I don't think there is a danger of topics getting lost.
I do have the BOA update to do tonight and would rather spend my limited time on that if it's OK, ticket:775.
comment:72 Changed 2 years ago by ed
OK Right then meeting cancelled - Sam can't get online either ... I'll get in touch with Annesley
comment:73 follow-up: ↓ 74 Changed 2 years ago by ed
Ed received an email from Nicholas Roberts (http://www.niccolox.org/) pointing us to guardr: https://www.drupal.org/project/guardr
comment:74 in reply to: ↑ 73 Changed 2 years ago by chris
Replying to ed:
Ed received an email from Nicholas Roberts (http://www.niccolox.org/) pointing us to guardr: https://www.drupal.org/project/guardr
Nice, looks like a ideal Drupal base to build off :-)
comment:75 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 5.14 to 5.265
@Ed
Very interesting. Even if we don't use this as our core distribution we can learn from it and build our own. Would you like me to explore the module selection and try installing.
comment:76 Changed 2 years ago by ed
@Paul
if there are immediate benefits to the existing D6 site, then it's valid maintenance activity - otherwise hold until we're clear on the framework decision (which will be somewhat delayed)...
comment:77 Changed 2 years ago by paul
- Add Hours to Ticket changed from 0.0 to 0.125
- Total Hours changed from 5.265 to 5.39
@Ed
There could be some benefit in going over the selection and comparing / contrasting this with what security modules we have currently. Would not take more than 4 hours work.
comment:78 Changed 2 years ago by ed
@Paul - that would be interesting now, but not strictly relevant until the framework decision is underway so please hold that thought until then (a few months away I suspect)
Replying to annesley:
The risk is that the server is rooted, the user data stolen and abused and the reputation of the TN takes a hit for doing something really stupid?