Ticket #370 (closed maintenance: wontfix)

Opened 5 years ago

Last modified 4 years ago

Problem with nightly MySQL backup load

Reported by: chris Owned by: chris
Priority: critical Milestone:
Component: Live server Keywords:
Cc: jim, laura, ed Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 3.05

Description

Since the start of December 2011 the nightly backupninja mysql backup is causing load spikes which result in the server being unresponsive.

Attachments

mysql_threads-month.png (19.8 KB) - added by chris 5 years ago.
Peak MySQL loads
memcached_multi_bytes-day.png (24.3 KB) - added by chris 5 years ago.
When the machine is unresponsive network traffic drops off
multimemory-day.png (25.9 KB) - added by chris 5 years ago.
There is a spike in apache memory usage
cpu-day.png (24.2 KB) - added by chris 5 years ago.
Also a iowait (disk usage) spike
load-day.png (16.8 KB) - added by chris 5 years ago.
Corresponding load spike

Change History

Changed 5 years ago by chris

Peak MySQL loads

Changed 5 years ago by chris

When the machine is unresponsive network traffic drops off

Changed 5 years ago by chris

There is a spike in apache memory usage

Changed 5 years ago by chris

Also a iowait (disk usage) spike

Changed 5 years ago by chris

Corresponding load spike

comment:1 Changed 5 years ago by chris

  • Cc jim, laura, ed added

Adding CC's

comment:2 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.75
  • Total Hours changed from 0.0 to 0.75

Per my emails...

  • Watchdog table no-longer needed since we're using Syslog. I've now truncated it and shaved ~500Mb from the database! I've taken a back up to my home folder first
  • Batch table truncated too, ~10Mb saved
  • I've identified other tables (search, mainly) that can be shrunk a little.
  • Sessions table now ~180Mb, probably was much bigger but we've lowered the session timeout to 3 days from default of 23... see #369

I reckon this will solve the backup issue since the database is now MUCH smaller (~600Mb+). See #369 for related work.

comment:3 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 0.75 to 1.25

Arg, I spent 30mins on this and documented the mysql database sizes, the sharing engine is by far the biggest, and looked at the queries that take over 100 secs (81 of these in the last few days) and those that took over 1,000 seconds (14 of these, one took 72 mins!) and wrote down some other thoughts but I appear to have only previewed the ticket rather than submitted it, and it's all been lost, I'll revisit this tomorrow...

comment:4 follow-up: ↓ 8 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 1.25 to 1.3

Bummer! That's happened to me several times, but Chromium remembers form contents when you go back, so it's usually ok.


Re SE size: Must be the now-removed watchdog table, because the SE still auto-backs up via the Backup and Migrate module, and those when zipped are ~2.6Mb. The SE, from relevant data size PoV is tiny.

Again, if the backup process is blindly backing up everything including sessions, access logs, watchdogs, caches and search indexes, it's going to be HUGE. If it backs up only relevant tables and ignores those that are just logs, generated data or caches, it'll be small.

Hopfully the work I've done here, plus that in #369, will have dramatically improved things. Revisiing the database and filesystem backups to make them less generic will also help a lot IMHO.

comment:5 Changed 5 years ago by jim

Re slow queries... Piwik processing, SE caches and LIVE site searches seem to be the biggies... I'm sure we can add indexes to the SE's data tables since they're pretty small compared to LIVE's content tables. The SE uses custom data types and tables, so there's a high chance we can improve its indexes immensely.

comment:6 Changed 5 years ago by laura

been watching this thread from afar today but top work you two!

comment:7 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 1.3 to 1.35

Chris: Just wanted to point to https://tech.transitionnetwork.org/trac/ticket/369#comment:4 and the mysqltuner.txt file attached to that issue -- some great config tips in there!

I reckon really this ticket is a dupe of #369 once the backup process has been streamlined to be better suited to Drupal, rather than general DB dumps.

comment:8 in reply to: ↑ 4 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.35 to 1.45

I have just realised that one key reason for the problems over the last few days will have been caused by a degraded raid array which was repairing itself, this is now done so tonight things should be better.

Replying to jim:

Again, if the backup process is blindly backing up everything including sessions, access logs, watchdogs, caches and search indexes, it's going to be HUGE. If it backs up only relevant tables and ignores those that are just logs, generated data or caches, it'll be small.

The current backupninja config does backup all tables in all databases.

Is this list of tables that we don't need the data from good to use, it's the list of tables to skip from the script to sync the dev server from the live server:

@skip_tables                 = qw[cache cache_block cache_content cache_emfield_xml cache_filter cache_form cache_hierarchical_select cache_location cache_media_youtube_status cache_menu cache_mollom cache_page cache_path cache_rules cache_update cache_views cache_views_data sessions search_dataset search_index search_node_links search_total watchdog];

comment:9 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.45 to 1.55

I have added this to /etc/backup.d/20.mysql:

nodata = live.cache, live.cache_block, live.cache_content, live.cache_emfield_xml, live.cache_filter, live.cache_form, live.cache_hierarchical_select, live.cache_location, live.cache_media_youtube_status, live.cache_menu, live.cache_mollom, live.cache_page, live.cache_path, live.cache_rules, live.cache_update, live.cache_views, live.cache_views_data, live.sessions, live.search_dataset, live.search_index, live.search_node_links, live.search_total, live.watchdog, live_sharingengine.cache, live_sharingengine.cache_block, live_sharingengine.cache_content, live_sharingengine.cache_filter, live_sharingengine.cache_form, live_sharingengine.cache_location, live_sharingengine.cache_menu, live_sharingengine.cache_page, live_sharingengine.cache_path, live_sharingengine.cache_update, live_sharingengine.cache_views, live_sharingengine.cache_views_data, live_sharingengine.sessions

Jim can you check this list is OK, thanks.

comment:10 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.55 to 1.65

For reference these are the sizes of the sql dumps from last nights backup:

1.5M Dec  5 03:02 information_schema.sql
2.4G Dec  5 04:48 live_sharingengine.sql
584M Dec  5 03:24 live.sql
22M  Dec  5 04:48 live_workspaces.sql
396K Dec  5 04:48 mysql.sql
941M Dec  5 04:56 stats.sql
1.3M Dec  5 04:57 transwiki.sql

comment:11 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 1.65 to 1.7

The ignore tables looks good except for two points:

  • *.watchdog should no longer exist since I've uninstalled that module across all 3 sites
  • There's no mention of live_workspaces which should follow the same patterns of ignored tables.

I'm going to analyse the SE database now, but I have a hunch it'll be much smaller already due to the loss of the watchdog table.

comment:12 follow-up: ↓ 13 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.7 to 1.8

Thanks, I have updated the list of tables:

nodata = live.cache, live.cache_block, live.cache_content, live.cache_emfield_xml, live.cache_filter, live.cache_form, live.cache_hierarchical_select, live.cache_location, live.cache_media_youtube_status, live.cache_menu, live.cache_mollom, live.cache_page, live.cache_path, live.cache_rules, live.cache_update, live.cache_views, live.cache_views_data, live.sessions, live.search_dataset, live.search_index, live.search_node_links, live.search_total, live_sharingengine.cache, live_sharingengine.cache_block, live_sharingengine.cache_content, live_sharingengine.cache_filter, live_sharingengine.cache_form, live_sharingengine.cache_location, live_sharingengine.cache_menu, live_sharingengine.cache_page, live_sharingengine.cache_path, live_sharingengine.cache_update, live_sharingengine.cache_views, live_sharingengine.cache_views_data, live_sharingengine.search_dataset, live_sharingengine.search_index, live_sharingengine.search_node_links, live_sharingengine.search_total, live_sharingengine.sessions, live_workspaces.cache, live_workspaces.cache_block, live_workspaces.cache_content, live_workspaces.cache_filter, live_workspaces.cache_form, live_workspaces.cache_hierarchical_select, live_workspaces.cache_location, live_workspaces.cache_menu, live_workspaces.cache_page, live_workspaces.cache_views, live_workspaces.cache_views_data, live_workspaces.captcha_sessions, live_workspaces.sessions, live_workspaces.search_dataset, live_workspaces.search_index, live_workspaces.search_node_links, live_workspaces.search_total

comment:13 in reply to: ↑ 12 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 1.8 to 2.05

Oops, the syntax for the list of tabled to exclude was wrong, so the optimised backup didn’t run last night, it's fixed now.

I have also updated the documentation, wiki:NewLiveServer#backupninja

!MySQL isn't using any more memory than before, see https://kiwi.transitionnetwork.org/munin/webarch.net/quince.webarch.net/index.html

I'll check everything is OK tomorrow and if it is I'll close this ticket.

comment:14 Changed 5 years ago by jim

According to Munin the outage last night during the backups was very small, so we've made progress even without optimising the backups...

comment:15 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Total Hours changed from 2.05 to 3.05

The backups failed last night, there is a problem with the databases with tables to omit, error message below, I have done a lot of testing on the dev server and I think there is a problem with backupninja, I have emailed their list about it https://lists.riseup.net/www/info/backupninja

So I have commented out the nodata line -- better to have all the tables backed up rather than none!

== warnings from /etc/backup.d/20.mysql ==

Info: Initializing SQL dump method
Info: Successfully finished dump of mysql database information_schema
Warning: bash: -c: line 0: syntax error near unexpected token `-r' bash: -c: line 0: `set -o pipefail ; ( /usr/bin/mysqldump
+--defaults-extra-file=/etc/mysql/debian.cnf --lock-tables --complete-insert --add-drop-table --quick --quote-names
+--ignore-table=live.cache --ignore-table=live.cache_block --ignore-table=live.cache_content
+--ignore-table=live.cache_emfield_xml --ignore-table=live.cache_filter --ignore-table=live.cache_form
+--ignore-table=live.cache_hierarchical_select --ignore-table=live.cache_location
+--ignore-table=live.cache_media_youtube_status --ignore-table=live.cache_menu --ignore-table=live.cache_mollom
+--ignore-table=live.cache_page --ignore-table=live.cache_path --ignore-table=live.cache_rules --ignore-table=live.cache_update
+--ignore-table=live.cache_views --ignore-table=live.cache_views_data --ignore-table=live.sessions
+--ignore-table=live.search_dataset --ignore-table=live.search_index --ignore-table=live.search_node_links
+--ignore-table=live.search_total
 --ignore-table=live_sharingengine.cache --ignore-table=live_sharingengine.cache_block
+--ignore-table=live_sharingengine.cache_content --ignore-table=live_sharingengine.cache_filter
+--ignore-table=live_sharingengine.cache_form --ignore-table=live_sharingengine.cache_location
+--ignore-table=live_sharingengine.cache_menu --ignore-table=live_sharingengine.cache_page
+--ignore-table=live_sharingengine.cache_path --ignore-table=live_sharingengine.cache_update
+--ignore-table=live_sharingengine.cache_views --ignore-table=live_sharingengine.cache_views_data
+--ignore-table=live_sharingengine.search_dataset --ignore-table=live_sharingengine.search_index
+--ignore-table=live_sharingengine.search_node_links --ignore-table=live_sharingengine.search_total
+--ignore-table=live_sharingengine.sessions --ignore-table=live_workspaces.cache --ignore-table=live_workspaces.cache_block
+--ignore-table=live_workspaces.cache_content --ignore-table=live_workspaces.cache_filter --ignore-table=live_workspac
 es.cache_form --ignore-table=live_workspaces.cache_hierarchical_select --ignore-table=live_workspaces.cache_location
+--ignore-table=live_workspaces.cache_menu --ignore-table=live_workspaces.cache_page --ignore-table=live_workspaces.cache_views
+--ignore-table=live_workspaces.cache_views_data --ignore-table=live_workspaces.captcha_sessions
+--ignore-table=live_workspaces.sessions --ignore-table=live_workspaces.search_dataset
+--ignore-table=live_workspaces.search_index --ignore-table=live_workspaces.search_node_links
+--ignore-table=live_workspaces.search_total live; /usr/bin/mysqldump --defaults-extra-file=/etc/mysql/debian.cnf --lock-tables
+--complete-insert --add-drop-table --quick --quote-names --no-data live cache cache_block cache_content cache_emfield_xml
+cache_filter cache_form cache_hierarchical_select cache_location cache_media_youtube_status cache_menu cache_mollom cache_page
+cache_path cache_rules cache_update cache_vie cache_vie sessions search_dataset search_index s
 earch_node_links search_total ) -r '/var/backups/mysql/sqldump/live.sql''
Warning: Failed to dump mysql databases live
Warning: bash: -c: line 0: syntax error near unexpected token `-r' bash: -c: line 0: `set -o pipefail ; ( /usr/bin/mysqldump
+--defaults-extra-file=/etc/mysql/debian.cnf --lock-tables --complete-insert --add-drop-table --quick --quote-names
+--ignore-table=live.cache --ignore-table=live.cache_block --ignore-table=live.cache_content
+--ignore-table=live.cache_emfield_xml --ignore-table=live.cache_filter --ignore-table=live.cache_form
+--ignore-table=live.cache_hierarchical_select --ignore-table=live.cache_location
+--ignore-table=live.cache_media_youtube_status --ignore-table=live.cache_menu --ignore-table=live.cache_mollom
+--ignore-table=live.cache_page --ignore-table=live.cache_path --ignore-table=live.cache_rules --ignore-table=live.cache_update
+--ignore-table=live.cache_views --ignore-table=live.cache_views_data --ignore-table=live.sessions
+--ignore-table=live.search_dataset --ignore-table=live.search_index --ignore-table=live.search_node_links
+--ignore-table=live.search_total
 --ignore-table=live_sharingengine.cache --ignore-table=live_sharingengine.cache_block
+--ignore-table=live_sharingengine.cache_content --ignore-table=live_sharingengine.cache_filter
+--ignore-table=live_sharingengine.cache_form --ignore-table=live_sharingengine.cache_location
+--ignore-table=live_sharingengine.cache_menu --ignore-table=live_sharingengine.cache_page
+--ignore-table=live_sharingengine.cache_path --ignore-table=live_sharingengine.cache_update
+--ignore-table=live_sharingengine.cache_views --ignore-table=live_sharingengine.cache_views_data
+--ignore-table=live_sharingengine.search_dataset --ignore-table=live_sharingengine.search_index
+--ignore-table=live_sharingengine.search_node_links --ignore-table=live_sharingengine.search_total
+--ignore-table=live_sharingengine.sessions --ignore-table=live_workspaces.cache --ignore-table=live_workspaces.cache_block
+--ignore-table=live_workspaces.cache_content --ignore-table=live_workspaces.cache_filter --ignore-table=live_workspac
 es.cache_form --ignore-table=live_workspaces.cache_hierarchical_select --ignore-table=live_workspaces.cache_location
+--ignore-table=live_workspaces.cache_menu --ignore-table=live_workspaces.cache_page --ignore-table=live_workspaces.cache_views
+--ignore-table=live_workspaces.cache_views_data --ignore-table=live_workspaces.captcha_sessions
+--ignore-table=live_workspaces.sessions --ignore-table=live_workspaces.search_dataset
+--ignore-table=live_workspaces.search_index --ignore-table=live_workspaces.search_node_links
+--ignore-table=live_workspaces.search_total live_sharingengine; /usr/bin/mysqldump --defaults-extra-file=/etc/mysql/debian.cnf
+--lock-tables --complete-insert --add-drop-table --quick --quote-names --no-data live_sharingengine cache cache_block
+cache_content cache_filter cache_form cache_location cache_menu cache_page cache_path cache_update cache_vie cache_vie
+search_dataset search_index search_node_links search_total sessions ) -r '/var/backups/mysql/sqldump/live_
 sharingengine.sql''
Warning: Failed to dump mysql databases live_sharingengine
Warning: bash: -c: line 0: syntax error near unexpected token `-r' bash: -c: line 0: `set -o pipefail ; ( /usr/bin/mysqldump
+--defaults-extra-file=/etc/mysql/debian.cnf --lock-tables --complete-insert --add-drop-table --quick --quote-names
+--ignore-table=live.cache --ignore-table=live.cache_block --ignore-table=live.cache_content
+--ignore-table=live.cache_emfield_xml --ignore-table=live.cache_filter --ignore-table=live.cache_form
+--ignore-table=live.cache_hierarchical_select --ignore-table=live.cache_location
+--ignore-table=live.cache_media_youtube_status --ignore-table=live.cache_menu --ignore-table=live.cache_mollom
+--ignore-table=live.cache_page --ignore-table=live.cache_path --ignore-table=live.cache_rules --ignore-table=live.cache_update
+--ignore-table=live.cache_views --ignore-table=live.cache_views_data --ignore-table=live.sessions
+--ignore-table=live.search_dataset --ignore-table=live.search_index --ignore-table=live.search_node_links
+--ignore-table=live.search_total
 --ignore-table=live_sharingengine.cache --ignore-table=live_sharingengine.cache_block
+--ignore-table=live_sharingengine.cache_content --ignore-table=live_sharingengine.cache_filter
+--ignore-table=live_sharingengine.cache_form --ignore-table=live_sharingengine.cache_location
+--ignore-table=live_sharingengine.cache_menu --ignore-table=live_sharingengine.cache_page
+--ignore-table=live_sharingengine.cache_path --ignore-table=live_sharingengine.cache_update
+--ignore-table=live_sharingengine.cache_views --ignore-table=live_sharingengine.cache_views_data
+--ignore-table=live_sharingengine.search_dataset --ignore-table=live_sharingengine.search_index
+--ignore-table=live_sharingengine.search_node_links --ignore-table=live_sharingengine.search_total
+--ignore-table=live_sharingengine.sessions --ignore-table=live_workspaces.cache --ignore-table=live_workspaces.cache_block
+--ignore-table=live_workspaces.cache_content --ignore-table=live_workspaces.cache_filter --ignore-table=live_workspac
 es.cache_form --ignore-table=live_workspaces.cache_hierarchical_select --ignore-table=live_workspaces.cache_location
+--ignore-table=live_workspaces.cache_menu --ignore-table=live_workspaces.cache_page --ignore-table=live_workspaces.cache_views
+--ignore-table=live_workspaces.cache_views_data --ignore-table=live_workspaces.captcha_sessions
+--ignore-table=live_workspaces.sessions --ignore-table=live_workspaces.search_dataset
+--ignore-table=live_workspaces.search_index --ignore-table=live_workspaces.search_node_links
+--ignore-table=live_workspaces.search_total live_workspaces; /usr/bin/mysqldump --defaults-extra-file=/etc/mysql/debian.cnf
+--lock-tables --complete-insert --add-drop-table --quick --quote-names --no-data live_workspaces cache cache_block
+cache_content cache_filter cache_form cache_hierarchical_select cache_location cache_menu cache_page cache_vie cache_vie
+captcha_sessions sessions search_dataset search_index search_node_links search_total ) -r '/var/backups/mysql/
 sqldump/live_workspaces.sql''
Warning: Failed to dump mysql databases live_workspaces
Info: Successfully finished dump of mysql database mysql
Info: Successfully finished dump of mysql database stats

comment:16 Changed 4 years ago by chris

  • Status changed from new to closed
  • Resolution set to wontfix

This has been overtaken by the migration to wiki:PuffinServer.

Note: See TracTickets for help on using tickets.