Ticket #516 (assigned defect)
Search: not showing events or initiatives
Reported by: | ed | Owned by: | ed |
---|---|---|---|
Priority: | minor | Milestone: | Production |
Component: | Drupal modules & settings | Keywords: | |
Cc: | Estimated Number of Hours: | 0.0 | |
Add Hours to Ticket: | 0 | Billable?: | yes |
Total Hours: | 0 |
Description
Search for Woking from homepage. I know that there are two events and two initaitives with Woking in them. Neither are shown:
https://www.transitionnetwork.org/search/node/woking
Advanced search does not bring them up either (ticking the initiatives box).
If I am in the initiatives section it returns the TIs though.
General search used to show TIs and events. Now it's only showing results of word searches.
General search needs to show TIs and events, and other nodes.
Please look into this.
Change History
comment:2 Changed 4 years ago by jim
OK so the search was stuck, probably because the index was not imported (because it'd have made the DB dump HUGE) but some meta data was. This must have left the search system in a confused state.
So I logged in to Chive DB manager and cleared any timestamps on the index:
UPDATE search_dataset SET reindex=0 WHERE reindex<>0
Then I cleared the index from the Drupal admin screen (belt and braces), then I ran this Drush command -- still running at time of writing -- that is reindexing all the content right now...
drush www.transitionnetwork.org search-index --immediate
Seems to be slow work so I'll leave this ticking over and come back later.
comment:3 Changed 4 years ago by jim
OK it's still stuck.
I have a suspicion that Drupal's out-of-the-box search module is not going to manage with a site as big as Transition's now. Not that this is necessarily the cause of this issue, but TN.org is big these days and we should consider changing the search system.
That aside, I've spotted something else...
It appears there are references for user profile nodes that are not editable. These might be the spam ones we deleted, or they got out of sync another way... Looking into how to kill these now.
comment:4 Changed 4 years ago by jim
OK seems to be working again... This query returned all the user profile nodes without a corresponding user:
SELECT node.nid AS nid, node.title AS node_title, node.uid AS node_uid, users.uid AS user_uid FROM node node LEFT JOIN users users ON node.uid = users.uid WHERE node.type = 'profile' AND users.uid IS NULL ORDER BY node.nid ASC
There were 668 of these and Drupal could not do anything with the pages, or delete them -- looks like all spam users, meaning just deleting the user account DOES NOT get rid of their profile. Annoying.
Anyway, I killed them with this:
DELETE node FROM node LEFT JOIN users ON node.uid = users.uid WHERE node.type = 'profile' AND users.uid IS NULL LIMIT 0, 1
And now the re-indexing process seems to be going again... Woop!
Note this might have left some bits lying around the database (field entries without the associated node they belonged to), so I'll try to tidy up now too -- will definitely back up before going much further.
I'll close this ticket when it's finished indexing, and the cleanup is done, or I'll carry on debugging as needed.
comment:5 Changed 4 years ago by jim
OFF topic: database integrity
Following on from comment 4 above, I wanted to check the integrity of the field tables, so I crafted and ran this query:
SELECT count(cck.nid) FROM `content_field_region` cck LEFT JOIN node node on cck.nid = node.nid WHERE node.nid IS NULL
Which returned 1,173 rows. I then replaced the 'content_field_region' with other fields found on the user profile nodes (which have their own database table) so the full breakdown is:
- content_field_region = 1173
- content_field_initiative = 1165
- content_field_themes = 6256
- content_field_training_attended = 1165
- content_field_roles_offered = 1285
- content_field_other_websites = 1181
- content_field_user_types = 1165
So what this means is there are around 1200 entries in field database storage for the above tables that do not have an associated node. This means the data is not properly referentially integral, and ideally these should be cleaned up one day.
The risk is low but it's worth doing a the rest of the audit (with the other fields that have tables of their own) and then cleaning these up at some point. These will have been caused by manual deletes and imports over the years. Drupal would normally keep these sorted but we've had to bypass Drupal a few times.
Or we can leave it as is and let the migration to Drupal 8 resolve this. The cost is a bit more database space and a slightly slower site... Though this *may* cause a bug or two.
Anyway, this ticket is about search so Ed if you are concerned by the above please let me know.
ON topic: Search index
The index is still going well, I'm going to leave it running until it's done.
comment:6 Changed 4 years ago by jim
The index is now done barring one item. Hmm. That's for another day...
The search for Woking now behaves as expected.
Left to resolve on this ticket then:
- Ed: should we look into a more powerful search system? Apache Solr, for example, would do some pretty cool stuff.
- Ed: should I try to review, back up, then clean the database of these old missing node field data?
- Jim: WTF is going on with that last node that won't index?
The search index is only 46% complete, and no amount of running Cron will make it go higher... That's why much content (54%) isn't found.
I can't do any more from work but will look into this tonight.