Ticket #516 (assigned defect)

Opened 4 years ago

Last modified 4 years ago

Search: not showing events or initiatives

Reported by: ed Owned by: ed
Priority: minor Milestone: Production
Component: Drupal modules & settings Keywords:
Cc: Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 0

Description

Search for Woking from homepage. I know that there are two events and two initaitives with Woking in them. Neither are shown:
https://www.transitionnetwork.org/search/node/woking

Advanced search does not bring them up either (ticking the initiatives box).

If I am in the initiatives section it returns the TIs though.

General search used to show TIs and events. Now it's only showing results of word searches.

General search needs to show TIs and events, and other nodes.

Please look into this.

Change History

comment:1 Changed 4 years ago by jim

The search index is only 46% complete, and no amount of running Cron will make it go higher... That's why much content (54%) isn't found.

I can't do any more from work but will look into this tonight.

comment:2 Changed 4 years ago by jim

OK so the search was stuck, probably because the index was not imported (because it'd have made the DB dump HUGE) but some meta data was. This must have left the search system in a confused state.

So I logged in to Chive DB manager and cleared any timestamps on the index:

UPDATE search_dataset SET reindex=0 WHERE reindex<>0

Then I cleared the index from the Drupal admin screen (belt and braces), then I ran this Drush command -- still running at time of writing -- that is reindexing all the content right now...

drush www.transitionnetwork.org search-index --immediate

Seems to be slow work so I'll leave this ticking over and come back later.

comment:3 Changed 4 years ago by jim

OK it's still stuck.

I have a suspicion that Drupal's out-of-the-box search module is not going to manage with a site as big as Transition's now. Not that this is necessarily the cause of this issue, but TN.org is big these days and we should consider changing the search system.

That aside, I've spotted something else...

It appears there are references for user profile nodes that are not editable. These might be the spam ones we deleted, or they got out of sync another way... Looking into how to kill these now.

comment:4 Changed 4 years ago by jim

OK seems to be working again... This query returned all the user profile nodes without a corresponding user:

SELECT node.nid AS nid, node.title AS node_title, node.uid AS node_uid, users.uid AS user_uid
FROM node node
LEFT JOIN users users ON node.uid = users.uid
WHERE node.type = 'profile' AND users.uid IS NULL
ORDER BY node.nid ASC

There were 668 of these and Drupal could not do anything with the pages, or delete them -- looks like all spam users, meaning just deleting the user account DOES NOT get rid of their profile. Annoying.

Anyway, I killed them with this:

DELETE node FROM node
LEFT JOIN users ON node.uid = users.uid
WHERE node.type = 'profile' AND users.uid IS NULL
LIMIT 0, 1

And now the re-indexing process seems to be going again... Woop!

Note this might have left some bits lying around the database (field entries without the associated node they belonged to), so I'll try to tidy up now too -- will definitely back up before going much further.

I'll close this ticket when it's finished indexing, and the cleanup is done, or I'll carry on debugging as needed.

comment:5 Changed 4 years ago by jim


OFF topic: database integrity

Following on from comment 4 above, I wanted to check the integrity of the field tables, so I crafted and ran this query:

SELECT count(cck.nid) FROM `content_field_region` cck
LEFT JOIN node node on cck.nid = node.nid
WHERE node.nid IS NULL

Which returned 1,173 rows. I then replaced the 'content_field_region' with other fields found on the user profile nodes (which have their own database table) so the full breakdown is:

  • content_field_region = 1173
  • content_field_initiative = 1165
  • content_field_themes = 6256
  • content_field_training_attended = 1165
  • content_field_roles_offered = 1285
  • content_field_other_websites = 1181
  • content_field_user_types = 1165

So what this means is there are around 1200 entries in field database storage for the above tables that do not have an associated node. This means the data is not properly referentially integral, and ideally these should be cleaned up one day.

The risk is low but it's worth doing a the rest of the audit (with the other fields that have tables of their own) and then cleaning these up at some point. These will have been caused by manual deletes and imports over the years. Drupal would normally keep these sorted but we've had to bypass Drupal a few times.

Or we can leave it as is and let the migration to Drupal 8 resolve this. The cost is a bit more database space and a slightly slower site... Though this *may* cause a bug or two.

Anyway, this ticket is about search so Ed if you are concerned by the above please let me know.


ON topic: Search index
The index is still going well, I'm going to leave it running until it's done.

comment:6 Changed 4 years ago by jim

The index is now done barring one item. Hmm. That's for another day...

The search for Woking now behaves as expected.

Left to resolve on this ticket then:

  • Ed: should we look into a more powerful search system? Apache Solr, for example, would do some pretty cool stuff.
  • Ed: should I try to review, back up, then clean the database of these old missing node field data?
  • Jim: WTF is going on with that last node that won't index?

comment:7 Changed 4 years ago by ed

  • Priority changed from major to minor

moving this ticket to minor as it's about search, and there are some remaining issues. Opening up the database integrity ticket #523 separately with Jim as owner and set to minor

comment:8 Changed 4 years ago by ed

  • Owner changed from jim to ed
  • Status changed from new to assigned
Note: See TracTickets for help on using tickets.