Ticket #761 (new defect)

Opened 2 years ago

Last modified 2 years ago

Spam account cull

Reported by: ed Owned by: sam
Priority: major Milestone: Maintenance
Component: Drupal modules & settings Keywords:
Cc: annesley, paul, chris Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 1.75

Description

There are bucketloads of spam accounts swamping us. Spam commeting is swarming again. I just did several pages of deleting spam accounts. No doubt I nailed some humans too (sorry Sam if this comes back to you); but the overwhelming majority of new accounts are spam.

It's crap and we need to have another spam sweep - especially if we're staying in D6 for a while.

See work done in Feb 2013: #461
See wiki page done in Feb 2013: https://wiki.transitionnetwork.org/Spam_accounts

SAM I'm going to suggest you start looking at it, and get your head around it, and the various modules and processes we've got running, then ask you to act/escalate accordingly.

Change History

comment:1 Changed 2 years ago by annesley

@Sam: let me know if you need more eyes on this.

comment:2 Changed 2 years ago by sam

I ran this report: https://www.transitionnetwork.org/admin/reports/spam/same-names and deleted all users with the same first/last name.

I see that the registration form isn't currently protected by Mollom:

https://www.transitionnetwork.org/admin/settings/mollom/add

This might be an easy win, unless you have tried before and run into problems?

Shall I try it?

Thanks

Sam

Last edited 2 years ago by sam (previous) (diff)

comment:3 Changed 2 years ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 0.0 to 0.25

comment:4 Changed 2 years ago by sam

Hi Paul

Just spotted that Mollom would set up a honeypot too if we enabled it on that registration form

https://mollom.com/features
"Hidden honeypots - In our Drupal Mollom module, we've added a basic honeypot to all forms protected by Mollom, through the use of a hidden field, a common way to trick spam bots into revealing themselves. This significantly reduces the number of spam bots attempting to game your web forms."

I think we should give Mollom a try in the first instance, then look for additional solutions if it doesn't fix it.

Any objections to enabling it for the registration form?

On Wordpress sites I have found this helps a lot: http://blog.fili.nl/the-anti-captcha-challenge/ the user does need to have javascript and cookies enabled.

Thanks

Sam

comment:5 Changed 2 years ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.125
  • Total Hours changed from 0.25 to 0.375

Hi Sam,

Good. Let's give that a try.

Best, Paul

comment:6 Changed 2 years ago by annesley

Sounds good!

comment:7 Changed 2 years ago by sam

I just enabled Mollom on the registration form and attempted to register a test user to check it's all OK.

I did the initial ReCaptcha? and all seemed to be OK

Due to the spammy username I used I was additionally presented with the Mollom Captcha, answered it correctly.

I was then blocked from proceeding, I'm pretty sure by Botcha (based on the error message): https://www.drupal.org/project/botcha Looking at the new issues, 27 open bugs & non existent developer response rate it looks to me like this module is falling into disrepair. It seems that at the moment it's not working in that it's blocking real users and allowing spammers in.

I see this issue has history here: /trac/ticket/514 And I do get a trickle of support enquiries from users having problems getting registered.

I propose that we disable the Botcha module for 24 hours, leave ReCaptcha?, Spambot & Mollom enabled on the form and see what happens.

If in 24 hours there is a marked increase in Spam registrations we could turn it on again.

Any objections?

Thanks

Sam

comment:8 Changed 2 years ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.125
  • Total Hours changed from 0.375 to 0.5

Sounds like a good plan. +1

I had to remove that module to get a local version of the site running.

Last edited 2 years ago by paul (previous) (diff)

comment:9 Changed 2 years ago by sam

OK so over the last 24 hours we have had 8 users register.

These one is definitely a spammer
https://www.transitionnetwork.org/user/20282/spambot

These two seem to be likely spammers
https://www.transitionnetwork.org/user/20286/spambot (Forex, sex returned on google search for email address
https://www.transitionnetwork.org/user/20282/spambot ('how to remove virus's' returned on google search for the xxx.mail.ru email)

The remaining five appear to be legit users at first glance
https://www.transitionnetwork.org/user/20285/spambot
https://www.transitionnetwork.org/user/20283/spambot
https://www.transitionnetwork.org/user/20290/spambot
https://www.transitionnetwork.org/user/20287/spambot
https://www.transitionnetwork.org/user/20288/spambot

I have disabled the Botcha on the registration form. I'll check back in an hour to make sure I haven't opened the floodgates, then in 24 hours see if the proportion of legit/ spam accounts has changed significantly.

Thanks

Sam

comment:10 Changed 2 years ago by sam

Actually the webgui won't work to disable it on a per-form basis. Seems even that is broken.

So I have disabled the module. As before I'll check back in an hour.

Version 0, edited 2 years ago by sam (next)

comment:11 follow-up: ↓ 13 Changed 2 years ago by sam

Well that didn't work on two counts.

1, More spammy looking users registered over the last 24 hours

2, We went over the limits of spam lookups set for the Mollom free version.

I have therefore restarted the Botcha module and disabled Mollom on the registration form.

Looking at Botcha a bit more these are the Honeypot/checks it carries out:
https://www.transitionnetwork.org/admin/user/botcha/recipebook/default

NoResubmit? (working without JavaScript?): The method consideres as spam all submissions made using already submitted forms.

Timegate (working without JavaScript?): During the form generation hidden by CSS field is added to the form containing the timestamp. At the moment of submission this timestamp is used for spam check: if the form is submitted too fast, the submission is considered as spam. The minimum number of seconds that must elapse from the time of form generation is an adjustable parameter.

Honeypot: Implementation of honeypot-trap. The gist of it is that the field is added to the form with a certain value, which is then modified by JS. Spam is any form submission, the calculated value of which is not the same as we need.
Honeypot2: The same as above, but using as a source of calculation not the value of a particular field, but the data from CSS.

ObscureUrl?: Similar to the previous recipe: constructed by JS is compared to the need. The difference is that the initial value is passed through the GET-parameter.

Not sure where we go from here. Personally I'm a bit concerned about the false positives we are getting from Botcha. For every user that emails me I guess there will be 5? 10? 20? who just give up. We could try selectively disabling some of the recipes its using and see if we can eliminate the false positives?

That doesn't of course address the spam registrations. I had Mollom set to 'normal' so I could try it on 'strict' and see if that helps, however even if it does we'd then need to pay $30/month for the increased number of submissions we get from the registration form.

Anyone got any ideas for free solutions we could try?

Thanks

Sam


comment:12 follow-up: ↓ 15 Changed 2 years ago by ed

and there is a long history of the work done before on #461

comment:13 in reply to: ↑ 11 Changed 2 years ago by paul

Replying to sam:

Well that didn't work on two counts.

1, More spammy looking users registered over the last 24 hours

We will always get spammy looking registration accounts on any of our public websites. If these accounts start leaving spammy content we will then need to disable their account and block their Ip address.

Having a look a the website logs ...

2, We went over the limits of spam lookups set for the Mollom free version.

I have therefore restarted the Botcha module and disabled Mollom on the registration form.

Looking at Botcha a bit more these are the Honeypot/checks it carries out:
https://www.transitionnetwork.org/admin/user/botcha/recipebook/default

NoResubmit? (working without JavaScript?): The method consideres as spam all submissions made using already submitted forms.

Timegate (working without JavaScript?): During the form generation hidden by CSS field is added to the form containing the timestamp. At the moment of submission this timestamp is used for spam check: if the form is submitted too fast, the submission is considered as spam. The minimum number of seconds that must elapse from the time of form generation is an adjustable parameter.

Honeypot: Implementation of honeypot-trap. The gist of it is that the field is added to the form with a certain value, which is then modified by JS. Spam is any form submission, the calculated value of which is not the same as we need.
Honeypot2: The same as above, but using as a source of calculation not the value of a particular field, but the data from CSS.

ObscureUrl?: Similar to the previous recipe: constructed by JS is compared to the need. The difference is that the initial value is passed through the GET-parameter.

Not sure where we go from here. Personally I'm a bit concerned about the false positives we are getting from Botcha. For every user that emails me I guess there will be 5? 10? 20? who just give up. We could try selectively disabling some of the recipes its using and see if we can eliminate the false positives?

That doesn't of course address the spam registrations. I had Mollom set to 'normal' so I could try it on 'strict' and see if that helps, however even if it does we'd then need to pay $30/month for the increased number of submissions we get from the registration form.

Anyone got any ideas for free solutions we could try?

Thanks

Sam


comment:14 Changed 2 years ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 0.5 to 0.75

Sorry, forgot to add time

comment:15 in reply to: ↑ 12 Changed 2 years ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 0.75 to 1.25

Replying to ed:

and there is a long history of the work done before on #461

Thanks Ed. I had a quick read through this to get up to speed.

Some additional thoughts on way to reduce spam:

  1. Only allow users who have a *member* role access to post content / comments on the site - without moderation. This member role could be :
  1. requested for: from the editor
  2. automatically given to a new user after the user has done a few things on the website (validated their email address, posted a comment (accepted by a moderator) , ..)
  3. automatically given to the user after being signed up a paid member.


The third option would also eliminate spam user accounts as well as spam content.

comment:16 Changed 2 years ago by ed

  1. That would upset the bloggers - and we don't really have the resources to handle pre-moderation generally
  2. We looked into 'whitelists' whereby you pre-moderate once then they get access - can't do it in D6
  3. no paid members in this system...

there is a set up which deletes all un-authenticated accts after a period of time

comment:17 Changed 2 years ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.125
  • Total Hours changed from 1.25 to 1.375

Ed

Is there is anything documented for (ii) It should be possible to do this. May need to write some of the code. Let me know if you want to explore this further.

comment:18 Changed 2 years ago by ed

Paul - nothing documented for ii.

And no custom code on old site if at all possible... :)

comment:19 Changed 2 years ago by paul

I forgot! Cool. Thanks Ed

comment:20 Changed 2 years ago by ed

Sam any progress/news on this?

comment:21 Changed 2 years ago by sam

Hi Ed.

I have just manually purged the spammy comments again.

We could do this;

  • Require comments to be pre-moderated for 'Authenticated' users (in permissions Skip Comment Approval check box)
  • Create a new role as Paul has suggested, 'Trusted' or whatever.
  • Create a rule that adds this 'Trusted' role to users who have had a comment approved.
  • Allow 'Trusted' users to comment without pre-moderation

I realise this would increase admin overhead slightly, but honestly if we are checking comments anyway I think it would be a similar workload.

The other thing I thought of is to not allow links in comments or forum posts..

Could do this by simply adding <a> to the disallowed tags on 'basic html' input, only editors and admins could then create links..

comment:22 Changed 2 years ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.125
  • Total Hours changed from 1.375 to 1.5

Another idea is to have a team of moderators who can contribute time to check comments that need approval. I could be your first volunteer!

comment:23 Changed 2 years ago by ed

Thanks both. OK. It's a start on the commens, if not the user accounts. SO:

  1. Let's proceed with the proposal from Paul/Sam? - please use this wiki page to document our proposal: https://wiki.transitionnetwork.org/Spam_accounts to get agreement *first* before we do anything on TN.org. I will need a very very clear page to refer to when we handle the grizzling from users.
  1. disabling tags in comments: would that put off bots and human spammers? It would certainly upset some honest users
  1. team of moderators: have tried variants of it in the past - notably the fora - without success - in the past, people show interest, then agree, then don't do it. am happy if paul you want to volunteer, and foresee Sam and Ed enjoying this
  1. We need to be careful about timing and communicating this change - August is a good time to do it, but ED to write up a blog ready for the newsletter, approve with Sarah, let Rob know when he's back etc...

comment:24 follow-up: ↓ 26 Changed 2 years ago by sam

Ah slight flaw in this plan; we don't have the rules module installed. ( I was just checking Rules can trigger on comment publish, Not 100% sure it can)

I'll investigate whether we can do the role adding using Triggers & Actions instead (which is already installed)

Thinking about it some more, perhaps we (as a wider team) could just pre-moderate the first comment from a user that contains a URL? It would mean that most users would publish straight away, & we'd have to look at a smaller number of posts..

Not sure how we would do it technically, but seems like a good compromise between usability & spam?

comment:25 Changed 2 years ago by annesley

interesting ideas!!
i'm only vaguely following this but i have done a lot of work with triggers and actions, so let me know. i guess this is something that will change a lot when we move forward to TNv3. i wander how WP4.0 handles spam? some research would be useful. i'll keep it on the TODO.

comment:26 in reply to: ↑ 24 Changed 2 years ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 1.5 to 1.75

Replying to sam:

Ah slight flaw in this plan; we don't have the rules module installed. ( I was just checking Rules can trigger on comment publish, Not 100% sure it can)

I'll investigate whether we can do the role adding using Triggers & Actions instead (which is already installed)

Thinking about it some more, perhaps we (as a wider team) could just pre-moderate the first comment from a user that contains a URL? It would mean that most users would publish straight away, & we'd have to look at a smaller number of posts..

Not sure how we would do it technically, but seems like a good compromise between usability & spam?

I think all the approaches mentioned above would require some custom code.

@Sam
Would you advise how much comment spam you have seen over the last week? and any other information that would help to build a picture of the current problem for later reference.

comment:27 Changed 2 years ago by ed

I just deleted about 15-20 comments from node types blog and Transition Network news - mostly from the first page of the listings https://www.transitionnetwork.org/admin/content/comment/recent?type=network_news

Note: See TracTickets for help on using tickets.