Spam issue on community site: make profile pages not accessible for logged-out visitors/webcrawlers

As I see it, the visibility of profile pages for everyone including search engines is kind of attractive for backlink spammers. They don't even need to post any content (with the risk of making the spam more obvious resulting in fast deletion of the account). They only need to register an account and add some link to some profile field. There are so many profiles on the Elgg community site that I suspect were created for the sole purpose of backlink spamming.

While making the profile pages no longer accessible for webcrawlers / logged-out visitors wouldn't change anything for the existing profile pages containing spam links (and it would be quite an amount of work to clean up the memberlist from these annoying accounts) it might at least make the creation of new spam accounts much less attractive for spammers if they no longer have any benefit from it with regards to backlink spamming.

A plugin with some functionality as implemented in my Private profile plugin (https://elgg.org/plugins/1860995) could be helpful (you wouldn't even need to allow for registered users to be able to restrict access to their profile pages).

Regarding spammers posting content, any changes in the access of profile pages wouldn't change much. But here's the question if these spam accounts were created with Twitter credentials (not requiring email validation) or by following the regular stand-alone account creation procedure. As it seems that the spam issue has stopped since the normal account creation has been disabled it would mean that registering with Twitter accounts wouldn't be the primary spam attack vector. But I'm not sure... it might be just a coincidence that it has stopped. Even so, I think it would be less of an annoyance to no longer allow for registering with Twitter credentials instead of requiring people to first join Twitter to be able to register an account at the community site. The question is rather why the identification of spammers at time of account creations seems to work less effective than in the past (any issues with StopForumSpam checks?). And it might also be worth investigating how to improve the "Mark as spam" functionality resulting in faster automatic banning of an account (even if only temporarily until verfied by an admin). As I understand the code there's a threshold of reports per object that needs to be reached first before the total number of reports per user is checked. This means that a certain number of trusted users would need to mark (possible every) spam content item of a spammer before the spam accounts gets banned. Maybe it would be more effective to check the total number of spam reports for a user account independently of the reports per object (and maybe adjust/increase the number of reports necessary per user account resulting in an automatic ban if needed). Avoiding/preventing false positives might also only be possible by reducing/restricting the number of trusted users to "really" trusted users (long term members specifically selected) and possibly also introduce a whitelist of accounts (trusted users whose content can never be marked as spam or where any reports about spam made are not accounted for by the automatic banning of accounts).

  • As it seems that the spam issue has stopped since the normal account creation has been disabled it would mean that registering with Twitter accounts wouldn't be the primary spam attack vector.

    People aren't currently able to register with Twitter either.

    make profile pages not accessible for logged-out visitors/webcrawlers

    I'm not sure how closely spammers check something like this, but it might be worth trying.

  • People aren't currently able to register with Twitter either.

    This would mean that my original suspicion that the spammers posting content are joining with Twitter credentials would still hold. Not sure about profile backlink spammers. These accounts might be created by different people, possible even human spammers.

    I'm not sure how closely spammers check something like this, but it might be worth trying.

    I can't prove that it would help but I have the impression (based on my own site) that walled-garden sites and/or sites that deny crawling by robots.txt are much less attractive for spammers. My guess is that spammers are crawling the net on their own, too, to find interesting target sites. That's maybe also the reason for the high attraction for backlink spam here on the community site because there are so many of these accounts here since such a long time.

  • I'm inclined to agree, I don't think there's much value in the profiles as-is beyond highlighting some of the more active developers by listing their plugins for example.  We could go with a much pared down profile page with a static layout (no widgets) just hard coded sections for activity, recent plugins, or something.

  • I have disabled registration page on most sites and load the form in a lightbox. We don't see many, if any, spam registrations.

    One way ia to disable the profile page for all users and making it an opt-in account setting (I imagine the bots won't figure it out).

Feedback and Planning

Feedback and Planning

Discussions about the past, present, and future of Elgg and this community site.