I can't help but to see that Giant Bomb has been littered with everyday sort of scam spam in the comments and forum such as...
my neighbor's sister-in-law makes $999999 every hour on the computer. She has been out of a job for 9999 months but last month her paycheck was $99999just working on the computer for a few hours. go to this site........ www.givemeavirus.com/
I don't know how much you guys want to put resources into filtering the spam and attempt to make Matt Rorie's job easier but it's clear that these post have a very similar pattern..
Im making over $30h a month working part time. I kept hearing other people tell me how much money they can make online so I decided to look into it. Well, it was all true and has totally changed my life. This is what I do----> (website)
my friend's step-aunt makes $86/hr on the internet. She has been out of a job for nine months but last month her paycheck was $18678 just working on the internet for a few hours. browse around this web-site........ (website)
I was wondering if there was some way to program a spam filter that would automatically detect post like this using a heuristic method of detection.
For example every one of these tends to start with...
My (relationship to other person, Aunt, Neighbor, Friend Ect.)
So you could do a flagging system for example that detect this particular format with a thesaurus or dictionary of nouns that relate to a persons relationship. Now to avoid false positives and type 1 errors for other posters this is obviously going to require more detection.
So another condition is if we detect $x tied to a specific time.
Almost everyone goes 'makes/making (over) $x (time)"
and we can tie that to a conditional if the string contains one of the following words shortly after.
"internet, home, office, computer"
Finally the nail in the coffin could be a website that are ALWAYS put at the end of the string.
"www.workathome.com, ect."
In fact you don't have to make a blacklist as almost all of these spam comments put a website at the very end of the string, never in the middle.
So if all these conditions are met, it's highly likely we have spam, with little error in the manner of having type 1 errors where legitimate users would be blocked because they typed something similar.
I do realize that this is a losing battle in some cases, they are going to get more crafty to send out their bullshit, but any attempts to automate this would certainly help admins and make Giant Bomb a better user experience.
There is a chance that they might give up and go to websites more easily exploited in this fashion as well, rather than dedicate resources to subvert the spam system.
Log in to comment