Avoiding the splogs

by billso on Thursday, 13 March 2008

A post by Word­Press found­ing devel­oper Matt Mul­len­weg claims that 80 per­cent of the world’s blogs are actu­ally spam blogs. Kevin Bur­ton claims this num­ber is as high as 90 per­cent, and that most of the spam blogs are hosted on Google’s free Blog­ger service.

I use a publicly-accessible blog to run my courses because it is eas­ier for my stu­dents to access this site. I’ve tried hid­ing blog arti­cles behind a password-protected walled gar­den like Moo­dle or WebCT in the past, and that was more trou­ble than the effort was worth.

I devote a cou­ple of hours each day to this site, because it’s a great place to post the exam­ple, arti­cles and links I dis­cuss in my teach­ing and con­sult­ing engage­ments. Over the last year, I’ve learned a lot about how the spl­o­gos­phere works.

The splog busi­ness model

Most of these splogs use a sim­i­lar model: auto­mated scripts search the Inter­net for key­words in legit­i­mate blog posts and RSS feeds, such as my web site. There are between 8 and 14 mil­lion active blogs on the Inter­net, accord­ing to Matt’s esti­mates. The rest of the 100 mil­lion blogs are splogs that use soft­ware to scrape the first few lines of another blog’s arti­cles, and then post an excerpt on the spam blog’s web site. Many spam blogs also try to leave track­backs or spings on legit­i­mate blogs, in an effort to draw vis­i­tors away from the real blogs.

Splog oper­a­tors have thin profit mar­gins, so they usu­ally oper­ate dozens or hun­dreds of sites. Sites earn rev­enue from keyword-based adver­tis­ing links on their splog pages, as well as links to advertising-heavy web sites.

Not for human consumption

Most splog oper­a­tors also try to get high rank­ings in search engines, so that Google users will see the splog arti­cles before they find the orig­i­nal posts. Slogs are writ­ten for search engines, not real peo­ple, to read. Pla­gia­rism Today ran one of the ear­li­est arti­cles about the splog busi­ness model, back in 2005. This Wired arti­cle came out a month ear­lier, and has some addi­tional information.

Fight­ing the splog­gers is easy

I’ve taken a few sim­ple steps to keep splog­gers from scrap­ing my arti­cles and leav­ing linkbacks to their sites, with­out wreck­ing my own web site in the process. It takes me about 10 min­utes each week to man­age these tools.

My com­ment forms require users to com­plete a reCAPTCHA ver­i­fi­ca­tion form. This step elim­i­nates almost all of the spam blogs that I’ve caught in my server logs. The only draw­back with reCAPTCHA is that many mobile web users can­not leave comments.

I also run anti-splog soft­ware that uses an Internet-based list of known splogs to iden­tify and quar­an­tine spu­ri­ous linkbacks.

Another piece of soft­ware searches for splog entries based on my arti­cles. Occa­sion­ally, I leave com­ments on splog posts that are based on my arti­cles, just to let them know I caught them.

My posts are also copy­righted under a Cre­ative Com­mons license, as I dis­cussed on 21 Feb­ru­ary 2008. I love it when real blog­gers link back to my arti­cles, as long as they give me credit for my writing.

Share

Comments on this entry are closed.

Previous post:

Next post: