billso.com

Bill Sodeman writes about management, mobile computing and information systems

billso.com header image 4

Avoiding the splogs

all

Posted Thursday, 13 March 2008

A post by WordPress founding developer Matt Mullenweg claims that 80 percent of the world’s blogs are actually spam blogs. Kevin Burton claims this number is as high as 90 percent, and that most of the spam blogs are hosted on Google’s free Blogger service.

I use a publicly-accessible blog to run my courses because it is easier for my students to access this site. I’ve tried hiding blog articles behind a password-protected walled garden like Moodle or WebCT in the past, and that was more trouble than the effort was worth.

I devote a couple of hours each day to this site, because it’s a great place to post the example, articles and links I discuss in my teaching and consulting engagements. Over the last year, I’ve learned a lot about how the splogosphere works.

The splog business model

Most of these splogs use a similar model: automated scripts search the Internet for keywords in legitimate blog posts and RSS feeds, such as my web site. There are between 8 and 14 million active blogs on the Internet, according to Matt’s estimates. The rest of the 100 million blogs are splogs that use software to scrape the first few lines of another blog’s articles, and then post an excerpt on the spam blog’s web site. Many spam blogs also try to leave trackbacks or spings on legitimate blogs, in an effort to draw visitors away from the real blogs.

Splog operators have thin profit margins, so they usually operate dozens or hundreds of sites. Sites earn revenue from keyword-based advertising links on their splog pages, as well as links to advertising-heavy web sites.

Not for human consumption

Most splog operators also try to get high rankings in search engines, so that Google users will see the splog articles before they find the original posts. Slogs are written for search engines, not real people, to read. Plagiarism Today ran one of the earliest articles about the splog business model, back in 2005. This Wired article came out a month earlier, and has some additional information.

Fighting the sploggers is easy

I’ve taken a few simple steps to keep sploggers from scraping my articles and leaving linkbacks to their sites, without wrecking my own web site in the process. It takes me about 10 minutes each week to manage these tools.

My comment forms require users to complete a reCAPTCHA verification form. This step eliminates almost all of the spam blogs that I’ve caught in my server logs. The only drawback with reCAPTCHA is that many mobile web users cannot leave comments.

I also run anti-splog software that uses an Internet-based list of known splogs to identify and quarantine spurious linkbacks.

Another piece of software searches for splog entries based on my articles. Occasionally, I leave comments on splog posts that are based on my articles, just to let them know I caught them.

My posts are also copyrighted under a Creative Commons license, as I discussed on 21 February 2008. I love it when real bloggers link back to my articles, as long as they give me credit for my writing.

Tags: advertising, blogging, business_model, captcha, copyright, e-commerce, mobile, software, spam, teaching