Google's Tag To Remove Content Spamming

Content spamming, in its simplest form, is the taking of content from other sites that rank well on the search engines, and then either using it as-it-is or using a utility software like Articlebot to scramble the content to the point that it can't be detected with plagiarism software. In either case, your good, search-engine-friendly content is stolen and used, often as part of a doorway page, to draw the attention of the search engines away from you.

Everyone has seen examples of this: the page that looks promising but contains lists of terms (like term – term paper – term papers – term limits) that link to other similar lists, each carrying Google advertising. Or the site that contains nothing but content licensed from Wikipedia. Or the site that plays well in a search but contains nothing more than SEO gibberish, often ripped off from the site of an expert and minced into word slaw.


These sites are created en masse to provide a fertile ground to draw eyeballs. It seems a waste of time when you receive a penny a view for even the best-paying ads – but when you put up five hundred sites at a time, and you've figured out how to get all of them to show up on the first page or two of a lucrative Google search term, it can be surprisingly profitable.

The losers are the people who click on these pages, thinking that there is content of worth on these sites – and you. Your places are stolen from the top ten by these spammers. Google is working hard to lock them out, but there is more that you can do to help Google.

Using The Antispam Tag

But there is another loser. One of the strengths of the Internet is that it allows for two-way public communication on a scale never seen before. You post a blog, or set up a wiki; your audience comments on your blog, or adds and changes your wiki.

The problem? While you have complete control over a website and its contents in the normal way of things, sites that allow for user communication remove this complete control from you and give it to your readers. There is no way to prevent readers of an open blog from posting unwanted links, except for manually removing them. Even then, links can be hidden in commas or periods, making it nearly impossible to catch everything.

This leaves you open to the accusation of link spam – for links you never put out there to begin with. And while you may police the most recent several blogs you've posted, no one polices the ones from several years ago. Yet Google still looks at them and indexes them. By 2002, bloggers everywhere were begging Google for an ignore tag of some sort to prevent its spiders from indexing comment areas.

Not only, they said, would bloggers be grateful; everyone with two-way uncontrolled communication – wikis, forums, guest books – needed this service from Google. Each of these types of sites has been inundated with spam at some point, forcing some to shut down completely. And Google itself needed it to help prevent the rampant spam in the industry.

In 2005, Google finally responded to these concerns. Though their solution is not everything the online community wanted (for instance, it leads to potentially good content being ignored as well as spam), it does at least allow you to section out the parts of your blog that are public. It is the “nofollow” attribute.

"Nofollow" allows you to mark a portion of your web page, whether you're running a blog or you want to section out paid advertising, as an area that Google spiders should ignore. The great thing about it is that not only does it keep your rankings from suffering from spam, it also discourages spammers from wasting your valuable comments section with their junk text.

The most basic part of this attribute involves embedding it into a hyperlink as . This allows you to manually flag links, such as those embedded in paid advertising, as links Google spiders should ignore. But what if the content is user-generated? It's still a problem because you certainly don't have time to go through and mark all those links up.

Fortunately, blogging systems have been sensitive to this new development. Whether you use Wordpress or another blogging system, most have implemented either automated "nofollow" links in their comment sections, or have issued plugins you can implement yourself to prevent this sort of spamming.

This does not solve every problem. But it's a great start. Be certain you know how your user-generated content system provides this service to you. In most cases, a software update will implement this change for you.

Is This Spamming And Will Google Block Me?

There's another problem with the spamming crowd. When you're fighting search engine spam and start seeing the different forms it can take – and, disturbingly, realizing that some of your techniques for your legitimate site are similar – you have to wonder: Will Google block me for my search engine optimization techniques?

This happened recently to BMW's corporate site. Their webmaster, dissatisfied with the dealership's position when web users searched for several terms (such as "new car"), created and posted a gateway page – a page optimized with text that then redirects searchers to an often graphics-heavy page.

Google found it and, rightly or wrongly, promptly dropped their page rank manually to zero. For weeks, searches for their site turned up plenty of spam and dozens of news stories – but to find their actual site, it was necessary to drop to the bottom of the search, not easy to do in Googleworld.

This is why you really need to understand what Google counts as search engine spam, and adhere to their restrictions even if everyone else doesn't. Never create a gateway page, particularly one with spammish data. Instead, use legitimate techniques like image alternate text and actual text in your page. Look for ways to get other pages to point to your site – article submission, for instance, or directory submission. And keep your content fresh, always.

While duplicated text is often a sign of serious spammage, the Google engineers realize two things: first, the original text is probably still out there somewhere, and it's unfair to drop that person's rankings along with those who stole it from them; and second, certain types of duplicated text, like articles or blog entries, are to be expected.

Their answer to the first issue is to credit the site first catalogued with a particular text as the creator, and to drop sites obviously spammed from that one down a rank. The other issue is addressed by looking at other data around the questionable data; if the entire site appears to be spammed, it, too, is dropped. Provided you are not duplicating text on many websites to fraudulently increase your ranking, you're safe. Ask yourself: are you using the same content on several sites registered to you in order to maximize your chances of being read? If the answer is yes, this is a bad idea and will be classified as spamdexing. If your content would not be useful to the average Internet surfer, it is also likely to be classed as spamdexing.

There is a very thin line between search engine optimization and spamdexing. You should become very familiar with it. Start with understanding hidden/invisible text, keyword stuffing, metatag stuffing, gateway pages, and scraper sites.



No comments:

Post a Comment