Last week, I was lucky to join Joe Apfelbaum of Ajax Union and Ben Kirshner of Elite SEM on a panel (ably moderated by Jules Kibbe of TicketNetworkDirect) for the Ticket Summit on the recent changes in search marketing. The attendees are ticket brokers and partners that move most of the seats for entertainment and sporting events in the U.S., so you can imagine that they have a fierce interest in search marketing. It fell to me to explain the dreaded Google Panda update of its search ranking algorithm. I say “dreaded” because so many people have treated this latest reshuffling of the search results as something approaching apocalyptic disaster. If it has been a nightmare for you, my condolences, but there’s no going back, so we all need to understand the idea behind Panda and we might need to change the way we think to succeed in the brave new Panda world.
First off, Panda isn’t named after a bear–it is actually the surname of the Google engineer whose ideas lay behind it. And, although it is about to celebrate its first birthday, it isn’t a single event wrapped in the past. Google Panda has ushered in a series of changes over the past year, with a couple of ranking algorithm updates interspersed with more regular changes in the data that it depends on.
Panda is revolutionary because it adds a new ranking factor to Google’s algorithm–a quality score imposed on sites by human raters that decide whether the site would be worth visiting again, for example. Dozens of human raters might visit the same site and Google averages their answers. High quality sites get boosted in the rankings, with lower-ranking sites, well, not so much.
Now, this wouldn’t be terribly interesting if that is all there were to it. For even Google, with its vast resources, can’t afford to pay human raters to visit all the sites that reside on the Web–not when they need many raters to judge each site and when those sites change regularly and need to be re-rated. No, they needed something a lot cheaper than that approach.
Enter machine learning, a technology that looks for patterns in data. Instead of Google having to use human beings to rate every site, they instead rated a small number of sites and then applied those ratings to all the unrated sites that were similar to the rated sites. So, if your site wasn’t rated. but it has the same characteristics as sites that are low in quality, your site will be treated as low in quality.
You probably want to know what patterns Panda is looking for, so that you can avoid them, but no one is saying. In fact, the very way that the algorithm works makes it a difficult question to answer. Machine learning algorithms are trained with some of the human data that Google collected, and then tested on the rest of the data. So the algorithm keeps trying to find more and more patterns until it can actually predict the answers that the human beings gave. At that point, the algorithm is unleashed on pages that have not been rated, assuming that the training it received against known answers will now allow it to predict the quality level of sites that have not been rated.
What this means is that, for the first time, what human beings think of Web pages is an explicit ranking factor. So, if you’ve been just following some rote rules about how to optimize for search, you might be in trouble if people don’t actually like your pages. This is, alas, the fate of most search optimizers who are only trying to feed the Google beast what it wants, instead of creating a quality experience for searchers. Those that give searchers what they want are now being rewarded more than ever.
Google is believed to be going after so-called “content farms” with Panda–low-quality sites produced at low cost by hack writers. But some marketers worry that there are other sites affected. Google reassured marketers that merely having a repeated product description from the manufacturer is not considered content scraping, but searchers might find it to be a low quality experience when they have to look through so many stores and keep reading the same information.
Does this mean that Panda never downgrades a site unfairly? Hardly. All of this technology is imperfect, although Google is constantly tinkering with the training data and algorithms. In fact, Google is collecting lots of data from people pressing +1 buttons, and might find someday that those are all the human raters that they need–and they won’t have to pay anyone.
So, many more changes are still ahead. And if Google’s Panda update is successful, you’ll see Bing go in that direction, too, affecting 30% more of the U.S. searches. And who knows how Panda might evolve in the future. To check out all my slides from the event, take a peek at “Google Panda Update” on Slideshare.