Archive for May, 2009

The Archaeology of Analysis… Ya Digg?

Once you start really getting into quantitative analysis, you start to realize that there is virtually no limit to the types of data you can analyze. Social media sites, like Facebook, Twitter and Digg, offer the curious mind a plethora of information that could be studied for SEO purposes. And data aggregators and mashups just add to the fun.

One such data mashup is called Archaelogist. Archaeologist is a third party Web application that catalogs the most popular Digg articles over the course of a year’s time.

Archaeologist - A Digg Mashup

We utilized the data from Archaeologist in conjunction with our quantitative data analysis tools to see if there were any patterns in the past year’s most “dugg” stories.

Most of what we found was more of the obvious. Stories about Barack Obama, George Bush and Sarah Palin were prevalent in the top Diggs. But some of what we discovered is not so obvious.

Most of us who frequent social media sites know that the day’s top political news is always going to be popular. And the more sensational, surreal or salacious, the more popular the story will be on these types of sites.

But what isn’t obvious, which the data show, is that the most predominant, significant word in the past year’s Diggs is the word pic. (By significant I mean, words other than “digg,” “post,” “read” and the like.)  More than the terms “Obama,” “Bush,” and “Palin” is the word “pic.” Assuming that the word “pic” is obviously the euphemism for “picture,” what does this tell us, if anything, about the nature of social media? And an even more important question for me personally is: What does this tell us about social media marketing strategy, since this is my whole purpose, as a professional SEO, for gathering the data in the first place?

Keep in mind, I’m mainly focusing on the terms in the “center slice” of the Bell Curve. That is, terms, in this case, having a density score of .21% or greater. Of these terms, I noticed that the terms “video,” “news,” and “best” were really the only other significant words with high density ratios.

Well, none of this is too enlightening, is it? Using some basic quantitative analysis I’ve merely confirmed the obvious. It seems almost inescapable that stories that will make it into the “most popular” category on a social media site will be those consisting of video-based and picture-based news covering the most “important” news of the day, according to the public.

SEO Tip: When thinking about a social media marketing campaign, on-page content about popular news will have a much better chance of bringing visibility to your site. Consider utilizing alternative news media tools such as video blogs, video podcasts, and photo galleries on your website and submit these to social media sites. Also think about how to tie in your products and services with the popular news stories of the day.


Maiden Voyage

Add to Technorati Favorites

Well, this is the maiden voyage of a brand new brainchild of our SEO department. I can’t seem to settle on the name. Sometimes I think it should be called Qualitative SEO Analysis; other times I like it the way it is. Just one of the many details that need to be settled upon, I guess.

The difficulty in choosing the name lies in the fact that this blog is both about qualitative and quantitative data analysis, as such analysis is applied to SEO data. We will be looking at SEO data from a mathematical perspective and then using that perspective to derive qualitative results.

For example, let’s say we want to study the characteristics of the first 10 Google search results for the keyword phrase “piano lessons nj.” Let’s pick one factor at random, say, number of pages in the domain. When we do so, we find that the average number of pages of the top 10 search results for this phrase is 105+. That is the quantitative part.

But then comes the qualitative part. This involves asking ourselves what, if anything, that number means. Is it significant that the average number of web pages among the top 10 search results for a given phrase is over 100? Maybe not. After all, ths is just one search term, and we’re only looking at 10 websites. Any statistician will tell you that such small numbers analyzed isn’t statistically significant.

But if we were to crunch the numbers for a hundred different search terms, analyzing 1000 or more websites; then if we noticed that the same averages held, then we might, in fact, be onto something significant, statistically speaking. As far as SEO policy goes, such data might lead us to advise our clients to enlarge their sites (if less than 100 pages) or perhaps reduce their sites (if above 100 pages).

Or, say we discovered that the average PageRank of the first 10 search results for a random phrase was 3.8. We can conduct the same mathematical research and ask the same qualitative questions. As an SEO, I look for patterns in data. I want to know whether striving to achieve a Page Rank > 3 is worthwhile. I want to know what the probability is of ranking on Page 1 for a given term is by virtue of achieving such a Page Rank.

But let’s take this a step further. Let’s say I Google the phrase “how to increase my site’s pagerank.” As I browse the top 10 results, I get the idea to perform some quantitative analysis on the sites delivered. So, I run a keyword density report of the first 10 results. I don’t know exactly what I’m looking for; like a scientist looking to make a discovery, I’m

just sifting through data, taking it apart and putting it back together in various ways.

When I run these reports, I discover that the most frequently occurring word in the top 10 sites is, not surprisingly, pagerank. In fact, the word pagerank has a density ratio of  20 percent. That is, in the top 10 sites returned for the phrase “how to increase my site’s page rank,” the term “pagerank” occurs 20 percent of the time. Right under that are the terms “google,” “seo,” and “rank.” Also not surprising.

Now, why is this important? Well, think about the history of SEO. (I know it’s brief, but indulge me for a moment.) Back in the early days, SEOs were all about keyword density. “You’ve got to have the right keyword density to rank well for a term,” some said. “Don’t forget to look at your keyword density,” others shouted. Then, the concept went out of vogue for a while. Extraordinarily erudite minds such as Dr. Edel Garcia (supposedly) debunked the theory that many SEOs held dear; namely, that keyword density has anything to do with ranking. (Read Dr. Garcia’s critique here.) I myself haven’t heard the term “keyword density” much at all in SEO circles lately. And so it goes.

However, our brief, unfinished quantitative analysis of the keyword density of the top 10 search results for a given term - taken by itself - would seem to debunk … the debunkers. Again, it is not statistically significant, nor is it as rigorously mathematical as Dr. Garcia’s excellent work. I mean, we’d have to take into account the growing influence of personalized search and semantic web algorithms. Nonetheless, it does make you want to double check the good professor’s math. It does beg for a second (or third) look.

It is our hope here at Gnosis Arts that this blog will spur on that “second look” that sharpens all our research. If nothing else, I hope the topics presented herein will motivate all of us as SEOs to attain to the level of rigor as a Dr. Garcia, that we’d all become a bit more data-driven than we currently are. Our clients deserve nothing less.

Additional Reading:

  1. Study and Analysis of User Queries on the Web, Penn State University
  2. Search the Web: A Survey of Excite Users, Spink, Bateman & Jansen
  3. Health-Related Searches on the Internet, Journal of the American Medical Association