Science of Swarms

How Degree-of-Separation Search Works

The degree-of-separation search works by building a network map displaying the linking structure of a list of Web sites returned in response to a Google query. For example, a search to get the betweenness of "Hillary Clinton" works as follows:

  1. Starts by entering the search string "Hillary Clinton" into Google.
  2. Take the top ten, or another small number of Web sites returned to query "Hillary Clinton".
  3. Get the top ten, or another small number of Web sites pointing to each of returned Web sites in step 2 by executing a "link:URL" query, where URL is one of the top ten Web sites returned in step 2. The Google "link" query returns the "significant" Web sites linking to a specific URL. For Google "significant" means that the linking Web sites themselves are linked by other Web sites with a page rank larger than 0.
  4. Get the top ten Web sites pointing to each of returned Web sites in step 3. Repeat step 4 up to desired degree of separation from the original top ten Web sites collected in step 2. Usually it is sufficient, however, to stop here at step 4.

Figure 1. Degree-of-separation Web search for query "Hillary Clinton"

Figure 1 illustrates the network map returned to the query "Hillary Clinton". The level-1 nodes are the ones connected directly to the query, i.e. the original search results. Level-2 nodes are the most highly ranked search results returned by the "link" query, to each of the top ten level-1 nodes. Level-3 nodes are the most highly Google-ranked nodes returned by the "link" queries of each of the level-2 nodes. Figure 1 already gives a visual overview of the betweenness of each of the level-1 and level-2 nodes. The more links a node has pointing to it, the more between it is. For example the node labeled http://clinton.senate.gov is linked by a group of level 2 nodes which themselves are linked by groups of level-3 nodes. This indicates that the node http://clinton.senate.gov will have fairly high betweenness itself.

Betweenness ranges from 0, for nodes which are totally peripheral, to 1, for nodes which are on all shortest paths. The most between node in figure 2 is the search query "Hillary Clinton" itself, with a value of 0.61. The second most between node is indeed, as figure 2 illustrates, http://clinton.senate.gov with a betweenness value of 0.36. Some other high-betweenness nodes are www.ovaloffice2008.com and www.hillaryclinton.com.

Figure 2. Betweenness of Web Sites indicated by size of square or circle

Top Google search results do not necessarily have highest betweenness centrality. Google sorts search results by its patented "Page Rank" algorithm, which looks at what Web pages link back to a particular page. It also weights the links to the page by the page rank of the originating page. In terms of social network analysis Google measures the in-degree of a page, that is the number of incoming links. Page rank is a local algorithm, because it only factors in all the nearest neighbors of the page it is measuring. It includes page-rank of the neighbors, weighting incoming links higher from sites that themselves have a high page rank. Betweeness, on the other hand, is basically a global concept, because it looks at all the shortest paths within the entire network that are going through a particular site. Frequently, a node which has a high page rank will also have high betweenness, but this is not necessarily the case.

Latest from the Blog

Predicting the World Cup 2010 Winner — Recently I was giving a presentation about Coolhunting through Swarm Creativity at the CRM Forum in Zurich. As there was the Soccer World Championship at that time in South Africa, at the...… Read More

Predicting Stock Market Indicators Through Twitter "I hope it is not as bad as I fear" — We have been working on trying to predict market indicators for quite some time by analyzing Web Buzz, predicting who will win an Oscar, or how well movies do at the box...… Read More

Read More from the Blog

Subscribe to the RSS Feed
  • The MIT association is very reassuring, their approach was very methodical, confident, patient, helpful and encouraging. Outstanding team, focused, conceptually easy, quick at negotiating time and delays, great class, simple and user friendly despite the complex systems.
    Maggie Palazzolo, Research Coordinator
  • Excellent experience overall. Correspondence was kind, fast and helpful mostly via email; we suggested new features as needed and they were added quickly.
    Daniel Oster, Assistant to the CEO
  • There are many others who say they can do similar things, but I don't think they are actually capable. GalaxyAdvisors is different because many others don't use the social network methodology to learn what people are talking about.
    Daniel Oster, Assistant to the CEO
  • Very Unique. There are no others like them! We trust galaxyadvisors and feel that relationship is critical, as we are not social network analysis experts and need their help and understanding.
    Mirjam Hauser, GDI Analyst