Google: Facts May Replace Links as a Ranking Signal

According a PDF document that was recently published by a research team at Google, web content may soon be ranked based on the accuracy of the information it provides. Currently Google focuses on exogenous signals to measure authority and rank web pages, such as the quantity and quality of links that point to them. However, Google’s new emphasis on endogenous signals would fact-check the information a website provides in order to determine how trustworthy that site is.

If you’re wondering how Google is capable of discerning fact from fiction, the answer is quite simple. Google has spent years harvesting the web for known facts. When Google introduced Knowledge Graph back in 2012, they said that it would be used to integrate factual information about people, places and things into search results in order to improve search relevancy. At the time, Google claimed to have compiled over 3.5 billion facts about 500 million different objects.

Now Google is leveraging their growing database to verify information on the web. Google refers to the concept as Knowledge-Based Trust (KBT). Basically Google will use its Knowledge Graph to cross-reference the facts provided by a website. If the specific web page doesn’t contain enough facts, then Google would dig deeper into the site find other pages that contain facts that can lend credibility to the site as a whole.

One thing I found interesting about all of this is that Page Rank does not always correlate with trustworthiness.

We consider the 15 gossip websites listed. Among them, 14 have a PageRank among top 15% of the websites, since such websites are often popular. However, for all of them the KBT are in the bottom 50%; in other words, they are considered less trustworthy than half of the websites.

When determining trustworthiness, the research team was primarily focused on four core factors:

Triple correctness: whether at least 9 triples are correct.

Extraction correctness: whether at least 9 triples are correctly extracted (and hence we can evaluate the website according to what it really states).

Topic relevance: we decide the major topics for the website according to the website name and the introduction in the “About us” page; we then decide whether at least 9 triples are relevant to these topics (e.g., if the website is about business directories in South America but the extractions are about cities and countries in SA, we consider them as not topic relevant).

Non-trivialness: we decide whether the sampled triples state non-trivial facts (e.g., if most sampled triples from a Hindi movie website state that the language of the movie is Hindi, we consider it as trivial).

It will definitely be interesting to see where all of this goes. Links have always been a core ranking factor for Google, but ever since Penguin, SEOs have been focusing more on creating quality content. Google has also started to incorporate a lot more UX ranking signals, so it makes sense for Google to adjust their algorithm to establish a more effective means of vetting and ranking content on the web, to provide a more relevant user experience.

Have any interesting theories about where Google is heading with all of this? I’d love to hear your thoughts. Feel free to leave a comment below.

Leave a reply