As reported in the Wall Street Journal
Google is planning to revamp search and feature more facts and information generated from semantic technologies. Recently it grew its database of named entities (people, places, and things) from 12 million to over 200 million. The forthcoming changes have potential to impact billions of search queries. It’s Google’s effort to get way ahead of Bing and keep up with Siri on Apple’s iPhone 4S.
Google and Apple Compete to Dominate the Future of Search
Siri, the voice-activated personal assistant millions of iPhone users have grown to love, is actually the first commercial product from the Pentagon’s Personalized Assistant that Learns (PAL) project – the largest artificial intelligence project in U.S. history. The project’s mission is to create an intelligent “cognitive assistant” to help the U.S. military operate more effectively, as depicted in this Defense Department video. SRI International, a California-based research & development company leads efforts for the project. It spun-out Siri as a commercial application and eventually sold it to Apple.
Apple’s ambitions for Siri span far beyond military applications. It hopes to create virtual personal assistants for the masses. Imagine being able to strike up a conversation anywhere anytime with an all-knowing computer like on Star Trek. Amit Singhal, head of Google’s search algorithm team calls this his dream. Apple has harbored ambitions to build what amounts to Star Trek’s computer for years. In 1987, it released this video:
In my opinion, something like the knowledge navigator is the future of search. Google and Apple are in a race to dominate that future. Search is entering a new era beyond keywords and blue links toward one of concepts and task completion. Concepts are at the heart of semantic technologies like Siri and Google’s knowledge graph.
Understanding Google’s Search Refresh through Siri
To understand how Google’s search refresh might work, let’s look under the hood of Siri. Siri is built upon a huge database of concepts all mapped together into various relationships. In computer science circles, this is known as an ontology. Siri’s founder, Adam Cheyer built the program upon a foundation he calls, “active ontologies”.
In ontologies, concepts have attributes that describe their characteristics. For example, a “movie event” is a concept that has attributes like movies, show times, theaters, actors, genres, ratings, and locations that describe what a “movie event” is. It’s these concepts and attributes that give meaning to otherwise meaningless text.
When you say, “show me the best movies near here at 8pm”, Siri (hopefully) pulls up a list of top rated movies near your location starting at 8pm or later. To accomplish this feat, it hones in on certain keywords from your input like: “best”, “movies”, “near here”, and “8pm”, then maps those onto corresponding concepts like: ratings, movie events, locations, and show times from its extensive database.
You can see graphical representations of ontologies for a movie event and a meal event. From the image, we see that a “movie event” is a type of generic “event” type that has an “event date”, “movie”, and a “theater”. This breaks down into further levels of granularity until a “movie event” is exhaustively modeled and defined. After Siri maps your input onto this conceptual model, it mashes up matching data from various web services to present a response (some have protested its accuracy.
Semantic Markup: The New Search Engine Optimization
How does Siri know which web services to call upon for specific questions? And what does this tell us about Google’s potential approach to the same problem? In a nutshell, Siri uses semantic markup from web service APIs to identify bits of information that match its conceptual models. Given the state of technology in this area, Google will likely follow a similar path.
In fact, Google has already begun to prepare us for its upcoming changes by emphasizing rich snippets and semantic markup starting in 2009. Last June, Google introduced schema.org in collaboration with Bing and Yahoo! to define a single semantic markup vocabulary across search engines. Search results have already begun to change.
For example, when you search for “sore throat”, Google taps into its database of named entities to show a list of conditions you might have, eliminating several steps (and website visits) in a typical health search session:
Preparing Your Site for Google’s Makeover
If you have a site that focuses on people, places, or things (e.g. products), then you should get familiar with the schemas at schema.org and incorporate relevant semantic markup into your content. Follow these steps to prepare for Google’s upcoming search refresh.
1. Identify named entities in your keywords
Named entities are recognized phrases that describe a particular person, place, or thing. They often answer the questions of: “who”, “what”, “where”, and “when”. Named entities are some of the most frequently searched patterns on the web. They make up the bulk of informational query types, which in themselves account for between 40% and 80% of all search queries. Think of named entities as searches that bring back results from Wikipedia. Chances are good your site covers named entities in some way.
To identify named entities relevant to your site, examine keywords from internal web analytics or from competitive intelligence services. Look for mentions of products, people, geographic locations, organizations, brands, creative works, events, and just about any other type of person, place, or thing and add them to a list.
Mining keywords can be a time-consuming and tedious task. We offer a keyword tool that makes it easier to sift through search data and pull out named entities.
2. Find relevant schema types at schema.org
Once you’ve identified named entities from your keywords, match your list with relevant schema types at schema.org. Schema.org has numerous schema types covering everything from local businesses to job postings and geographic shapes. If you sell products, you might choose the “Product” and “Offer” types to describe your inventory.
3. Markup your content with microdata
After you’ve selected the proper schema types, it’s time to markup your content with microdata. Follow the instructions in the getting started guide to transform your HTML into rich semantic microdata that gives search engines new insights into what your content is all about.
Continuing with the products example, you would markup content on product detail pages with microdata properties such as: images, brand, manufacturer, model, ratings, availability, condition, price, reviews, and sellers.
4. Test with Google’s rich snippet tool
When you’ve finished marking up content with relevant schema types, test your work with Google’s rich snippet testing tool. It will show you all the semantic markup data it can read from your page and identify any gaps or errors.
No matter what you believe about Google’s coming search changes, one thing is clear: users are doing billions of searches looking for answers to a broad base of informational queries. Many are coming away with less-than-satisfying results or ending up at Wikipedia.
The 300 million search clicks arriving at Wikipedia every month is a lot of unmonetized traffic. I wouldn’t be surprised if Google wants in on the action. Plus, Google refuses to leave an opening for a new search leader to emerge (e.g. Apple). It has to maintain its lead by adapting to new technologies and user needs. And this means changing search for the better.
With the tips presented here, you should be able to prepare your site for the kinds of changes Google is boasting about in its latest PR blitz.
The opinions expressed in this post are those of the author and don’t necessarily reflect those of Marketing Pilgrim.
Anthony Long is product architect of Concentrate, a search analytics and keyword tool that transforms search data into actionable intelligence. He is former director of SEO at AOL, where his accomplishments included adding over 190 million new SEO entries to AOL web properties over the span of five years.