Marketing Pilgrim's "Search Marketing" Channel

Marketing Pilgrim's Search Marketing Channel is sponsored by ClickZ Live San Francisco. Register to attend today! !

Understanding Natural Language Processing for SEO



Not long ago we got word that a new search engine will launch in May that will rely heavily on Natural Language Processing (NLP). And we have even heard Eric Schmidt, CEO of Google, hint that the search giant would be implementing a greater emphasis on NLP:

Wouldn’t it be nice if Google understood the meaning of your phrase rather than just the words that are in that phrase? We have a lot of discoveries in that area that are going to roll out in the next little while. (via)

Marie-Claire JenkinsAs search marketers it’s important that we stay on top of the world of search technologies and trends. Therefore I thought it would be a good idea to learn the basis of NLP. For this I have turned to a true expert in the field. Marie-Claire Jenkins (a.k.a CJ) is completing a PhD in Natural Language Processing and Artificial Intelligence at the University of East Anglia. She is an extremely seasoned SEO who has worked with corporate clients on all things related to search. You can learn more about CJ by visiting her web site Science for SEO.

What follows is my interview with CJ about Natural Language Processing:

Joe: Can you define Natural Language Processing for the layman?

CJ: It is the area of computing that deals with words.  Words form texts, texts form collections of many texts, leading to an awful lot of words.  This “bag of words” needs to be analyzed so that a machine can make sense of them.  If it can’t, there is no way of retrieving information from the texts and so they are in effect useless.

Linguistic analysis is used to analyze and represent the texts.  Once the computer can represent the language in some way (patterns, reoccurring words, synonymy…) the information in the texts can be manipulated.  This mimics the human understanding of language.  NLP is a sub field of artificial intelligence and computational linguistics.

A little history…

NLP was first used in 1948 for use in a look-up dictionary at Birkbeck College in London.  The code-breakers from the 2nd World War were highly interested as this was a new thing to work on now the war was over.  Machine translation was actually the 1st research area involving NLP in the 1950′s, a problem that has still not been solved today.  NLP uses a wealth of research done by linguists such as Noam Chomsky, John Fillmore who revolutionized the way we search for patterns in language which is what NLP is all about.  Research has since continued and improvements have been made.

It is a very old area of research, and an extremely difficult one. The double helix has been discovered, quantum mechanics, the radar, kidney dialysis machines, man has walked on the moon…but the problem of NLP has not yet been solved, although there have been some breakthroughs.  This how hard it is.

Joe: Are there any examples of NLP in our daily life, online or off, that can help us understand this technology a bit more?

CJ: NLP is used in a great many areas of our lives.  A simple example is the spell checker and grammar checker in your word processing software.   When you call the bank and get an auto responder, that also uses NLP.  All search engines, mobile phones (predictive text and speech recognition for example), digital encyclopedias, dictionaries,  washing machines, software for sign language for the deaf, databases, computer games, and there are many many more applications of the technology.

Joe: We have heard rumors for sometime now that Google and some other smaller search engines are developing NLP in their search platforms. Is this a credible theory? If so, will NLP become a trend in search development?

CJ: All search engines have been using NLP since the 50′s.  A search engine is not necessarily Google, it can also be a part of machine translation technology for example and many others.  In order to make sense of words on web pages, if we take the example of a web search engine, it needs to process them so it can use them to find patterns which allow it to classify everything.  As well as that, it enables the engine to analyze your query also, and feed the results into the search engine which then uses them to provide you with an answer.

It is not a credible theory, it is a certainty.  NLP is definitely set to develop in the future.  There are major issues with it at the moment.  Finding patterns between the words, and analyzing which grammatical group they belong to for example, does not tell us much about the topic of the text.  To achieve this we need to do some context analysis which also uses NLP.  All computer programs which deal with language use NLP.

A small list of examples:

  • Machine translation
  • Natural language understanding
  • Automatic summarization
  • Information retrieval
  • Natural language generation
  • Topic detection
  • Optical character recognition (OCR—as used for Google books)

Joe: Is there a relationship between SEO and NLP? If so, what is it?

CJ: There is indeed a relationship.  The search engines use words to assess what a web page is about, using NLP amongst other techniques. The content on a web page will help determine what the topic of the page is. 

Understanding the techniques used in NLP allows us to provide the best format and patterns for the search engine.  In fact I think that the entire site is affected because analysing a whole site, each page, helps to determine exactly what a site is about.  Seeing as NLP seeks to mimic human language understanding, using common sense is a good idea.  This is why search engines always recommend writing good relevant content.

NLP is a complex area of research, requiring a solid understanding of grammars (not just grammar), and a good grounding in computational linguists (in order to apply the techniques to machine, which is not always easy).  For the purpose of SEO learning all about this would be overkill, and a waste of time, although very interesting!  I recommend understanding the basic way that the technology functions and having a good idea of how to write clearly, using the correct vocabulary, in a focused context is enough.

Joe: What area of NLP are you working in? And, can you share with us any interesting aspects of your research?

CJ: My early research was on machine translation systems and naturally search engines and natural language processing.  After a few years working in that area I took a detour.

My current research is in natural language understanding and generation.   My current test application is on conversational agents for customer service providers.  It cuts down on costs and allows the customers to always have access to information and data easily.  They ask natural language questions, make statements, ask for advice.  It uses natural language processing techniques all over. 

I also use a lot of artificial intelligence methods, information retrieval, web 3.0 things like the OWL language for example.  I use a huge amount of linguistic research, information extraction, different programming languages, cluster analysis and all sorts of other things.  The challenges are monumental but it is fascinating.

Thanks for the insight into natural language processing, CJ!

  • http://www.marketingpilgrim.com Andy Beal

    Good stuff!

    I’m having flashbacks to the launch of Fortune Interactive. My then excellent co-founder Mike Marshall knew this stuff inside out, and we built technology based on NLP/LSI. Mike’s still doing a lot in that field: http://www.semscout.com/index-1.html

  • Pingback: Understanding Natural Language Processing for SEO | Internet Seo blog

  • http://www.searchengine-optimization-seo.com Paul R

    Is Google not moving in this direction? I can’t imagine that they would let a competitor like this come into the field.

  • Pingback: links for 2009-03-17 | This Inspires Me

  • http://www.businessbymouse.com Mike SEM

    Joe, your article was quite informative and on point. With Google’s acquisition of Applied Semantics in 2003, the search arena has been changed forever, just look at the last few Google dances for proof.

    One caveat, I believe that the abbreviation NLP cannot be used here since it is already established in search as meaning Neuro Linguistic Programming which is not quite the same as Natural Language Processing as you described here. Perhaps something like ALP would be more accurate and less confusing. A for artificial, since computers are doing the semantic evaluations? We are still light years away from anything close to the human mind’s ability to do this.

    Think of the fruit apple, now Google it and you’ll find that Wikipedia is the first to mention the fruit in the #5 position. Everything else is about Apple, the company. You are correct, it is an fascinating area of study.

  • http://www.jozsoft.com/blog/ Joe Hall

    @Mike thanks for the compliments, but CJ is the real guru here, shes the one i interviewed. All I did was think of a few questions to ask. You can learn more about CJ at her blog http://www.scienceforseo.com/

    Joe Hall’s last blog post..IM Spring Break is Going to Make You Pee Your Pants!

  • http://www.scienceforseo.com CJ

    Hello, thank you all for your comments! Hey Mile SEM, it’s not my abbreviation, it’s been NLP for “natural language processing” since the mid-40′s, Neuro-linguistic programming came along in the mid-70′s so I reckon they should change their acronym not us :) Also I haven’t seen neuro-linguistic programming used in search (IR), seeing as it’s to do with interpersonal communication and self-awareness. I think this would rather be an HCI topic, and maybe applied to AI.

    CJ’s last blog post..How does a search engine know what words mean?

  • http://www.scienceforseo.com CJ

    Mike sorry for mistyping your name!

    CJ’s last blog post..How does a search engine know what words mean?

  • http://www.businessbymouse.com Mike SEM

    Hello CJ and Joe, I wasn’t trying to be anti-establishment or anything, I appreciate you guys comments. I know neuro-linguistic programming only came along in the 70′s, however the popular abbreviation, NLP has been associated with it ever since. Either way it doesn’t really matter.
    Frankly, it has more to do with manipulation of the human mind and that’s perhaps why it scores such high marks with the internet marketing crowd, just ask Frank Kern what his Mind Control website is all about. He probably won’t tell you but his domain name says it all.

  • http://www.scienceforseo.com CJ

    No trouble Mike! I get asked this question about the 2 NLP things quite a lot and I’ve refined my answer bit by bit :) – I didn’t know about the Mind Control website, thanks for the tip.

    CJ’s last blog post..TGIF – (OvO)

  • http://www.jyanty5243.hpage.com Lakhwinder

    Joe your airticle is benifical for me. Thanks

  • Pingback: Google’s Answer to Siri- The Next Step in Natural Language Processing | Creative Virtual