• SEO
  • By: Omar Radoncic
  • 8 min read
  • 15 Jun 2020

Latent Semantic Indexing (LSI) in SEO - Digestable Demystification

next blog post: Reusing the ViewModel from the external Layout
Latent Semantic Indexing (LSI) in SEO - Digestable Demystification

There is something that has been used by search engines called Latent Semantic Indexing. It is a complex mathematical process which systematically identifies patterns in relationships between matrices of terms and concepts within an unstructured collection of text, consequently organizing available content for better information retrieval. The given definition might appear complex at first, but won’t be after we debunk its essence (and I will give a much simpler definition a few lines below).

So without delving much upon its mathematical structure and use, I will concentrate on clarifying why LSI techniques can be vital for writers and their SEO strategy.

My definition: Latent Semantic Indexing is a natural language processing technique which, through use of sophisticated mathematical formulas in distributional semantics, tries to sort out words that cohere with the rest of a given text. It singles out particular words and tries to interpret their meaning for purposes of evaluating context and significance. Though the machine itself cannot differentiate meaning, it can pattern and follow certain wording structures. Hence its refined assessment quality in terms of cohesiveness in computer-based information retrieval.

In short: The more topic-centric your writing, the better its chances of ranking in search results.

LSI in SEO

You remember those bad guys who used keyword stuffing to trick good ole’ Google? They were cramming exact-match keywords for purposes of landing on page #1. And even though latent semantic indexing was born prior to the WWW boom, search engine algorithms had to use LSI as a direct response to these frequent SEO cheat attempts.

The intention was to reconstruct the search engine in a way to better understand what words mean. Even though machines can still not do this “by heart”, Google at least managed to come close to understanding semantics. Think of it like this:

  • How many times did you enter a search query which provided you with results that didn’t give you the exact term you’ve put in (or even in a different order), and yet satisfied your need with relevant content for which you were on the lookout to begin with.
  • How many times was your wording out of proportion, grammatically incorrect, and you still got your answers by Google?

That is all because of LSI techniques that some writers either used intentionally or just accidentally. It’s because Google understands your intention that it is able to provide you with results.

Why the reader cares about LSI not even knowing it

We said above in the definition that the purpose of LSI is to find texts that have similar-meaning word distribution across content. So if we were to talk about cows in one paragraph, milking techniques in the second, then we certainly shouldn’t use the third paragraph for discussing the horse power of a Mercedes Benz.

Do you understand what I mean?

You lured the reader with content that he was looking for. Don’t you think that you will lose him if he stumbles upon something that could lessen his interest in further reading?

That is why knowledge in terms of Latent Semantic Indexing is crucial for pleasing both parties; the human reader and the search engine.

LSI Keywords

“Wow Omar, you’re rotten to the core. You made me read all of this to tell me that there’s no such thing as LSI keywords!”

Yes. In practice, there’s no such thing as LSI keywords. But please remember the beginning of our conversation, where almost immediately we explained that it’s more of an amalgam of mathematical formulas and concepts that parse meaning from text in order to provide searchers with what they’re looking for. And let’s not forget that this piece of content was intended to demystify what Latent Semantic Indexing is.

Disapproving it as a ranking factor brings us closer to the truth. But why should you continue reading now that you know about its disconnection with SEO practices?

LSI in On-Page SEO

The wording in your content is important. When I say this I’m referring to topic relevance in general. Whoever rejects LSI as a technique that can help add more relevance to your pieces is not someone who acknowledges search engines’ aim to stay on topic for any given term.

Now considering how all we can do is hypothesize what exactly a search engine’s algorithm entails, we should at best make an effort to understand it. Even if you go outside for a stroll with the SEO community and hear about how obsolete LSI has become, bear in mind that it could be blended with other context-deciphering methods, or that it was at the very least a precursor to algorithmic enhancements towards serving relevant content to the searcher.

That being said, let me give you some tips on how to map out same-topic words for more contextually-relevant copywriting that’s intended to please both the grand jury of a search engines on the WWW, as well as readers of your content. The more you have these words sporadically supplied, the bigger the chances that search engines will index them positively, thus ranking you higher and growing your reach. So which words that add to LSI best practices and where to find them?

Synonyms

There’s a huge misconception out there that if you use synonyms, you’re closer to meeting LSI objectives. This is just partially true, as we are aware that LSI is about using words that are often grouped together and share the same context.

For example, someone could search for unique text checker, and would probably get the same SERP like someone who searched for plagiarism checker. That is due to the similarity in co-occurring keywords for both terms, as well as the intent behind the query.

All who believe that having synonyms could provide to a more coherent written piece can take a look at this list of useful websites:

Thesaurus – Known as the most efficient related-word provider out there. And for more versatility, there’s also Powerthesaurus.

Google Translate – Didn’t see that coming, right? But this, for so many, oh-so-infamous tool is very potent when it comes to handing over wording alternatives to its user. Put English as your source language (not important what target language you choose), type in the word, scroll down and click on the gray arrow which opens the dropdown menu, and enjoy your many new options for writing.

Synonym.com – The name speaks for its purpose.

Wikipedia? – This one wasn’t expected either, I guess. Wikipedia is here because it provides topical content when you search for something and is in so many cases on the first page of Google for many informational keywords. One should not neglect this fact! Peruse through the content and take notes on words that are topic-related. Use them.

Hyponyms and hypernyms

In the field of linguistics, these terms build a chronological order of matters and are words or phrases whose semantic field is related. Considered from a different perspective, it’s like a hierarchy between words. For example; cow is a hyponym and animal is its hypernym. Another one: crocodile is the hyponym of reptile. It doesn’t have to be just two classes:


graph of hyponyms and hypernyms

Why am I pointing this out? Because knowing hypernyms and hyponyms can strengthen the LSI significance of your content. So, for instance, if you had an assignment to write about benefits that come by eating carrots, you would instantly indicate that vegetables, in general, are good for our health. Then you would start creating content around carrots and maybe you would add some other roots as an alternative for someone’s diet.

This is how hyponyms can help you!

Related searches on Google


Picture from Google search

You can either use it for

  • keyword grouping and mapping - to broaden the list of keywords that you will target in whole,
  • or for research purposes - to find out what topical content can be found on SERPs that come from related search queries.

Searchers use variations of phrases to search for a single type of content, and it’s completely natural that one of the best sources to find those is Google’s “Searches related to” functionality. Another way of generating a list of co-occurring same-intent keywords is by using Google’s autocomplete option in its native search bar, as well as tools like Google Keyword Planner and LSI Graph.

Ignore these in your LSI calculations

When I am saying "ignore" I don’t mean that you shouldn’t use them. Just that the latent semantic indexing algorithm isn’t identifying them as content words. Here’s a list of them:

  • Articles,
  • Prepositions,
  • Conjunctions,
  • Common verbs like know, see, do, be,
  • Pronouns,
  • Common adjectives like big, late, high,
  • Frilly words like therefore, thus, however, and the like.

Conclusion


Keep in mind that semantics is a linguistic field concerned with the meaning of words. Attach meaning to your phraseology, don’t deviate in your writing, stick to the topic, let the flow take over... Write naturally and optimization for latent semantic indexing will never be a problem for you.

Similar blog posts

2020 SEO trends: 4 things to focus on if you want to rank high
9 min read

2020 SEO trends: 4 things to focus on if you want to rank high

At times it seems as if Google’s algorithm has us running in circles.

Tijana OstojicTijana Ostojic
Benefits of Schema Markup and Why You Should Be Using It
5 min read

Benefits of Schema Markup and Why You Should Be Using It

Enhance your image on Search Engines with Schema markup

Tijana OstojicTijana Ostojic
get in touch

You’re still scrolling?! That means there's an unanswered question

If that is the case, always know that you can shoot us an email or give us a call, as we don't like leaving things in the air.

Contact Us