What Is Latent Semantic Indexing

Latent Semantic Indexing

LSI changed the way that search engines provided results to those who are searching for information on a given topic or theme. Put simply LSI gives the search engines the ability to provide its users a more relevant list of options to choose from by running a series of smart algorithms over web pages.  These algorithms are used in conjunction with semantic analysis to provide meaningful search results to queries.


09-Sep 19, 2006

To try and help you understand LSI better (and more importantly how it affects your business), we need to take a very quick, high-level look at search engine history.

Basic Search History:

Before the world wide web became what it is today, there were text based bulletin board servers scattered across the globe. The ‘internet’ was a realm dominated by geeks, nerds and academics. In short it was difficult to get the information you wanted unless you knew where to look. Internet connections were very slow and there were no pictures on webpages.

Around 1993 came the Mosaic Browser which allowed ‘inline’ images to be displayed on the same page as text. From that moment on, the growth of the ‘worldwideweb’ was astronomical. Combined with advances in technology, the average man on the street could have and use a computer in his home.

With more people having the ability to create web pages the web mushroomed but trying to locate information was still a chore. Unless you were in the know.

Although there were ‘search programmes’, they were in their infancy and they were struggling to keep up with the cataloguing of the 1000’s of new pages that were being added to the new ‘web of sites’ each month. It wasn’t until 1998 when Google started making its mark in the world that search was forever changed.

Love ’em or hate ’em, we owe a lot to Google because if it weren’t for company founders Larry Page and Sergey Brin we ‘might’ still be in info darkness. I doubt it but it never hurts to give credit where credit is due.

Why? Because the major players of the day took the attitude of ‘Our users don’t really care about search – what we give them is good enough and we’re making money out of that’. Google Inc’s attitude was different and they soon became the benchmark to beat when it came to search.

In the late 90’s, the business model of the search engines of the day (including Google) was ‘If a website is there, we want to know about it, spider it and offer it’s content to our users’. This was great because there weren’t ‘that’ many website portals focusing on a topic and being able to offer ANY result for a search was better than offering none.

Fast-forward to the mid noughties – 06-07 and things changed dramatically. We started suffering from information overload.

There are millions of websites, billions of webpages and let’s be honest, search engine spammers have ruined things for the average business owner. The average user wants to find useful and relevant information on the topic or product they are searching for. What are they starting to get? Page upon page of mindless, poorly written garbage surrounded by adverts. Or we could put it another way. Pages of useless inappropriate adverts with useless content.

Its almost as if there has been a battle of epic proportions between search engine marketers and the search engine users. Yahoo, Google, MSN, AOL, everyone – users worldwide are affected.

The result of this information overload is that search engine users were beginning to distrust the search results and the search engine companies were at risk of decreased profits. The more familiar a user becomes with something, the more they expect from it. This isn’t good news for the search engine companies business (or profits) and things needed to change. One of those changes is the testing, introduction and ongoing improvements to LSI technology.

OK. History Lesson Over:

Now that your history overview is behind you and that knowledge in your mind, we can now look ahead and begin to plan or integrate your web site structure around LSI and Schema.org markup. To do this you need to understand a little about LSI.

I don’t ‘want’ to get technical but … you need to go through ‘some’ of it now so you can ‘begin’ to understand it, because if you can’t understand what the search engines are ‘looking for’, you will be fighting an uphill battle in the search engine listing wars. You will be letting money go to waste or worse, giving it on a plate to your competitors.

LSI Basics:

High Level: LSI involves (don’t get scared – it is easy to grasp) statistical probability and trying and work out the semantic distance (or similarity) between words and/or phrases in relation to a know topic.

In English: What LSI software is trying to understand are the relationships between certain words in a paragraph. The paragraphs in the document and when you take this further, LSI will then look for the relationships between the pages and your web site theme(s). Ultimately latent semantic indexing will become a part of the process that defines your website in relation to an overall topic within its search base.

So How does it affect you?

Search engine companies employing LSI algorithms are not only studying a document for keywords, they are also studying your documents and learning to recognize and identify the words that are common between these documents. By doing this the search engine databases are indexing the ‘semantic relationship’ between your documents to discover which pages are ‘related’ or are ‘closely relevant’ to an overall context or theme.

The same technology can also then be used to tag your content as ‘too focused’, ‘not diverse enough’, ‘too repetitive’. Either way it might not be considered good enough to be served to a potential client.

Let’s consider an example here.

Lets think about a website based on the word ‘Dogs’.

Semantically related to the word ‘dogs’ would be words such as ‘canine’ ‘puppies’ ‘puppy’ ‘dog’ ‘doggy’ – and others.

NOT semantically related to the word ‘dog’ but ‘similar’, would be phrases like ‘puppy fat’ or ‘canine teeth’ because some of these phrases ‘could’ also ‘relate’ to ‘weight or child health issues’ and dentists. Remember that in general, any computer programme is logical. It normally expects a ‘yes’ or ‘no’ response. Perfect or not?

In effect what LSI technology is trying to do for search technology is add ‘… but more similar too …’ and ‘most like …’ or ‘compliments …’ into the results that get displayed. Its almost like it is aiming to provide an automated ‘human touch’.

How does it work?

LSI algorithms scan the document it is working on for other ‘expected’ words or phrases. This allows it to make the assumption that ‘the page’ is probably about ‘dogs’ because it may also mention a ‘breed of dog’ or ‘dog training’.

LSI then takes this a step further by analysing the whole website that the page is a part of as well. Like a high level overview.

You may well have one page that just has the words ‘canine’ ‘puppies’ ‘dog’ within it. That page ‘could be about other things but … because the ‘theme’ of other pages in that site have references to ‘breeds of dog’ or ‘dog training tips’ the LSI algorithm is happy to classify your page under the wider theme of ‘dog’.

The LSI algorithm (unlike Schema.org or semantic markup) doesn’t understand anything about the meaning of a word in a document. It just reads through the patterns and usage of particular words and calculates ‘word relationships’ to an overall theme.

Latent Semantic Indexing practicalities and how it could be applied by the search engines.

We need to think of LSI as a form of artificial intelligence. With the number of web pages increasing dramatically on a daily basis, the challenge is for the search engines to give its users an ideal search result.

LSI fits in to the search process by enhancing the search engine’s capabilities.

A conventional search engine that bases its results on ‘keyword only’ analysis may not give the best results. This is because the older search engine programs cannot tell the difference between:

  • Similar words with different meanings.e.g.: Dice – Die (dice plural) – Die (as in dead) – Die (as in mould) or Router (wood shaper) – Router (internet connectivity)
  • Words that are similar in meaning but spelled differently,e.g. : sickness – vomiting
  • Singular and plural forms of words, ex: dice/die, dog/doggies,
  • Words with similar roots, such as ‘water’ ‘watered,’ ‘watering,’ ‘waterings,’ ‘waterer,’

The LSI enabled search platform is more effective because it does not focus on a bunch of keywords. The best example of this I have seen is when you search for Tiger Woods, the search engine will not look for web pages that use the keywords ‘tiger’ and ‘woods’. It will present a collection of pages that are related to the theme of Golf. This is what is called relevance feedback. i.e. during the past x months most people who searched for ‘Tiger Woods’ clicked on a link to a ‘golf’ related web site.

This is where the ‘general’ opinion of what LSI is goes astray slightly. Many assume it is an algorithm that is bolted on to a search engine. I think it better to think of LSI as a ‘concept’ and that word is important to remember. If we mentally tie in the phrase ‘ artificial intelligence’ to LSI technology you should begin to see the importance of it.

People want better search results – so give it to them

The users of search engines want better results and users are human beings. Using the 80/20 rule, it would be safe to assume that 80% of users want good information. They don’t want to waste their time. When you put these factors together the logical assumption should be that search engines need a human touch to make them better. Google have even suggested human intervention, so there can be no doubt that things are changing.

As the search engine spammers though up more and more ways to fool the search engines and ‘catch’ the unsuspecting internet user, so the user has become more adept at ‘spotting spammy sites’. In fact the users have learnt to be more specific in the search terms that they are now using.

Here is a quick example: If you wanted to buy a new wind turbine to provide alternative home power, the chances are you would do or had done, some internet based research using a search engine.

Searching for ‘buy new wind turbine’ does not tell me what I want so then I might try ‘new wind turbine for my house’ or ‘new wind turbine installation house’ – Usually, you will find that the overall number of search results reduce with an increase in the number of keywords searched for. Then it’s just a case of improving the quality.

For years humans have been learning how to refine their own searches on a given topic. It doesn’t take a giant leap of faith or a degree to work out that the search engines have been able to record and dissect all of this free human input. LSI as a concept is ‘giving back improved results’.

LSI focuses on knowing and analysing a document before it gets indexed. Therefore, LSI optimised pages are more archive-friendly and can point towards content that may be relevant but not directly covered within the document. Think of it as a kind of automated grading system.

The key point is simple. Search engines want to be able to provide better more accurate search results for their customers. LSI is one of the technologies that is being employed to meet this aim. LSI has the power to filter out ‘ineffective’ and unwanted information. If you don’t want to have your business filtered out or overlooked you need work in harmony with latent semantic technology and not try to fool or beat it.

SEO is an ever evolving process and I’m certain that it will change again in the future. For now, by learning about, following and implementing, simple LSI orientated optimisation procedures, it will pay dividends in increased long term traffic profits.

How To Build LSI into your ecommerce website design.

I personally am lucky enough to have found a dedicated group of technologists who really do make a lot of sense. More importantly are the real world examples of successful sites that have been employing the knowledge and ethos. The results are in. It works.

There are two highly relevant ‘arms’ to the process that I follow within the community. The first is learning about the design, structure and planning of the ultimate LSI compliant web site framework and the other is the research into the relationships between keywords, keyphrases and theme density.

It is quite a lot to get your head round but the community is supportive and attacking this from all sides. LSI is not something that you should rush into. Part of the secret is in the planning and having to learn the basics again will allow you time to get it right.

Even now this community is well ahead of the game. The software I’m using blew my mind apart and forced me to take a different approach. I’m getting results I never dreamed of. Keyword research has taken on a whole new meaning for me. What used to take days or even weeks can now be done in hours. What’s more is that I’m having fun doing it.

What you are about to learn is not an ‘easy fix’ or a ‘quick solution’ and unless you are really prepared to put in the time it takes to do this properly, I suggest that you do not just click on the links below.

Like wise, if you are looking for a free solution to your problems, this is not for you. Wait until a cheap alternative comes along. However I am fairly sure there won’t be a free alternative until it’s too late.

For those of you who are prepared to learn about something new and worthwhile. Something that will, without doubt or reservation, help you significantly from now with your search engine placement, you need to be involved.

Here is the link to the community that will put you at least two or three steps ahead of the competition when you implement what you learn.

The ThemeZoom project. Getting involved here will save you hours or days. There is stuff going on with this software which is right at the forefront of getting the best from understanding how the web works and how to pull all of the pieces of the puzzle together.

When I first found these folk they were offering a ‘day rate’ ticket to try it out. ThemeZoom is such a powerful research tool that many ‘day trippers’ gave up on it. I think that was because it gave back more information that they new what to do with. Rather than clog up the resources they stopped that facility.

You can still get a three day pass to use the Krakken software but, I realised that this concept is bigger than big. You could rush it. You can play with it but why would you.

Can I also recommend that you listen to and watch all of the tutorials that are available at ThemeZoom. Learn as much as you can. You will be more than happy.

Good luck in your designing and building you Latent semantically optimised website. The time you spend learning about and implementing what you discover has the potential to bring real and strategic longer term rewards and profits.  If you don’t want to do it yourself give us a call on 01787 311514 or drop us an email.  We can do it for you.


Affiliate Software For ZenCart

Setting up and managing your affiliate program with Zen Cart and other shopping carts is childs play when you use the JRox Affiliate Manager.

Get Your Free Affiliate Management Software

Order and download your Zen Cart Affiliate Software

Try it for Free. The software developers at JROX have been working with the core Zen Cart development team to ensure that JAM integrates with Zen Cart 1.3 and future versions. This makes it one of (if not the only) sensible long term solution to your affiliate management needs. (JAM also works very well with just about any other software)

One of the best benefits I have seen is that Zen Cart store developers and marketers have the ability to offer different commission rate to individual products or whole product categories. There are some limitations but in all this is the best software I found to date that works with Zen Cart.

Free Version

  • Start your Affiliate Marketing Program at No Initial Cost
  • No Feature Restrictions on Free Version, Just Limited to 250 Total Affiliates
  • All Features in Licensed Version Available on Free Version
  • Free Technical Support via our Customer Support Forums
  • Upgrade to Licensed Version Easily when you need to expand

Get your free version and try the demo affiliate software

Licensed Version

  • Manage an Unlimited Number of Affiliates, Products, and Commissions
  • Easy Setup with Windows Installer, Web-Based Installation
  • No Setup Fees or Monthly Charges
  • Free Software Updates for Life Included
  • Free Technical Support via Help Desk for One Year

Affiliate Management by JROX

The Automatic Signup Module

JAM’s Automatic Signup Module can be integrated into a number of shopping carts or e-commerce applications to help streamline the affiliate signup process.

What it does is automatically create affiliate accounts for your customers during the payment checkout process, all behind the scenes. In this way, your customer can automatically become one of your affiliates, which in turn will help you make more sales.

The Automatic Refund Module

JAM’s Automatic Refund Module can help you refund and / or alert you of any existing affiliate commissions associated with a customer refund / charge back.

This helps you to track any existing commissions in the JAM database that you need to delete, refund, or change the status of, whenever you have to issue a refund.

Order and download your Zen Cart Affiliate Software


osCommerce logo

osCommerce Online Merchant‘s success is thanks to its dedicated team that focus on the core features which in turn is re-enforced by an active community of store owners, developers, and service providers that focus on additional features. To date, osCommerce has been the inspiration behind other open source shopping cart solution such as Zen Cart.  The osCommerce community has provided over 7,000 add-ons that extend on the core feature set of osCommerce Online Merchant to meet the individual requirements of store owners. And did we mention all of this is available for free.

The add-on categories include:

» Features
» Images
» InfoBoxes
» Languages
» Order Total Modules
» Payment Modules
» Reports
» Shipping Modules
» Templates and Themes
» Zones
» Other

osCommerce Online Merchant is built using the PHP scriping language and uses a MySQL database server for the online store data. The combination of PHP and MySQL allows osCommerce Online Merchant to run on any webserver environment that supports PHP and MySQL, which includes Linux, Solaris, BSD, Mac OS X, and Microsoft Windows environments.

osCommerce started in March 2000 and has since matured to a solution that is powering many hundreds and thousands of live shops around the world.

You can find out more about and download osCommerce at: http://www.oscommerce.com

Zen Cart

zen cart logo

zen cart logoZen Cart is a popular free, user-friendly, open source shopping cart software. The ecommerce web site design program has been developed as a fork from osCommerce by a group of like-minded shop owners, programmers, designers, and consultants that think ecommerce web design could be, and should be, done differently.

Zen Cart in the BoxWhite hat SEORegularly updated by the core development team and with contributions from 100’s of developers worldwide it is a good choice for anyone wanting to run an online business. Other shopping cart software programs can be difficult to install and use without an IT degree, Zen Cart® can be installed and set-up by anyone with the most basic web site building and computer skills. Having said that it can be difficult and time consuming to customise.

One core advantage ZenCart has over virtually all of the other open source shopping cart platforms is that it is developed to PA-DSS certification.

mutual advantage logoMore Information and downloads at: http://www.zen-cart.com/

See our recommended Affiliate software for Zen Cart


e-commerce Basics

What is ecommerce and how can you develop your ecommerce website?

‘ecommerce’ is the word used to describe the process of selling and and getting paid online. You can sell digital or real products and services or a mixture of both.

No two businesses are the same and your existing constraints, specific needs or future requirements will determine the type of e-commerce application(s) you should use. Whatever stage you are in the planning or improving of your online web shop, it is always a good idea to involve an ecommerce consultant or specialist like us. Why?

Your time is better spent working on your business rather than in it.  You know what makes your business work and your experience(s) makes you invaluable in your chosen/specialist field.  As a business owner you may need to be versatile and have many skills but would you try to be a barrister or an accountant? Would you perform your own dental work or build a house?  We know what and how to make things sell online.

What are some of the things you need to consider for an online shop?

  • Are you selling products or a service?
  • Are your products digital or physical or both?
  • How big is your product range?
  • Do some products have different attributes (size, colour etc)
  • How do you plan to take payment (there & then, invoice, cheque etc)?
  • How do you deliver your products (courier, self etc)?
  • Where are you prepared to sell and ship to?
  • Do you want to manage the e-commerce site yourself (adding and keeping products updated)?
  • How long is the learning curve to understand how to use your site (there will be one!)?
  • What fraud prevention measures do you want in place?
  • Will staff need to be trained?
  • Will your online shop need to be integrated into existing systems (accounts, stock control etc)?
  • Do you need to revisit your terms and conditions of trade for distance selling regulations?
  • and lots of other things ….

e-commerce solutions come in all shapes and sizes, some will be ready to use ‘out of the box’ and may do the job you want with very little customisation. On the other hand you may want your site to look a specific way or do certain things which will require customising to one degree or another.

Some e-commerce software may have more functionality than you really need and a simpler cheaper alternative is available and would be better for your business.

If ecommerce is new to you or technical things don’t appeal to you, ask for assistance or advice because it will save you time and money in the long run.

Building an ecommerce website, developing a strategy

There has never really been a time where you could ‘throw’ a website together and expect to make a fortune overnight. I can agree that some people have made significant amounts of cash during the early stages of a successful product launch – but you can bet safe money that lots of time was spent during the planning stages. I put the phrase ‘planning stages’ is in bold for a reason. To avoid unexpected or expensive problems in the future, planning, developing and implementing your ecommerce systems, in stages, is the only logical way to proceed.

If you develop a solid base on top of rock solid foundations, you will not be worrying about your business crumbling from under you.

One strategy that we suggest you should embrace right at the start is to plan in or incorporate into an existing website semantic markup and silo structure. plan your web site structure using LSI fortification concepts.