Proposed

Search Engine(s)

Enhancement

(2007 Jan 14 blog post)

Home > Blog menu >

This 'Proposed Search Engines Enhancement' blog page

INTRODUCTION:

A couple of years ago (2005 Mar), I tried to propose to Google a major enhancement to their search engine. I got an automated reply --- essentially a non-reply.

The image above indicates the suggestion --- a search-words distance-apart number that the user can specify.

Many web pages are huge and contain sections on many different topics.

If this 'words-distance-apart' suggestion were implemented, as outlined further below, this feature would drastically reduce the number of useless 'hits', in large pages [such as Google blogspot.com pages], in most of my web searches.

I found what I thought was an appropriate email address --- suggestions@google.com.

But their reply said to "register" at a Google "posting" web site and submit the suggestion there.

Interesting --- the email address suggestions@google.com does not accept suggestions.

As Spock would say, it is not logical.

    I did not have time or energy to go through their registry dance to post the suggestion.

    The dance:
    Get a userid and password ... and try to remember where I hid the information (so that I can follow up to responses to the posting), as I go through computer and mail system migrations ... along with potentially 50 other registrations, if I responded to every such command to "register".

    So I let the suggestion to Google go, for the time being.

I am still, years later, just as frustrated by the massive amount of non-pertinent pages that I get --- on doing almost any wordS search, with any search engine.

More background info:

So I am posting the suggestion openly now --- hoping that ANY search engine organization will take up this challenge. Are you listening AltaVista, A9, AOL, Ask.com, Clusty, Exalead, Gigablast, Google, Lycos, MSN, WiseNut, Yahoo, and others? Readers, please alert them.

I may periodically snail-mail this suggestion to Google and other search engine developers.

    [Actually, there have been a couple of attempts, around 2005, at implementing a search-engine enhancement like this.

    One was done by an essentially-one-person web-searcher development-operation, in the Netherlands --- walhello.com (Web+valhalla+hello).

    They/he did not have a very big database of web documents to search, nor the huge server farm of an organization like Google.

    The other attempt was limited to two options --- a fixed word span of 16 words, OR no limit on word span (the current, lamentable state of affairs).

    This (preliminary?) attempt was by a major search engine organization in France, exalead.com.

    With Exalead, you could use the word NEAR between words in a search query --- to do a "proximity search".

    "The NEAR operator finds documents where the query terms are within 16 words of each other."

    Note that the French and Dutch are not willing to resign themselves to using Google for all their searches. They know they can do better.

    Hopefully, these two, and other searcher development organizations, are still working on this feature.]

The image at the top of this page (for a hypothetical search engine called 'Hoogle') gives the gist of the suggestion in a readily assimilatable visual form.

    If it is technically more convenient to express the 'maximum span' between the search words in 'characters' rather than 'words', consider changing 'Word Span' to 'Characters Span'.

To give some details of the suggestion, here is the text of the original proposal that I e-mailed to suggestions@google.com on March 13, 2005.

A 2005 Communication to Google:

Subject:
Suggestion for search feature to blow competitors away
[2005 Mar]

Dear Google Developers,

In doing searches on multiple keywords, I am continually getting many pages that do not apply --- because they are long pages (like pages with hundreds of mail responses, or a lot of information on many different subjects).

Suggestion:
If there were a user-option to allow a Google user to say that they want the two (or more) keywords to be within, say, 30 words of each other, Google users could eliminate hundreds or thousands of 'false positives'.

Implementation:
It seems that when storing keywords, with each keyword, the Google data gathering engine(s) could store an integer that represents the location of the word in the page. Then the 'distance' between two keywords could be determined by subtracting the two integers stored with the two keywords.

Data gathering (word location) considerations:
I realize that the format of some pages may render the meaning of the integer-location rather meaningless as a measure of distance-between-words --- BUT for the vast majority of web pages, the integer would be useful to determine distance between words --- even if the integer were simply a count of the word-location in the sequence of text and HTML tags in a web page (i.e. treat the HTML source as plain text and simply count the keyword-location-integer using that approach).

Storage overhead:
The storage of the integer-location of keywords could be very compact, say a 4-byte binary integer, which would allow for assigning word-location-integers in web pages about 4 billion words long. This should accomodate essentially any page that anyone would want to look at.

Although the 4-bytes for each keyword might increase the size of Google database(s) by about 20%, the pay-back would be well worth it.

Cheers,
a constant Google user
(still looking for a better search engine)


2013 UPDATE :

I recently (2013 April) bought a book called "9 Algorithms That Changed the Future" by John McCormack.

That book points out, in the first chapter, on web search algorithms, that the position of words within web pages IS SAVED and accessible to search engines like Google.

So there is no reason why they could not provide the facility suggested here --- if not on the main search page, then via the 'Advanced Search' link.

That chapter even points out that search engines like Google use the 'near' capability very heavily for their own purposes.

Why they do not make that ability available to users is puzzling --- especially when it could cut down searches that return millions of pages down to returning thousands of pages instead. A situation devoutly to be wished --- especially as the databases of web pages explode in size.

Here is a page of web searcher sites
for examples of web searchers --- and
to search for more information on this topic.

Bottom of this blog page on the topic
A Proposed Search Engine Enhancement
(a max-distance-apart-number
for search keywords)

---
in other words, a 'words span' parameter,
in number-of-words or number-of-characters.

To return to a previously visited web page location, click on the Back button of your web browser, a sufficient number of times. OR, use the History-list option of your web browser.
OR ...

< Go to Top of Page, above. >

Or you can scroll up, to the top of this page.


Page history:

Page was posted 2007 Jan 14.

Page was changed 2007 Jan 21.
(Some sentence additons and re-wording was done.)

Page was changed 2009 Aug 10.
(Added page breaks for better printout, and minor additions.)

Page was changed 2013 Apr 18.
(Minor format changes.)

Page was changed 2018 Oct 31.
(Added css and javascript to try to handle text-size for smartphones, esp. in portrait orientation.)

Page was changed 2019 Jun 06.
(Specified image widths in percents to size the images according to width of the browser window. Also did minor reformatting, for readability.)