Internet searches made simple

There are hundreds of millions of web pages accessible via the internet. Some pages may be easy to find; others may not. Unless you are provided with a specific universal resource locator (URL), or are extremely lucky, you must search through the masses for the specific pages that contain the information required.

06/01/2001


There are hundreds of millions of web pages accessible via the internet. Some pages may be easy to find; others may not. Unless you are provided with a specific universal resource locator (URL), or are extremely lucky, you must search through the masses for the specific pages that contain the information required. A search engine is needed to find the information.

Search engines are general and commercial or site specific. Site specific choices use database-querying tools or employ general-purpose external search engines. Regardless of the type, knowing how search engines work enables you to make the most of these powerful tools.

How do search engines locate information? How can you create searches that provide the information you seek? This article provides the answers.

How search engines work

A search engine uses special software tools, commonly referred to as robots or spiders, to assemble lists of the words found on web sites. Web crawling is the term used to describe when a spider is building its lists.

Spiders usually begin by looking at lists of heavily used servers and popular pages. Beginning with a popular site, a spider indexes the words on its pages and follows every link found within the site. The spider spreads out quickly across the most widely used areas of the web, much like a true spider weaves its web.

For example, when the Google.com search engine spider, which began as an academic search engine, looked at an HTML page, it made a list of the words within the page and noted where the words were found. Words with positions of relative importance, such as those occurring in the title, subtitles, and meta-tags, were noted for special consideration during subsequent user searches. This spider was designed to index every significant word on a page except for the articles "a," "an," and "the."

Other spiders work differently. Some approaches are devised to make the spider operate faster, allow more efficient searches, or both. The Lycos spider keeps track of words in the title, subheads, links, and the 100 words used most frequently on the page, along with every word in the first 20 lines of text. AltaVista indexes every word on a page, including the articles.

Other systems place importance on meta-tags. These identifiers allow page owners to specify key words and concepts that aid in indexing the page. Meta-tags can direct the search engine in selecting among several possible meanings to find the correct word or words.

The task of finding information on web pages is never actually completed. Since the web is always changing, the spiders are always crawling. Regardless, the search engine must store the retrieved information in a useful way. The factors important to making gathered data accessible to users are the information stored with the data and the method by which the information is indexed.

A search engine would be of limited use if it merely stored the word and the URL where it was found. So, other factors must also be weighed.

Was the word used in an important or trivial way? How many times was the word used? Was it used only once on the page? Does the page contain links to other pages containing the word? It is necessary to provide a list that "ranks" the most useful pages at the top of the search results list.

An index allows information to be found as quickly as possible. To build an index, a formula applies and attaches a numerical value to each word. This formula evenly distributes the entries across a number of divisions different from the distribution of words across the alphabet. This process is called hashing, and the result is a hash table.

Because there are more words that begin with some letters than with others, finding a word with a frequently used initial letter would take longer than a word with an infrequently used letter. Hashing reduces the average amount of time necessary to locate an entry and separates the index from the actual entry.

The hash table contains the calculated and assigned hash number, along with a vector to the actual data. These data can be sorted to maximize storage efficiency. This indexing/storage combination enables quick results, even with complicated searches.

Basic searches

To search through an index, you must first build a query. A query can be as simple as a single word, or it can be a complicated combination of words and operators.

To use a search engine, such as Yahoo! or Lycos, navigate to the opening page. The opening page for Yahoo! is either www.yahoo.com or just yahoo.com. Your browser should take you there regardless of whether or not you enter the www. Access Lycos by typing in either www.lycos.com or lycos.com into the window of your browser.

When the search engine portal appears, type your search word or phrase into the search window. Typing the phrase " Plant Engineering magazine" into the search window of Lycos, then clicking on search produced four web sites based on user selection traffic. However, 95,872 web pages were found in a search of the complete Lycos catalog or index .

Regardless of how efficient a plant engineer is, none has the time to visit more than 90,000 web hits. It quickly becomes necessary to limit your search.

Most search engines allow you to narrow your search. In Lycos, click the checkbox labeled search these results to "drill down" within your search criteria.

For example, checking this box, then typing " Information Engineering" into the search box and clicking the search button resulted in 447 sites from the entire Lycos catalog. A far cry from 95,872, but probably not a manageable number. In an effort to limit the number of hits, quotation marks were included around the search subject "Information Engineering" to force Lycos into returning pages containing only sites with the specific phrase in the exact order within the quotes.

Basically, a word or a phrase can initiate a simple search. And simple searches can be useful. However, to be effective, sometimes a complex or advanced search is necessary.

Boolean operators

Building a complex query requires the use of Boolean operators. Boolean operators allow you to refine and extend the terms of the search.

There are several Boolean operators often used.

  • AND—Any terms joined by AND must appear in the pages or documents. Some search engines substitute the operator "+" for the word AND .

  • OR—At least one of the terms joined by OR must appear in the pages or documents.

  • NOT—The term or terms following NOT must not appear in the pages or documents. Some search engines substitute the operator

  • "-" for the word NOT .

  • FOLLOWED BY—One of the terms must be directly followed by the other.

  • NEAR—One of the terms must be within a specified number of words of the other.

  • Quotation Marks—The words between the quotation marks are treated as a phrase, and that phrase must be found within the document or file in the exact order.

    • The "+" symbol is especially helpful when you do a search and then find yourself overwhelmed with information.

      Using Yahoo!, the following was typed into the search window: plant engineering magazine+ information engineering+cmms/eam

      Yahoo! returned 10 web site matches, most of which were relevant (Fig. 2). This number is much more useable than 95,872, or even 447 sites that were not limited by Boolean operators.

      Often, you may need a search engine to find pages that have one word on them but not another word. The "-" symbol allows this type of search.

      Using Yahoo!, the following was typed into the search window:

      plant engineering magazine+information engineering+eam-cmms.

      Yahoo! returned nine web site matches, which included the "EAM" term and excluded the "CMMS" term. Eliminate terms you know are not of interest to get the best results from the "-"operator.

      Advanced searches

      For most users, general searching techniques using Boolean or symbol operators are sufficient. However, if more searching power is required, the following commands are useful.

      MATCH ANY

      Occasionally you want web pages that contain any of the terms for which you are searching. Some search engines do this automatically. It is not necessary to enter a special operator. Those search engines include AltaVista, Excite, GoTo, Go, LookSmart, Netscape, Snap, WebCrawler, and Yahoo!. AOL Search, HotBot, Lycos, and MSN Search have "match any" as a menu item adjacent to the search window. You must use the Boolean operator OR with Northern Light. Google does not support the MATCH ANY command.

      It should be noted that most search engines automatically list pages with all your search terms first, then some of your terms.

      MATCH ALL

      MATCH ALL is a search term for web pages that contain all your search terms. The search engines for which MATCH ALL is automatic include AOL Search, Google, HotBot, Lycos, MSN Search, and Northern Light. You must use the "+" operator with all other engines. Almost all the major search engines support the "+" operator as a command.

      Title search

      Many search engines allow you to search within a web page's HTML title. For example, this page has an HTML title similar to this:

      &title>Internet searches made simple&/title>.

      There are several ways to execute a title search, depending on the search engine used. AltaVista, GoTo, HotBot, Go, MSN Search, Northern Light, and Snap require TITLE in the search window. It is important to include the colon punctuation in the command. An identifying word, phrase, or entire title follows the colon when you type it into the search window. Yahoo! requires "t:" instead of "title:." The Lycos title search is available on a menu on its advanced search page. AOL Search, Excite, Google, LookSmart, Netscape, and WebCrawler do not support searching for HTML titles.

      Site search

      Sometimes you may want to control which sites are included or excluded from a search. This ability is a powerful search engine feature.

      This feature allows you to:

      • See all the pages indexed from a specific domain

      • See all the pages indexed from a specific domain that contain a word or phrase

      • Use include and exclude commands along with specific domain searches

      • Include or exclude domains such as .edu for educational institutions, .gov for governmental, .org for organizational, or .us for domains located in the United States (or the appropriate code for any other country). Each country has a unique suffix. For example, the suffix for the United Kingdom is .uk.

        • GoTo, HotBot, MSN Search, and Snap support the domain: syntax, which enables you to specify the domain to include or exclude. AltaVista requires the term host: while Go requires site:. To perform a site search with Lycos, you must access a menu on the advanced search page.

          Wildcards

          (*)The asterisk (*) can be used as a wildcard for searches or certain other data operations. Wildcards are used to search for plurals or variations of words. It also comes in handy if you are unsure of the exact spelling of a word.

          AOL Search, AltaVista, HotBot, MSN Search, Northern Light, Snap, and Yahoo! support wildcard searches and use the * operator. Excite, Google, GoTo, Go, LookSmart, Lycos, and Web Crawler do not support the use of wildcards.

          Search engine showdown
          This table compares the popular search engines and lists accepted Boolean operators, defaults, case sensitivity, and other important parameters.

          <table ID = 'id2737069-144-table' CELLSPACING = '1' CELLPADDING = '3' BORDER = '0'><tr ID = 'id2737080-144-tr' STYLE = 'background-color: #CCCCCC'><td ID = 'id2737084-144-td' CLASS = 'copy'>Search engine</td><td ID = 'id2737090-145-td' CLASS = 'copy'>Boolean</td><td ID = 'id2737096-146-td' CLASS = 'copy'>Default</td><td ID = 'id2737101-147-td' CLASS = 'copy'>Proximity</td><td ID = 'id2737107-148-td' CLASS = 'copy'>Truncation</td><td ID = 'id2737113-149-td' CLASS = 'copy'>Case</td><td ID = 'id2737118-150-td' CLASS = 'copy'>Fields</td><td ID = 'id2737124-151-td' CLASS = 'copy'>Limits</td><td ID = 'id2737130-152-td' CLASS = 'copy'>Stop</td><td ID = 'id2737136-153-td' CLASS = 'copy'>Sorting</td></tr><tr ID = 'id2738106-291-tr'><td ID = 'id2738108-291-td' CLASS = 'tfoot' COLSPAN = '10'>(Courtesy of Greg R. Notess. Used with permission.)
          </td></tr><tbody ID = 'id2737144-156-tbody'><tr ID = 'id2737146-156-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737152-156-td' CLASS = 'table'>All the Web</td><td ID = 'id2737158-157-td' CLASS = 'table'>+, -</td><td ID = 'id2737164-158-td' CLASS = 'table'>AND</td><td ID = 'id2737169-159-td' CLASS = 'table'>Phrase</td><td ID = 'id2737175-160-td' CLASS = 'table'>No</td><td ID = 'id2737180-161-td' CLASS = 'table'>No</td><td ID = 'id2737186-162-td' CLASS = 'table'>Title, URL, link, more</td><td ID = 'id2737191-163-td' CLASS = 'table'>Language, domains</td><td ID = 'id2737197-164-td' CLASS = 'table'>No</td><td ID = 'id2737203-165-td' CLASS = 'table'>Relevance</td></tr><tr ID = 'id2737209-167-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737216-167-td' CLASS = 'table'>Google</td><td ID = 'id2737222-168-td' CLASS = 'table'>-, OR</td><td ID = 'id2737227-169-td' CLASS = 'table'>AND</td><td ID = 'id2737233-170-td' CLASS = 'table'>Phrase</td><td ID = 'id2737238-171-td' CLASS = 'table'>No more</td><td ID = 'id2737244-172-td' CLASS = 'table'>No domain</td><td ID = 'id2737250-173-td' CLASS = 'table'>Intitle, inurl, searches</td><td ID = 'id2737255-174-td' CLASS = 'table'>Language, on citation</td><td ID = 'id2737261-175-td' CLASS = 'table'>Yes, +</td><td ID = 'id2737267-176-td' CLASS = 'table'>Relevance,</td></tr><tr ID = 'id2737274-178-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737280-178-td' CLASS = 'table'>Lycos</td><td ID = 'id2737286-179-td' CLASS = 'table'>+, -</td><td ID = 'id2737292-180-td' CLASS = 'table'>AND</td><td ID = 'id2737297-181-td' CLASS = 'table'>Phrase</td><td ID = 'id2737303-182-td' CLASS = 'table'>No</td><td ID = 'id2737308-183-td' CLASS = 'table'>No link, more</td><td ID = 'id2737314-184-td' CLASS = 'table'>Title, URL, domain</td><td ID = 'id2737320-185-td' CLASS = 'table'>Language,</td><td ID = 'id2737325-186-td' CLASS = 'table'>No</td><td ID = 'id2737331-187-td' CLASS = 'table'>Relevance</td></tr><tr ID = 'id2737338-189-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737344-189-td' CLASS = 'table'>Northern Light</td><td ID = 'id2737350-190-td' CLASS = 'table'>AND, OR, NOT, (), +, -</td><td ID = 'id2737356-191-td' CLASS = 'table'>AND</td><td ID = 'id2737361-192-td' CLASS = 'table'>Phrase</td><td ID = 'id2737367-193-td' CLASS = 'table'>Yes * %, auto plurals</td><td ID = 'id2737372-194-td' CLASS = 'table'>No</td><td ID = 'id2737378-195-td' CLASS = 'table'>Title, URL, more</td><td ID = 'id2737384-196-td' CLASS = 'table'>Doc type date, more</td><td ID = 'id2737389-197-td' CLASS = 'table'>No</td><td ID = 'id2737395-198-td' CLASS = 'table'>Custom folders, date</td></tr><tr ID = 'id2737402-200-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737408-200-td' CLASS = 'table'>iWon</td><td ID = 'id2737414-201-td' CLASS = 'table'>AND, OR, NOT, (), +, -</td><td ID = 'id2737596-202-td' CLASS = 'table'>AND</td><td ID = 'id2737601-203-td' CLASS = 'table'>Phrase</td><td ID = 'id2737607-204-td' CLASS = 'table'>Yes * ?</td><td ID = 'id2737612-205-td' CLASS = 'table'>Yes</td><td ID = 'id2737618-206-td' CLASS = 'table'>Title, link, domain</td><td ID = 'id2737624-207-td' CLASS = 'table'>Date</td><td ID = 'id2737630-208-td' CLASS = 'table'>Yes</td><td ID = 'id2737635-209-td' CLASS = 'table'>Relevance, site</td></tr><tr ID = 'id2737642-211-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737648-211-td' CLASS = 'table'>AltaVista Simple</td><td ID = 'id2737654-212-td' CLASS = 'table'>+, -</td><td ID = 'id2737660-213-td' CLASS = 'table'>lt;5: AND; gt;4: OR</td><td ID = 'id2737666-214-td' CLASS = 'table'>Phrase</td><td ID = 'id2737671-215-td' CLASS = 'table'>Yes *</td><td ID = 'id2737677-216-td' CLASS = 'table'>Yes</td><td ID = 'id2737682-217-td' CLASS = 'table'>Title, URL, link, more</td><td ID = 'id2737688-218-td' CLASS = 'table'>Language</td><td ID = 'id2737694-219-td' CLASS = 'table'>Yes</td><td ID = 'id2737699-220-td' CLASS = 'table'>AskJeeves, RealNames, Relevance</td></tr><tr ID = 'id2737706-222-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737712-222-td' CLASS = 'table'>AltaVista Adv.</td><td ID = 'id2737718-223-td' CLASS = 'table'>AND, OR, AND NOT, ()</td><td ID = 'id2737724-224-td' CLASS = 'table'>Phrase</td><td ID = 'id2737730-225-td' CLASS = 'table'>Phrase, near</td><td ID = 'id2737735-226-td' CLASS = 'table'>Yes *</td><td ID = 'id2737741-227-td' CLASS = 'table'>Yes</td><td ID = 'id2737746-228-td' CLASS = 'table'>Title, URL, link, more</td><td ID = 'id2737752-229-td' CLASS = 'table'>Language, date</td><td ID = 'id2737758-230-td' CLASS = 'table'>No</td><td ID = 'id2737763-231-td' CLASS = 'table'>Relevance, if used</td></tr><tr ID = 'id2737770-233-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737777-233-td' CLASS = 'table'>HotBot</td><td ID = 'id2737782-234-td' CLASS = 'table'>AND, OR, - NOT, (), +,</td><td ID = 'id2737788-235-td' CLASS = 'table'>AND</td><td ID = 'id2737794-236-td' CLASS = 'table'>Phrase</td><td ID = 'id2737799-237-td' CLASS = 'table'>Yes *</td><td ID = 'id2737805-238-td' CLASS = 'table'>Yes</td><td ID = 'id2737810-239-td' CLASS = 'table'>Title, more</td><td ID = 'id2737816-240-td' CLASS = 'table'>Language, date, more</td><td ID = 'id2737822-241-td' CLASS = 'table'>Yes</td><td ID = 'id2737827-242-td' CLASS = 'table'>Relevance, site</td></tr><tr ID = 'id2737834-244-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737841-244-td' CLASS = 'table'>NBCi</td><td ID = 'id2737846-245-td' CLASS = 'table'>AND, OR, NOT, (), +, -</td><td ID = 'id2737852-246-td' CLASS = 'table'>AND</td><td ID = 'id2737858-247-td' CLASS = 'table'>Phrase</td><td ID = 'id2737863-248-td' CLASS = 'table'>Yes *</td><td ID = 'id2737869-249-td' CLASS = 'table'>Yes</td><td ID = 'id2737874-250-td' CLASS = 'table'>Title, more</td><td ID = 'id2737880-251-td' CLASS = 'table'>Language, date, more</td><td ID = 'id2737886-252-td' CLASS = 'table'>Yes</td><td ID = 'id2737891-253-td' CLASS = 'table'>Relevance</td></tr><tr ID = 'id2737898-255-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737905-255-td' CLASS = 'table' COLSPAN = '10'>The smaller search engines</td></tr><tr ID = 'id2737913-257-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737920-257-td' CLASS = 'table'>Excite</td><td ID = 'id2737925-258-td' CLASS = 'table'>AND, OR, NOT, (), +, -</td><td ID = 'id2737931-259-td' CLASS = 'table'>OR</td><td ID = 'id2737936-260-td' CLASS = 'table'>Phrase</td><td ID = 'id2737942-261-td' CLASS = 'table'>No</td><td ID = 'id2737948-262-td' CLASS = 'table'>No</td><td ID = 'id2737953-263-td' CLASS = 'table'>No</td><td ID = 'id2737959-264-td' CLASS = 'table'>Language, domain</td><td ID = 'id2737964-265-td' CLASS = 'table'>Yes</td><td ID = 'id2737970-266-td' CLASS = 'table'>Relevance, site</td></tr><tr ID = 'id2737977-268-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2737983-268-td' CLASS = 'table'>Magellan</td><td ID = 'id2737989-269-td' CLASS = 'table'>AND, OR, NOT, (), +, -</td><td ID = 'id2737994-270-td' CLASS = 'table'>OR</td><td ID = 'id2738000-271-td' CLASS = 'table'>Phrase</td><td ID = 'id2738006-272-td' CLASS = 'table'>No</td><td ID = 'id2738011-273-td' CLASS = 'table'>No</td><td ID = 'id2738017-274-td' CLASS = 'table'>No</td><td ID = 'id2738022-275-td' CLASS = 'table'>No</td><td ID = 'id2738028-276-td' CLASS = 'table'>Yes</td><td ID = 'id2738033-277-td' CLASS = 'table'>Relevance</td></tr><tr ID = 'id2738040-279-tr' VALIGN = 'middle' STYLE = 'background-color: #EEEEEE'><td ID = 'id2738046-279-td' CLASS = 'table'>WebCrawler</td><td ID = 'id2738052-280-td' CLASS = 'table'>AND, OR, OR NOT, (), +, -</td><td ID = 'id2738057-281-td' CLASS = 'table'>OR</td><td ID = 'id2738063-282-td' CLASS = 'table'>Phrase, near, adj</td><td ID = 'id2738069-283-td' CLASS = 'table'>No</td><td ID = 'id2738074-284-td' CLASS = 'table'>No</td><td ID = 'id2738080-285-td' CLASS = 'table'>No</td><td ID = 'id2738085-286-td' CLASS = 'table'>No</td><td ID = 'id2738090-287-td' CLASS = 'table'>Yes</td><td ID = 'id2738096-288-td' CLASS = 'table'>Relevance</td></tr></tbody></table>

          Popular general-purpose search engines

          The following is a list of popular general-purpose search engines and web directories.

          AltaVista—altavista.com

          Dogpile—dogpile.com

          Excite—excite.com

          FAST—alltheweb.com

          Go—go.com

          Google—google.com

          HotBot—hotbot.com

          Intelliseek—profusion.com

          Looksmart—looksmart.com

          Lycos—lycos.com

          Magellan—magellan.excite.com

          Mamma—mamma.com

          Matilda—aaa.com.au

          Metacrawler—metacrawler.com

          Northern Light—northernlight.com

          Open Directory Project—dmoz.org

          Search.com—search.com

          Snap—snap.com

          Web Crawler—webcrawler.com

          Yahoo!—yahoo.com.



No comments
The Top Plant program honors outstanding manufacturing facilities in North America. View the 2015 Top Plant.
The Product of the Year program recognizes products newly released in the manufacturing industries.
The Engineering Leaders Under 40 program identifies and gives recognition to young engineers who...
2016 Product of the Year; Diagnose bearing failures; Asset performance management; Testing dust collector performance measures
Safety for 18 years, warehouse maintenance tips, Ethernet and the IIoT, GAMS 2016 recap
2016 Engineering Leaders Under 40; Future vision: Where is manufacturing headed?; Electrical distribution, redefined
SCADA at the junction, Managing risk through maintenance, Moving at the speed of data
Safety at every angle, Big Data's impact on operations, bridging the skills gap
The digital oilfield: Utilizing Big Data can yield big savings; Virtualization a real solution; Tracking SIS performance
Applying network redundancy; Overcoming loop tuning challenges; PID control and networks
Driving motor efficiency; Preventing arc flash in mission critical facilities; Integrating alternative power and existing electrical systems
Package boilers; Natural gas infrared heating; Thermal treasure; Standby generation; Natural gas supports green efforts

Annual Salary Survey

Before the calendar turned, 2016 already had the makings of a pivotal year for manufacturing, and for the world.

There were the big events for the year, including the United States as Partner Country at Hannover Messe in April and the 2016 International Manufacturing Technology Show in Chicago in September. There's also the matter of the U.S. presidential elections in November, which promise to shape policy in manufacturing for years to come.

But the year started with global economic turmoil, as a slowdown in Chinese manufacturing triggered a worldwide stock hiccup that sent values plummeting. The continued plunge in world oil prices has resulted in a slowdown in exploration and, by extension, the manufacture of exploration equipment.

Read more: 2015 Salary Survey

Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Society for Maintenance and Reliability Professionals an organization devoted...
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
This article collection contains several articles on the vital role of plant safety and offers advice on best practices.
This article collection contains several articles on the Industrial Internet of Things (IIoT) and how it is transforming manufacturing.
This article collection contains several articles on strategic maintenance and understanding all the parts of your plant.
click me