Internet searches made simple

There are hundreds of millions of web pages accessible via the internet. Some pages may be easy to find; others may not. Unless you are provided with a specific universal resource locator (URL), or are extremely lucky, you must search through the masses for the specific pages that contain the information required.


There are hundreds of millions of web pages accessible via the internet. Some pages may be easy to find; others may not. Unless you are provided with a specific universal resource locator (URL), or are extremely lucky, you must search through the masses for the specific pages that contain the information required. A search engine is needed to find the information.

Search engines are general and commercial or site specific. Site specific choices use database-querying tools or employ general-purpose external search engines. Regardless of the type, knowing how search engines work enables you to make the most of these powerful tools.

How do search engines locate information? How can you create searches that provide the information you seek? This article provides the answers.

How search engines work

A search engine uses special software tools, commonly referred to as robots or spiders, to assemble lists of the words found on web sites. Web crawling is the term used to describe when a spider is building its lists.

Spiders usually begin by looking at lists of heavily used servers and popular pages. Beginning with a popular site, a spider indexes the words on its pages and follows every link found within the site. The spider spreads out quickly across the most widely used areas of the web, much like a true spider weaves its web.

For example, when the search engine spider, which began as an academic search engine, looked at an HTML page, it made a list of the words within the page and noted where the words were found. Words with positions of relative importance, such as those occurring in the title, subtitles, and meta-tags, were noted for special consideration during subsequent user searches. This spider was designed to index every significant word on a page except for the articles "a," "an," and "the."

Other spiders work differently. Some approaches are devised to make the spider operate faster, allow more efficient searches, or both. The Lycos spider keeps track of words in the title, subheads, links, and the 100 words used most frequently on the page, along with every word in the first 20 lines of text. AltaVista indexes every word on a page, including the articles.

Other systems place importance on meta-tags. These identifiers allow page owners to specify key words and concepts that aid in indexing the page. Meta-tags can direct the search engine in selecting among several possible meanings to find the correct word or words.

The task of finding information on web pages is never actually completed. Since the web is always changing, the spiders are always crawling. Regardless, the search engine must store the retrieved information in a useful way. The factors important to making gathered data accessible to users are the information stored with the data and the method by which the information is indexed.

A search engine would be of limited use if it merely stored the word and the URL where it was found. So, other factors must also be weighed.

Was the word used in an important or trivial way? How many times was the word used? Was it used only once on the page? Does the page contain links to other pages containing the word? It is necessary to provide a list that "ranks" the most useful pages at the top of the search results list.

An index allows information to be found as quickly as possible. To build an index, a formula applies and attaches a numerical value to each word. This formula evenly distributes the entries across a number of divisions different from the distribution of words across the alphabet. This process is called hashing, and the result is a hash table.

Because there are more words that begin with some letters than with others, finding a word with a frequently used initial letter would take longer than a word with an infrequently used letter. Hashing reduces the average amount of time necessary to locate an entry and separates the index from the actual entry.

The hash table contains the calculated and assigned hash number, along with a vector to the actual data. These data can be sorted to maximize storage efficiency. This indexing/storage combination enables quick results, even with complicated searches.

Basic searches

To search through an index, you must first build a query. A query can be as simple as a single word, or it can be a complicated combination of words and operators.

To use a search engine, such as Yahoo! or Lycos, navigate to the opening page. The opening page for Yahoo! is either or just Your browser should take you there regardless of whether or not you enter the www. Access Lycos by typing in either or into the window of your browser.

When the search engine portal appears, type your search word or phrase into the search window. Typing the phrase " Plant Engineering magazine" into the search window of Lycos, then clicking on search produced four web sites based on user selection traffic. However, 95,872 web pages were found in a search of the complete Lycos catalog or index .

Regardless of how efficient a plant engineer is, none has the time to visit more than 90,000 web hits. It quickly becomes necessary to limit your search.

Most search engines allow you to narrow your search. In Lycos, click the checkbox labeled search these results to "drill down" within your search criteria.

For example, checking this box, then typing " Information Engineering" into the search box and clicking the search button resulted in 447 sites from the entire Lycos catalog. A far cry from 95,872, but probably not a manageable number. In an effort to limit the number of hits, quotation marks were included around the search subject "Information Engineering" to force Lycos into returning pages containing only sites with the specific phrase in the exact order within the quotes.

Basically, a word or a phrase can initiate a simple search. And simple searches can be useful. However, to be effective, sometimes a complex or advanced search is necessary.

Boolean operators

Building a complex query requires the use of Boolean operators. Boolean operators allow you to refine and extend the terms of the search.

There are several Boolean operators often used.

  • AND—Any terms joined by AND must appear in the pages or documents. Some search engines substitute the operator "+" for the word AND .

  • OR—At least one of the terms joined by OR must appear in the pages or documents.

  • NOT—The term or terms following NOT must not appear in the pages or documents. Some search engines substitute the operator

  • "-" for the word NOT .

  • FOLLOWED BY—One of the terms must be directly followed by the other.

  • NEAR—One of the terms must be within a specified number of words of the other.

  • Quotation Marks—The words between the quotation marks are treated as a phrase, and that phrase must be found within the document or file in the exact order.

    • The "+" symbol is especially helpful when you do a search and then find yourself overwhelmed with information.

      Using Yahoo!, the following was typed into the search window: plant engineering magazine+ information engineering+cmms/eam

      Yahoo! returned 10 web site matches, most of which were relevant (Fig. 2). This number is much more useable than 95,872, or even 447 sites that were not limited by Boolean operators.

      Often, you may need a search engine to find pages that have one word on them but not another word. The "-" symbol allows this type of search.

      Using Yahoo!, the following was typed into the search window:

      plant engineering magazine+information engineering+eam-cmms.

      Yahoo! returned nine web site matches, which included the "EAM" term and excluded the "CMMS" term. Eliminate terms you know are not of interest to get the best results from the "-"operator.

      Advanced searches

      For most users, general searching techniques using Boolean or symbol operators are sufficient. However, if more searching power is required, the following commands are useful.


      Occasionally you want web pages that contain any of the terms for which you are searching. Some search engines do this automatically. It is not necessary to enter a special operator. Those search engines include AltaVista, Excite, GoTo, Go, LookSmart, Netscape, Snap, WebCrawler, and Yahoo!. AOL Search, HotBot, Lycos, and MSN Search have "match any" as a menu item adjacent to the search window. You must use the Boolean operator OR with Northern Light. Google does not support the MATCH ANY command.

      It should be noted that most search engines automatically list pages with all your search terms first, then some of your terms.


      MATCH ALL is a search term for web pages that contain all your search terms. The search engines for which MATCH ALL is automatic include AOL Search, Google, HotBot, Lycos, MSN Search, and Northern Light. You must use the "+" operator with all other engines. Almost all the major search engines support the "+" operator as a command.

      Title search

      Many search engines allow you to search within a web page's HTML title. For example, this page has an HTML title similar to this:

      &title>Internet searches made simple&/title>.

      There are several ways to execute a title search, depending on the search engine used. AltaVista, GoTo, HotBot, Go, MSN Search, Northern Light, and Snap require TITLE in the search window. It is important to include the colon punctuation in the command. An identifying word, phrase, or entire title follows the colon when you type it into the search window. Yahoo! requires "t:" instead of "title:." The Lycos title search is available on a menu on its advanced search page. AOL Search, Excite, Google, LookSmart, Netscape, and WebCrawler do not support searching for HTML titles.

      Site search

      Sometimes you may want to control which sites are included or excluded from a search. This ability is a powerful search engine feature.

      This feature allows you to:

      • See all the pages indexed from a specific domain

      • See all the pages indexed from a specific domain that contain a word or phrase

      • Use include and exclude commands along with specific domain searches

      • Include or exclude domains such as .edu for educational institutions, .gov for governmental, .org for organizational, or .us for domains located in the United States (or the appropriate code for any other country). Each country has a unique suffix. For example, the suffix for the United Kingdom is .uk.

        • GoTo, HotBot, MSN Search, and Snap support the domain: syntax, which enables you to specify the domain to include or exclude. AltaVista requires the term host: while Go requires site:. To perform a site search with Lycos, you must access a menu on the advanced search page.


          (*)The asterisk (*) can be used as a wildcard for searches or certain other data operations. Wildcards are used to search for plurals or variations of words. It also comes in handy if you are unsure of the exact spelling of a word.

          AOL Search, AltaVista, HotBot, MSN Search, Northern Light, Snap, and Yahoo! support wildcard searches and use the * operator. Excite, Google, GoTo, Go, LookSmart, Lycos, and Web Crawler do not support the use of wildcards.

          Search engine showdown
          This table compares the popular search engines and lists accepted Boolean operators, defaults, case sensitivity, and other important parameters.

          Search engineBooleanDefaultProximityTruncationCaseFieldsLimitsStopSorting
          (Courtesy of Greg R. Notess. Used with permission.)
          All the Web+, -ANDPhraseNoNoTitle, URL, link, moreLanguage, domainsNoRelevance
          Google-, ORANDPhraseNo moreNo domainIntitle, inurl, searchesLanguage, on citationYes, +Relevance,
          Lycos+, -ANDPhraseNoNo link, moreTitle, URL, domainLanguage,NoRelevance
          Northern LightAND, OR, NOT, (), +, -ANDPhraseYes * %, auto pluralsNoTitle, URL, moreDoc type date, moreNoCustom folders, date
          iWonAND, OR, NOT, (), +, -ANDPhraseYes * ?YesTitle, link, domainDateYesRelevance, site
          AltaVista Simple+, -lt;5: AND; gt;4: ORPhraseYes *YesTitle, URL, link, moreLanguageYesAskJeeves, RealNames, Relevance
          AltaVista Adv.AND, OR, AND NOT, ()PhrasePhrase, nearYes *YesTitle, URL, link, moreLanguage, dateNoRelevance, if used
          HotBotAND, OR, - NOT, (), +,ANDPhraseYes *YesTitle, moreLanguage, date, moreYesRelevance, site
          NBCiAND, OR, NOT, (), +, -ANDPhraseYes *YesTitle, moreLanguage, date, moreYesRelevance
          The smaller search engines
          ExciteAND, OR, NOT, (), +, -ORPhraseNoNoNoLanguage, domainYesRelevance, site
          MagellanAND, OR, NOT, (), +, -ORPhraseNoNoNoNoYesRelevance
          WebCrawlerAND, OR, OR NOT, (), +, -ORPhrase, near, adjNoNoNoNoYesRelevance

          Popular general-purpose search engines

          The following is a list of popular general-purpose search engines and web directories.















          Northern Light—

          Open Directory Project—




Top Plant
The Top Plant program honors outstanding manufacturing facilities in North America.
Product of the Year
The Product of the Year program recognizes products newly released in the manufacturing industries.
System Integrator of the Year
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
October 2018
Tools vs. sensors, functional safety, compressor rental, an operational network of maintenance and safety
September 2018
2018 Engineering Leaders under 40, Women in Engineering, Six ways to reduce waste in manufacturing, and Four robot implementation challenges.
GAMS preview, 2018 Mid-Year Report, EAM and Safety
October 2018
2018 Product of the Year; Subsurface data methodologies; Digital twins; Well lifecycle data
August 2018
SCADA standardization, capital expenditures, data-driven drilling and execution
June 2018
Machine learning, produced water benefits, programming cavity pumps
Spring 2018
Burners for heat-treating furnaces, CHP, dryers, gas humidification, and more
October 2018
Complex upgrades for system integrators; Process control safety and compliance
September 2018
Effective process analytics; Four reasons why LTE networks are not IIoT ready

Annual Salary Survey

After two years of economic concerns, manufacturing leaders once again have homed in on the single biggest issue facing their operations:

It's the workers—or more specifically, the lack of workers.

The 2017 Plant Engineering Salary Survey looks at not just what plant managers make, but what they think. As they look across their plants today, plant managers say they don’t have the operational depth to take on the new technologies and new challenges of global manufacturing.

Read more: 2017 Salary Survey

The Maintenance and Reliability Coach's blog
Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
One Voice for Manufacturing
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Maintenance and Reliability Professionals Blog
The Society for Maintenance and Reliability Professionals an organization devoted...
Machine Safety
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
Research Analyst Blog
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Marshall on Maintenance
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
Lachance on CMMS
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
Material Handling
This digital report explains how everything from conveyors and robots to automatic picking systems and digital orders have evolved to keep pace with the speed of change in the supply chain.
Electrical Safety Update
This digital report explains how plant engineers need to take greater care when it comes to electrical safety incidents on the plant floor.
IIoT: Machines, Equipment, & Asset Management
Articles in this digital report highlight technologies that enable Industrial Internet of Things, IIoT-related products and strategies.
Randy Steele
Maintenance Manager; California Oils Corp.
Matthew J. Woo, PE, RCDD, LEED AP BD+C
Associate, Electrical Engineering; Wood Harbinger
Randy Oliver
Control Systems Engineer; Robert Bosch Corp.
Data Centers: Impacts of Climate and Cooling Technology
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
Safety First: Arc Flash 101
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
Critical Power: Hospital Electrical Systems
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
Design of Safe and Reliable Hydraulic Systems for Subsea Applications
This eGuide explains how the operation of hydraulic systems for subsea applications requires the user to consider additional aspects because of the unique conditions that apply to the setting
click me