Site hosted by Angelfire.com: Build your free website today!

WEEK FOUR
        Doing Research Online:
Planning Your Search, Exploring and Using Search Tools


What This Lesson Includes:


Exactly What Is It That We Are Looking At When We Are Searching?

Everything from specialized full-text and statistical databases to online library catalogs to web sites and web pages found by using a search engine or directory is available online. It's important to begin by knowing exactly what we are searching.

As we learned in our earlier lesson, when we are looking for web documents (web pages or websites), what we are really doing is using a search tool that looks at its own database or collection of information about websites that live on computers (servers) located worldwide. The World Wide Web contains billions of documents, and unlike a library's catalog which is indexed using standard terms, the Web is not indexed using any common vocabulary. This means when we enter our search term(s) we are really guessing what terms someone used to organize a web page or website on a particular topic, and hoping that our guess will match those terms so we can find the link to that document!

So, we enter our search term and hope for the best!


Even though any search tool we choose is really only looking at a small subset of the entire World Wide Web, the number of links to websites that come to us in response to a search can be overwhelming. So, a good concept to keep in mind is that it is impossible for any search to look at the entire web, but your searching techniques can be developed so you will choose the best search tool, maximize your efforts and find what you need.

return to top


Types of Search Tools:

The basic categories of search tools include search engines, metasearchers, subject directories, and library gateways/specialized databases. For each of these, we'll look at what it is, how it works, the pros and cons of using it, and examples of searches for which you would want to use it.

return to top


Search Engines:

What Is A Search Engine?
Search engines are huge databases of web page files that have been assembled automatically by machine. There are two types of search engines: individual search engines that compile their own searchable databases (Google, alltheweb) and Metasearchers that search the databases of multiple individual search engines simultaneously (ixquick, vivisimo, surfwax).

How Do Search Engines Work?
Search engines compile their databases by employing "spiders" or "robots" (sometimes called bots) to crawl the web from link to link, identifying and perusing pages. Websites that have no links to other pages may be missed altogether. Once the spiders gets to a web site, it typically indexes words on the publicly available pages at the site. Web page owners who want their page found by a search engine often submit their URLs to search engines for "crawling" and eventual inclusion in their databases.

No two search engines are exactly the same in terms of size, speed and content; no two search engines use exactly the same ranking schemes, and not every search engine offers you exactly the same search options. Estimates put search engine overlap at approximately 60 percent and unique content at around 40 percent.

Remember when we said we are guessing what terms were used to define or organize a web site? Whenever you search the web using a search engine, you're asking the engine to scan its index of sites and match your keywords and phrases with those in the search engine's database -- hoping your guess matches the terms used.

Search engines use sets of rules or guidelines (varying from one engine to another) to rank pages. They want to return the most relevant pages at the top of their lists, so they look for the location and frequency of keywords and phrases in the web page document and (sometimes in the HTML coding that doesn't appear on the web page). They check out the title field and scan the headers and text near the top of the document. Some of them assess popularity by the number of links that are pointing to sites; the more links, the greater the popularity, i.e., value of the page.

How Current Is the Data retrieved by a Search Engine?
Spiders regularly return to the web pages they index to look for changes. When changes occur, the index is updated to reflect the new information. However, the process of updating can take a while, depending upon how often the spiders make their rounds and then, how promptly the information they gather is added to the index.
When you are using a search engine, you are NOT searching the entire web as it exists at this moment, you are really using a search tool to look at a portion of the web, indexed previously.

Most search engine companies have partnered with specialized news databases that are up to the minute, allowing the search engine to provide up-to-the-minute news, usually accessible by clicking a tab or link labeled "news"  Good examples include All the Web News, Yahoo! News and Google News:

What Are the Pros and Cons of Search Engines?
Search engines provide access to a fairly large portion of the publicly available pages on the Web, and continue to prove to be the best means devised yet for searching the web. However, the sheer number of words indexed by search engines increases the likelihood that they will return hundreds of thousands of responses to simple search requests, especially when you consider that a search engine will return lengthy documents in which your keyword appears only once (and as you have likely noticed, many of these "hits" will be irrelevant to your search.

When Should You Use Search Engines?
Search engines are best at finding unique keywords, phrases, quotes, and information buried in the full-text of web pages. Because they index word by word, search engines are also useful in retrieving tons of documents. If you want a wide range of responses to specific queries, use a search engine.

NOTE: Today, the line between search engines and subject directories is blurring. Search engines are partnering with subject directories, or creating their own directories, and returning results gathered from a variety of other guides and services as well.

What Are Some Examples of Search Engines?
Google, Teoma and All the Web are good search engines to look at.

return to top


Metasearchers:

What is A Metasearcher?
Unlike search engines, that crawl the web compiling their own searchable databases, metasearchers search the databases of multiple sets of individual search engines simultaneously, from a single site and using the same interface. Metasearchers provide a quick way of finding out which engines are retrieving the best results for you in your search.

How Does A Metasearcher Work?
After compiling results from several search engines, metasearchers present the results of their searches in either a single merged list (without duplicate entries) or in separate lists as they were received from each search engine (duplicate entries may show up).

What Are the Pros and Cons of Metasearchers?
They can give you a fair picture of what's available on the Web and where it can be found, and are usually very fast. You generally can't choose how your search is configured or conducted, so you are at the mercy of the metasearch engine to present your search.

When Should You Use a Metasearcher?
Students often like metasearchers, because they are a good tool when you are in a hurry. They can obtain a quick overview on a subject and/or unique term. They are also good choices when you are "striking out" using other search tools.

What Are Some Examples of Metasearchers?
Examples of metasearch engines include
Ixquick, Metor, Vivisimo, Profusion, surfwax and Copernic Agent.

return to top


Subject Directories:
NOTE: Today, the line between search engines and subject directories is blurring, particularly notable with Google's Directory option (click on the tab that says :Directory" when you go to the Google site to see this).

What Are Subject Directories?
Unlike search engines, directories are created and maintained by humans rather than by electronic spiders or robots. These editors review and select sites for inclusion in their directories on the basis of previously determined selection criteria. The resources they list are usually annotated. Because they generally index only the home page or top level of a website, directories tend to be smaller than search engine databases.

How Does A Subject Directory Work?
When you enter your search term, a directory tries to match your term or phrase with those in its written descriptions. Subject directories include general directories, academic directories, commercial directories, and portals. Portals are directories created or used by private interests or companies to use as gateways to the web. Another new trend is toward "Vortals" (vertical portals) that are subject-specific. Examples include the
Internet Movie Database, SportSearch  and WebMD.

What Are the Pros and Cons of Subject Directories?
Because of the human element, directories tend to deliver high quality content. The hierarchical organization is very popular. The downside to this type of organizational structure is seen when a searcher needs to click through several layers to get to an actual web document.  Directories may also provide a smaller number of "hits" than search engines.

Dead links (often created because a web page changed content after inclusion in a directory) tend to be a real problem  for subject directories, and some people view them as being too heavily populated with e-commerce sites.

When Should You Use Directories?
These are best for general searching and browsing (think of the telephone book or Yellow Pages: if your picnic table is broken and you want to find a repair person, first you find "furniture" then you go to "outdoor furniture" and then "repair."

return to top


Library Gateways, Specialized Databases and "Vortals":

What Are Library Gateways and Subject-Specific Databases?
Library gateways are collections of databases and organized lists of informational sites, created, recommended and reviewed by specialists (usually librarians). These support reference and research by identifying and pointing to academically-oriented pages on the Web. Subject-specific databases or vortals ("vertical portals") are databases devoted to a single subject. They tend to be created by governmental agencies, business interests, professors, researchers,  and other subject specialists in a particular field.

When Do You Use Gateways or Vortals?
These are best for locating high quality information sites on the Web. Searchers can feel fairly comfortable that these sites have been reviewed and evaluated by subject specialists for their accuracy and content. Increasingly, search engines and directories provide links to these subject-specific databases.

What About the "Invisible Web"?
There is a gigantic portion (60 to 80 percent) of the Web that traditional search tools are unable to or prohibited from indexing. It  includes password-protected sites and documents behind firewalls.  Not usually visible to search engine spiders because their pages are embedded within individual Web sites, much of the information is assembled dynamically in response to specific queries so is difficult to index and maintain. Two of the best sites for locating "hidden" sites are Complete Planet and the Invisible Web Directory.

How Can You Access the  "Invisible Web" sites?
In order to get to much of the Invisible or "Deep" web, you have to point your browser directly at the sites. This is exactly what many library gateways and subject-specific databases do. They are good sources for direct links to database information stored on the "Invisible Web."

When Should You Use a Gateway or Vortal?
These are excellent choices when you want high quality information sites that have been validated and/or verified by subject specialists for their accuracy and content. They are also good for news links, multimedia files, mailing lists and finding people.

What Are Some Examples of Library Gateways?

What Are Some Examples of "Vortals" (Subject-Specific Databases)?

return to top



With All These Tools Available, How Do You Choose Which One to Use? And How Do You Formulate Your Search?

It's tempting to just dive in to your search, but it's a good idea to THINK about your search before you begin. Create a search strategy in your head by asking yourself "What is it I want to do? Browse? Locate a specific piece of information?  Find everything I can on a subject?"

The answer to this will steer you toward the best search tools to use and help you formulate your search strategy.

Defaults, Stopwords and other mysterious-sounding things that affect your results

If you just start by entering more than one keyword into your search without using any accompanying sign, mark or symbol, the search engine will most commonly automatically add either AND or OR to link your search terms together. This could radically alter your search in unexpected ways. The defaults are the basic settings of the search engine you are using, and can often explain why your search results may not be what you expect them to be.

Strange things can happen for other reasons as well. Sometimes search engines use relevance ranking systems that can throw off your search by ignoring some of the words in your search statement. This might happen when the search engine recognizes your string of separate keywords as a phrase in its list of pre-determined phrases.

Another time this can happen is when the search engine is responding to its own internal list of "stop words" (these are words that some search tools ignore in order to cut down response time). Stop word lists tend to include small common words, such as a, about, an, and, are, as, at, be, by, from, how, i, in, is, it, of, on, or, that, the, this, to, we, what, when, where, which, with, etc.  If you initiate a search at a site that maintains a list of stop words and you type any of those words into your search statement (even in phrases surrounded by quotes), they may well be ignored. An exception to this is Google, which has a stop word list but recognizes stop words within phrases surrounded by quotation marks, e.g., "to be or not to be" or "what you see is what you get".

You may never know the real reason why your search retrieves so many irrelevant responses, and it can be frustrating!

return to top


Formulating Your Search Strategy:

The Teaching Library at U.C. Berkeley has an excellent 5-step tutorial  that will help you formulate your strategy for effective searching. To avoid frustration and the feeling of being overwhlemed when you do your searches is to think about your search BEFORE you begin. The online article Things To Know Before You Begin Searching, also from U.C. Berkeley, is a good place to start. 

return to top


Types of Searches

You can us "Boolean operators" such as AND, OR and NOT to include or exclude keywords from a search. In other words, if you were trying to find information about lions and tigers, you could structure the following search:

lions AND tigers

This would retrieve any sites that include references to both lions and tigers

lions OR tigers

This would retrieve any sites that include a reference to the keyword lions or the keyword tigers, but not necessarily to both in the same site.

lions OR tigers NOT Detroit

This would retrieve any sites that include a reference to the keyword lions or the keyword tigers but not the keyword Detroit, so it would exclude search results that were about either Detroit's baseball team (Tigers) or NFL football team (Lions). Here is a PDF guide to assist you in constructing boolean searches.

return to top


Research Steps Simplified:

To perform research effectively, both online and using print materials:

1.    Identify your topic (a good technique is to state your topic as a question)
2.    Find background information (look up keywords in subject encyclopedias).
3.    Use catalogs to find books (start with the
MPC Library online catalog)
4.    Use indexes to find periodical articles (available from the
MPC library web page
5.    Find internet, audio and video resources
6.    Evaluate your search results
7.    Cite your sources in a standard format (MPC library has
online information)

The following tips can help determine the terms to use when formulating your search query for online searching:

return to top


Online Tutorials and Additional Resources

return to top


© 2004 Stephanie Tetter