4

Site hosted by Angelfire.com: Build your free website today!

WHAT IS IT USEFUL FOR

The Web is potentially a terrific place to get information on almost any topic. Doing research without leaving your desk sounds like a great idea, but all too often you end up wasting precious time chasing down useless URLs. Almost everyone agrees that there's gotta be a better way! But for now we're stuck with making the best use of the search tools that already exist on the Web.

It's important to give some thought to your search strategy. Are you just beginning to amass knowledge on a fairly broad subject? Or do you have a specific objective in mind--like finding out everything you can about carpal tunnel syndrome, or the e-mail address of your old college roommate?

If you're more interested in broad, general information, the first place to go is to a Web Directory. If you're after narrow, specific information, a Web search engine is probably a better choice.

Searching by Means of Subject Directories

Think back to the library card catalogue analogy. In the old card files, and even in today's computer terminal library catalogues, you find information by searching on either the author, the title, or the subject. You usually choose the subject option when you want to cover a broad range of information.

Example: You'd like to create your own home page on the Web, but you don't know how to write HTML, you've never created a graphic file, and you're not sure how you'd post a page on the Web even if you knew how to write one. In short, you need a lot of information on a rather broad topic--Web publishing.

Your best bet is not a search engine, but a Web directory like Yahoo. Yahoo is a subject-tree style catalogue that organizes the Web into 14 major topics, including Arts, Business and Economy, Computers and Internet, Education, Entertainment, Government, Health, News, Recreation, Reference, Regional, Science, Social Science, Society and Culture. Under each of these topics is a list of subtopics, and under each of those is another list, and another, and so on, moving from the more general to the more specific.

Example: To find out about Web page publishing from Yahoo, select the Computers and Internet Topic, under which you find a subtopic on the Wide World Web. Click on that and you find another list of subtopics, several of which are pertinent to your search: Web Page Authoring, CGI Scripting, Java, HTML, Page Design, Tutorials. Selecting any of these subtopics eventually takes you to Web pages that have been posted precisely for the purpose of giving you the information you need.

If you are clear about the topic of your query, start with a Web directory rather than a search engine. Directories probably won't give you anywhere near as many references as a search engine will, but they are more Web directories usually come equipped with their own keyword search engines that allow you to search through their indices for the information you need..

Important note: More and more search engines are incorporating Web directories into their sites. These directories interact with the main search engine on the site in various ways. See Ex

They are now characterizing themselves as Web portals or hubs -- places where people come to on the Web to get information about a multitude of subjects, and even to chat, send email and form online communities.

Searching by Means of Search Engines

This is where things start to get very complicated :)
Search engines are trickier than they look! You'll discover this the first time you enter a query on C++, the programming language. At least of the Web search engines will essentially say, "Huh?"

C++ is not a word. It's a letter followed by two characters that might, depending on the index, be regarded merely as punctuation. Many text search engines have trouble handling input of this type. Many don't deal too well with numbers, either. So much for "007," "R2D2,"or "Catch-22."

Important Note: This problem is no longer as bad as it used to be. I'm now finding relevant hits for C++ on a majority of search engines.. However, if you enclose the query in quotation marks, forcing the search engine to find the words, "to be or not to be" in that precise order, most search engines can recognize the phrase as a famous quotation from Hamlet. . If you enter the words as a phrase, however, you stand a better chance of getting some good hits.

However, as search technology advances, this is not as much of a problem as it was a couple of years ago. Many search engines will now automatically apply the "adjacency" operator when responding to a two-word query. This mean that they will indeed look for documents in which your two words appear next to each other.

If you understand how search engines organize information and run queries, you can maximize your chances of getting hits on A Helpful Guide to Web Search Engines

How Search Engines Work

	Keyword Searching
	Concept-based Searching
	Refining Your Search
	Relevancy Ranking
	Meta Tags

Search engines use software robots to survey the Web and build their databases. Web documents are retrieved and indexed. When you enter a query at a search engine website, your input is checked against the search engine's keyword indices. The best matches are then returned to you as hits.

There are two primary methods of text searching--keyword and concept.

Keyword Searching

This is the most common form of text search on the Web. Most search engines do their text query and retrieval using keywords.

Unless the author of the Web document specifies the keywords for her document (this is possible by using meta tags in the latest version of HTML), it's up to the search engine to determine them. Essentially, this means that search engines pull out and index words that are believed to be significant. Words that are mentioned towards the top of a document and words that are repeated several times throughout the document are more likely to be deemed important.

Some sites index every word on every page. Others index only part of the document. For example, Lycos indexes the title, headings, subheadings and the hyperlinks to other sites, along with the first 20 lines of text.

Full-text indexing systems generally pick up every word in the text except commonly occurring stop words such as "a," "an," "the," "is," "and," "or," and "www." AltaVista claims to index all words, even the articles, "a," "an," and "the." Some of the search engines discriminate upper case from lower case; others store all words without reference to capitalization.

The Problem With Keyword Searching

Keyword searches have a tough time distinguishing between words that are spelled the same way, but mean something different (i.e. hard cider, a hard stone, a hard exam, and the hard drive on your computer). This often results in hits that are completely irrelevant to your query. Some search engines also have trouble with so-called stemming--i.e., if you enter the word "big," should they return a hit on the word, "bigger?" What about singular and plural words? What about verb tenses that differ from the word you entered by only an "s," or an "ed"?

Search engines also cannot return hits on keywords that mean the same, but are not actually entered in your query. A query on heart disease would not return a document that used the word "cardiac" instead of "heart."

Concept-based searching

Unlike keyword search systems, concept-based search systems try to determine what you mean, not just what you say. In the best circumstances, a concept-based search returns hits on documents that are "about" the subject/theme you're exploring, even if the words in the document don't precisely match the words you enter into the query.

Excite is currently the best-known general-purpose search engine site on the Web that relies on concept-based searching.

This is also known as clustering -- which essentially means that words are examined in relation to other words found nearby.

How does it work? There are various methods of building clustering systems, some of which are highly complex, relying on sophisticated linguistic and artificial intelligence theory that we won't even attempt to go into here. Excite sticks to a numerical approach. Excite's software determines meaning by calculating the frequency with which certain important words appear. When several words or phrases that are tagged to signal a particular concept appear close to each other in a text, the search engine concludes, by statistical analysis, that the piece is "about" a certain subject.

For example, the word heart, when used in the medical/health context, would be likely to appear with such words as coronary, artery, lung, stroke, cholesterol, pump, blood, attack, and arteriosclerosis. If the word heart appears in a document with others words such as flowers, candy, love, passion, and valentine, a very different context is established, and the search engine returns hits on the subject of romance.

Warning: This often works better in theory than in practice. Concept-based indexing is a good idea, but it's far from perfect. The results are best when you enter a lot of words, all of which roughly refer to the concept you're seeking information about.

Refining Your Search

Most sites offer two different types of searches--"basic" and "refined." In a "basic" search, you just enter a keyword without sifting through any pulldown menus of additional options. Depending on the engine, though, "basic" searches can be quite complex.

Search refining options differ from one search engine to another, but some of the possibilities include the ability to search on more than one word, to give more weight to one search term than you give to another, and to exclude words that might be likely to muddy the results. You might also be able to search on proper names, on phrases, and on words that are found within a certain proximity to other search terms.

Some search engines also allow you to specify what form you'd like your results to appear in, and whether you wish to restrict your search to certain fields on the internet (i.e., usenet or the Web) or to specific parts of Web documents (i.e., the title or URL).

Many, but not all search engines allow you to use so-called Boolean operators to refine your search. These are the logical terms AND, OR, NOT, and the so-called proximal locators, NEAR and FOLLOWED BY.

Boolean AND means that all the terms you specify must appear in the documents, i.e., "heart" AND "attack." You might use this if you wanted to exclude common hits that would be irrelevant to your query.

Boolean OR means that at least one of the terms you specify must appear in the documents, i.e., bronchitis, acute OR chronic. You might use this if you didn't want to rule out too much.

Boolean NOT means that at least one of the terms you specify must not appear in the documents. You might use this if you anticipated results that would be totally off-base, i.e., nirvana AND Buddhism, NOT Cobain.

Not quite Boolean + and - Some search engines use the characters + and - instead of Boolean operators to include and exclude terms.

NEAR means that the terms you enter should be within a certain number of words of each other. FOLLOWED BY means that one term must directly follow the other. ADJ, for adjacent, serves the same function. A search engine that will allow you to search on phrases uses, essentially, the same method (i.e., determining adjacency of keywords).

Phrases: The ability to query on phrases is very important in a search engine. Those that allow it usually require that you enclose the phrase in quotation marks, i.e., "space the final frontier."

Capitalization: This is essential for searching on proper names of people, companies or products. Unfortunately, many words in English are used both as proper and common nouns--Bill, bill, Gates, gates, Oracle, oracle, Lotus, lotus, Digital, digital--the list is endless.