Topic 1 Introduction to Internet Resources
Categories Subjects Purpose Search Facilities Search Strategy
Search Engines
- Most commonly used facility to find information
on the internet.
- Use software to automatically generate
a database of websites and pages.
- Work behind the scene
of a search system, in which automated programs called spiders
or crawlers do indexing work for the search engines. They
go out on to the Web and look at pages and the words on those
pages, building huge lists of terms.
Work of Search Engine
- Spiders/Crawlers
- Visits a web page, reads it, then follows links to other pages within the site
- Returns to the site on a regular basis, e.g. every month to look for changes
- Differs in the depth
- First 100 keywords
- The entire full text
- Top level of a site
Some spiders only look only at the first bit of each page
or just the top level. The top level might be the opening
pages of the Web site or the first few pages of a multilevel
site. Other spiders might scan the full text of a page.
With about 4 billion pages and increasing, there is no way
that the search engines can get to all of them. Thus, depending
on how deep they go, this will determine the number of pages
that they retrieve.
- Index
- Also known as catalog.
- Contains a copy of every web page that the spider finds
- If a web page changes, then this index is updated with new information.
- Search engine software
- Sifts through the millions of pages recorded in the index
- Find matches to the search terms
- Rank them in order of what it believes is most relevant
<Previous> <Next>
Home Introduction Topic Assignment Resources Discussion Forum Contact Us
© 2005 Temasek Polytechnic
|