Internet and Research: Searching and Information Evaluation [Archives:2000/18/Science & Technology]

May 1 2000

By Helmi Noman,
Director of the American Internet Information Center
The fact that almost anyone can publish anything on the Internet is both a strength and a weakness. The increasing number of Web pages and the lack of organization created the necessity to learn sophisticated search skills especially when looking for complex search queries. In addition, fabricated information, biased, outdated or inaccurate information exists online because there is no quality control. Thus, the Internet should be considered as a tool that compliments other traditional search tools and does not replace them. A researcher might find the information needed but this does not necessarily mean that all information is available at all times. Other research tools should be considered when conducting a comprehensive research.
Limitations of Internet Resources:
1. Even though there are many free online full text books, many valuable information is not available for free; they are available in commercial databases which are sometimes too expensive for individuals and small research institutions.
2. Archival information is usually not available online for free. Some commercial databases such as Lexis-Nexis have archival information of newspapers, journals and wire services.
3. Selected chapters from some books and abstracts from some periodicals are available but full text articles and books are not available on the Internet but could be available via the Internet.
4. Free information does not mean good information; evaluating Internet information resources is essential for authentic research.
5. Sometimes the information is free but the cost of accessing the Internet is very high. Searchers should consider print materials for reference queries.
How Do Search Engines Work?
46% of Internet users find new web sites via search engines (IMT Strategies) and 57% of Internet users search the Web each day, making search the second most popular Internet activity (SRI). Understanding how search engines work helps searchers use them better. The following chart shows the number of Web pages indexed by each search engine. Sizes are reported by each search engine as of February 3, 2000.
Search engines track down Web pages by deploying spiders or robots which follow links available on the Internet and then add them to their own databases. Each search engine has its own way of tracking and adding Web pages. Web addresses can also be registered by publishers. 1 billion indexable pages exist on the Internet (Joint study published by Inktomi and the NEC Research Institute, February 2000). Based on this estimation, the most comprehensive search engine indexes only 35% of the total number of Web pages.
For researchers, this means the following:
1. There is a lack of organization of information on the Internet.
2. In many cases, search engines produce huge numbers of results that are not related to the search query.
3. Search engines do not produce accurate bibliographic information about documents such as author, date of publication and subject.
4. There is a possibility of spamdexing which is using keywords to describe a document to be something that is not true.
5. No single search engine indexes all of the documents available on the Internet. Searchers should use various ones for more comprehensive searches.
6. Web pages are created and added to the Internet on a daily basis and search engines can not cope with the speed of this increase. Thus, search engines should not be expected to generate the most recent results.
Subject Directories
Subject directories are smaller than search engines. They are organized by subject and maintained by human beings. Links available in subject directories such as Yahoo are evaluated first by information professionals before they are added to the database. Unlike search engines, subject directories may not allow searchers to use common language. In an online survey that I conducted in 1999 with a research project at Columbia University, Yahoo was found to be the most used and best rated search tool on the Internet. (Full results are available at
How to prepare to search
Before you search, you have to be realistic in your expectation. The following are some useful search techniques that can save users time and help them get the best out of their search efforts.
1. Queries should be formulated carefully by identifying keywords which describe the topic and by identifying alternative synonyms.
2. Make sure words are spelt correctly.
3. Explore more search engines and do not stick to the ones you know. Search engines are competing with each other to provide the best service.
4. Each search engine works differently. So learn about each one’s tips and tricks by reading the help files.
5. Use wildcard searches. For example, using “educat*” as a key word will generate documents about education, educator, educators, and educational. If however you use the word “education”, the other alternatives might be excluded from the results.
6. Boolean search can help broaden or narrow the search results. For example, if AND is used as an operator with two words you are likely to get documents about the two keywords rather than only one of them. AND NOT excludes undesirable words, and OR broadens a search.
Evaluating Internet Information Resources: Free information does not necessarily mean good information
Different circumstances require different sets of information evaluation criteria. Information evaluators should consider, develop and apply other sets as necessary to measure uniqueness, scope, and depth. The following are some guidelines that can be used in the evaluation process.
1. Authority
Identify the source of information and check if the resource is governmental, commercial, academic or personal.
Start with the URL. For example, if you want to download press releases from a governmental site make sure it has the organizational extension (.gov) because fake sites can look identical to the original ones. Check these two sites:
One is the official site of the White House and the other is fake and humorous
The official White House site
A fake and humorous site
If information resources are wrongly identified, searchers might cite information believed to be written by a political party while in fact the information is propaganda written by political activists from the other side.
2. Credibility
Determine the credibility of the organization or individual publishing the information; find out if they are recognized experts in the subject field and if the information has been checked by independent third party.
3. Motivation
Identify the motivation or purpose for the site and find out if the provider of the information could possibly be motivated to provide inaccurate information.
4. Currency
Special attention should be paid to date issues especially when looking for time sensitive resources such as statistics, stock quotes, and sales figures. Do not assume that the date of uploading the information represents when the information was written.