Web Finding for Scholars

Searching the Web

The Web is a gigantic database of information stored on computers around the world. It can be a good source of information once you learn how to effectively and efficiently search for information and learn to evaluate what you find.

Exploring or searching the Web can be very time-consuming, especially if you are looking for something specific. This is in part due to the sheer size of the WWW, currently estimated to contain 30 billion Web pages. It is also because the WWW is not indexed in any standard vocabulary, unlike a library catalog which assigns standardized subject descriptors to documents. However, there are over 300 special Web pages, called search tools, available to help you locate what you are looking for on the Internet. Learning HOW to use these tools will increase the return of relevant results and decrease your frustration.

Subject Directories

Subject directories involve human intervention in selecting and organizing resources, so they cover fewer resources but provide more focus and guidance for topics they cover. Use a subject directory:

Advantage: sites are reviewed by subject experts who compile the database so results are likely to be more relevant. Disadvantage: the database is comparatively small and updating frequency is relatively slow.

Subject directories come in a variety of types:

Search Engines

A search engine is a searchable database of sites that allows you to type in one or more key words, then the search engine searches a previously created database of Web pages to find and display a list of links arranged by relevancy ranking that meet your criteria. There are search engines specifically for finding people, e-mail addresses, software, sounds, images, etc. Use search engines to obtain specific information, to search for obscure subjects, or to search multifaceted topics.

Advantages of search engines: robots and spiders index sites so the database is larger and updated more frequently; most are full-text databases. Disadvantage: search engines are not exacting in the way they index databases so it makes finding information difficult unless you use advanced search techniques. Note: No humans are involved in selecting, sorting, evaluating, or organizing the sites retrieved by a search engine.

Google - http://www.google.com
Indexes 2.4 billion Web pages.

Use advanced search features to focus your search.

Google Advanced Searchs


Google Scholar
- http://scholar.google.com
It's still being developed, but it's already a very useful tool if you are looking for scholarly literature, peer-reviewed papers, books, technical reports, and the like on academic topics. Google Scholar also has advanced search limiters.

Kartoo - http://www.kartoo.com
Kartoo is one of the best examples of a visual search engine that organizes search results into topical clusters based on users' search terms. Concept clustering is a useful option for academic researchers who wish to identify issues, points of view and important sites relevant to a research topic. Teoma and Vivisimo are other search engines that cluster results.

Vivisimo - http://www.vivisimo.com
Enter a keyword and Vivismo will return results from major search engines and automatically organize the pages into categories.

Grokker - http://www.grokker.com
Enter a search term and see the results presented as a visual map. One of the library's new electronic databases, Business Source Premier, provides a visual search feature using Grokker. Check it out at http://www.washburn.edu/mabee/researchTools/elec_dbs.html#b .

HotBot - http://www.hotbot.com
HotBot provides access to Yahoo, Google and Teoma. Unlike a meta-search engine, it does not combine the results together.

Dogpile - http://www.dogpile.com
Dogpile is a popular meta-search engine that sends a search to a customizable list of search engines, directories and specialty search sites, then displays results from each search engine individually.

Specialized Search Engines

Blinkx - http://tv.blinkx.com/
Search over 4,000,000 hours of TV and viral video content at the world's largest video search engine. Requires Flash and Windows Media Player or Real Player.

Scirus - http://www.scirus.com
A science search engine with over 167 million science-specific Web pages.

Singingfish - http://www.singingfish.com
One of the world's largest streaming audio and video indexes.

The Invisible Web

A whole world of research exists beyond the reach of Google. A study by the Internet company, BrightPlanet, titled The Deep Web: Surfacing Hidden Value, estimates that there are more than 100,000 searchable databases, containing 550 billion individual documents, available on the Web. For every 1 page a search engine could theoretically reach, there are 549 more pages out there with useful information on them that cannot be reached by search engines. This material is often referred to as the "Invisible Web." The information is not really hidden or invisible. It is there, freely available and waiting to be found. The problem is that general search engines are built in such a way that they cannot go into every single database and search the information contained in each one.

Examples of Invisible Web content:

Search engines can only find content that has been indexed by their respective software programs, known as "robots," "crawlers" or "spiders." Conventional search engines cannot drill down or mine the Invisible Web. They can find the databases, but they cannot enter them and extract content. The databases online are a lot like the books in the library. They contain the relevant information, but you need to know where to look. There are tools that can access some of the content of this Invisible Web. All of these developments are part of the steady evolution of a Web that is of growing value to academic researchers.

To search for databases, use Google to search for a phrase, such as "bioethics database" or "toxicology database" or "civil war database" or "rock music encyclopedia" or try a string, such as +aviation +database +domain:gov. Other terms to try are archive and repository.

Or...use a search tool that indexes Web databases, such as Complete Planet.

CompletePlanet - http://www.completeplanet.com
A huge collection of over 70,000+ databases available on the Web.

Examples of Specialized Databases

AnimalSearch - http://www.animalsearch.net
A database for family-safe animal-related sites. You can search by group, type, and geographic regions.

Directory of Open Access Journals (DOAJ) - http://www.doaj.org
Provides no-cost access to the full text of over 1,200 journals in the science and humanities/social science.

Directory of Published Proceedings (DoPP) - http://www.interdok.com/dopp/index.cfm
To locate and procure available literature from thousands of conferences, meetings and symposia in the areas of Science/Technology, Pollution Control/Ecology, Medical/Life Sciences, and Social Sciences/Humanities.

FindArticles - http://www.findarticles.com
"The Web's largest free articles database." Indexes 5.5 million articles from over 900 publications, 1998 to present.

History Matters - http://historymatters.gmu.edu/browse/manypasts/
Contains primary documents in text, image, and audio about the experiences of ordinary Americans throughout U.S. history.

National Science Digital Library - http://nsdl.org
NSDL provides educational resources for science, technology, engineering and mathematics education.

NatureServe Explorer - http://www.natureserve.org/explorer
An online encyclopedia that provides authoritative conservation information on 55,000+ plants, animals, and ecological communities in the US and Canada, with in-depth coverage for rare and endangered species.

U.S. Census Quick Stats - http://quickfacts.census.gov/qfd/index.html
Find basic demographic, business, and geographic statistics about the 50 states.

To be an effective searcher on the Internet, one needs to be familiar with a variety of these search tools and to develop effective search techniques. Regardless of the search tool being used, the development of an effective search strategy is essential if the searcher hopes to obtain satisfactory results.

September 2006

http://www.washburn.edu/mabee/crc/courses/cm298