
Searching the Web
The Web is a gigantic database of information stored on computers around the world. It can be a good source of information once you learn how to effectively and efficiently search for information and learn to evaluate what you find.
Exploring or searching the Web can be very time-consuming, especially if you are looking for something specific. This is in part due to the sheer size of the WWW, currently estimated to contain 30 billion Web pages. It is also because the WWW is not indexed in any standard vocabulary, unlike a library catalog which assigns standardized subject descriptors to documents. However, there are over 300 special Web pages, called search tools, available to help you locate what you are looking for on the Internet. Learning HOW to use these tools will increase the return of relevant results and decrease your frustration.
Subject Directories
Subject directories involve human intervention in selecting and organizing resources,
so they cover fewer resources but provide more focus and guidance for topics
they cover. Use a subject directory:
Subject directories come in a variety of types:
Librarians' Internet Index - http://lii.org
A searchable, annotated subject directory of more than 11,000 Internet resources
selected and evaluated by librarians. For invisible web resources, look
for sites labeled or categorized as "databases."
Open Directory - http://dmoz.org/
The largest human edited directory of the Web. Well-chosen and annotated
sources.
Yahoo - http://www.yahoo.com
Launched in 1994, Yahoo is the web's oldest "directory."
InfoMine - http://infomine.ucr.edu/
A scholarly resource collection that includes more than 120,000 sites, grouped
into 9 annotated and indexed categories. It contains useful Internet resources
such as databases, electronic journals, electronic books, bulletin boards,
mailing lists, online library card catalogs, articles, directories of researchers,
and many other types of information.
Search Engines
A search engine is a searchable database of sites that allows you to type in one or more key words, then the search engine searches a previously created database of Web pages to find and display a list of links arranged by relevancy ranking that meet your criteria. There are search engines specifically for finding people, e-mail addresses, software, sounds, images, etc. Use search engines to obtain specific information, to search for obscure subjects, or to search multifaceted topics.
Advantages of search engines: robots and spiders index sites so the database
is larger and updated more frequently; most are full-text databases. Disadvantage:
search engines are not exacting in the way they index databases so it makes
finding information difficult unless you use advanced search techniques. Note:
No humans are involved in selecting, sorting, evaluating, or organizing the
sites retrieved by a search engine.
Google - http://www.google.com
Indexes 2.4 billion Web pages.
Use advanced search features to focus your search.
s
Google Scholar - http://scholar.google.com
It's still being developed, but it's already a very useful tool if you are looking
for scholarly literature, peer-reviewed papers, books, technical reports, and
the like on academic topics. Google Scholar also has advanced search limiters.
Kartoo - http://www.kartoo.com
Kartoo is one of the best examples of a visual search engine that organizes
search results into topical clusters based on users' search terms. Concept clustering
is a useful option for academic researchers who wish to identify issues, points
of view and important sites relevant to a research topic. Teoma and Vivisimo
are other search engines that cluster results.
Vivisimo - http://www.vivisimo.com
Enter a keyword and Vivismo will return results from major search engines and
automatically organize the pages into categories.
Grokker - http://www.grokker.com
Enter a search term and see the results presented as a visual map. One of the
library's new electronic databases, Business Source Premier, provides a visual
search feature using Grokker. Check it out at http://www.washburn.edu/mabee/researchTools/elec_dbs.html#b
.
HotBot - http://www.hotbot.com
HotBot provides access to Yahoo, Google and Teoma. Unlike a meta-search engine,
it does not combine the results together.
Dogpile - http://www.dogpile.com
Dogpile is a popular meta-search engine that sends a search to a customizable
list of search engines, directories and specialty search sites, then displays
results from each search engine individually.
Specialized Search Engines
Blinkx - http://tv.blinkx.com/
Search over 4,000,000 hours of TV and viral video content at the world's largest
video search engine. Requires Flash and Windows Media Player or Real Player.
Scirus - http://www.scirus.com
A science search engine with over 167 million science-specific Web pages.
Singingfish - http://www.singingfish.com
One of the world's largest streaming audio and video indexes.
The Invisible Web
A whole world of research exists beyond the reach of Google. A study by the Internet company, BrightPlanet, titled The Deep Web: Surfacing Hidden Value, estimates that there are more than 100,000 searchable databases, containing 550 billion individual documents, available on the Web. For every 1 page a search engine could theoretically reach, there are 549 more pages out there with useful information on them that cannot be reached by search engines. This material is often referred to as the "Invisible Web." The information is not really hidden or invisible. It is there, freely available and waiting to be found. The problem is that general search engines are built in such a way that they cannot go into every single database and search the information contained in each one.
Examples of Invisible Web content:
Search engines can only find content that has been indexed by their respective software programs, known as "robots," "crawlers" or "spiders." Conventional search engines cannot drill down or mine the Invisible Web. They can find the databases, but they cannot enter them and extract content. The databases online are a lot like the books in the library. They contain the relevant information, but you need to know where to look. There are tools that can access some of the content of this Invisible Web. All of these developments are part of the steady evolution of a Web that is of growing value to academic researchers.
To search for databases, use Google to search for a phrase, such as "bioethics database" or "toxicology database" or "civil war database" or "rock music encyclopedia" or try a string, such as +aviation +database +domain:gov. Other terms to try are archive and repository.
Or...use a search tool that indexes Web databases, such as Complete Planet.
CompletePlanet - http://www.completeplanet.com
A huge collection of over 70,000+ databases available on the Web.
Examples of Specialized Databases
AnimalSearch - http://www.animalsearch.net
A database for family-safe animal-related sites. You can search by group, type,
and geographic regions.
Directory of Open Access Journals (DOAJ) - http://www.doaj.org
Provides no-cost access to the full text of over 1,200 journals in the science
and humanities/social science.
Directory of Published Proceedings (DoPP) - http://www.interdok.com/dopp/index.cfm
To locate and procure available literature from thousands of conferences, meetings
and symposia in the areas of Science/Technology, Pollution Control/Ecology,
Medical/Life Sciences, and Social Sciences/Humanities.
FindArticles - http://www.findarticles.com
"The Web's largest free articles database." Indexes 5.5 million articles
from over 900 publications, 1998 to present.
History Matters - http://historymatters.gmu.edu/browse/manypasts/
Contains primary documents in text, image, and audio about the experiences of
ordinary Americans throughout U.S. history.
National Science Digital Library - http://nsdl.org
NSDL provides educational resources for science, technology, engineering and
mathematics education.
NatureServe Explorer - http://www.natureserve.org/explorer
An online encyclopedia that provides authoritative conservation information
on 55,000+ plants, animals, and ecological communities in the US and Canada,
with in-depth coverage for rare and endangered species.
U.S. Census Quick Stats - http://quickfacts.census.gov/qfd/index.html
Find basic demographic, business, and geographic statistics about the 50 states.
To be an effective searcher on the Internet, one needs to be familiar with a variety of these search tools and to develop effective search techniques. Regardless of the search tool being used, the development of an effective search strategy is essential if the searcher hopes to obtain satisfactory results.
September 2006
http://www.washburn.edu/mabee/crc/courses/cm298