LESSON
4: FINDING WEBSITES: USING SEARCH ENGINES
Tutorial:
Using ANDs & ORs In
Google Searches
* To recognize the 3 basic parts of a URL.
* To recognize
domain names.
* To understand the difference between subject
directories and search engines and know the appropriate ways to use them in
research.
* To understand
the importance of selective directories.
* To know the difference between general search
engines and site-specific search engines.
* To be aware of the common features of web
search engines.
LESSON
4 TABLE OF CONTENTS:
1. Preface
2. Websites, URL’s, and Domain Names
3. General Web Surfing and Types of Web Pages & Web Documents
4. Subject Directories and Selective Directories
5. General Search Engines and Site-specific
Search Engines
6. Common Features of General Search Engines
7. Key Points to Remember
1. PREFACE
Thus far in the course, we have
concentrated primarily on finding books and periodical articles. We have
used online catalogs (to find books) and web databases (to find periodical
articles). In this lesson, we expand our search for information sources
to include information found in websites.
2. WEBSITES,
URL’s, AND DOMAIN NAMES
You may recall from Lesson 1 our definition of a
website:
DEFINITION: WEBSITE
A website is a coherent collection of Web pages that are
linked together and reside on that part of the Internet known as the World Wide
Web. Millions of websites exist,
offering vast amounts of information of varying credibility and worth.
Every website (and every Web page) has a unique address known as a URL
(Uniform Resource Locator) which identifies where it is located on the
Web. For example, here is the URL for Skyline Library’s home page:
http://skylinecollege.edu/library/index.html
URLs have
three basic parts: the protocol, the server name and the resource ID. These
parts provide "clues" to where a Web page originates and who might be
responsible for the information at that page or site. Let's look at each part:
·
PROTOCOL: appears at the start of the URL
before the double slash and identifies the method (set of rules) by which the
resource is transmitted. All Web pages use HyperText
Transfer Protocol (HTTP). Thus, all Web URL's begin with http://.
·
SERVER NAME: appears between the double slash
(//) and the first single slash (/)
The server name for the Skyline Library URL is: skylinecollege.edu
The server name identifies the computer on which the resource is found. (Computers that store and "serve up" Web pages are called servers.) This part of the URL commonly identifies which organization or company is either directly responsible for the information or simply providing the computer space where the information is stored.
The server name always ends with a dot and a three-letter or two-letter extension called the domain name (sometimes called the domain type). The domain is important because it usually identifies the type of organization that created or sponsored the resource. Sometimes it indicates the country where the server is located. The most common domain names are:
.com for company or commercial sites
.org for non-profit organization sites
.edu for educational sites (most commonly four-year universities)
.gov for government sites
.net for Internet service providers or other types of networks
.mil for a military body
If the domain name is two
letters, it identifies a country, e.g. .us for the
· RESOURCE ID: everything after the first single
slash (/)
The resource ID for the Skyline Library URL is: library/index.html
The resource ID contains
directories and subdirectories, thereby giving you the exact location of the
document on the server. Following the last slash (/), you are given the
file name for the specific page. (The file name for the Skyline Library
homepage is: index.html.) The file
name ends with a three or four letter designation that specifies the file type
(e.g., .htm or .html for a standard Web page, .jpg or
.gif for common graphic files).
3. GENERAL WEB SURFING AND TYPES OF WEB PAGES & WEB DOCUMENTS
At some
point in your research -- usually after searching the Deep Web using Web databases and
online catalogs -- you may want to look for information and opinion found on
free websites within the Visible Web. This is often
referred to as general Web surfing. Be cautious, however,
when searching the Visible Web because no quality control is in effect
here. You may find highly accurate and reliable information at one
website, and complete falsehoods at another.
When you do
look for information from web pages or other documents on the Internet, it is
very useful to be aware of the different types of pages and documents that you
may find. Below is a description of some
common types of web pages and web documents:
-
Web articles
Articles that were originally published in print
magazines, journals or newspapers are accessible through online databases; but
they are also often accessible on websites and they can look much like regular
web pages. Additionally, many articles
are written only for websites that are organized as online magazines or
journals. These types of online articles
can be even more difficult to differentiate from web pages, and in some cases
the distinctions between web articles and web pages are extremely blurred.
–Examples:
Article from a free online database (originally published in a
print journal) ;
Article from a subscription database only accessible from
off-campus with a PLS library card (originally published in a print magazine)
;
Article from a website (only published online, never in print)
(Example 1)
Article
from a website (only published online, never in print) (Example 2)
Article from a website (only published online, never in print)
(Example 3, press release)
-
Blog posts
A blog (short for "web log") is a type of
web page that serves as a publicly accessible personal journal (or log) for an
individual. Typically updated daily, blogs often reflect the personality of the
author. Blog software usually has an archive of old blog postings. Many blogs
can be searched for terms in the archive. Blogs have become a vibrant,
fast-growing medium for communication in professional, political, news, trendy,
and other specialized web communities.
– Example of a
weblog
-
Wiki pages
A term meaning "quick" in Hawaiian, that
is used for technology that gathers in one place a number of web pages focused
on a theme, project, or collaboration. Wikis are generally used when users or
group members are invited to develop, contribute, and update the content of the
wiki. Wikis can be passworded in various ways to
control or allow contributions. The most famous wiki is the Wikipedia. - Example of a wiki
-
Group discussion threads
Discussion forums one can participate in, share
ideas with, and form community. Most are free and some are open to new members.
Yahoo Groups and Google Groups are both
popular. Google Groups includes the former Usenet Newsgroups.
Blogs (see “Blog posts” above) are replacing some of the need for
this type of community sharing and information exchange. – Example of a discussion thread
Web document formats:
-
PDFs
Abbreviation for Portable Document Format, a file
format developed by Adobe Systems, that is used to
capture almost any kind of document with the formatting in the original.
Viewing a PDF file requires Acrobat Reader, which is built into most browsers and can be downloaded
free from Adobe. - Example of a .pdf document
-
Word documents (.doc files)
Microsoft Word documents accessible on the Internet
– Example of
a Word document
-
Excel documents (.xls files)
Microsoft Excel spreadsheet files accessible on the
Internet - Example of an Excel document
-
PowerPoint files
Microsoft Powerpoint
slideshow files accessible on the Internet – Example of a PowerPoint slideshow file
4.
SUBJECT DIRECTORIES AND SELECTIVE DIRECTORIES
Two types of Web search
tools are available to help you find websites and/or web pages: subject
directories and search engines. Let's examine each
separately.
Web subject directories (such as InfoMine, Librarians’ Internet Index, Google Directory or Yahoo Directory) provide lists of websites
arranged by subject category. The websites included in a subject
directory are chosen by people known as indexers. Each site
in the directory is listed under one or more subject categories, as determined
by the directory's indexers. A brief description of each site listed is
usually included.
Directories are often a good
place to start when you’re looking for information on relatively general
subjects or if you want an overview of what’s available on the
Web on a given subject.
To find websites on general subjects using subject directories:
*
browse through the directory’s list of subject categories, OR
* do a keyword search using terms that describe the overall general
subject under which your topic falls (click
here for an example, use the Back button to return here)
There is wide variation in the number and quality of sites included in
different Web subject directories. Some large directories try to be as
comprehensive as possible, with very extensive listings. However, one
disadvantage of these large directories is that they usually do little
evaluation of the quality of the sites they list, thus making them somewhat
less effective at finding the best sites in a particular subject area.
For that reason, you are
wise to use a subject directory that only lists sites known to be high quality.
These directories are known as selective directories. In addition
to indexing credible websites, selective directories often provide links to
leading sites in many subject areas, which in turn, provide links to more
specific high-quality documents on a particular topic within the broader subject
area.
Recommended selective
directories:
InfoMine (http://infomine.ucr.edu/)
-- academic resources
Internet Public Library (http://www.ipl.org/)
Librarians' Internet Index (http://www.lii.org)
-- high-quality resources on a range of general subjects
AcademicInfo (http://www.academicinfo.net) --
scholarly sites on a wide range of subjects
Scout Report Archives (http://scout.cs.wisc.edu/archives/)
-- academic resources
5.
GENERAL SEARCH ENGINES AND SITE-SPECIFIC SEARCH ENGINES
Web search engines (such as Google,
Yahoo search, and many others)
allow you to search through millions of websites using your own keyword(s).
Websites gathered and indexed by search engines are not selected,
organized or previewed by humans. Instead, their collection of websites is
created entirely by computer programs called spiders (also known
as robots) that continuously scan the Internet looking for sites
to add to their index.
Since the collection of
websites indexed by search engines are huge (numbering in the millions) and
often have no subject organization at all, it is very important to think
carefully about what search words to use and be aware of the various search
features available before performing a search. Always look for the "Search
Help," "Search Tips," or other pages that explain the features of
the search engine you're using. Remember that Web search engines, unlike
library online catalogs, do not use a common set of subject headings.
Therefore, to use search engines effectively, it is usually best to use very
precise search words or phrases, or combine several search terms using Boolean
logic (as discussed in Lesson 3).
Search engines should be used when you have a focused research question
in mind or when you’re looking for a specific item of information,
such as a known document (e.g. the U.S. Declaration of Independence),
image, etc. or a specific web page. They're not recommended for finding sites
on broad subjects, such as "astronomy" or "history." As
discussed earlier, Web subject directories should be used to find sites on
general subjects.
Finally, there is a special type of search engine you should be aware of.
Sometimes, websites offer their own internal search engine that allows you to
search just that website’s collection of information. These are
known as site-specific search engines. Click HERE
to see an example of a website that contains a site-specific search engine.
6.
COMMON FEATURES OF GENERAL SEARCH ENGINES
Listed below are
features common to many search engines, with a particular focus on Google
because Google has one of the largest databases of web pages and
because its PageRank™ system is
considered to be among the most effective at identifying high quality
pages. Keep in mind that these features may not work the same --
or even be available -- on every search engine.
Note: The search examples shown
below (in bold and italics) are links to actual Google searches for those examples.
Click on any of the examples to see the search in Google.
AND: Google and many other search
engines assume that a typed space equals AND.
For example, immigration economy
would automatically be understood as immigration AND
economy. Many search engines
use the + sign (often called the "require" sign) in front of words
that must be included in the search results. For example, +immigration
+economy may be used instead of immigration
AND economy. Google and many other search
engines that allow the use of AND and OR require
that they be capitalized. (Thus, it's a good idea to always capitalize
these connectors if you use them.)
OR: the OR should be used between
synonymous, equivalent terms, or variant spellings or endings that should be
included in a search for the same idea. ORs should
usually be used in combination with ANDs or other methods of limiting your search.
Synonyms: Google will search for words
with similar meanings using the ~ symbol. For example, the search: ~food would find web pages with the words: recipes,
nutrition or cooking. This feature does not always work
effectively and should be used cautiously.
Phrase
searching: by
putting a phrase in quotation marks, documents will be retrieved that contain
that exact phrase. For example: "illegal
immigration" will retrieve documents containing those two
words next to each other as a phrase.
Truncation not available: Google (like most major search engines)
does not provide truncation (a symbol--usually an asterisk--that allows you
to search for all variations of a common root).
Instead of truncation, use the OR (capitalized) between words with
variant endings.
Organizing
precise searches: When doing a
search in Google or any other Web search engine, be sure to use at least one
search term from each of the concepts for your research question or topic. You
may add parentheses and ANDs to make it easier to see
& organize your concepts, but they are not necessary. For example:
Research
question: “How does
illegal immigration affect the
Google search: ("illegal
immigration" OR "illegal aliens") AND (economy)
AND ("United States" OR U.S.)
Relevance ranking: a programming method that attempts
to rank search results in order to place those pages that are most relevant to
your search and/or are the highest quality pages at or near the top of the
results list. Search engines' ranking systems are based on various
factors. Documents returned from a search can be ranked on such factors as:
7.
KEY POINTS TO REMEMBER
·
A
website
is a coherent collection of Web pages linked together.
·
URL’s have 3 basic parts: the protocol,
the server
name, and the resource ID.
·
The
server name always ends with a dot and a 3-letter or 2-letter extension called
the domain
name (or domain type). The domain
name is important because it usually identifies the type of organization that
created or sponsored the website.
·
Looking
for information and opinion found on free websites within the Visible
Web (as opposed to the Deep Web) is known as general
Web surfing. Be cautious, however, because surfing can uncover highly
credible sites as well as sites containing very questionable or false
information.
·
Two
types of Web search tools are available to help you find websites and/or web
pages: subject directories and search engines.
·
Web
subject directories
provide lists of websites arranged by subject category. The websites
included in a subject directory are selected, organized, and previewed by human
beings. They’re often a good place to start when you’re looking for information on
relatively general subjects or if you want an overview of what is available on
the Web on a given subject.
·
Selective
directories,
such as the Librarians'
Internet Index, are a type of subject directory that only list sites
recognized to be high in academic quality.
·
Web
search engines (such
as Google,
Yahoo! Search, and many others)
allow you to search through millions of websites using your own keyword(s).
Computer programs known as spiders collect and index the
websites found with a search engine. It is appropriate to use search engines
when you have a focused research question in mind rather than a broad subject.
· Sometimes, websites offer their own internal search engine that allows you to search just that website’s collection of information. These are known as site-specific search engines.
Go to
Tutorial: Using ANDs & ORs In Google
Searches
Optional:
Additional tutorials on using Web search engines:
- UC
Berkeley Tutorial on "The BEST Search Engines";
- "Googling to the Max" exercises (pdf)
last revised: 4-23-08 by
These materials may be used for educational purposes. Please inform and credit the author and cite
the source as: LSCI 100: Introduction to Information Research. All
commercial rights are reserved. Send comments or suggestions to: