SEDAC
ENTRI Environmental Treaties and Resource Indicators
User's Guide to Searching with the ENTRI Search Engine

This document provides a brief introduction to the effective use of CIESIN's search engine. Basic features are described below. Advanced features are described further down the page.
Basic Features: Advanced features
WORDS AND PHRASES
BOOLEAN SEARCHES

IGNORE SPECIAL MEANINGS
PROXIMITY SEARCHES
USING PATTERNS TO SEARCH
SOUNDEX
ABOUT ()

Basic Features
SIMPLE SEARCHES: WORDS AND PHRASES
The simplest kind of search expression is a word, or a phrase. A phrase is a sequence of one or more words. Only those documents that contain the exact word or phrase provided will be selected by the query.

Examples:
mortality
human
population
cause of death
population distribution
Note that, unlike most popular search engines, phrases are not treated as a list of words, but are instead interpreted as a fragment of a sentence. The exact phrase must occur in a document if the search engine is to find it.

BOOLEAN SEARCHES
Many users will already be familiar with the concept of using Boolean expressions in queries. You can use Boolean operators to combine results from simpler queries in powerful ways. There are three Boolean operators: AND, OR, and NOT.

When you write:
word1 AND word2
you are saying that you want to locate all documents that contain both word1 and word2. Documents that contain neither word1 nor word2 will not be selected.
For example, the search expression:
death AND cause
will select only those documents that contain both the word "death" and the word "cause"
When you write:
word1 OR word2
you are saying that you want to locate all documents that contain either word1 or word2, or both. Documents that contain neither word1 nor word2 will not be selected.
For example, the search expression:
death OR mortality
will select only those documents that contain either the word "death" or the word "mortality".
When you write:
word1 NOT word2
you are saying that you want to locate all documents that contain word1but not word2. Documents that don't contain word1 will be ignored. Documents containing both word1 and word2 will be ignored.
For example, the search expression:
death NOT cause
will select only those documents that contain the word "death" but not the word "cause"
You can parenthesize Boolean expressions and combine them into ever more complicated queries.
For example:
(disease AND cause of death) NOT natural causes
would select only documents containing both the word "disease" and the phrase "cause of death", but not documents containing the phrase "natural causes".
Advanced Features

TELLING THE SEARCH ENGINE TO KEEP IT SIMPLE
As you saw in the previous section, the search engine has a small number of special words, such as AND, OR, NOT, and WITH that modify the meaning of the search expression. Expressions such as:
death and taxes

will be interpreted as a search for documents containing both the words "death" and "taxes", but not the intended phrase "death and taxes".

You can tell the search engine to  ignore the special meaning of words in the search expression by writing portions of the search expression in the form {phrase}

For example:
death {and} taxes
to be {or not} to be
{to be or not to be}water
{within} house(death {and} taxes AND health) NOT social security
The last example above matches only those documents that contain the phrase "death and taxes" as well as the word "health" but excludes documents that contain the phrase "social security".

WORDS THAT ARE NEAR EACH OTHER IN A DOCUMENT
When words or phrases occur in a document near one another, there's a good chance that they might be more related to a single topic of interest. A search that involves checking the nearness of words and phrases is called a "proximity search".

The CIESIN search engine allows for nearness of words to be described in two ways.

word1 NEAR word2
and
NEAR( (word1, word2, ...), n)
where n is some number.

In the first form above, the expression will select only those documents containing word1 and word2, and only when word1 and word2 occur within 100 words of each other somewhere in the document.

For example:
death NEAR cause
The second form is more complicated. The expression will select only those documents containing all of the words word1, word2, ..., and only when all of the words occur in a group no longer than n words in length. In other words, there must be some excerpt that can be taken out of the document, consisting of no more than n words, and that excerpt must contain all the search terms.
Consider the following example:
near((red tide,cause,sewage),50)

First, only documents that contain the phrase "red tide", and the words "cause" and "sewage" will be considered.

Consider the following scenario:
Document 1: ... red tide...(30 words)...sewage...(30 words)...cause...
Document 2:  ...cause...(20 words)...red tide...(30 words)...sewage...
With the search expression above, only the second document would match the query, because the total distance from the first word to the last word in Document 1 is 60 words, while in Document 2 the distance is 50 words.

The search engine also understands sentences and paragraphs. You can use the WITHIN operator to indicate two or more words occur in the same sentence or paragraph as follows:

(death AND cause) WITHIN SENTENCE
(death AND unnatural) WITHIN PARAGRAPH 
USING PATTERNS TO SEARCH

Words can be misspelled, can occur in difference tenses, can be pluralized, and can have other forms that make it more difficult to find matches using exact matching of words and phrases. To address these problems, the search engine supports numerous pattern matching tools to allow for more flexible searching. Here we will discuss only a few of them: wildcards, word stemming, soundex, and fuzzy searches.

Wildcard (%)

A wildcard, %, matches any  number of characters.  It is used when it is desirable to specify only a portion of a word when searching. Examples are as follows:
polluti% matches words beginning with the "polluti", such as pollution and polluting.
pol%ing matches words beginning with 'pol' and ending with 'ing'  such as polling, polluting, and politicking.
%lution% matches words containing the sequence of letters "lution", such as pollution, solution, and resolutions.
Word Stem ($)
The stem pattern finds words with the same stem form. This is useful for finding "GOING" and "WENT" from "GO", for instance. Examples:
$go matches words having the same stem as "go", including going, gone, and went.
$pollution matches word having word stem as pollution, e.g. polluting, pollute, pollutant

Soundex (!)

Soundex query finds words which sound similar.Examples:

!hog

Fuzzy (?)

The fuzzy pattern finds words with similar form. This is useful for finding mis-typed or mis-OCR'd words. The fuzzy operator is ?. Example:

?dog


A SIMPLE TOOL THAT DOES A LOT
ABOUT ()

About applies word stem, wildcards, and other patterns to find variations on the words and phrases given in the query. It uses a variety of strategies to find the most information that might be relevant to your search expression.

about(temperature)
about(global climate change in the southern hemisphere)
THE AMAZINGLY COMPLICATED FINAL EXAMPLE
In order to illustrate the flexibility that you have in defining search criteria, we offer the following very complicated but potentially useful example:
about(causes of disease that result in unnatural death)
AND ($cause near water)
AND ( (pollut% AND infect%) WITHIN SENTENCE )

CIESIN's search engine is built on the InterMedia Text Cartridge from Oracle Corporation, and supports most of the InterMedia query language. Details on the InterMedia query language can be found >here

shim shim shim
shim
CIESIN Home Page
Need HELP or information? Contact SEDAC User Services
About SEDAC  Acknowledgments
NASA Home Page
SEDAC logo

Copyright © 1997-.
The Trustees of Columbia University in the City of New York. 

Privacy, Security, Notices

shim
shim shim shim