PANGAEA search

From PANGAEA Wiki
(Redirected from Data warehouse)
Jump to navigation Jump to search

Basic search

Search field on PANGAEA home

The most convenient and fastest way to find data is using the search engine on PANGAEA home. Each predefined dataset in its granularity as defined by the PI can be found by keywords and any expressions matching the data set description. Search is supported by an autocomplete functionality. Keywords can be combined to create Boolean expressions using a syntax identical to those used by search engines.

As a result of a query the titles of datasets are listed, linking to the full meta-description.

By prefixing keywords (using the format "prefix:keyword") with a tag name from the PANGAEA XML schema the search can be performed inside specific parts of the schema. Exceptions from that are schema parts like "attribute" and "reference" which are not searchable this way.

Filtering of search results

Filtering of search results

The results of search can be filtered using facets in the left panel:

Filtering of search results

Additionally, the search results can be filtered by:

  • Geographical coordinates and
  • Date

Advanced search

Choosing search terms
When choosing search terms keep in mind:

  • Try the obvious first. If you're looking for information on the grain size of sediment, enter "grain size" rather than "sediments"
  • Use words likely to appear on a site with the information you want. "Holocene ice Lazarev" gets better results than "Holocene ice extension from the Lazarev Sea shelf".

Capitalization
PANGAEA searches are NOT case sensitive. All letters, regardless of how you type them, will be understood as lower case. For example, searches for "marine geology", "Marine Geology", and "mArInE gEoLoGy" will all return the same results.

Using query operators
PANGAEA Search uses per default the "AND" logic to combine the search terms. This means that all entered terms must be in the searched documents. To find documents that contain either one or another term (or both) concatenate by "OR". For example, enter "falconensis OR bulloides" to get all datasets that contain one of the terms.

The use of "AND" between keywords is optional. If you want to combine "AND" and "OR", use brackets - for example: "Globigerina AND (falconensis OR bulloides)".

Excluding searches by using "-"
To exclude certain keywords add a minus sign ("-") immediately before the search term you want to avoid (be sure to include a space before the minus sign).

Approximate searches
If you do not exactly know the spelling of a word, you may want to search not only for a particular keyword, but also for variants in spelling. Indicate a search for all by placing the tilde sign ("~") immediately in front of the keyword.

Wildcards
Wildcards allow a substitution of unknown characters in the item used for searching. The following table describes the wildcard characters and their attributes:

Wildcard Function Syntax Locates
? Specifies one alphanumeric character. m?ller "müller", "miller", and "muller"
* Specifies zero or more of any alphanumeric character. You should not use the asterisk to specify the first character of a wildcard-character string (slow search). corp* "corporate", "corporation", "corporal", and "corpulent"

Phrase searches
Search for complete phrases by enclosing them in quotation marks. Words enclosed in double quotes ("like this") will appear together in all results exactly as you have entered them. Phrase searches are especially useful when searching for phrases or full names.

Searches in specific fields
PANGAEA XML schema can be used for specific queries using the PANGAEA search engine. Search for keywords in specific fields by putting a the field name with a ':' immediately in front of the term you want to match. Exceptions from that are schema parts like "attribute" and "reference" which are not searchable this way, instead references can be searched using the reference relation type (e.g., "supplementto" or "relatedto"). Inside attributes can also be searched by using their name as field in front of the term. The most used field names are:

Field name Function
project: Search for keywords in projects
project:label: Matches a project label
author: Search for authors of datasets or assigned references
citation:author: Search for authors of datasets only in the citation
pi: Search for datasets with Principal Investigator (PI)
citation: Search for keywords in the citation
relatedto: Search for keywords in assigned "Related to" references.
supplementto: Search for keywords in assigned "Supplement to" references.
year: Search for datasets or assigned references published in a specific year
citation:year: Search for datasets only published in a specific year
parameter: Search for keywords in parameter names
method: Search for keywords in method names
event:label: Search for event labels
basis: Search for basis eg. ship or research station
campaign: Search for reasearch campaigns
O2ARegistryURI:* Search for datasets with a O2A Registry URI (link to registry.awi.de). This is an example of a special case for attributes and uses plain wildcard to find all datasets where the specific field is actually used.

Query examples

marine Finds datasets that contain "marine".
marine geology Finds datasets that contain both "marine" and "geology"
"marine geology" Placing quotation marks around any series of words turns them into a phrase and tells PANGAEA Search that you are only interested in data sets that have the words in this specific order.
marine geology -organic Finds datasets that contains both "marine" and "geology" but not "organic"
Globigerina AND (falconensis OR bulloides) Finds datasets that contain "Globigerina" and either "falconensis" or "bulloides"
~Neogloboqadrina Finds datasets with "Neogloboquadrina" regardless of your spelling mistake
project:label:IMAGES Finds datasets that belong to project "IMAGES"
citation:author:Mackensen Finds datasets of author "Mackensen"
m?ller Finds "Müller", "Muller" or "Miller". Use this to specify characters you cannot type in with your keyboard

PANGAEA Search Results
The results page shows a list of abbreviated dataset descriptions (thumbs) including the links to the full dataset description and links to download the dataset in either html or text format. The score gives an estimate on the relevance of the search result: a higher score means that the entered words can be found more often and closer together.

Datasets are listed with ordinal numbers and are shown in ten hits per page. Above and below the listing, one may click the page number or the NEXT (or PREV) link to see more results.


Data warehouse

Entering data warehouse

The data warehouse is a tool to combine data from different PANGAEA datasets in one file. With a login the < Data warehouse > button is visible after submitting a query. The button links to a page which allows to configure geocodes and parameters for an export table. Parameters are listed in order by a score which depends on the query.

Example:

Data set of a planktonic foraminifera extracted with the data warehouse. Map plotted with ODV, interpolated with the diva algorithm.

The following example will produce a distribution map of a plankton shell in the world ocean sediments.

  • go to http://www.pangaea.de
  • login (or sign up for an account)
  • search for bulloides (species name of a planktonic foraminifera)
  • click on < Data warehouse > (a button on the upper right of the page)
  • choose:
    • Latitude
    • Longitude
    • Depth, sediment [m]
    • Globigerina bulloides [%]
  • < Start Data Warehouse Query >
  • find a file bulloides.zip on your desktop and extract it. The file contains a list of all dataset citations in various formats (text only, RIS/Endnote, BIBTEX) and the datafile bulloides.tab.
  • start Pan2Applic (needs to be installed first)
  • drag'n drop bulloides.tab to the empty window
  • choose Convert/Ocean Data View (ODV needs to be installed first)

Important: It is required to cite all datasets which are referenced in the data file and the citation list!

External services

  • pangaeapy: a Python module to download and analyse metadata as well as data from tabular PANGAEA datasets
  • pangaear: an R client to interact with the PANGAEA