drfindformat

 

Wiki

The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

Please help by correcting and extending the Wiki pages.

Function

Find public databases by format

Description

drfindformat searches the Data Resource Catalogue to find entries with EDAM format terms matching a query string.

Algorithm

The first search is of the EDAM ontology format namespace, using the term names and their synonynms. All child terms are automatically included in the set of matches inless the -nosubclasses qualifier is used.

The -sensitive qualifier also searches the definition strings.

The set of EDAM terms are then compared to entries in the Data Resource Catalogue, searching the 'efmt' EDAM format index.

Usage

Here is a sample session with drfindformat


% drfindformat fasta 
Find public databases by format
Data resource output file [drfindformat.drcat]: 

Go to the output files for this example

Command line arguments

Find public databases by format
Version: EMBOSS:6.5.0.0

   Standard (Mandatory) qualifiers:
  [-query]             string     List of EDAM data keywords (Any string)
  [-outfile]           outresource [*.drfindformat] Output data resource file
                                  name

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -sensitive          boolean    [N] By default, the query keywords are
                                  matched against the EDAM term names (and
                                  synonyms) only. This option also matches the
                                  keywords against the EDAM term definitions
                                  and will therefore (typically) report more
                                  matches.
   -[no]subclasses     boolean    [Y] Extend the query matches to include all
                                  terms which are specialisations (EDAM
                                  sub-classes) of the matched type.

   Associated qualifiers:

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory
   -oformat2           string     Data resource output format

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-query]
(Parameter 1)
string List of EDAM data keywords Any string  
[-outfile]
(Parameter 2)
outresource Output data resource file name Data resource entry <*>.drfindformat
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
-sensitive boolean By default, the query keywords are matched against the EDAM term names (and synonyms) only. This option also matches the keywords against the EDAM term definitions and will therefore (typically) report more matches. Boolean value Yes/No No
-[no]subclasses boolean Extend the query matches to include all terms which are specialisations (EDAM sub-classes) of the matched type. Boolean value Yes/No Yes
Associated qualifiers
"-outfile" associated outresource qualifiers
-odirectory2
-odirectory_outfile
string Output directory Any string  
-oformat2
-oformat_outfile
string Data resource output format Any string  
General qualifiers
-auto boolean Turn off prompts Boolean value Yes/No N
-stdout boolean Write first file to standard output Boolean value Yes/No N
-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N
-options boolean Prompt for standard and additional values Boolean value Yes/No N
-debug boolean Write debug output to program.dbg Boolean value Yes/No N
-verbose boolean Report some/full command line options Boolean value Yes/No Y
-help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N
-warning boolean Report warnings Boolean value Yes/No Y
-error boolean Report errors Boolean value Yes/No Y
-fatal boolean Report fatal errors Boolean value Yes/No Y
-die boolean Report dying program messages Boolean value Yes/No Y
-version boolean Report version number and exit Boolean value Yes/No N

Input file format

None.

Output file format

The output is a standard EMBOSS resource file.

The results can be output in one of several styles by using the command-line qualifier -oformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: drcat, basic, wsbasic, list.

See: http://emboss.sf.net/docs/themes/ResourceFormats.html for further information on resource formats.

Output files for usage example

File: drfindformat.drcat

ID      dbEST
Name    dbEST database of EST sequences
Desc    dbEST is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of organisms.
URL     http://www.ncbi.nlm.nih.gov/dbEST/
Cat     Not available
Taxon   1 | all
EDAMtpc 0655 | mRNA, EST or cDNA
EDAMdat 0849 | Sequence record
EDAMid  2314 | GI number
EDAMid  1105 | dbEST accession
EDAMfmt 2310 | FASTA-HTML
EDAMfmt 2532 | GenBank-HTML
EDAMfmt 2331 | HTML
Xref    SP_FT | None
Query    Sequence record | GenBank-HTML | dbEST accession | http://www.ncbi.nlm.nih.gov/nucest/%s?report=genbank
Query    Sequence record | HTML {est} | dbEST accession | http://www.ncbi.nlm.nih.gov/nucest/%s?report=est
Query    Sequence record | HTML {docsum} | dbEST accession | http://www.ncbi.nlm.nih.gov/nucest/%s?report=docsum
Query    Sequence record | FASTA-HTML | dbEST accession | http://www.ncbi.nlm.nih.gov/nucest/%s?report=fasta
Query    Sequence record | GenBank-HTML | dbEST accession | http://www.ncbi.nlm.nih.gov/nucest/%s?report=genbank
Query    Sequence record | GenBank-HTML | GI number | http://www.ncbi.nlm.nih.gov/nucest/%s?report=genbank
Query    Sequence record | HTML {est} | GI number | http://www.ncbi.nlm.nih.gov/nucest/%s?report=est
Query    Sequence record | HTML {docsum} | GI number | http://www.ncbi.nlm.nih.gov/nucest/%s?report=docsum
Query    Sequence record | FASTA-HTML | GI number | http://www.ncbi.nlm.nih.gov/nucest/%s?report=fasta
Query    Sequence record | GenBank-HTML | GI number | http://www.ncbi.nlm.nih.gov/nucest/%s?report=genbank
Example dbEST accession | f12345
Example GI number | 706694

ID      REDIdb
Name    RNA editing database (REDIdb)
Desc    Sequences post-transcriptionally modified by RNA editing from primary databases and literature. All editing information such as substitutions, insertions and deletions occurring in a wide range of organisms is stored.
URL     http://biologia.unical.it/py_script/overview.html
Taxon   1 | all
EDAMtpc 0114 | Gene structure and RNA splicing
EDAMdat 2043 | Sequence record lite
EDAMdat 1383 | Sequence alignment (nucleic acid)
EDAMid  2781 | REDIdb ID
EDAMfmt 2310 | FASTA-HTML
EDAMfmt 2331 | HTML
Query    Sequence record lite {REDIdb entry} | HTML | REDIdb ID | http://biologia.unical.it/py_script/cgi-bin/retrieve.py?query=%s
Query    Sequence record lite {REDIdb fasta} | FASTA-HTML | REDIdb ID | http://biologia.unical.it/py_script/cgi-bin/fasta.py?query=%s
Query    Sequence alignment (nucleic acid) {REDIdb overview} | HTML | REDIdb ID | http://biologia.unical.it/py_script/cgi-bin/display.py?query=%s
Query    Sequence alignment (nucleic acid) {REDIdb alignment} | HTML | REDIdb ID | http://biologia.unical.it/py_script/cgi-bin/align.py?query=%s
Example REDIdb ID  | EDI_000000002

ID      UniProtKB_Swiss-Prot
IDalt   SwissProt
Name    Universal protein resource knowledge base / Swiss-Prot
Desc    Section of the UniProt knowledgebase, containing annotated records, which include curator-evaluated computational analysis, as well as, information extracted from the literature
URL     http://www.uniprot.org
Taxon   1 | all


  [Part of this file has been deleted for brevity]

Cat     Other
Taxon   1 | all
EDAMtpc 3052 | Sequence clusters and classification
EDAMtpc 0114 | Gene structure and RNA splicing
EDAMdat 1245 | Sequence cluster (protein)
EDAMid  2347 | Sequence cluster ID (UniRef100)
EDAMid  2348 | Sequence cluster ID (UniRef90)
EDAMid  2349 | Sequence cluster ID (UniRef50)
EDAMfmt 1929 | FASTA format
EDAMfmt 2376 | RDF
EDAMfmt 2331 | HTML
EDAMfmt 2332 | XML
Xref    SP_implicit | UniProt accession
Query    Sequence cluster (protein) | HTML | Sequence cluster ID (UniRef100) | http://www.uniprot.org/uniref/UniRef100_%s
Query    Sequence cluster (protein) | XML | Sequence cluster ID (UniRef100) | http://www.uniprot.org/uniref/UniRef100_%s.xml
Query    Sequence cluster (protein) | RDF | Sequence cluster ID (UniRef100) | http://www.uniprot.org/uniref/UniRef100_%s.rdf
Query    Sequence cluster (protein) | FASTA format | Sequence cluster ID (UniRef100) | http://www.uniprot.org/uniref/UniRef100_%s.fasta
Query    Sequence cluster (protein) | HTML | Sequence cluster ID (UniRef90) | http://www.uniprot.org/uniref/UniRef90_%s
Query    Sequence cluster (protein) | XML | Sequence cluster ID (UniRef90) | http://www.uniprot.org/uniref/UniRef90_%s.xml
Query    Sequence cluster (protein) | RDF | Sequence cluster ID (UniRef90) | http://www.uniprot.org/uniref/UniRef90_%s.rdf
Query    Sequence cluster (protein) | FASTA format | Sequence cluster ID (UniRef90) | http://www.uniprot.org/uniref/UniRef90_%s.fasta
Query    Sequence cluster (protein) | HTML | Sequence cluster ID (UniRef50) | http://www.uniprot.org/uniref/UniRef50_%s
Query    Sequence cluster (protein) | XML | Sequence cluster ID (UniRef50) | http://www.uniprot.org/uniref/UniRef50_%s.xml
Query    Sequence cluster (protein) | RDF | Sequence cluster ID (UniRef50) | http://www.uniprot.org/uniref/UniRef50_%s.rdf
Query    Sequence cluster (protein) | FASTA format | Sequence cluster ID (UniRef50) | http://www.uniprot.org/uniref/UniRef50_%s.fasta
Example Sequence cluster ID (UniRef100) | P02930
Example Sequence cluster ID (UniRef90) | P02930
Example Sequence cluster ID (UniRef50) | P02930

ID      UniProt
IDalt   UniProtKB
Name    Universal protein resource
Desc    A comprehensive, high-quality and freely accessible resource of protein sequence and functional information.
URL     http://www.uniprot.org/
Taxon   1 | all
EDAMtpc 0639 | Protein sequence analysis
EDAMdat 2201 | Sequence record full
EDAMid  3021 | UniProt accession
EDAMfmt 1929 | FASTA format
EDAMfmt 2376 | RDF
EDAMfmt 2188 | uniprot
EDAMfmt 2331 | HTML
EDAMfmt 2332 | XML
Xref    SP_FT | None
Query    Sequence record full | HTML | UniProt accession | http://www.uniprot.org/uniprot/%s
Query    Sequence record full | uniprot | UniProt accession | http://www.uniprot.org/uniprot/%s.txt
Query    Sequence record full | XML | UniProt accession | http://www.uniprot.org/uniprot/%s.xml
Query    Sequence record full | RDF | UniProt accession | http://www.uniprot.org/uniprot/%s.rdf
Query    Sequence record full | FASTA format | UniProt accession | http://www.uniprot.org/uniprot/%s.fasta
Example UniProt accession | P12345

Data files

The Data Resource Catalogue is included in EMBOSS as local database drcat. The EDAM Ontology is included in EMBOSS as local database edam.

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program name Description
drfinddata Find public databases by data type
drfindid Find public databases by identifier
drfindresource Find public databases by resource
drget Get data resource entries
drtext Get data resource entries complete text
edamdef Find EDAM ontology terms by definition
edamhasinput Find EDAM ontology terms by has_input relation
edamhasoutput Find EDAM ontology terms by has_output relation
edamisformat Find EDAM ontology terms by is_format_of relation
edamisid Find EDAM ontology terms by is_identifier_of relation
edamname Find EDAM ontology terms by name
wossdata Find programs by EDAM data
wossinput Find programs by EDAM input data
wossoperation Find programs by EDAM operation
wossoutput Find programs by EDAM output data
wossparam Find programs by EDAM parameter
wosstopic Find programs by EDAM topic

Author(s)

Peter Rice
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None