PDBTOSP documentation


 

CONTENTS

1.0 SUMMARY
2.0 INPUTS & OUTPUTS
3.0 INPUT FILE FORMAT
4.0 OUTPUT FILE FORMAT
5.0 DATA FILES
6.0 USAGE
7.0 KNOWN BUGS & WARNINGS
8.0 NOTES
9.0 DESCRIPTION
10.0 ALGORITHM
11.0 RELATED APPLICATIONS
12.0 DIAGNOSTIC ERROR MESSAGES
13.0 AUTHORS
14.0 REFERENCES



1.0 SUMMARY

Convert raw swissprot:PDB equivalence file to EMBL-like format. Convert swissprot:PDB codes file to EMBL-like format


2.0 INPUTS & OUTPUTS

PDBTOSP parses the swissprot:PDB equivalence table available at http://www.expasy.ch/cgi-bin/lists?pdbtosp.txt and writes the data out in EMBL-like format. The raw (input) file can be obtained by doing a "save as ... text format" from the web page. No changes are made to the data other than changing the format in which it is held. The input and output files are specified by the user.


3.0 INPUT FILE FORMAT

An excerpt from the raw swissprot:PDB equivalence file is shown (Figure 1).

Input files for usage example

File: pdbtosp.txt

  ------------------------------------------------------------------------
   ExPASy Home page   Site Map    Search ExPASy   Contact us    SWISS-PROT

 Hosted by SIB       Mirror                                        USA[new]
 Switzerland         sites:      AustraliaCanada China Korea Taiwan
  ------------------------------------------------------------------------

----------------------------------------------------------------------------
        SWISS-PROT Protein Knowledgebase
        Swiss Institute of Bioinformatics (SIB); Geneva, Switzerland
        European Bioinformatics Institute (EBI); Hinxton, United Kingdom
----------------------------------------------------------------------------

Description: Index of Protein Data Bank (PDB) entries referenced in
             SWISS-PROT
Name:        PDBTOSP.TXT
Release:     40.9 of 31-Jan-2002

----------------------------------------------------------------------------

The PDB database is available at the following URL:

USA:  http://www.rcsb.org/pdb/
EBI:  http://www2.ebi.ac.uk/pdb/

 - Number of PDB entries referenced in SWISS-PROT: 9901
 - Number of SWISS-PROT entries with one or more pointers to PDB: 3260

PDB   Last revision
code  date           SWISS-PROT entry name(s)
____  ___________    __________________________________________
101M  (08-APR-98)  : MYG_PHYCA   (P02185)
102L  (31-OCT-93)  : LYCV_BPT4   (P00720)
102M  (08-APR-98)  : MYG_PHYCA   (P02185)
103L  (31-OCT-93)  : LYCV_BPT4   (P00720)
103M  (08-APR-98)  : MYG_PHYCA   (P02185)
9XIA  (15-JUL-92)  : XYLA_STRRU  (P24300)
9XIM  (15-JUL-93)  : XYLA_ACTMI  (P12851)

----------------------------------------------------------------------------
SWISS-PROT is copyright.  It is produced through a collaboration between the
Swiss Institute  of  Bioinformatics   and the EMBL Outstation - the European
Bioinformatics Institute. There are no restrictions on its use by non-profit
institutions as long as its  content is in no way modified. Usage by and for
commercial entities requires a license agreement.  For information about the
licensing  scheme  see: http://www.isb-sib.ch/announce/ or send  an email to
license@isb-sib.ch.
----------------------------------------------------------------------------

  ------------------------------------------------------------------------
   ExPASy Home page   Site Map    Search ExPASy   Contact us    SWISS-PROT

 Hosted by SIB       Mirror                                        USA[new]
 Switzerland         sites:      AustraliaCanada China Korea Taiwan
  ------------------------------------------------------------------------




4.0 OUTPUT FILE FORMAT

An excerpt from the swissprot:PDB equivalence file in EMBL-like format is shown (Figure 2).

Output files for usage example

File: Epdbtosp.dat

EN   101M
XX
NE   1
XX
IN   MYG_PHYCA ID; P02185 ACC;
XX
//
EN   102L
XX
NE   1
XX
IN   LYCV_BPT4 ID; P00720 ACC;
XX
//
EN   102M
XX
NE   1
XX
IN   MYG_PHYCA ID; P02185 ACC;
XX
//
EN   103L
XX
NE   1
XX
IN   LYCV_BPT4 ID; P00720 ACC;
XX
//
EN   103M
XX
NE   1
XX
IN   MYG_PHYCA ID; P02185 ACC;
XX
//
EN   9XIA
XX
NE   1
XX
IN   XYLA_STRRU ID; P24300 ACC;
XX
//
EN   9XIM
XX
NE   1
XX
IN   XYLA_ACTMI ID; P12851 ACC;
XX
//




5.0 DATA FILES

No data files are used by PDBTOSP.


6.0 USAGE

   Standard (Mandatory) qualifiers:
  [-infile]            infile     This option specifies the name of raw
                                  swissprot:PDB equivalence file (input).
                                  HETPARSE parses this file, which is
                                  available at URL
                                  http://www.expasy.ch/cgi-bin/lists?pdbtosp.txt
  [-outfile]           outfile    [Epdbtosp.dat] This option specifies the
                                  name of swissprot:PDB equivalence file
                                  (EMBL-like format). This is the PDBTOSP
                                  output file.

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

6.1 COMMAND LINE ARGUMENTS

Standard (Mandatory) qualifiers Allowed values Default
[-infile]
(Parameter 1)
This option specifies the name of raw swissprot:PDB equivalence file (input). HETPARSE parses this file, which is available at URL http://www.expasy.ch/cgi-bin/lists?pdbtosp.txt Input file Required
[-outfile]
(Parameter 2)
This option specifies the name of swissprot:PDB equivalence file (EMBL-like format). This is the PDBTOSP output file. Output file Epdbtosp.dat
Additional (Optional) qualifiers Allowed values Default
(none)
Advanced (Unprompted) qualifiers Allowed values Default
(none)

6.2 EXAMPLE SESSION

An example of interactive use of PDBTOSP is shown below. Here is a sample session with pdbtosp


% pdbtosp 
Convert swissprot:PDB codes file to EMBL-like format.
Swissprot:pdb equivalence table file: pdbtosp.txt
Swissprot:pdb equivalence output file [Epdbtosp.dat]: 

Go to the input files for this example
Go to the output files for this example




7.0 KNOWN BUGS & WARNINGS

None.


8.0 NOTES

PDBTOSP is used to create the EMBOSS data file Epdbtosp.dat that is included in the EMBOSS distribution.

8.1 GLOSSARY OF FILE TYPES

FILE TYPE FORMAT DESCRIPTION CREATED BY SEE ALSO
swissprot:PDB equivalence file EMBL-like format. A file containing swissprot identifiers for PDB codes. Included in the EMBOSS distribution N.A.
None


9.0 DESCRIPTION

Some research applications require knowledge of the database sequence(s) that corresponds to the sequence(s) given in a PDB file. A 'swissprot:PDB equivalence' file listing accession numbers and swissprot database identifiers for certain PDB codes is available, but is not in a format that is consistent with flat file formats used for protein structural data in emboss. PDBTOSP parses the swissprot:PDB equivalence in its raw format and converts it to an EMBL-like format.


10.0 ALGORITHM

PDBTOSP relies on finding a line beginning with '____ _' in the input file (all lines up to and including this one are ignored). Lines of code data are then parsed, up until the first blank line.


11.0 RELATED APPLICATIONS

See also

Program name Description
aaindexextract Extract amino acid property data from AAINDEX
allversusall Sequence similarity data from all-versus-all comparison
cathparse Generates DCF file from raw CATH files
cutgextract Extract codon usage tables from from CUTG database
domainer Generates domain CCF files from protein CCF files
domainnr Removes redundant domains from a DCF file
domainseqs Adds sequence records to a DCF file
domainsse Add secondary structure records to a DCF file
hetparse Converts heterogen group dictionary to EMBL-like format
jaspextract Extract data from JASPAR
pdbplus Add accessibility & secondary structure to a CCF file
pdbtosp Convert swissprot:PDB codes file to EMBL-like format
printsextract Extract data from PRINTS database for use by pscan
prosextract Processes the PROSITE motif database for use by patmatmotifs
rebaseextract Process the REBASE database for use by restriction enzyme applications
scopparse Generate DCF file from raw SCOP files
seqnr Removes redundancy from DHF files
sites Generate residue-ligand CON files from CCF files
ssematch Search a DCF file for secondary structure matches
tfextract Process TRANSFAC transcription factor database for use by tfscan



12.0 DIAGNOSTIC ERROR MESSAGES

None.


13.0 AUTHORS

Jon Ison (jison@ebi.ac.uk)
The European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge CB10 1SD UK


14.0 REFERENCES

Please cite the authors and EMBOSS.

Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European Molecular Biology Open Software Suite" Trends in Genetics, 15:276-278.

See also http://emboss.sourceforge.net/

14.1 Other useful references