SIGGENLIG documentation


 

CONTENTS

1.0 SUMMARY
2.0 INPUTS & OUTPUTS
3.0 INPUT FILE FORMAT
4.0 OUTPUT FILE FORMAT
5.0 DATA FILES
6.0 USAGE
7.0 KNOWN BUGS & WARNINGS
8.0 NOTES
9.0 DESCRIPTION
10.0 ALGORITHM
11.0 RELATED APPLICATIONS
12.0 DIAGNOSTIC ERROR MESSAGES
13.0 AUTHORS
14.0 REFERENCES



1.0 SUMMARY

Generates ligand-binding signatures from a CON file (contacts file) of residue-ligand contacts. Generate ligand-binding signatures from a CON file


2.0 INPUTS & OUTPUTS

SIGGENLIG reads a CON file of residue-ligand contacts generated by using SITES and a directory of CCF files (clean coordinate files) containing a CCF file for each protein or domain in the CON file. One or more signature files, each containing a ligand-binding signature, are generated for each ligand-binding site in the CON file. The user specifies whether 1D (sequence) or 3D (structural) signatures are generated and whether they are 'full-length' (signature corresponds to entire ligand-binding site) or 'patch' (signature corresponds to part of ligand-binding site). For 3D signatures, the environment definition is specified and for patch signatures, a 'Minimum patch size' and 'Maximum gap distance' are specified. A 'Window size' is specified for all signatures. The paths of the CCF files (input) and signature files (output) are specified by the user and the file extensions are specified in the ACD file. A log file is also written.


3.0 INPUT FILE FORMAT

The format of the CON file (contacts file) is described in SITES documentation. The format of the CCF files is described in PDBPARSE documentation (proteins) and the DOMAINER documentation (domains).

Input files for usage example

File: ../sites-signature/SITES.con

XX   Residue-ligand contact data (for domains).
XX
TY   LIGAND
XX
EX   THRESH 1.0; IGNORE .; NMOD .; NCHA .;
XX
NE   11
XX
EN   [1]
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG 101;
XX
DE   2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE
XX
SI   SN 1; NS 2
XX
CN   MO .; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 52; NRES2 .
XX
S1   SEQUENCE    52 AA;   5817 MW;  D8CCAE0E1FC0849A CRC64;
     ADIEGFTSLA SQCTAQELVM TLNELFARFD KLAAENHCLR IKILGDCYYC VS
XX
NC   SM .; LI 6
XX
LI   ASP 2
LI   PHE 6
LI   THR 7
LI   LEU 44
LI   GLY 45
LI   ASP 46
XX
//
EN   [2]
XX
ID   PDB 1ii7; DOM d1ii7a_; LIG 101;
XX
DE   2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE
XX
SI   SN 2; NS 2
XX
CN   MO .; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 65; NRES2 .
XX
S1   SEQUENCE    65 AA;   7395 MW;  75FBE75B22FD3678 CRC64;
     MKFAHLADIH LGYEQFHKPQ REEEFAEAFK NALEIAVQEN VDFILIAGDL FHSSRPSPGT
     LKKAI
XX
NC   SM .; LI 2
XX
LI   HIS 10
LI   ASP 49
XX


  [Part of this file has been deleted for brevity]

NC   SM .; LI 3
XX
LI   ASP 8
LI   HIS 10
LI   ASP 49
XX
//
EN   [10]
XX
ID   PDB 2hhb; DOM .; LIG PO4;
XX
DE   PHOSPHATE ION
XX
SI   SN 1; NS 1
XX
CN   MO .; CN1 1; CN2 .; ID1 D; ID2 .; NRES1 146; NRES2 .
XX
S1   SEQUENCE   146 AA;  15867 MW;  EACBC707CFD466A1 CRC64;
     VHLTPEEKSA VTALWGKVNV DEVGGEALGR LLVVYPWTQR FFESFGDLST PDAVMGNPKV
     KAHGKKVLGA FSDGLAHLDN LKGTFATLSE LHCDKLHVDP ENFRLLGNVL VCVLAHHFGK
     EFTPPVQAAY QKVVAGVANA LAHKYH
XX
NC   SM .; LI 2
XX
LI   VAL 1
LI   LEU 81
XX
//
EN   [11]
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG POP;
XX
DE   PYROPHOSPHATE 2-
XX
SI   SN 1; NS 1
XX
CN   MO .; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 52; NRES2 .
XX
S1   SEQUENCE    52 AA;   5817 MW;  D8CCAE0E1FC0849A CRC64;
     ADIEGFTSLA SQCTAQELVM TLNELFARFD KLAAENHCLR IKILGDCYYC VS
XX
NC   SM .; LI 6
XX
LI   ASP 2
LI   ILE 3
LI   GLU 4
LI   GLY 5
LI   PHE 6
LI   THR 7
XX
//




4.0 OUTPUT FILE FORMAT

The output file (Figure 1) uses the standard signature file format and is described in the SIGGEN documentation. For the ligand-binding signatures generated by SIGGENLIG, however, four additional lines of bibliographic information taken from the CON (input) file are written to a signature file. The records have the following meaning: In addition, where 3D signatures are generated, the following records have different meanings than for 1D signatures (e.g. those generated by using SIGGEN):

Output files for usage example

File: 101.1.F.#.1cs4.d1cs4a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG 101;
XX
DE    2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE
XX
IS   SN 1; NS 2
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   6
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   1 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   3 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   T ; 1
XX
GA   0 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   36 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   G ; 1
XX
GA   0 ; 1
XX
NN   [6]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   0 ; 1
//

File: 101.2.F.#.1ii7.d1ii7a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1ii7; DOM d1ii7a_; LIG 101;
XX
DE    2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE
XX
IS   SN 2; NS 2
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   2
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   9 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   38 ; 1
//

File: FOK.1.F.#.1cs4.d1cs4a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG FOK;
XX
DE    FORSKOLIN
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   1
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   43 ; 1
//

File: HEM.1.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG HEM;
XX
DE    PROTOPORPHYRIN IX CONTAINING FE
XX
IS   SN 1; NS 4
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   17
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   Y ; 1
XX
GA   41 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   1 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX


  [Part of this file has been deleted for brevity]

XX
GA   0 ; 1
XX
NN   [12]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   3 ; 1
XX
NN   [13]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   1 ; 1
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 1
XX
GA   3 ; 1
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [16]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   2 ; 1
XX
NN   [17]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   34 ; 1
//

File: HEM.2.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG HEM;
XX
DE    PROTOPORPHYRIN IX CONTAINING FE
XX
IS   SN 2; NS 4
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   17
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   30 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   9 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   20 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX


  [Part of this file has been deleted for brevity]

XX
GA   0 ; 1
XX
NN   [12]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   3 ; 1
XX
NN   [13]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   1 ; 1
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 1
XX
GA   3 ; 1
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [16]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   2 ; 1
XX
NN   [17]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   34 ; 1
//

File: HEM.3.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG HEM;
XX
DE    PROTOPORPHYRIN IX CONTAINING FE
XX
IS   SN 3; NS 4
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   19
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   M ; 1
XX
GA   31 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   T ; 1
XX
GA   6 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   Y ; 1
XX
GA   2 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX


  [Part of this file has been deleted for brevity]

XX
GA   0 ; 1
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   3 ; 1
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   1 ; 1
XX
NN   [16]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 1
XX
GA   3 ; 1
XX
NN   [17]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [18]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   2 ; 1
XX
NN   [19]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   34 ; 1
//

File: HEM.4.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG HEM;
XX
DE    PROTOPORPHYRIN IX CONTAINING FE
XX
IS   SN 4; NS 4
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   15
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   30 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   9 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   20 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX


  [Part of this file has been deleted for brevity]

XX
GA   0 ; 1
XX
NN   [10]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   3 ; 1
XX
NN   [11]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   1 ; 1
XX
NN   [12]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 1
XX
GA   3 ; 1
XX
NN   [13]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   2 ; 1
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   34 ; 1
//

File: MG.1.F.#.1cs4.d1cs4a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG MG;
XX
DE    MAGNESIUM ION
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   4
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   1 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   I ; 1
XX
GA   0 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   2 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   39 ; 1
//

File: MN.1.F.#.1ii7.d1ii7a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1ii7; DOM d1ii7a_; LIG MN;
XX
DE    MANGANESE (II) ION
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   3
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   7 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   1 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   38 ; 1
//

File: PO4.1.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG PO4;
XX
DE    PHOSPHATE ION
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   2
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   0 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   79 ; 1
//

File: POP.1.F.#.1cs4.d1cs4a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG POP;
XX
DE    PYROPHOSPHATE 2-
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   6
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   1 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   I ; 1
XX
GA   0 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   E ; 1
XX
GA   0 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   G ; 1
XX
GA   0 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [6]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   T ; 1
XX
GA   0 ; 1
//

File: siggenlig.log





5.0 DATA FILES

SIGGENLIG does not use any data files.


6.0 USAGE

Generate ligand-binding signatures from a CON file
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-confile]           infile     This option specifies the name of the CON
                                  file (contact file) (input). A 'contact
                                  file' contains contact data for a protein or
                                  a domain from SCOP or CATH, in the CON
                                  format (EMBL-like). The contacts may be
                                  intra-chain residue-residue, inter-chain
                                  residue-residue or residue-ligand. The files
                                  are generated by using CONTACTS, INTERFACE
                                  and SITES.
   -ccfpdir            directory  This option specifies the location of
                                  protein CCF file (clean coordinate files)
                                  (input). A 'clean cordinate file' contains
                                  protein coordinate and derived data for a
                                  single PDB file ('protein clean coordinate
                                  file') or a single domain from SCOP or CATH
                                  ('domain clean coordinate file'), in CCF
                                  format (EMBL-like). The files, generated by
                                  using PDBPARSE (PDB files) or DOMAINER
                                  (domains), contain 'cleaned-up' data that is
                                  self-consistent and error-corrected.
                                  Records for residue solvent accessibility
                                  and secondary structure are added to the
                                  file by using PDBPLUS. default: ./
   -ccfddir            directory  This option specifies the location of
                                  protein CCF file (clean coordinate files)
                                  (input). A 'clean cordinate file' contains
                                  protein coordinate and derived data for a
                                  single PDB file ('protein clean coordinate
                                  file') or a single domain from SCOP or CATH
                                  ('domain clean coordinate file'), in CCF
                                  format (EMBL-like). The files, generated by
                                  using PDBPARSE (PDB files) or DOMAINER
                                  (domains), contain 'cleaned-up' data that is
                                  self-consistent and error-corrected.
                                  Records for residue solvent accessibility
                                  and secondary structure are added to the
                                  file by using PDBPLUS. default: ./
   -mode               menu       [1] This option specifies the mode of
                                  signature generation. In 'Full-length
                                  signature mode' (mode 1) a single signature
                                  incorporating all residue positions that
                                  contact the ligand plus intervening gaps is
                                  generated. In 'Patch signature mode' (mode
                                  2) one or more signatures corresponding to
                                  'patches' of residue positions are
                                  generated. A patch is a set of residues that
                                  are near-neighbours in sequence and is
                                  described by two user-defined parameters:
                                  minimum patch size and maximum gap distance.
                                  (Values: 1 (Full-length signature mode); 2
                                  (Patch signature mode))
   -type               menu       [1] This option specifies the type of
                                  signature generated. In '1D (sequence)
                                  signature' sequence-based signatures are
                                  generated. In '3D (structural) signature'
                                  structure-based signatures are generated.
                                  (Values: 1 (1D (sequence) signature); 2 (3D
                                  (structural) signature))
*  -patchsize          integer    [5] This option specifies the minimum patch
                                  size. This is the minimum number of contact
                                  positions that must be incorporated in a
                                  signature. (Integer 3 or more)
*  -gapdistance        integer    [2] This option specifies the maximum gap
                                  distance. This is the maximum allowable gap
                                  (residues) between two residue in a patch.
                                  If two contact residues are further than
                                  this distance apart in sequence, they would
                                  not belong to the same patch. (Integer 0 or
                                  more)
*  -environment        menu       [1] This option specifies the environment
                                  definition. See matgen3d documentation for
                                  description of definitions. (Values: 1
                                  (Env1); 2 (Env2); 3 (Env3); 4 (Env4); 5
                                  (Env5); 6 (Env6); 7 (Env7); 8 (Env8); 9
                                  (Env9); 10 (Env10); 11 (Env11); 12 (Env12);
                                  13 (Env13); 14 (Env14); 15 (Env15); 16
                                  (Env16))
  [-sigoutdir]         outdir     [./] This option specifies the location of
                                  signature files (output). A 'signature file'
                                  contains a sparse sequence signature
                                  suitable for use with the SIGSCAN and
                                  SIGSCANLIG programs. The files are generated
                                  by using SIGGEN and SIGGENLIG.
  [-logfile]           outfile    [siggenlig.log] Domainatrix log output file

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-ccfpdir" associated qualifiers
   -extension          string     Default file extension

   "-ccfddir" associated qualifiers
   -extension          string     Default file extension

   "-sigoutdir" associated qualifiers
   -extension2         string     Default file extension

   "-logfile" associated qualifiers
   -odirectory3        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit

6.1 COMMAND-LINE ARGUMENTS

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-confile]
(Parameter 1)
infile This option specifies the name of the CON file (contact file) (input). A 'contact file' contains contact data for a protein or a domain from SCOP or CATH, in the CON format (EMBL-like). The contacts may be intra-chain residue-residue, inter-chain residue-residue or residue-ligand. The files are generated by using CONTACTS, INTERFACE and SITES. Input file Required
-ccfpdir directory This option specifies the location of protein CCF file (clean coordinate files) (input). A 'clean cordinate file' contains protein coordinate and derived data for a single PDB file ('protein clean coordinate file') or a single domain from SCOP or CATH ('domain clean coordinate file'), in CCF format (EMBL-like). The files, generated by using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up' data that is self-consistent and error-corrected. Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. default: ./ Directory  
-ccfddir directory This option specifies the location of protein CCF file (clean coordinate files) (input). A 'clean cordinate file' contains protein coordinate and derived data for a single PDB file ('protein clean coordinate file') or a single domain from SCOP or CATH ('domain clean coordinate file'), in CCF format (EMBL-like). The files, generated by using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up' data that is self-consistent and error-corrected. Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. default: ./ Directory  
-mode list This option specifies the mode of signature generation. In 'Full-length signature mode' (mode 1) a single signature incorporating all residue positions that contact the ligand plus intervening gaps is generated. In 'Patch signature mode' (mode 2) one or more signatures corresponding to 'patches' of residue positions are generated. A patch is a set of residues that are near-neighbours in sequence and is described by two user-defined parameters: minimum patch size and maximum gap distance.
1 (Full-length signature mode)
2 (Patch signature mode)
1
-type list This option specifies the type of signature generated. In '1D (sequence) signature' sequence-based signatures are generated. In '3D (structural) signature' structure-based signatures are generated.
1 (1D (sequence) signature)
2 (3D (structural) signature)
1
-patchsize integer This option specifies the minimum patch size. This is the minimum number of contact positions that must be incorporated in a signature. Integer 3 or more 5
-gapdistance integer This option specifies the maximum gap distance. This is the maximum allowable gap (residues) between two residue in a patch. If two contact residues are further than this distance apart in sequence, they would not belong to the same patch. Integer 0 or more 2
-environment list This option specifies the environment definition. See matgen3d documentation for description of definitions.
1 (Env1)
2 (Env2)
3 (Env3)
4 (Env4)
5 (Env5)
6 (Env6)
7 (Env7)
8 (Env8)
9 (Env9)
10 (Env10)
11 (Env11)
12 (Env12)
13 (Env13)
14 (Env14)
15 (Env15)
16 (Env16)
1
[-sigoutdir]
(Parameter 2)
outdir This option specifies the location of signature files (output). A 'signature file' contains a sparse sequence signature suitable for use with the SIGSCAN and SIGSCANLIG programs. The files are generated by using SIGGEN and SIGGENLIG. Output directory ./
[-logfile]
(Parameter 3)
outfile Domainatrix log output file Output file siggenlig.log
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
(none)
Associated qualifiers
"-ccfpdir" associated directory qualifiers
-extension string Default file extension Any string ccf
"-ccfddir" associated directory qualifiers
-extension string Default file extension Any string ccf
"-sigoutdir" associated outdir qualifiers
-extension2
-extension_sigoutdir
string Default file extension Any string sig
"-logfile" associated outfile qualifiers
-odirectory3
-odirectory_logfile
string Output directory Any string  
General qualifiers
-auto boolean Turn off prompts Boolean value Yes/No N
-stdout boolean Write first file to standard output Boolean value Yes/No N
-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N
-options boolean Prompt for standard and additional values Boolean value Yes/No N
-debug boolean Write debug output to program.dbg Boolean value Yes/No N
-verbose boolean Report some/full command line options Boolean value Yes/No Y
-help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N
-warning boolean Report warnings Boolean value Yes/No Y
-error boolean Report errors Boolean value Yes/No Y
-fatal boolean Report fatal errors Boolean value Yes/No Y
-die boolean Report dying program messages Boolean value Yes/No Y
-version boolean Report version number and exit Boolean value Yes/No N

6.2 EXAMPLE SESSION

An example of interactive use of SIGGENLIG is shown below. Here is a sample session with siggenlig


% siggenlig 
Generate ligand-binding signatures from a CON file
Structure contacts file: ../sites-signature/SITES.con
Clean protein structure coordinates directory (optional) [.]: ../pdbplus-signature
Clean domain coordinates directory (optional) [.]: ../domainer-signature
Available modes
         1 : Full-length signature mode
         2 : Patch signature mode
Select mode of operation. [1]: 1
Available types
         1 : 1D (sequence) signature
         2 : 3D (structural) signature
Select type of signature. [1]: 1
Domainatrix signature file output directory [./]: 
Domainatrix log output file [siggenlig.log]: 

Go to the input files for this example
Go to the output files for this example




7.0 KNOWN BUGS & WARNINGS




8.0 NOTES

8.1 GLOSSARY OF FILE TYPES

FILE TYPE FORMAT DESCRIPTION CREATED BY SEE ALSO
Clean coordinate file (for domain) CCF format (EMBL-like). Protein coordinate and derived data for a single domain from SCOP or CATH. The data are 'cleaned-up': self-consistent and error-corrected. DOMAINER Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS.
Clean coordinate file (for domain) CCF format (EMBL-like). Protein coordinate and derived data for a single domain from SCOP or CATH. The data are 'cleaned-up': self-consistent and error-corrected. DOMAINER Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS.
Contact file (residue-ligand contacts) CON format (EMBL-like.) Residue-ligand contact data for a protein or a domain from SCOP or CATH. SITES N.A.
Signature file SIG format Contains a sparse sequence signature suitable for use with the SIGSCAN program. Contains a sparse sequence signature. SIGGEN, LIBGEN The files are generated by using SIGGENLIG.



9.0 DESCRIPTION

A protein or a single domain of a protein may contain one or more ligand-binding sites. SIGGENLIG provides an automated means to generate signatures of ligand-binding from a CON file (contacts file) of residue-ligand contacts. The signatures generated may be of two types, 1D or 3D. 1D signatures represent protein sites as residue identitites whereas 3D signatures represent sites as residue environments in space.


10.0 ALGORITHM

The user specifies whether 1D (sequence) or 3D (structural) signatures are generated and whether they are 'full-length' (signature corresponds to entire ligand-binding site) or 'patch' (signature corresponds to part of ligand-binding site). For 3D signatures, the environment definition is specified and for patch signatures, a 'Minimum patch size' and 'Maximum gap distance' are specified. A 'Window size' is specified for all signatures.

Definition of full-length and patch signatures
For each ligand-binding site represented in the CON file (input), one or more signatures are generated as follows: (1) a single 'full-length' signature incorporating all residue positions that contact the ligand plus intervening gaps, or (2) one or more signatures corresponding to 'patches' of residue positions.
A patch is a set of residues that are near-neighbours in sequence and is described by two user-defined parameters as follows. (1) Minimum patch size; the minimum number of contact positions that must be incorporated for a patch to be defined. (2) Maximum gap distance. The maximum allowable gap (residues) between two residue in a patch. If two contact residues are further than this distance apart in sequence, they would not belong to the same patch.

Environment definitions.
See MATGEN3d documentation for environment definitions.

Naming of output files. The naming convention of the signature (output) files is as follows:
Ligand identifier.Site number.F or P.Patch number-Total patches.PDB identifier.Domain identifier. For example,
101.1.P.1-1.1cs4.d1cs4a_.sig


11.0 RELATED APPLICATIONS

See also

Program name Description
echlorop Report presence of chloroplast transit peptides
elipop Predict lipoproteins
esignalp Report protein signal cleavage sites
etmhmm Reports transmembrane helices
profit Scan one or more sequences with a simple frequency matrix
prophecy Create frequency matrix or profile from a multiple alignment
prophet Scan one or more sequences with a Gribskov or Henikoff profile
seqsearch Generate PSI-BLAST hits (DHF file) from a DAF file
sigcleave Report on signal cleavage sites in a protein sequence
siggen Generate a sparse protein signature from an alignment
sigscan Generate hits (DHF file) from a signature search
sigscanlig Search ligand-signature library and writes hits (LHF file)
tmap Predict and plot transmembrane segments in protein sequences
topo Draw an image of a transmembrane protein



13.0 DIAGNOSTIC ERROR MESSAGES

None.


14.0 AUTHORS

Jon Ison (jison © hgmp.mrc.ac.uk)
HGMP-RC, Genome Campus, Hinxton, Cambridge CB10 1SB, UK

Waqas Awan
Jon Ison (jison@ebi.ac.uk)
The European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge CB10 1SD UK


14.0 REFERENCES

Please cite the authors and EMBOSS.

Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European Molecular Biology Open Software Suite" Trends in Genetics, 15:276-278.

See also http://emboss.sourceforge.net/

14.1 Other useful references

History

Comments

None