SITES documentation |
ID D1CS4A_ XX EN 1CS4 XX TY SCOP XX SI 53931 CL; 54861 FO; 55073 SF; 55074 FA; 55077 DO; 55078 SO; 39418 DD; XX CL Alpha and beta proteins (a+b) XX FO Ferredoxin-like XX SF Adenylyl and guanylyl cyclase catalytic domain XX FA Adenylyl and guanylyl cyclase catalytic domain XX DO Adenylyl cyclase VC1, domain C1a XX OS Dog (Canis familiaris) XX NC 1 XX CN [1] XX CH A CHAIN; . START; . END; // ID D1II7A_ XX EN 1II7 XX TY SCOP XX SI 53931 CL; 56299 FO; 56300 SF; 64427 FA; 64428 DO; 64429 SO; 62415 DD; XX CL Alpha and beta proteins (a+b) XX FO Metallo-dependent phosphatases XX SF Metallo-dependent phosphatases XX FA DNA double-strand break repair nuclease XX DO Mre11 XX OS Archaeon Pyrococcus furiosus XX NC 1 XX CN [1] XX CH A CHAIN; . START; . END; // |
XX Residue-ligand contact data (for domains). XX TY LIGAND XX EX THRESH 1.0; IGNORE .; NMOD .; NCHA .; XX NE 11 XX EN [1] XX ID PDB 1cs4; DOM d1cs4a_; LIG 101; XX DE 2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE XX SI SN 1; NS 2 XX CN MO .; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 52; NRES2 . XX S1 SEQUENCE 52 AA; 5817 MW; D8CCAE0E1FC0849A CRC64; ADIEGFTSLA SQCTAQELVM TLNELFARFD KLAAENHCLR IKILGDCYYC VS XX NC SM .; LI 6 XX LI ASP 2 LI PHE 6 LI THR 7 LI LEU 44 LI GLY 45 LI ASP 46 XX // EN [2] XX ID PDB 1ii7; DOM d1ii7a_; LIG 101; XX DE 2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE XX SI SN 2; NS 2 XX CN MO .; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 65; NRES2 . XX S1 SEQUENCE 65 AA; 7395 MW; 75FBE75B22FD3678 CRC64; MKFAHLADIH LGYEQFHKPQ REEEFAEAFK NALEIAVQEN VDFILIAGDL FHSSRPSPGT LKKAI XX NC SM .; LI 2 XX LI HIS 10 LI ASP 49 XX [Part of this file has been deleted for brevity] NC SM .; LI 3 XX LI ASP 8 LI HIS 10 LI ASP 49 XX // EN [10] XX ID PDB 2hhb; DOM .; LIG PO4; XX DE PHOSPHATE ION XX SI SN 1; NS 1 XX CN MO .; CN1 1; CN2 .; ID1 D; ID2 .; NRES1 146; NRES2 . XX S1 SEQUENCE 146 AA; 15867 MW; EACBC707CFD466A1 CRC64; VHLTPEEKSA VTALWGKVNV DEVGGEALGR LLVVYPWTQR FFESFGDLST PDAVMGNPKV KAHGKKVLGA FSDGLAHLDN LKGTFATLSE LHCDKLHVDP ENFRLLGNVL VCVLAHHFGK EFTPPVQAAY QKVVAGVANA LAHKYH XX NC SM .; LI 2 XX LI VAL 1 LI LEU 81 XX // EN [11] XX ID PDB 1cs4; DOM d1cs4a_; LIG POP; XX DE PYROPHOSPHATE 2- XX SI SN 1; NS 1 XX CN MO .; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 52; NRES2 . XX S1 SEQUENCE 52 AA; 5817 MW; D8CCAE0E1FC0849A CRC64; ADIEGFTSLA SQCTAQELVM TLNELFARFD KLAAENHCLR IKILGDCYYC VS XX NC SM .; LI 6 XX LI ASP 2 LI ILE 3 LI GLU 4 LI GLY 5 LI PHE 6 LI THR 7 XX // |
CCF: /homes/user/test/qa/pdbplus-keep/1cs4.ccf HETS:YES NHETS:7 SCOP:YES NDOMS: 1 CCF: /homes/user/test/qa/pdbplus-keep/1ii7.ccf HETS:YES NHETS:5 SCOP:YES NDOMS: 1 CCF: /homes/user/test/qa/pdbplus-keep/2hhb.ccf HETS:YES NHETS:5 SCOP:NO NCHN:4 |
Standard (Mandatory) qualifiers: [-protpath] dirlist [./] This option specifies the location of the protein CCF files (clean coordinate files) (input). A 'clean cordinate file' contains protein coordinate and derived data for a single PDB file ('protein clean coordinate file') or a single domain from SCOP or CATH ('domain clean coordinate file'), in CCF format (EMBL-like). The files, generated by using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up' data that is self-consistent and error-corrected. Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. [-domaindir] directory [./] This option specifies the location of the domain CCF files (clean coordinate files) (input). A 'clean cordinate file' contains protein coordinate and derived data for a single PDB file ('protein clean coordinate file') or a single domain from SCOP or CATH ('domain clean coordinate file'), in CCF format (EMBL-like). The files, generated by using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up' data that is self-consistent and error-corrected. Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. [-dcffile] infile This option specifies the name of the DCF file (domain classification file) (input). A 'domain classification file' contains classification and other data for domains from SCOP or CATH, in DCF format (EMBL-like). The files are generated by using SCOPPARSE and CATHPARSE. Domain sequence information can be added to the file by using DOMAINSEQS. -threshold float [1.0] This option specifies the threshold contact distance. (Any numeric value) [-outfile] outfile [SITES.con] This option specifies the name of the output file. -logfile outfile [sites.log] This option specifies the name of the log file. Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -dicfile datafile [Ehet.dat] This option specifies the dictionary of heterogen groups in PDB. This file is generated by using HETPARSE and is part of the EMBOSS distribution. -vdwfile datafile [Evdw.dat] This option specifies the name of the data file with van der Waals radii for atoms in amino acid residues. This file is part of the EMBOSS distribution. Associated qualifiers: "-outfile" associated qualifiers -odirectory4 string Output directory "-logfile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-protpath] (Parameter 1) |
This option specifies the location of the protein CCF files (clean coordinate files) (input). A 'clean cordinate file' contains protein coordinate and derived data for a single PDB file ('protein clean coordinate file') or a single domain from SCOP or CATH ('domain clean coordinate file'), in CCF format (EMBL-like). The files, generated by using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up' data that is self-consistent and error-corrected. Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. | Directory with files | ./ |
[-domaindir] (Parameter 2) |
This option specifies the location of the domain CCF files (clean coordinate files) (input). A 'clean cordinate file' contains protein coordinate and derived data for a single PDB file ('protein clean coordinate file') or a single domain from SCOP or CATH ('domain clean coordinate file'), in CCF format (EMBL-like). The files, generated by using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up' data that is self-consistent and error-corrected. Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. | Directory | ./ |
[-dcffile] (Parameter 3) |
This option specifies the name of the DCF file (domain classification file) (input). A 'domain classification file' contains classification and other data for domains from SCOP or CATH, in DCF format (EMBL-like). The files are generated by using SCOPPARSE and CATHPARSE. Domain sequence information can be added to the file by using DOMAINSEQS. | Input file | Required |
-threshold | This option specifies the threshold contact distance. | Any numeric value | 1.0 |
[-outfile] (Parameter 4) |
This option specifies the name of the output file. | Output file | SITES.con |
-logfile | This option specifies the name of the log file. | Output file | sites.log |
Additional (Optional) qualifiers | Allowed values | Default | |
(none) | |||
Advanced (Unprompted) qualifiers | Allowed values | Default | |
-dicfile | This option specifies the dictionary of heterogen groups in PDB. This file is generated by using HETPARSE and is part of the EMBOSS distribution. | Data file | Ehet.dat |
-vdwfile | This option specifies the name of the data file with van der Waals radii for atoms in amino acid residues. This file is part of the EMBOSS distribution. | Data file | Evdw.dat |
% sites Generate residue-ligand CON files from CCF files. Clean protein structure coordinates directories [./]: ../pdbplus-keep Clean domain coordinates directory [./]: ../domainer-keep Domain classification file: ../scopparse-keep/all.scop Threshold contact distance [1.0]: 1 Structure contacts output file [SITES.con]: Domainatrix log output file [sites.log]: Entries in HetDic 4306 Entries in Dbase 4306 CCF FILE: /homes/user/test/qa/pdbplus-keep/1cs4.ccf (1/3) CCF FILE: /homes/user/test/qa/pdbplus-keep/1ii7.ccf (2/3) CCF FILE: /homes/user/test/qa/pdbplus-keep/2hhb.ccf (3/3) |
Go to the input files for this example
Go to the output files for this example
FILE TYPE | FORMAT | DESCRIPTION | CREATED BY | SEE ALSO |
Clean coordinate file (for protein) | CCF format (EMBL-like). | Protein coordinate and derived data for a single PDB file. The data are 'cleaned-up': self-consistent and error-corrected. | PDBPARSE | Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. |
Clean coordinate file (for domain) | CCF format (EMBL-like). | Protein coordinate and derived data for a single domain from SCOP or CATH. The data are 'cleaned-up': self-consistent and error-corrected. | DOMAINER | Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. |
Contact file (intra-chain residue-residue contacts) | CON format (EMBL-like.) | Intra-chain residue-residue contact data for a protein or a domain from SCOP or CATH. | CONTACTS | N.A. |
Contact file (inter-chain residue-residue contacts) | CON format (EMBL-like.) | Inter-chain residue-residue contact data for a protein or a domain from SCOP or CATH. | INTERFACE | N.A. |
Contact file (residue-ligand contacts) | CON format (EMBL-like.) | Residue-ligand contact data for a protein or a domain from SCOP or CATH. | SITES | N.A. |
van der Waals radii | A file of van der Waals radii for atoms in amino acid residues. Part of the emboss distribution. | N.A. | N.A. | |
Dictionary of heterogen groups | A file of the dictionary of heterogen groups in PDB. | HETPARSE | N.A. |
Program name | Description |
---|---|
aaindexextract | Extract amino acid property data from AAINDEX |
allversusall | Sequence similarity data from all-versus-all comparison |
cathparse | Generates DCF file from raw CATH files |
cutgextract | Extract codon usage tables from from CUTG database |
domainer | Generates domain CCF files from protein CCF files |
domainnr | Removes redundant domains from a DCF file |
domainseqs | Adds sequence records to a DCF file |
domainsse | Add secondary structure records to a DCF file |
hetparse | Converts heterogen group dictionary to EMBL-like format |
jaspextract | Extract data from JASPAR |
pdbparse | Parses PDB files and writes protein CCF files |
pdbplus | Add accessibility & secondary structure to a CCF file |
pdbtosp | Convert swissprot:PDB codes file to EMBL-like format |
printsextract | Extract data from PRINTS database for use by pscan |
prosextract | Processes the PROSITE motif database for use by patmatmotifs |
rebaseextract | Process the REBASE database for use by restriction enzyme applications |
scopparse | Generate DCF file from raw SCOP files |
seqnr | Removes redundancy from DHF files |
ssematch | Search a DCF file for secondary structure matches |
tfextract | Process TRANSFAC transcription factor database for use by tfscan |
Excerpt of log file CCF: 000_testdata_new/sites/in/1cs4.ccf HETS:YES NHETS:7 SCOP:YES NDOMS: 1 CCF: 000_testdata_new/sites/in/1ii7.ccf HETS:YES NHETS:5 SCOP:YES NDOMS: 1 |
See also http://emboss.sourceforge.net/