sigscan documentation


 

CONTENTS

1.0 SUMMARY
2.0 INPUTS & OUTPUTS
3.0 INPUT FILE FORMAT
4.0 OUTPUT FILE FORMAT
5.0 DATA FILES
6.0 USAGE
7.0 KNOWN BUGS & WARNINGS
8.0 NOTES
9.0 DESCRIPTION
10.0 ALGORITHM
11.0 RELATED APPLICATIONS
12.0 DIAGNOSTIC ERROR MESSAGES
13.0 AUTHORS
14.0 REFERENCES



1.0 SUMMARY

Generates a DHF (domain hits file) of hits (sequences) from scanning a signature against a sequence database. Generates hits (DHF file) from a signature search


2.0 INPUTS & OUTPUTS

SIGSCAN reads a signature from a protein signature file, scans the signature against a protein sequence database and generates a DHF file (domain hits file) of hits to database sequences and a DAF file (domain alignment file) of corresponding signature-sequence alignments. The names of the signature file, DHF file and DAF file are provided by the user. The user specifies a maximum number of high-scoring hits that will be generated.


3.0 INPUT FILE FORMAT

The format of the signature file is described in SIGGEN documentation.

Input files for usage example

File: ../siggen-keep/54894.sig

TY   SCOP
XX
TS   1D
XX
CL   Alpha and beta proteins (a+b)
XX
FO   Ferredoxin-like
XX
SF   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
FA   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
SI   54894
XX
NP   15
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 2
XX
GA   12 ; 2
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   1 ; 2
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   26 ; 2
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   T ; 2
XX
GA   15 ; 2
XX
NN   [5]
XX


  [Part of this file has been deleted for brevity]

XX
GA   4 ; 2
XX
NN   [10]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   I ; 2
XX
GA   2 ; 2
XX
NN   [11]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 2
XX
GA   0 ; 2
XX
NN   [12]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 2
XX
GA   0 ; 2
XX
NN   [13]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 2
XX
GA   3 ; 2
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   R ; 2
XX
GA   3 ; 2
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 2
XX
GA   2 ; 2
//

File: swsmall

> Q9WVI4
DDVTMLFSDIVGFTAICAQCTPMQVISMLNELYTRFDHQCGFLDIYKVETIGDAYCVASG
LHRKSLCHAKPIALMALKMMELSEEVLTPDGRPIQMRIGIHSGSVLAGVVGVRMPRYCLF
GNNVTLASKFESGSHPRRINISPTTYQLL
> Q9ERL9
VTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLH
RESDTHAVQIALMALKMMELSNEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGN
NVTLANKFESCSVPRKINVSPTTYRLLKDCPG
> Q9DGG6
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEDTKCEKISTLGDCYYCVAG
CPEPRADHAYCCIEMGLGMIKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVW
SNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGKVTERVGQSAVADQLKGLKTYL
I
> Q99396
KELADPVTLIFTDIESSTAQWATQPELMPDAVATHHSMVRSLIENYDCYEVKTVGDSFMI
ACKSPFAAVQLAQELQLRFLRLDWGTTVFDEFYREFEERHAEEGDGKYKPPTARLDPEVY
RQLWNGLRVRVGIHTGLCDIRYDEVTKGYDYYGQTANTAARTESVGNGGQVLMTCETYHS
LSTAERSQFDVTPLGGVPLRGVSEPVEVYQLN
> Q99280
NDSAPKEPTGPVTLIFTDIESSTALWAAHPDLMPDAVATHHRLIRSLITRYECYEVKTVG
DSFMIASKSPFAAVQLAQELQLRFLRLDWETNALDESYREFEEQRAEGECEYTPPTAHMD
PEVYSRLWNGLRVRVGIHTGLCDIRYDEVTKGYDYYGRTSNMAARTESVANGGQVLMTHA
AYMSLSGEDRNQLDVTTLGATVLRGVPEPVRMYQLN
> Q99279
NNNRAPKEPTDPVTLIFTDIESSTALWAAHPDLMPDAVAAHHRMVRSLIGRYKCYEVKTV
GDSFMIASKSPFAAVQLAQELQLCFLHHDWGTNALDDSYREFEEQRAEGECEYTPPTAHM
DPEVYSRLWNGLRVRVGIHTGLCDIIRHDEVTKGYDYYGRTPNMAARTESVANGGQVLMT
HAAYMSLSAEDRKQIDVTALGDVALRGVSDPVKMYQLN
> Q91WF3
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTY
MAATGLNATSGQDTQQDSERSCSHLGTMVEFAVALGSKLGVINKHSFNNFRLRVGLNHGP
VVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVTEETARAL
> Q91WF3
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKIL
GDCYYCVSGLPLSLPDHAINCVRMGLDMCRAIRKLRVATGVDINMRVGVHSGSVLCGVIG
LQKWQYDVWSHDVTLANHMEAGGVPGRVHITGATLALL
> Q8VHH7
NNFMLRIGMNKGGVLAGVIGARKPHYDIWGNTVNVASRMESTGVMGNIQVVEET
> Q8VHH7
FNTMYMYRHENVSILFADIVGFTQLSSACSAQELVKLLNELFARFDKLAAKYHQLRIKIL
GDCYYCICGLPDYREDHAVCSILMGLAMVEAISYVREKTKTGVDMRVGVHTGTVLGGVLG
QKRWQYDVWSTDVTVANKMEAGGIPGRVHISQSTMDCLKGEFDVEPGDGGSRCDYLDEKG
IETYLI
> Q8NFM4
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTY
MAATGLNATSGQDAQQDAERSCSHLGTMVEFAVALGSKLDVINKHSFNNFRLRVGLNHGP
VVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVTEET
> Q8NFM4
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKIL
GDCYYCVSGLPLSLPDHAINCVRMGLDMCRAIRKLRAATGVDINMRVGVHSGSVLCGVIG


  [Part of this file has been deleted for brevity]

> Q83IL8
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSE
EQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> Q7P144
VEALKQGTVIDHIPAGEGVKILRLFKLTETGERVTVGLNLVSRHMGSKDLIKVENVALTE
EQANELALFAPKATVNVIDNFEVVKKHKLTLP
> Q7MZ14
VEAIRCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSNRLGKKDLIKIENTFLTE
QQANQLAMYAPNATVNCIENYEVVKKLPINLP
> Q7MX57
VAAIRNGIVIDHIPPTKLFKVATLLQLDDLDKRITIGNNLRSRSHGSKGVIKIEDKTFEE
EELNRIALIAPNVRLNIIRDYEVVEKRQVEVP
> Q7MHF0
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINE
EQASKLALYAPHATVNQIEDYQVVKKLALELP
> Q58801
VKKITNGTVIDHIDAGKALMVFKVLNVPKETSVMIAINVPSKKKGKKDILKIEGIELKKE
DVDKISLISPDVTINIIRNGKVVEKLKPQIP
> P96175
VEAICNGYVIDHIPSGQGVKILRLFSLTDTKQRVTVGFNLPSHDGTTKDLIKVENTEITK
SQANQLALLAPNATVNIIENFKVTDKHSLALP
> P96111
GIKPIENGTVIDHIAKGKTPEEIYSTILKIRKILRLYDVDSADGIFRSSDGSFKGYISLP
DRYLSKKEIKKLSAISPNTTVNIIKNSTVVEKYRIKLP
> P77919
VSAIKEGTVIDHIPAGKGLKVIEILKLGKLTNGGAVLLAMNVPSKKLGRKDIVKVEGRFL
SEEEVNKIALVAPNATVNIIRDYKVVEKFKVEVP
> P74766
VSKIKNGTVIDHIPAGRAFAVLNVLGIKGHEGFRIALVINVDSKKMGKKDIVKIEDKEIS
DTEANLITLIAPTATINIVREYEVVKKTKLEVP
> P57451
VEAIKSGSVIDHIPEYIGFKLLSLFRFTETEKRITIGLNLPSKKLGRKDIIKIENTFLSD
EQINQLAIYAPHATVNYINEYNLVRKVFPTLP
> P19936
VEAIKCGTVIDHIPAQIGFKLLTLFKLTATDQRITIGLNLPSNELGRKDLIKIENTFLTE
QQANQLAMYAPKATVNRIDNYEVVRKLTLSLP
> P08421
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTE
EQVNQLALYAPQATVNRIDNYDVVGKSRPSLP
> P00478
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSE
DQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> O58452
VSAIKEGTVIDHIPAGKGLKVIEILGLSKLSNGGSVLLAMNVPSKKLGRKDIVKVEGKFL
SEEEVNKIALVAPTATVNIIRNYKVVEKFKVEVP
> O30129
VSKIKEGTVIDHINAGKALLVLKILKIQPGTDLTVSMAMNVPSSKMGKKDIVKVEGMFIR
DEELNKIALISPNATINLIRDYEIERKFKVSPP
> O26938
VKPIKNGTVIDHITANRSLNVLNILGLPDGRSKVTVAMNMDSSQLGSKDIVKIENRELKP
SEVDQIALIAPRATINIVRDYKIVEKAKVRL




4.0 OUTPUT FILE FORMAT

DHF file (domain hits file)
The format of the DHF file (domain hits file) of hit sequences generated by SIGSCAN (Figure 1) is described fully in SEQSEARCH documentation and only summarised here. The file contains two lines per hit, the first is a description of the hit in 16 text tokens delimited by '^'. The second line contains the protein sequence. The first 4 tokens refer to the hit (sequence) itself, the tokens are
The next 9 tokens refer to the domain family, superfamily etc for which the signature was derived and are as follows:
The next 4 tokens refer to the hit, specifically, information about the search result as follows:

DAF file (domain alignment file)
The format of the DAF file (domain alignment file, Figure 2) generated by SIGSCAN is described fully in DOMAINALIGN documentation and is only summarised here.
It conforms to EMBOSS "simple" multiple sequence alignment format and includes domain classification records (in comment lines beginning with '#') for the node for which the signature was generated. The classification records are TY (domain type, either SCOP or CATH), CL (class), FO (fold), SF (superfamily) and FA (family). For CATH domains, AR (architecture) and TP (topology) may also be given. A unique identifier for the node is given after SI.
There are multiple blocks that contain the accession numbers, positions and aligned sequences. An accession number is given for each hit. The positions are the start and end residue positions of the appropriate section of sequence. The sequence uses '-' as a gap character. A 'SIGNATURE' line is given as a markup line underneath the sequence (signature positions are marked with a '*').

Output files for usage example

File: SIGSCAN.dhf

> P00478^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^3.20^0.000e+00^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEDQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> P08421^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^3.20^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTEEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP
> Q83IL8^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^3.20^0.000e+00^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEEQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> Q8Z130^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^3.00^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTDEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP
> Q97B28^.^11^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.53^0.000e+00^0.000e+00
ISKIKDGTVIDHIPSGKALRVLSILGIRDDVDYTVSVGMHVPSSKMEYKDVIKIENRSLDKNELDMISLTAPNATISIIKNYEISEKFKVELP
> P19936^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.40^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQIGFKLLTLFKLTATDQRITIGLNLPSNELGRKDLIKIENTFLTEQQANQLAMYAPKATVNRIDNYEVVRKLTLSLP
> Q7P144^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.40^0.000e+00^0.000e+00
VEALKQGTVIDHIPAGEGVKILRLFKLTETGERVTVGLNLVSRHMGSKDLIKVENVALTEEQANELALFAPKATVNVIDNFEVVKKHKLTLP
> Q8ZB38^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.40^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQIGFKLLSLFKLTATDQRITIGLNLPSKRSGRKDLIKIENTFLTEQQANQLAMYAPDATVNRIDNYEVVKKLTLSLP
> Q9HKM3^.^11^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.40^0.000e+00^0.000e+00
ISKIRDGTVIDHVPSGKGIRVIGVLGVHEDVNYTVSLAIHVPSNKMGFKDVIKIENRFLDRNELDMISLIAPNATISIIKNYEISEKFQVELP
> P74766^.^11^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.20^0.000e+00^0.000e+00
VSKIKNGTVIDHIPAGRAFAVLNVLGIKGHEGFRIALVINVDSKKMGKKDIVKIEDKEISDTEANLITLIAPTATINIVREYEVVKKTKLEVP
> Q8ZTG2^.^11^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.13^0.000e+00^0.000e+00
VSKIENGTVIDHIPAGRALTVLRILGISGKEGLRVALVMNVESKKLGKKDIVKIEGRELTPEEVNIISAVAPTATINIIRNFAVVKKFKVTPP
> Q9UX07^.^11^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.13^0.000e+00^0.000e+00
VSKIRNGTVIDHIPAGRALAVLRILGIRGSEGYRVALVMNVESKKIGRKDIVKIEDRVIDEKEASLITLIAPSATINIIRDYVVTEKRHLEVP
> Q7MZ14^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.07^0.000e+00^0.000e+00
VEAIRCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSNRLGKKDLIKIENTFLTEQQANQLAMYAPNATVNCIENYEVVKKLPINLP
> Q9HHN3^.^11^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.07^0.000e+00^0.000e+00
VSKIQAGTVIDHIPAGQALQVLQILGTNGASDDQITVGMNVTSERHHRKDIVKIEGRELSQDEVDVLSLIAPDATINIVRDYEVDEKRRVDRP
> Q9K1K9^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.07^0.000e+00^0.000e+00
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTIDNFKVVQKRHLNLP
> O58452^.^12^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.93^0.000e+00^0.000e+00
VSAIKEGTVIDHIPAGKGLKVIEILGLSKLSNGGSVLLAMNVPSKKLGRKDIVKVEGKFLSEEEVNKIALVAPTATVNIIRNYKVVEKFKVEVP
> Q87LF7^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.93^0.000e+00^0.000e+00
VEAIKNGTVIDHIPAQIGIKVLKLFDMHNSSQRVTIGLNLPSSALGHKDLLKIENVFINEEQASKLALYAPHATVNQIENYEVVKKLALELP
> Q9KP65^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.93^0.000e+00^0.000e+00
VEAIKNGTVIDHIPAKVGIKVLKLFDMHNSAQRVTIGLNLPSSALGSKDLLKIENVFISEAQANKLALYAPHATVNQIENYEVVKKLALQLP
> P96175^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.73^0.000e+00^0.000e+00
VEAICNGYVIDHIPSGQGVKILRLFSLTDTKQRVTVGFNLPSHDGTTKDLIKVENTEITKSQANQLALLAPNATVNIIENFKVTDKHSLALP
> Q8D1W6^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.73^0.000e+00^0.000e+00
VEAIFGGTVIDHIPAQVGLKLLSLFKWLHTKERITMGLNLPSNQQKKKDLIKLENVLLNEDQANQLSIYAPLATVNQIKNYIVIKKQKLKLP
> Q9JWY6^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.73^0.000e+00^0.000e+00
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTIDHFKVVQKRHLNLP
> P77919^.^12^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.60^0.000e+00^0.000e+00
VSAIKEGTVIDHIPAGKGLKVIEILKLGKLTNGGAVLLAMNVPSKKLGRKDIVKVEGRFLSEEEVNKIALVAPNATVNIIRDYKVVEKFKVEVP
> Q7MHF0^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.60^0.000e+00^0.000e+00
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIEDYQVVKKLALELP
> Q8DCF7^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.60^0.000e+00^0.000e+00
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIEDYQVVKKLALELP
> Q8K9H8^.^10^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.60^0.000e+00^0.000e+00
VEAIKSGSVIDHIPAHIGFKLLSLFRFTETEKRITIGLNLPSQKLDKKDIIKIENTFLSDDQINQLAIYAPCATVNYIEKYNLVGKIFPSLP


  [Part of this file has been deleted for brevity]

FHSLYVKRHQNVSILYADIVGFTQLASDCSPKELVVVLNELFGKFDQIAKANECMRIKILGDCYYCVSGLPVSLPTHARNCVKMGLDMCQAIKQVREATGVDINMRVGIHSGNVLCGVIGLRKWQYDVWSHDVSLANRMEAAGVPGRVHITEATLKHLDKAYEVEDGHGQQRDPYLKEMNIRTYLV
> P51829^.^90^170^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
FHSLYVKRHQNVSILYADIVGFTRLASDCSPKELVVVLNELFGKFDQIAKANECMRIKILGDCYYCVSGLPVSLPTHARNCVKMGLDICEAIKQVREATGVDISMRVGIHSGNVLCGVIGLRKWQYDVWSHDVSLANRMEAAGVPGRVHITEATLNHLDKAYEVEDGHGEQRDPYLKEMNIRTYLV
> Q03343^.^92^172^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
MMFHKIYIQKHDNVSILFADIEGFTSLASQCTAQELVMTLNELFARFDKLAAENHCLRIKILGDCYYCVSGLPEARADHAHCCVEMGVDMIEAISLVREVTGVNVNMRVGIHSGRVHCGVLGLRKWQFDVWSNDVTLANHMEAGGRAGRIHITRATLQYLNGDYEVEPGRGGERNGYLKEQCIETFLIL
> Q07093^.^16^96^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
VTILFSDIVGFTSICSRATPFMVISMLEGLYKDFDEFCDFFDVYKVETIGDAYCVASGLHRASIYDAHRCLDGLKMIDACSKHITHDGEQIKMRIGLHTGTVLAGVVGRKMPRYCLFGHSVTIANKFESGSEALKINVSPTTKDWLTKHEGFEFELQP
> Q08462^.^70^150^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
DCVCVMFASIPDFKEFYTESDVNKEGLECLRLLNEIIADFDDLLSKPKFSGVEKIKTIGSTYMAATGLSAVPSQEHSQEPERQYMHIGTMVEFAFALVGKLDAINKHSFNDFKLRVGINHGPVIAGVIGAQKPQYDIWGNTVNVASRMDSTGVLDKIQVTEETSLVL
> Q26721^.^94^174^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
PVTLIFTDIESSTALWAAHPEVMPDAVATHHRLIRTLISKYECYEVKTVGDSFMIASKSPFAAVQLAQELQLCFLHHDWGTNAIDESYQQFEQQRAEDDSDYTPPTARLDPKVYSRLWNGLRVRVGIHTGLCDIRRDEVTKGYDYYGRTSNMAARTESVANGGQVLMTHAAYMSLSAEERQQIDVTALGDVPLRGVPKPVEMYRLN
> Q29450^.^90^170^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
FHNLYVKRHQNVSILYADIVGFTRLASDCSPKELVVVLNELFGKFDQIAKANECMRIKILGDCYYCVSGLPVSLPNHARNCVKMGLDMCEAIKQVREATGVDISMRVGIHSGNVLCGVIGLRKWQYDVWSHDVSLANRMEAAGVPGRVHITEATLKHLDKAYEVEDGHGQQRDPYLKEMNIRTYLV
> Q8NFM4^.^76^156^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKILGDCYYCVSGLPLSLPDHAINCVRMGLDMCRAIRKLRAATGVDINMRVGVHSGSVLCGVIGLQKWQYDVWSHDVTLANHMEAGGVPGRVHITGATLALL
> Q8NFM4^.^68^148^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTYMAATGLNATSGQDAQQDAERSCSHLGTMVEFAVALGSKLDVINKHSFNNFRLRVGLNHGPVVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVTEET
> Q91WF3^.^76^156^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKILGDCYYCVSGLPLSLPDHAINCVRMGLDMCRAIRKLRVATGVDINMRVGVHSGSVLCGVIGLQKWQYDVWSHDVTLANHMEAGGVPGRVHITGATLALL
> Q91WF3^.^68^148^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.53^0.000e+00^0.000e+00
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTYMAATGLNATSGQDTQQDSERSCSHLGTMVEFAVALGSKLGVINKHSFNNFRLRVGLNHGPVVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVTEETARAL
> Q97FS4^.^8^88^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.47^0.000e+00^0.000e+00
INSIKNGIVIDHIKAGHGIKIYNYLKLGEAEFPTALIMNAISKKNKAKDIIKIENVMDLDLAVLGFLDPNITVNIIEDEKIRQKIQLKLP
> O60503^.^50^130^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.47^0.000e+00^0.000e+00
VSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEETKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGMIKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGKVIERLGQSVVADQLKGLKTYLI
> P26769^.^1^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.47^0.000e+00^0.000e+00
FHNLYVKRHTNVSILYADIVGFTRLASDCSPGELVHMLNELFGKFDQIAKENECMRIKILGDCYYCVSGLPISLPNHAKNCVKMGLDMCEAIKKVRDATGVDINMRVGVHSGNVLCGVIGLQKWQYDVWSHDVTLANHMEAGGVPGRVHISSVTLEHLNGAYKVEEGDGEIRDPYLKQHLVKTYFV
> P98999^.^52^132^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.47^0.000e+00^0.000e+00
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEETKCEKISTLGDCYYCVAGCPEPRPDHAYCCIEMGLGMIEAIDQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEKTARYLD
> Q08462^.^1^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.47^0.000e+00^0.000e+00
FHNLYVKRHTNVSILYADIVGFTRLASDCSPGELVHMLNELFGKFDQIAKENECMRIKILGDCYYCVSGLPISLPNHAKNCVKMGLDMCEAIKKVRDATGVDINMRVGVHSGNVLCGVIGLQKWQYDVWSHDVTLANHMEAGGVPGRVHISSVTLEHLNGAYKVEEGDGDIRDPYLKQHLVKTYFV
> Q99279^.^119^199^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.47^0.000e+00^0.000e+00
NNNRAPKEPTDPVTLIFTDIESSTALWAAHPDLMPDAVAAHHRMVRSLIGRYKCYEVKTVGDSFMIASKSPFAAVQLAQELQLCFLHHDWGTNALDDSYREFEEQRAEGECEYTPPTAHMDPEVYSRLWNGLRVRVGIHTGLCDIIRHDEVTKGYDYYGRTPNMAARTESVANGGQVLMTHAAYMSLSAEDRKQIDVTALGDVALRGVSDPVKMYQLN
> Q9DGG6^.^52^132^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.47^0.000e+00^0.000e+00
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEDTKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGMIKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGKVTERVGQSAVADQLKGLKTYLI
> O02740^.^59^139^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.40^0.000e+00^0.000e+00
DLVTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPKRNGMRHAAEIANMSLDILSSVGTFKMRHMPEVPVRIRIGLHSGPVVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVSHSTVTILRTLGEGYEVE
> O19179^.^57^137^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.40^0.000e+00^0.000e+00
VTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPQRNGQRHAAEIANMALDILSAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVNMSTVRILHALDEGFQTEV
> O95622^.^77^157^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.40^0.000e+00^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVGKTHIKALADFAMKLMDQMKYINEHSFNNFQMKIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQVL
> P19754^.^33^113^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.40^0.000e+00^0.000e+00
FHKIYIQRHDNVSILFADIVGFTGLASQCTAQELVKLLNELFGKFDELATENHCRRIKILGDCYYCVSGLTQPKTDHAHCCVEMGLDMIDTITSVAEATEVDLNMRVGLHTGRVLCGVLGLRKWQYDVWSNDVTLANVMEAAGLPGKVHITKTTLACLNGDYEVEPGHGHERNSFLKTHNIETFFI
> P30803^.^77^157^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.40^0.000e+00^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRVLNEIIADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVGKTHIKALADFAMKLMDQMKYINEHSFNNFQMKIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQVL
> P40137^.^30^110^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.40^0.000e+00^0.000e+00
VTLLFADIRDFTSLSERLRPEQVVTLLNEYYGRMVEVVFRHGGTLDKFIGDALMVYFGAPIADPAHARRGVQCALDMVQELETVNALRSARGEPCLRIGVGVHTGPAVLGNIGSATRRLEYTAIGDTVNLASRIESLTK
> P40144^.^77^157^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.40^0.000e+00^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVGKTHIKALADFAMKLMDQMKYINEHSFNNFQMKIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQVL
> P51839^.^59^139^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.40^0.000e+00^0.000e+00
DQVTIYFSDIVGFTTISALSEPIEVVGFLNDLYTMFDAVLDSHDVYKVETIGDAYMVASGLPRRNGNRHAAEIANMALEILSYAGNFRMRHAPDVPIRVRAGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVSRNTVQALLSLDEGYKIDV

File: SIGSCAN.aln

# DE   Results of signature search
# XX
# TY   SCOP
# XX
# CL   Alpha and beta proteins (a+b)
# XX
# FO   Ferredoxin-like
# XX
# SF   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# FA   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# SI   54894
# XX
P00478    1      VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
P00478    54     ENTFLSEDQVDQLALYAPQATVNRIDNYEVVGKSRPSLP               106
SIGNATURE -      --*---*--*----*-*----*--***---*---*--*-              
P00478    107    .                                                     159
SIGNATURE -      .                                                    
P00478    160    .                                                     212
SIGNATURE -      .                                                    
P00478    213    .                                                     265
SIGNATURE -      .                                                    
# XX
P08421    1      VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
P08421    54     ENTFLTEEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP               106
SIGNATURE -      --*---*--*----*-*----*--***---*---*--*-              
P08421    107    .                                                     159
SIGNATURE -      .                                                    
P08421    160    .                                                     212
SIGNATURE -      .                                                    
P08421    213    .                                                     265
SIGNATURE -      .                                                    
# XX
Q83IL8    1      VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
Q83IL8    54     ENTFLSEEQVDQLALYAPQATVNRIDNYEVVGKSRPSLP               106
SIGNATURE -      --*---*--*----*-*----*--***---*---*--*-              
Q83IL8    107    .                                                     159
SIGNATURE -      .                                                    
Q83IL8    160    .                                                     212
SIGNATURE -      .                                                    
Q83IL8    213    .                                                     265
SIGNATURE -      .                                                    
# XX
Q8Z130    1      VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
Q8Z130    54     ENTFLTDEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP               106


  [Part of this file has been deleted for brevity]

SIGNATURE -      ---------*---------------*---*--*----*-*----*--***---
P19754    107    VGLHTGRVLCGVLGLRKWQYDVWSNDVTLANVMEAAGLPGKVHITKTTLACLN 159
SIGNATURE -      *---*--*---------------------------------------------
P19754    160    GDYEVEPGHGHERNSFLKTHNIETFFI                           212
SIGNATURE -      ---------------------------                          
P19754    213    .                                                     265
SIGNATURE -      .                                                    
# XX
P30803    1      VAVMFASIANFSEFYVELEANNEGVECLRVLNEIIADFDEIISEDRFRQLEKI 53
SIGNATURE -      -----------------------------------------------------
P30803    54     KTIGSTYMAASGLNDSTYDKVGKTHIKALADFAMKLMDQMKYINEHSFNNFQM 106
SIGNATURE -      ------------------------*-*--------------------------
P30803    107    KIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQVL 159
SIGNATURE -      *---------------*---*--*----*-*----*--***---*---*--*-
P30803    160    .                                                     212
SIGNATURE -      .                                                    
P30803    213    .                                                     265
SIGNATURE -      .                                                    
# XX
P40137    1      VTLLFADIRDFTSLSERLRPEQVVTLLNEYYGRMVEVVFRHGGTLDKFIGDAL 53
SIGNATURE -      ------------------------------*-*--------------------
P40137    54     MVYFGAPIADPAHARRGVQCALDMVQELETVNALRSARGEPCLRIGVGVHTGP 106
SIGNATURE -      ------*---------------*---*--*----*-*----*--***---*--
P40137    107    AVLGNIGSATRRLEYTAIGDTVNLASRIESLTK                     159
SIGNATURE -      -*--*----------------------------                    
P40137    160    .                                                     212
SIGNATURE -      .                                                    
P40137    213    .                                                     265
SIGNATURE -      .                                                    
# XX
P40144    1      VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEDRFRQLEKI 53
SIGNATURE -      -----------------------------------------------------
P40144    54     KTIGSTYMAASGLNDSTYDKVGKTHIKALADFAMKLMDQMKYINEHSFNNFQM 106
SIGNATURE -      ------------------------*-*--------------------------
P40144    107    KIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQVL 159
SIGNATURE -      *---------------*---*--*----*-*----*--***---*---*--*-
P40144    160    .                                                     212
SIGNATURE -      .                                                    
P40144    213    .                                                     265
SIGNATURE -      .                                                    
# XX
P51839    1      DQVTIYFSDIVGFTTISALSEPIEVVGFLNDLYTMFDAVLDSHDVYKVETIGD 53
SIGNATURE -      -----------------------------------------------------
P51839    54     AYMVASGLPRRNGNRHAAEIANMALEILSYAGNFRMRHAPDVPIRVRAGLHSG 106
SIGNATURE -      ------*-*--------------------------*---------------*-
P51839    107    PCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVSRNTVQALLSLDEGY 159
SIGNATURE -      --*--*----*-*----*--***---*---*--*-------------------
P51839    160    KIDV                                                  212
SIGNATURE -      ----                                                 
P51839    213    .                                                     265
SIGNATURE -      .                                                    




5.0 DATA FILES

SIGSCAN requires a residue substitution matrix.


6.0 USAGE

   Standard (Mandatory) qualifiers:
  [-siginfile]         infile     This option specifies the name of the
                                  signature file (input). A 'signature file'
                                  contains a sparse sequence signature
                                  suitable for use with the SIGSCAN and
                                  SIGSCANLIG programs. The files are generated
                                  by using SIGGEN and SIGGENLIG.
  [-dbsequence]        seqall     This option specifies the name of the
                                  database to search.
   -sub                matrixf    [EBLOSUM62] This option specifies the
                                  residue substitution matrix.
   -gapo               float      [10.0 for any sequence] This option
                                  specifies the gap insertion penalty. The gap
                                  insertion penalty is the score taken away
                                  when a gap is created. The best value
                                  depends on the choice of comparison matrix.
                                  The default value assumes you are using the
                                  EBLOSUM62 matrix for protein sequences, and
                                  the EDNAMAT matrix for nucleotide sequences.
                                  (Floating point number from 1.0 to 100.0)
   -gape               float      [0.5 for any sequence] This option specifies
                                  the gap extension penalty. The gap
                                  extension penalty is added to the standard
                                  gap penalty for each base or residue in the
                                  gap. This is how long gaps are penalized.
                                  Usually you will expect a few long gaps
                                  rather than many short gaps, so the gap
                                  extension penalty should be lower than the
                                  gap penalty. (Floating point number from 0.0
                                  to 10.0)
   -nterm              menu       [1] This option specifies the N-terminal
                                  matching option. This determines how the
                                  first signature position is aligned to a
                                  sequence from the database. (Values: 1
                                  (Align anywhere and allow only complete
                                  signature-sequence fit); 2 (Align anywhere
                                  and allow partial signature-sequence fit); 3
                                  (Use empirical gaps only))
   -nhits              integer    [100] This option specifies the maximum
                                  number of hits to output. (Any integer
                                  value)
  [-hitsfile]          outfile    [SIGSCAN.dhf] This option specifies the name
                                  of the DHF file (domain hits file)
                                  (output). A 'domain hits file' contains
                                  database hits (sequences) with domain
                                  classification information, in the DHF
                                  format (FASTA-like). The hits are relatives
                                  to a SCOP or CATH family (or other node in
                                  the structural hierarchies) and are found
                                  from a search of a sequence database, in
                                  this case, by using SIGSCAN. Files
                                  containing hits retrieved by PSIBLAST are
                                  generated by using SEQSEARCH or various
                                  types of HMM and profile by using LIBSCAN.
  [-alignfile]         outfile    [SIGSCAN.aln] This option specifies the name
                                  of the SAF (signature alignment file)
                                  (output).A 'signature alignment file'
                                  contains one or more signature-sequence
                                  alignments. The file is in DAF format
                                  (CLUSTAL-like) and is annotated with
                                  bibliographic information, either the domain
                                  family classification (for SIGSCAN output)
                                  or ligand classification (for SIGSCANLIG
                                  output). The files generated by SIGSCAN will
                                  contain a signature-sequence alignment for
                                  a single signature against a library of one
                                  or more sequences. The files generated by
                                  using SIGSCANLIG will contain a
                                  signature-sequence alignment for a single
                                  query sequence against a library of one or
                                  more signatures.

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-dbsequence" associated qualifiers
   -sbegin2            integer    Start of each sequence to be used
   -send2              integer    End of each sequence to be used
   -sreverse2          boolean    Reverse (if DNA)
   -sask2              boolean    Ask for begin/end/reverse
   -snucleotide2       boolean    Sequence is nucleotide
   -sprotein2          boolean    Sequence is protein
   -slower2            boolean    Make lower case
   -supper2            boolean    Make upper case
   -sformat2           string     Input sequence format
   -sdbname2           string     Database name
   -sid2               string     Entryname
   -ufo2               string     UFO features
   -fformat2           string     Features format
   -fopenfile2         string     Features file name

   "-hitsfile" associated qualifiers
   -odirectory3        string     Output directory

   "-alignfile" associated qualifiers
   -odirectory4        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

6.1 COMMAND LINE ARGUMENTS

Standard (Mandatory) qualifiers Allowed values Default
[-siginfile]
(Parameter 1)
This option specifies the name of the signature file (input). A 'signature file' contains a sparse sequence signature suitable for use with the SIGSCAN and SIGSCANLIG programs. The files are generated by using SIGGEN and SIGGENLIG. Input file Required
[-dbsequence]
(Parameter 2)
This option specifies the name of the database to search. Readable sequence(s) Required
-sub This option specifies the residue substitution matrix. Comparison matrix file in EMBOSS data path EBLOSUM62
-gapo This option specifies the gap insertion penalty. The gap insertion penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAMAT matrix for nucleotide sequences. Floating point number from 1.0 to 100.0 10.0 for any sequence
-gape This option specifies the gap extension penalty. The gap extension penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. Floating point number from 0.0 to 10.0 0.5 for any sequence
-nterm This option specifies the N-terminal matching option. This determines how the first signature position is aligned to a sequence from the database.
1 (Align anywhere and allow only complete signature-sequence fit)
2 (Align anywhere and allow partial signature-sequence fit)
3 (Use empirical gaps only)
1
-nhits This option specifies the maximum number of hits to output. Any integer value 100
[-hitsfile]
(Parameter 3)
This option specifies the name of the DHF file (domain hits file) (output). A 'domain hits file' contains database hits (sequences) with domain classification information, in the DHF format (FASTA-like). The hits are relatives to a SCOP or CATH family (or other node in the structural hierarchies) and are found from a search of a sequence database, in this case, by using SIGSCAN. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH or various types of HMM and profile by using LIBSCAN. Output file SIGSCAN.dhf
[-alignfile]
(Parameter 4)
This option specifies the name of the SAF (signature alignment file) (output).A 'signature alignment file' contains one or more signature-sequence alignments. The file is in DAF format (CLUSTAL-like) and is annotated with bibliographic information, either the domain family classification (for SIGSCAN output) or ligand classification (for SIGSCANLIG output). The files generated by SIGSCAN will contain a signature-sequence alignment for a single signature against a library of one or more sequences. The files generated by using SIGSCANLIG will contain a signature-sequence alignment for a single query sequence against a library of one or more signatures. Output file SIGSCAN.aln
Additional (Optional) qualifiers Allowed values Default
(none)
Advanced (Unprompted) qualifiers Allowed values Default
(none)

6.2 EXAMPLE SESSION

An example of interactive use of sigscan is shown below. Here is a sample session with sigscan


% sigscan 
Generates hits (DHF file) from a signature search
Domainatrix signature file: ../siggen-keep/54894.sig
Name of database to search.: swsmall
Residue substitution matrix [EBLOSUM62]: 
Gap insertion penalty [10]: 
Gap extension penalty [0.5]: 
N-terminal matching options
         1 : Align anywhere and allow only complete signature-sequence fit
         2 : Align anywhere and allow partial signature-sequence fit
         3 : Use empirical gaps only
Select number [1]: 
Max. number of hits to output [100]: 
Domain hits output file [SIGSCAN.dhf]: 
Domainatrix signature alignment output file [SIGSCAN.aln]: 

Signature file read ok
Signature compiled ok
Signature aligned to db ok
Hits file written ok
Alignments file written ok

Go to the input files for this example
Go to the output files for this example




7.0 KNOWN BUGS & WARNINGS

None.


8.0 NOTES

SIGSCAN does not generate p-values or E-values. DHF files of hits for which p-values or E-values are calculated may be generated by using LIBSCAN . LIBSCAN provides searches for sparse protein signatures as well as various types of hidden Markov models and other profiles.

In the case a signature file is generated by hand, it is essential that the gap data given is listed in order of increasing gap size (see SIGGEN documentation ).

8.1 GLOSSARY OF FILE TYPES

FILE TYPE FORMAT DESCRIPTION CREATED BY SEE ALSO
Domain hits file DHF format (FASTA-like). Database hits (sequences) with domain classification information. The hits are relatives to a SCOP or CATH family (or other node in the structural hierarchies) and are found from a search of a discriminating element (e.g. a protein signature, hidden Markov model, simple frequency matrix, Gribskov profile or Hennikoff profile) against a sequence database. SEQSEARCH (hits retrieved by PSIBLAST). SIGSCAN (hits retrieved by sparse protein signature). LIBSCAN (hits retrieved by various types of HMM and profile). N.A.
Domain alignment file DAF format (CLUSTAL-like). Sequence alignment of domains belonging to the same SCOP or CATH family (or other node in the structural hierarchies). The file is annotated with domain family classification information. DOMAINALIGN (structure-based sequence alignment of domains of known structure). DOMAINALIGN alignments can be extended with sequence relatives (of unknown structure) to the family in question by using SEQALIGN.
Hits file Text file of classified hits A list of hits (e.g. from a prediction method) that are classified and rank-ordered on the basis of score, p-value, E-value etc. ROCON and LIBSCAN (hits from searches of a discriminating element (hidden Markov model, profile or signature) against a sequence database). ROCPLOT is run on the files to perform Receiver Operator Characteristic (ROC) analysis on the hits.
Signature file SIG format Contains a sparse sequence signature suitable for use with the SIGSCAN program. Contains a sparse sequence signature. SIGGEN, SIGGENLIG, LIBGEN The files are generated by using SIGGEN.
None


9.0 DESCRIPTION

See Blades et al., Ison et al. and Daniel et al. for a description of protein signatures and their application.


10.0 ALGORITHM

The algorithm is based on approach first described in Daniel et al (1999) that was applied to the definition of protein families (Ison et al, 2000) and later to automatically-generated signatures (Blades et al, 2005).


11.0 RELATED APPLICATIONS

See also

Program name Description
contacts Generate intra-chain CON files from CCF files
domainalign Generate alignments (DAF file) for nodes in a DCF file
domainrep Reorder DCF file to identify representative structures
domainreso Remove low resolution domains from a DCF file
interface Generate inter-chain CON files from CCF files
libgen Generate discriminating elements from alignments
matgen3d Generate a 3D-1D scoring matrix from CCF files
psiphi Calculates phi and psi torsion angles from protein coordinates
rocon Generates a hits file from comparing two DHF files
rocplot Performs ROC analysis on hits files
seqalign Extend alignments (DAF file) with sequences (DHF file)
seqfraggle Removes fragment sequences from DHF files
seqsearch Generate PSI-BLAST hits (DHF file) from a DAF file
seqsort Remove ambiguous classified sequences from DHF files
seqwords Generates DHF files from keyword search of UniProt
siggen Generates a sparse protein signature from an alignment
siggenlig Generates ligand-binding signatures from a CON file
sigscanlig Searches ligand-signature library & writes hits (LHF file)



12.0 DIAGNOSTIC ERROR MESSAGES

None.


13.0 AUTHORS

Jon Ison (jison@ebi.ac.uk)
The European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge CB10 1SD UK


14.0 REFERENCES

Please cite the authors and EMBOSS.

Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European Molecular Biology Open Software Suite" Trends in Genetics, 15:276-278.

See also http://emboss.sourceforge.net/ Automatic generation and evaluation of sparse protein signatures for families of protein structural domains. MJ Blades, JC Ison, R Ranasinghe, and JBC Findlay. Protein Science. 2005 (accepted)

A key residues approach to the definition of protein families and analysis of sparse family signatures. JC Ison, AJ Bleasby, MJ Blades, SC Daniel, JH Parish, JBC Findlay. PROTEINS: Structure, Function & Genetics. 2000, 40:330-341

Alignment of a sparse protein signature with protein sequences: application to fold prediction for three small globulins. SC Daniel, JH Parish, JC Ison, MJ Blades & JBC Findlay. FEBS Letters. 1999, 459:349-352.

14.1 Other useful references