|
|
emira |
Please help by correcting and extending the Wiki pages.
% emira -setparam fasta -project cjejuni_demo -genome accurate -mxti -rns tigr -orh
MIRA fragment assembly program
This is MIRA V2.8.3 (production version).
Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence
Assembly Using Trace Signals and Additional Sequence Information.
Computer Science and Biology: Proceedings of the German Conference on
Bioinformatics (GCB) 99, pp. 45-56.
Mail questions, bug reports, ideas or suggestions to:
bach@chevreux.org
Compiled in boundtracking mode.
Compiled in bugtracking mode.
Parsing parameters: -genomeaccurate -fasta -GE:project=cjejuni_demo -GE:mxti=yes -OUT:orh=yes -GE:rns=tigr
Using quickmode switch -genomeaccurate :
-GE:uti=yes
-AS:mrl=40:nop=4:sep=yes:rbl=4:sd=yes:sdlpo=yes:ugpf=yes
-DP:ure=yes:rewl=30:rewme=2:feip=0;leip=0:tpae=no
-CL:pvc=yes:pvcmla=18:qc=no:mbc=no:emlc=yes:mlcr=25:smlc=30
-SK:bph=16:hss=4:pr=45:mhpr=200
-AL:bip=20:bmin=25:bmax=130:mo=15:ms=30:mrs=65:egp=yes:egpl=low
-CO:rodirs=25:mr=yes:asir=no:mrpg=2:emea=25
amgb=yes:amgbemc=yes:amgbnbs=yes
-ED:ace=no
Using quickmode switch fasta : -GE:lj=fasta
Parameters parsed without error, perfect.
Used parameter settings:
General (-GE):
Project name (pro) : cjejuni_demo
Load job (lj) : FASTA file (fasta)
Filecheck only (fo) : No
External quality (eq) : from SCF (scf)
Ext. qual. override (eqo) : No
Discard reads on e.q. error (droeqe): No
Read naming scheme (rns) : TIGR (tigr)
Merge with XML trace info (mxti) : Yes
Use template information (uti) : Yes
EST-assembly start step (ess) : 1
Assembly options (-AS):
Minimum read length (mrl) : 40
Number of passes (nop) : 4
Skim each pass (sep) : Yes
Maximum number of RMB break loops (rbl) : 4
Spoiler detection (sd) : Yes
Last pass only (sdlpo) : Yes
Base default quality (bdq) : Yes
Use genomic pathfinder (ugpf) : Yes
Use emergency search stop (uess) : Yes
ESS partner depth (esspd) : 500
Use emergency blacklist (uebl) : Yes
Use max. contig build time (umcbt) : No
Build time in seconds (bts) : 10000
Strain and backbone options (-SB):
Load straindata (lsd) : No
Load backbone (lb) : No
Start backbone usage in pass (sbuip): 3
Backbone strain name (bsn) : (none)
Backbone file type (bft) : FASTA file (fasta)
Backbone rail length (brl) : 2500
Backbone base quality (bbq) : 0
Also build new contigs (abnc) : Yes
Dataprocessing options (-DP):
Use read extensions (ure) : Yes
Read extension window length (rewl) : 30
Read extension w. maxerrors (rewme) : 2
First extension in pass (feip) : 0
Last extension in pass (leip) : 0
Tag poly A/T at ends (tpae) : No
Polybase window length (pbwl) : 7
Polybase window maxerrors (pbwme) : 2
Polyb. window grace distance (pbwgc): 9
Clipping options (-CL):
Possible vector leftover clip (pvc) : Yes
maximum len allowed (pvcmla) : 18
Quality clip (qc) : No
Minimum quality (qcmq) : 20
Window length (qcwl) : 30
Masked bases clip (mbc) : No
Gap size (mbcgs) : 20
Max front gap (mbcmfg) : 40
Max end gap (mbcmeg) : 60
Ensure minimum left clip (emlc) : Yes
Minimum left clip req. (mlcr) : 25
Set minimum left clip to (smlc) : 30
Parameters for SKIM algorithm (-SK):
Bases per hash (bph) : 16
Hash save stepping (hss) : 4
Percent required (pr) : 45
Maximum hashes in memory (mhim) : 15000000
Max hits per read (mhpr) : 200
Align parameters for Smith-Waterman align (-AL):
Bandwidth in percent (bip) : 20
Bandwidth max (bmax) : 130
Bandwidth min (bmin) : 25
Minimum score (ms) : 30
Minimum overlap (mo) : 15
Minimum relative score in % (mrs) : 65
Extra gap penalty (egp) : Yes
extra gap penalty level (egpl) : low
Max. egp in percent (megpp) : 100
Contig parameters (-CO):
Name prefix (np) : cjejuni_demo
Error analysis (an) : SCF signal (signal)
Reject on drop in relative alignment score (%) : 25
Max. error rate in dangerous zones in % (dmer) : 1
Mark repeats (mr) : Yes
Assume SNP instead of repeats (asir) : No
Minimum reads per group needed for tagging (mrpg) : 2
Minimum neighbour quality needed for tagging (mnq) : 20
Minimum Group Quality needed for RMB Tagging (mgqrt) : 30
End-read Marking Exclusion Area in bases (emea) : 25
Also mark gap bases (amgb) : Yes
Also mark gap bases - even multicolumn (amgbemc) : Yes
Also mark gap bases - need both strands (amgbnbs): Yes
Default template insert size minimum (dismin) : 500
Default template insert size maximum (dismax) : 5000
Edit options (-ED):
Automatic contig editing (ace) : No
Strict editing mode (sem) : No
Confirmation threshold in percent (ct): 50
Directories (-DI):
When loading EXP files:
When loading SCF files:
For writing log files : cjejuni_demo_log
For writing gap4 DA res.: cjejuni_demo_out
Input files (-FI):
When loading EXP fofn : cjejuni_demo_in.fofn
When loading project from PHD : cjejuni_demo_in.phd.1
When loading project from CAF : cjejuni_demo_in.caf
When loading sequences from FASTA : cjejuni_demo_in.fasta
When loading qualities from FASTA quality: cjejuni_demo_in.fasta.qual
When loading straindata : cjejuni_demo_straindata_in.txt
When loading XML trace info files : cjejuni_demo_traceinfo_in.xml
When loading backbone from CAF : cjejuni_demo_backbone_in.caf
When loading backbone from GenBank : cjejuni_demo_backbone_in.gbf
When loading backbone from FASTA : cjejuni_demo_backbone_in.fasta
Output files (-OUTPUT/-OUT):
Result files:
Saved as CAF (orc): Yes
Saved as FASTA (orf): Yes
Saved as GAP4 (directed assembly) (org): Yes
Saved as phrap ACE (ora): Yes
Saved as HTML (orh): Yes
Saved as Transposed Contig Summary (ors): Yes
Saved as simple text format (ort): Yes
Temporary result files:
Saved as CAF (otc): No
Saved as FASTA (otf): No
Saved as GAP4 (directed assembly) (otg): No
Saved as phrap ACE (ota): No
Saved as HTML (oth): No
Saved as Transposed Contig Summary(ots): No
Saved as simple text format (ott): No
Extended temporary result files:
Saved as CAF (oetc): No
Saved as FASTA (oetf): No
Saved as GAP4 (directed assembly) (oetg): No
Saved as phrap ACE (oeta): No
Saved as HTML (oeth): No
Save also singlets (oetas): No
Alignment output customisation:
TEXT characters per line (tcpl): 60
HTML characters per line (hcpl): 60
TEXT characters per line (tegfc): ' '
HTML characters per line (hegfc): ' '
File / directory names:
CAF : cjejuni_demo_out.caf
FASTA : cjejuni_demo_out.unpadded.fasta
FASTA quality : cjejuni_demo_out.unpadded.fasta.qual
FASTA (padded) : cjejuni_demo_out.padded.fasta
FASTA qual.(pad): cjejuni_demo_out.padded.fasta.qual
GAP4 (directory): cjejuni_demo_out.gap4da
ACE : cjejuni_demo_out.ace
HTML : cjejuni_demo_out.html
Simple text : cjejuni_demo_out.txt
TCS overview : cjejuni_demo_out.tcs
Creating directory cjejuni_demo_log ... done.
Creating directory cjejuni_demo_results ... done.
Creating directory cjejuni_demo_info ... done.
Localtime: Fri Jan 15 12:00:00 2010
Loading data normal (probably Sanger type) from FASTA file cjejuni_demo_in.fasta
Counting sequences in FASTA file:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Loading sequence data from FASTA file:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Loading quality data from FASTA quality file:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Done.
There haven been 544 reads given, 544 of which have quality accounted for.
Localtime: Fri Jan 15 12:00:00 2010
Checking SCF files (loading qualities only if needed):
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Done.
0 SCF files loaded ok.
544 SCF files were not found (see 'cjejuni_demo_log/cjejuni_demo_info_scfreadfail.0' for a list of names).
Localtime: Fri Jan 15 12:00:00 2010
Merging data from XML trace info file cjejuni_demo_traceinfo_in.xml ...Num reads: 496
Building hash table ... done.
Localtime: Fri Jan 15 12:00:00 2010
Generated 288 unique template ids for 544 valid reads.
Done merging XML data, matched 496 reads.
Localtime: Fri Jan 15 12:00:00 2010
Checking SCF files (loading qualities only if needed):
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Done.
0 SCF files loaded ok.
544 SCF files were not found (see 'cjejuni_demo_log/cjejuni_demo_info_scfreadfail.0' for a list of names).
Starting minimum left vector clip ... done.
Pool has 544 reads .
Checking reads for trace data:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
No SCF data present in any read, automatic contig editing is now switched off.
544 reads with valid data for assembly.
For the reads that are neither backbones nor rails:
- 0 reads have not enough good bases for assembly.
- 544 reads used for assembly.
- 0 reads have no real quality (see miralog.noqualities).
- mean length of good parts of used reads: 626
Localtime: Fri Jan 15 12:00:00 2010
Generated 288 unique template ids for 544 valid reads.
Localtime: Fri Jan 15 12:00:00 2010
Generated 0 unique strain ids for 544 reads.
Localtime: Fri Jan 15 12:00:00 2010
Searching for possible overlaps:
Localtime: Fri Jan 15 12:00:00 2010
We will get 1 partitions.
Progressend: 1088
Now running partitioned skimmer with 1 partitions:
Working on partition 1/1
Will contain read IDs 0 to 543
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Total megahubs: 0
Skim summary:
accepted: 4243
possible: 4607
permbans: 0
Hits chosen: 4243
Localtime: Fri Jan 15 12:00:00 2010
Pre-assembly alignment search for read extension and / or vector clipping:
Making alignments.
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible forward matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible complement matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Calculating possible vector leftovers ... done.
Loading confirmed overlaps from disk (will need approximately 1.2 M.):
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Sorting confirmed overlaps (this may take a while) ... done.
Generating clusters:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Pre-assembly read extension:
Localtime: Fri Jan 15 12:00:00 2010
Searching possible read extensions:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Changed length of 258 sequences.
Mean length gained in these sequences: 73.2713 bases.
Pre-assembly vector clipping
Performing vector clipping ... done.
Pool has 544 reads .
Checking reads for trace data:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
No SCF data present in any read, automatic contig editing is now switched off.
544 reads with valid data for assembly.
For the reads that are neither backbones nor rails:
- 0 reads have not enough good bases for assembly.
- 544 reads used for assembly.
- 0 reads have no real quality (see miralog.noqualities).
- mean length of good parts of used reads: 660
Localtime: Fri Jan 15 12:00:00 2010
Generated 288 unique template ids for 544 valid reads.
Localtime: Fri Jan 15 12:00:00 2010
Generated 0 unique strain ids for 544 reads.
Localtime: Fri Jan 15 12:00:00 2010
Searching for possible overlaps:
Localtime: Fri Jan 15 12:00:00 2010
We will get 1 partitions.
Progressend: 1088
Now running partitioned skimmer with 1 partitions:
Working on partition 1/1
Will contain read IDs 0 to 543
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Total megahubs: 0
Skim summary:
accepted: 4512
possible: 4913
permbans: 0
Hits chosen: 4512
Localtime: Fri Jan 15 12:00:00 2010
Pass: 1
Making alignments.
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible forward matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible complement matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Calculating possible vector leftovers ... done.
Loading confirmed overlaps from disk (will need approximately 1.3 M.):
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Sorting confirmed overlaps (this may take a while) ... done.
Generating clusters:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 1
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 544
+[1] t+t++++a+aaaaar
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 1
Contig length: 2467
Avg. contig coverage: 2.36
Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0
IUPAC: 7 Funny: 0 *: 20
Num reads: 7
Avg. read length: 833
Reads contain 5780 bases, 0 Ns and 55 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 2
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 537
+[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[61] ++++++++++++++++++++++++++++++++++++++++++++++++++++++t+++++
[120] ++++++++++++++++++++++++++++++++++++++++++a+++a+++++++++++++
[178] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[238] ++++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++
[296] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[356] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[416] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[476] +++++a++++++++a+a++++++++++++++++a++++++++++++++++++++
RL1
[526] aaaThat's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 2
Contig length: 40028
Avg. contig coverage: 8.66
Consensus contains: A: 13590 C: 5845 G: 6941 T: 13404 N: 0
IUPAC: 24 Funny: 0 *: 224
Num reads: 526
Avg. read length: 659
Reads contain 343983 bases, 0 Ns and 2661 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Marking possibly misassembled repeats ...done. Found
- 1 Strong RMB
- 3 Weak RMB
- 0 SNP
positions tagged.Transfering contig RMB permanent pair bans.
Transfering tags to readpool.
The previously assembled contig had grave misassemblies, rebuilding contig 2 now.
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 537
+[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[61] ++++++++++++++++++++++++++++++++++++++++++++++++++++++t+++++
[120] ++++++++++++++++++++++++++++++++++++++++++a+++a+++++++++++++
[178] +++++++++++++++++++++++++++++++++++p+++p++++++++++++++++++++
[236] +++++++++a+++++a++++++++++++++++++++++++++++++++++++++++++++
[294] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[354] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[414] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[474] +++++++++a++++p+a+p+++++++++a+++++a+++++++++++++++++++++
RL1
[524] aaapThat's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 2
Contig length: 40021
Avg. contig coverage: 8.62
Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0
IUPAC: 14 Funny: 0 *: 217
Num reads: 524
Avg. read length: 658
Reads contain 342555 bases, 0 Ns and 2577 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Marking possibly misassembled repeats ...done. Found
- 0 Strong RMB
- 3 Weak RMB
- 0 SNP
positions tagged.Transfering contig RMB permanent pair bans.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 3
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 13
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 3
Contig length: 805
Avg. contig coverage: 1
Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 805
Reads contain 805 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 4
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 12
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 4
Contig length: 788
Avg. contig coverage: 1
Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 788
Reads contain 788 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 5
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 11
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 5
Contig length: 788
Avg. contig coverage: 1
Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 788
Reads contain 788 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 6
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 10
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 6
Contig length: 786
Avg. contig coverage: 1
Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 786
Reads contain 786 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 7
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 9
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 7
Contig length: 865
Avg. contig coverage: 1
Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 865
Reads contain 865 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 8
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 8
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 8
Contig length: 963
Avg. contig coverage: 1
Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 963
Reads contain 963 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 9
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 7
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 9
Contig length: 1052
Avg. contig coverage: 1
Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 1052
Reads contain 1052 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 10
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 6
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 10
Contig length: 563
Avg. contig coverage: 1
Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 563
Reads contain 563 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 11
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 5
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 11
Contig length: 893
Avg. contig coverage: 1
Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 893
Reads contain 893 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 12
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 4
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 12
Contig length: 478
Avg. contig coverage: 1
Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 478
Reads contain 478 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 13
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 3
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 13
Contig length: 869
Avg. contig coverage: 1
Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 869
Reads contain 869 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 14
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 2
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 14
Contig length: 973
Avg. contig coverage: 1
Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 973
Reads contain 973 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 15
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 1
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 15
Contig length: 972
Avg. contig coverage: 1
Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 972
Reads contain 972 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.1.txt
Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.1.txt
Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.1.txt
Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.1.txt
Pass: 2
Performing vector clipping ... done.
Pool has 544 reads .
Checking reads for trace data:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
No SCF data present in any read, automatic contig editing is now switched off.
544 reads with valid data for assembly.
For the reads that are neither backbones nor rails:
- 0 reads have not enough good bases for assembly.
- 544 reads used for assembly.
- 0 reads have no real quality (see miralog.noqualities).
- mean length of good parts of used reads: 660
Localtime: Fri Jan 15 12:00:00 2010
Generated 288 unique template ids for 544 valid reads.
Localtime: Fri Jan 15 12:00:00 2010
Generated 0 unique strain ids for 544 reads.
Localtime: Fri Jan 15 12:00:00 2010
Searching for possible overlaps:
Localtime: Fri Jan 15 12:00:00 2010
We will get 1 partitions.
Progressend: 1088
Now running partitioned skimmer with 1 partitions:
Working on partition 1/1
Will contain read IDs 0 to 543
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Total megahubs: 0
Skim summary:
accepted: 4512
possible: 4913
permbans: 0
Hits chosen: 4512
Localtime: Fri Jan 15 12:00:00 2010
Making alignments.
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible forward matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible complement matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Calculating possible vector leftovers ... done.
Loading confirmed overlaps from disk (will need approximately 1.3 M.):
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Sorting confirmed overlaps (this may take a while) ... done.
Generating clusters:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 1
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 544
+[1] t+t++++a+aaaaar
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 1
Contig length: 2467
Avg. contig coverage: 2.36
Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0
IUPAC: 7 Funny: 0 *: 20
Num reads: 7
Avg. read length: 833
Reads contain 5780 bases, 0 Ns and 55 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 2
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 537
+[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++
[120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++
[176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++
[234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++
[292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++
RL1
[524] aapaThat's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 2
Contig length: 40021
Avg. contig coverage: 8.62
Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0
IUPAC: 14 Funny: 0 *: 217
Num reads: 524
Avg. read length: 658
Reads contain 342548 bases, 0 Ns and 2577 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Marking possibly misassembled repeats ...done. Found
- 0 Strong RMB
- 3 Weak RMB
- 0 SNP
positions tagged.Transfering contig RMB permanent pair bans.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 3
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 13
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 3
Contig length: 805
Avg. contig coverage: 1
Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 805
Reads contain 805 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 4
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 12
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 4
Contig length: 788
Avg. contig coverage: 1
Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 788
Reads contain 788 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 5
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 11
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 5
Contig length: 788
Avg. contig coverage: 1
Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 788
Reads contain 788 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 6
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 10
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 6
Contig length: 786
Avg. contig coverage: 1
Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 786
Reads contain 786 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 7
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 9
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 7
Contig length: 865
Avg. contig coverage: 1
Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 865
Reads contain 865 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 8
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 8
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 8
Contig length: 963
Avg. contig coverage: 1
Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 963
Reads contain 963 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 9
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 7
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 9
Contig length: 1052
Avg. contig coverage: 1
Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 1052
Reads contain 1052 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 10
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 6
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 10
Contig length: 563
Avg. contig coverage: 1
Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 563
Reads contain 563 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 11
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 5
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 11
Contig length: 893
Avg. contig coverage: 1
Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 893
Reads contain 893 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 12
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 4
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 12
Contig length: 478
Avg. contig coverage: 1
Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 478
Reads contain 478 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 13
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 3
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 13
Contig length: 869
Avg. contig coverage: 1
Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 869
Reads contain 869 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 14
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 2
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 14
Contig length: 973
Avg. contig coverage: 1
Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 973
Reads contain 973 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 15
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 1
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 15
Contig length: 972
Avg. contig coverage: 1
Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 972
Reads contain 972 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.2.txt
Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.2.txt
Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.2.txt
Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.2.txt
Pass: 3
Performing vector clipping ... done.
Pool has 544 reads .
Checking reads for trace data:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
No SCF data present in any read, automatic contig editing is now switched off.
544 reads with valid data for assembly.
For the reads that are neither backbones nor rails:
- 0 reads have not enough good bases for assembly.
- 544 reads used for assembly.
- 0 reads have no real quality (see miralog.noqualities).
- mean length of good parts of used reads: 660
Localtime: Fri Jan 15 12:00:00 2010
Generated 288 unique template ids for 544 valid reads.
Localtime: Fri Jan 15 12:00:00 2010
Generated 0 unique strain ids for 544 reads.
Localtime: Fri Jan 15 12:00:00 2010
Searching for possible overlaps:
Localtime: Fri Jan 15 12:00:00 2010
We will get 1 partitions.
Progressend: 1088
Now running partitioned skimmer with 1 partitions:
Working on partition 1/1
Will contain read IDs 0 to 543
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Total megahubs: 0
Skim summary:
accepted: 4498
possible: 4913
permbans: 14
Hits chosen: 4498
Localtime: Fri Jan 15 12:00:00 2010
Making alignments.
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible forward matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible complement matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Calculating possible vector leftovers ... done.
Loading confirmed overlaps from disk (will need approximately 1.3 M.):
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Sorting confirmed overlaps (this may take a while) ... done.
Generating clusters:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 1
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 544
+[1] t+t++++a+aaaaar
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 1
Contig length: 2467
Avg. contig coverage: 2.36
Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0
IUPAC: 7 Funny: 0 *: 20
Num reads: 7
Avg. read length: 833
Reads contain 5780 bases, 0 Ns and 55 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 2
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 537
+[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++
[120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++
[176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++
[234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++
[292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++
RL1
[524] aapaThat's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 2
Contig length: 40021
Avg. contig coverage: 8.62
Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0
IUPAC: 14 Funny: 0 *: 217
Num reads: 524
Avg. read length: 658
Reads contain 342548 bases, 0 Ns and 2577 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Marking possibly misassembled repeats ...done. Found
- 0 Strong RMB
- 3 Weak RMB
- 0 SNP
positions tagged.Transfering contig RMB permanent pair bans.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 3
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 13
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 3
Contig length: 805
Avg. contig coverage: 1
Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 805
Reads contain 805 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 4
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 12
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 4
Contig length: 788
Avg. contig coverage: 1
Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 788
Reads contain 788 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 5
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 11
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 5
Contig length: 788
Avg. contig coverage: 1
Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 788
Reads contain 788 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 6
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 10
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 6
Contig length: 786
Avg. contig coverage: 1
Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 786
Reads contain 786 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 7
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 9
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 7
Contig length: 865
Avg. contig coverage: 1
Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 865
Reads contain 865 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 8
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 8
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 8
Contig length: 963
Avg. contig coverage: 1
Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 963
Reads contain 963 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 9
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 7
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 9
Contig length: 1052
Avg. contig coverage: 1
Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 1052
Reads contain 1052 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 10
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 6
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 10
Contig length: 563
Avg. contig coverage: 1
Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 563
Reads contain 563 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 11
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 5
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 11
Contig length: 893
Avg. contig coverage: 1
Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 893
Reads contain 893 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 12
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 4
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 12
Contig length: 478
Avg. contig coverage: 1
Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 478
Reads contain 478 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 13
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 3
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 13
Contig length: 869
Avg. contig coverage: 1
Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 869
Reads contain 869 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 14
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 2
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 14
Contig length: 973
Avg. contig coverage: 1
Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 973
Reads contain 973 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 15
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 1
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 15
Contig length: 972
Avg. contig coverage: 1
Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 972
Reads contain 972 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.3.txt
Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.3.txt
Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.3.txt
Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.3.txt
Localtime: Fri Jan 15 12:00:00 2010
Hunting contig join spoiler ... done.
Localtime: Fri Jan 15 12:00:00 2010
Pass: 4
Performing vector clipping ... done.
Pool has 544 reads .
Checking reads for trace data:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
No SCF data present in any read, automatic contig editing is now switched off.
544 reads with valid data for assembly.
For the reads that are neither backbones nor rails:
- 0 reads have not enough good bases for assembly.
- 544 reads used for assembly.
- 0 reads have no real quality (see miralog.noqualities).
- mean length of good parts of used reads: 660
Localtime: Fri Jan 15 12:00:00 2010
Generated 288 unique template ids for 544 valid reads.
Localtime: Fri Jan 15 12:00:00 2010
Generated 0 unique strain ids for 544 reads.
Localtime: Fri Jan 15 12:00:00 2010
Searching for possible overlaps:
Localtime: Fri Jan 15 12:00:00 2010
We will get 1 partitions.
Progressend: 1088
Now running partitioned skimmer with 1 partitions:
Working on partition 1/1
Will contain read IDs 0 to 543
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Total megahubs: 0
Skim summary:
accepted: 4498
possible: 4913
permbans: 14
Hits chosen: 4498
Localtime: Fri Jan 15 12:00:00 2010
Making alignments.
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible forward matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Aligning possible complement matches:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Calculating possible vector leftovers ... done.
Loading confirmed overlaps from disk (will need approximately 1.3 M.):
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Sorting confirmed overlaps (this may take a while) ... done.
Generating clusters:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 1
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 544
+[1] t+t++++a+aaaaar
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 1
Contig length: 2467
Avg. contig coverage: 2.36
Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0
IUPAC: 7 Funny: 0 *: 20
Num reads: 7
Avg. read length: 833
Reads contain 5780 bases, 0 Ns and 55 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 2
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 537
+[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++
[120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++
[176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++
[234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++
[292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++
RL1
[524] aapaThat's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 2
Contig length: 40021
Avg. contig coverage: 8.62
Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0
IUPAC: 14 Funny: 0 *: 217
Num reads: 524
Avg. read length: 658
Reads contain 342548 bases, 0 Ns and 2577 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Marking possibly misassembled repeats ...done. Found
- 0 Strong RMB
- 3 Weak RMB
- 0 SNP
positions tagged.Transfering contig RMB permanent pair bans.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 3
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 13
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 3
Contig length: 805
Avg. contig coverage: 1
Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 805
Reads contain 805 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 4
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 12
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 4
Contig length: 788
Avg. contig coverage: 1
Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 788
Reads contain 788 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 5
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 11
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 5
Contig length: 788
Avg. contig coverage: 1
Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 788
Reads contain 788 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 6
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 10
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 6
Contig length: 786
Avg. contig coverage: 1
Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 786
Reads contain 786 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 7
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 9
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 7
Contig length: 865
Avg. contig coverage: 1
Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 865
Reads contain 865 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 8
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 8
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 8
Contig length: 963
Avg. contig coverage: 1
Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 963
Reads contain 963 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 9
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 7
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 9
Contig length: 1052
Avg. contig coverage: 1
Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 1052
Reads contain 1052 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 10
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 6
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 10
Contig length: 563
Avg. contig coverage: 1
Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 563
Reads contain 563 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 11
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 5
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 11
Contig length: 893
Avg. contig coverage: 1
Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 893
Reads contain 893 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 12
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 4
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 12
Contig length: 478
Avg. contig coverage: 1
Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 478
Reads contain 478 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 13
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 3
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 13
Contig length: 869
Avg. contig coverage: 1
Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 869
Reads contain 869 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 14
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 2
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 14
Contig length: 973
Avg. contig coverage: 1
Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 973
Reads contain 973 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Building new contig 15
Localtime: Fri Jan 15 12:00:00 2010
Unused reads: 1
+
RL1
That's it for this contig.
Finished building the contig.
Localtime: Fri Jan 15 12:00:00 2010
-------------- Contig statistics ----------------
Contig id: 15
Contig length: 972
Avg. contig coverage: 1
Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0
IUPAC: 0 Funny: 0 *: 0
Num reads: 1
Avg. read length: 972
Reads contain 972 bases, 0 Ns and 0 gaps.
-------------------------------------------------
Localtime: Fri Jan 15 12:00:00 2010
Saving of extra temporary singlets disabled.
Marking possibly misassembled repeats ...done. Found none.
Transfering reads to readpool.
Localtime: Fri Jan 15 12:00:00 2010
Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.4.txt
Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.4.txt
Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.4.txt
Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.4.txt
Assembly finished, saving final results.
Localtime: Fri Jan 15 12:00:00 2010
Saving project statistics to file: cjejuni_demo_info/cjejuni_demo_info_contigstats.txt
Localtime: Fri Jan 15 12:00:00 2010
Saving read tag list to file: cjejuni_demo_info/cjejuni_demo_info_readtaglist.txt
Localtime: Fri Jan 15 12:00:00 2010
Saving contig tag list to file: cjejuni_demo_info/cjejuni_demo_info_consensustaglist.txt
Localtime: Fri Jan 15 12:00:00 2010
Saving project contig<->read list to file: cjejuni_demo_info/cjejuni_demo_info_contigreadlist.txt
Localtime: Fri Jan 15 12:00:00 2010
Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.caf
Localtime: Fri Jan 15 12:00:00 2010
Saving contigs to directory: cjejuni_demo_results/cjejuni_demo_out.gap4da
(first deleting old directory)
(now creating new directory)
(saving contigs)
Done.
Localtime: Fri Jan 15 12:00:00 2010
Saving contigs to FASTA file: cjejuni_demo_results/cjejuni_demo_out.unpadded.fasta
Saving padded contigs to FASTA file: cjejuni_demo_results/cjejuni_demo_out.padded.fasta
Saving contig qualities to FASTA quality file: cjejuni_demo_results/cjejuni_demo_out.unpadded.fasta.qual
Saving padded contig qualities to FASTA quality file: cjejuni_demo_results/cjejuni_demo_out.padded.fasta.qual
Localtime: Fri Jan 15 12:00:00 2010
Saving contigs TCS to file: cjejuni_demo_results/cjejuni_demo_out.tcs
Localtime: Fri Jan 15 12:00:00 2010
Saving SNP analysis to file: cjejuni_demo_info/cjejuni_demo_info_snpanalysis.txt
Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.txt
Localtime: Fri Jan 15 12:00:00 2010
Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.ace
Localtime: Fri Jan 15 12:00:00 2010
Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.html
Localtime: Fri Jan 15 12:00:00 2010
End of assembly process, thank you for using MIRA.
|
Go to the output files for this example
MIRA fragment assembly program
Version: EMBOSS:6.2.0
Standard (Mandatory) qualifiers:
-project string [mira] Default is mira. Defines the project
name for this assembly. The project name
automatically influences the name of input
and output files or directories. E.g. in the
default setting, the file names for the
output of the assembly in FASTA format would
be mira_out.fasta and mira_out.fasta.qual.
Setting the project name to 'MyProject'
would generate MyProject_out.fasta and
MyProject_out.fasta.qual. (Any string)
Additional (Optional) qualifiers: (none)
Advanced (Unprompted) qualifiers:
-paramsfile infile Loads parameters from the filename given.
Allows a maximum of 10 levels of recursion,
i.e. a -params option appearing within a
file that loads other parameter files
-setparam menu [unspecified] Sets parameters suited for
loading sequences from FASTA, PHD or CAF
files. The default is not to specify the
type of input file. (Values: unspecified
(Unspecified); fasta (Fasta); phd (PHD); caf
(CAF))
-expdir directory [.] Defines the directory where mira should
search for experiment files (EXP).
-scfdir directory [.] Defines the directory where mira should
search for SCF files
-feifile infile [mira_in.fofn] Defines the file of filenames
where the names of the EXP files of a
project are located.
-fpifile infile [mira_in.fofn] Defines the file of filenames
where the names of the PHD files of a
project are located.
-pifile infile [mira_in.phd] Defines the PHD file to load
sequences of a project from.
-faifile infile [mira_in.fasta] Defines the FASTA file to
load sequences of a project from.
-fqifile infile [mira_in.fasta.qual] Defines the fasta file
to load base qualities of a project from.
Although the order of reads in the quality
file does not need to be the same as in the
fasta or fofn projects (although it saves a
bit of time if they are).
-cifile infile [mira_in.caf] Defines the file to load a CAF
project from. Filename must end with
'.caf'.
-sdifile infile [mira_straindata_in.txt] Defines the file to
load straindata from. Only used in EST
projects (miraEST).
-xtiifile infile [mira_xmltraceinfo_in.xml] Defines the file
to load a trace info file in XML format
from. This can be used both when merging XML
data to loaded files or when loading a
project from an XML trace info file.
-genome menu [normal] Quality grades of de-novo genome
assembly. Draft is quick-and-dirty, suited
to get a first look on approximate coverage
of a running project. Should not be used for
anything else. Normal is the default
parameter set of mira that is able to tackle
most genomes. A bit slower than the draft
version, but includes such options as read
extension and vector remnant clipping.
Accurate is still slower than the normal
mode but should be used for genomes that
pose a problem to the normal mode. (Values:
draft (Draft); normal (Normal); accurate
(Accurate))
-mapping menu [normal] Work like the -genome switches
except they are to be used when performing
mapping assemblies against given backbone
sequences. (Values: draft (Draft); normal
(Normal); accurate (Accurate))
-clipping menu [medium] Three clipping grade modifiers,
from light clipping when working with well
preprocessed sequences to heavy clipping
when the sequences that are being assembled
had only sloppy or no preprocessing. Note 1
- the light version is already included in
the -genome and -mapping switches. Note 2 -
it is recommended that you perform a
thorough preprocessing (clipping sequencing
vector stretches, clipping of low quality
bases, tagging standard repeats etc.) before
assembling sequences. The clipping routines
of mira are more optimised to cope with the
last remnants of wrongly preprocessed
sequences than with sequences having had no
pre-processing at all. (Values: light
(Light); medium (Medium); heavy (Heavy))
-highlyrepetitive boolean [N] A modifier switch for genome data that
is deemed to be highly repetitive. The
assemblies will run slower due to more
iterative cycles that give mira a chance to
resolve nasty repeats.
-highqualitydata boolean [N] A modifier switch when the sequences
that are used are of exceptional quality.
mira will then bump up a few quality
parameters which should lead to less false
positives in the repeat and SNP detection
routines.
-estmode boolean [N] Switches mira to a good initial preset
for assembling EST data. Note that this is
not needed (and even counterproductive) when
used with miraEST.
-horrid boolean [N] Sets a number of parameters useful when
dealing with really horrid data sets. Useful
means that parameters are chosen to so that
time and memory consumption do not explode
beyond all hope of the program returning.
Note that MIRA will return in most cases
useful assemblies with this switch, but
these might not be as optimised as with
normal operation. The definition of 'horrid'
is a bit flexible, for example, (a) a
genomic projects with more than 2.000 reads
that all seem to align partly to each other
but have different repetitive structures or
(b) EST clusters with a few thousand almost
similar reads.
-borg boolean [N] Sets several parameters to have mira try
to assemble as many reads as possible. Will
probably slow down the assembly process and
use more memory. 'We are MIRA of borg. You
will be assembled, resistance is futile!'
-lj menu [fofnexp] Defines whether to load and
assemble EXP files from a file of filenames
('mira_in.fofn'), load and assemble FASTA
sequences ('mira_in.fasta') and their
qualities ('mira_in.fasta.qual'), load and
assemble sequences or qualities from a phd
file ('mira_in.phd') or to load a project
from a CAF file ('mira_in.caf') and assemble
or eventually reassemble it. N.B. fofnphd
is not currently available. (Values: fofnexp
(EXP files from a file of filenames); fasta
(Load and assemble FASTA); caf (Load and
assemble CAF); phd (Load and assemble PHD);
fofnphd (PHD files from a file of
filenames))
-fo boolean [N] If set to 'Y', the project will not be
assembled and no assembly output files will
be produced. Instead, the project files will
only be loaded. This switch is useful for
checking consistency of input files.
-mxti boolean [N] Some file formats above (FASTA, PHD or
even CAF and EXP) possibly don't contain all
the info necessary or useful for each read
of an assembly. Should additional
information, such as like clipping positions
etc., be available in a XML trace info file
in NCBI format (see File formats), then set
this option to 'Y' and it will be merged to
the data loaded. Please note, quality
clippings given here will override quality
clippings loaded earlier or performed by
mira. Minimum clippings will still be made
by the program, though.
-rns menu [sanger] Defines the centre naming scheme
for read suffixes. Currently, only Sanger
Institute and TIGR naming schemes are
supported out of the box. How to choose?
Please read the documentation available at
the different centres or ask your sequence
provider. In a nutshell, the Sanger scheme
is
'somename.[pqsfrw][12][bckdeflmnpt][a|b|c|...'
(e.g. U13a08f10.p1ca), TIGR scheme is
'somenameTF*|TR*|TA*' (e.g. GCPBN02TF or
GCPDL68TABRPT103A58B). (Values: sanger
(Sanger); tigr (TIGR))
-eq menu [SCF] Defines the source format for reading
qualities from external sources. Normally
takes effect only when these are not present
in the format of the load_job project (EXP
and FASTA can have them, CAF and PHD must
have them). (Values: none (None); SCF (SCF))
-eqo boolean [N] Only takes effect when 'lj' is fofnexp.
Defines whether or not the qualities from
the external source override the possibly
loaded qualities from the load job project.
This might be of use in case some
post-processing software fiddles around with
the quality values of the input file but
one wants to have the original ones.
-[no]droeqe boolean [Y] Should there be a major mismatch between
the external quality source and the
sequence (e.g. the base sequence read from a
SCF file does not match the originally read
base sequence), should the read be excluded
from assembly or not. If not, it will use
the qualities it had before trying to load
the external qualities (either default
qualities or the ones loaded from the
original source).
-[no]uti boolean [Y] Two reads sequenced from the same clone
template form a read pair with a known
minimum and maximum distance. This feature
will definitively help for contigs
containing lots of repeats. Set this to 'Y'
if your data contains information on insert
sizes. Information on insert sizes can be
given via the SI tag in EXP files (for each
read pair individually), or for the whole
project using dismin and dismax
-ess integer [1] Controls the starting step of the EST
assembly and is therefore only useful in
miraEST. EST assembly is a three step
process, each with different settings to the
assembly engine, with the result of each
step being saved to disk. If results of
previous steps are present in a directory,
one can easily 'play around' with different
setting for subsequent steps by reusing the
results of the previous steps and directly
starting with step two or three. (Integer
from 1 to 4)
-[no]ps boolean [Y] Controls whether date and time are
printed out during the assembly. Suppressing
it isn't useful in normal operation, only
when debugging or benchmarking.
-lsd boolean [N] Straindata is a key value file, one read
per line. First the name of the read, then
the strain name of the organism the read
comes from. It is used by the program to
differentiate different types of SNPs
appearing in organisms and classifying them.
-lb boolean [N] A backbone is a sequence (or a previous
assembly) that is used as a template for the
current assembly. The current assembly
process will first assemble reads to loaded
backbone contigs before creating new
contigs. This feature is helpful for
assembling against previous (and already
possibly edited) assembly iterations, or to
make a comparative assembly of two very
closely related organisms. Please read 'very
closely related' as in 'only SNP mutations
or short indels present'.
-sbuip integer [3] When assembling against backbones, this
parameter defines the pass iteration (see
nop) from which on the backbones will be
really used. In the passes preceding this
number, the non-backbone reads will be
assembled together as if no backbones
existed. This allows mira to correctly spot
repetitive stretches that differ by single
bases and tag them accordingly. Rule of
thumb - if backbones belong to the same
strain as the reads to assemble, set to 1.
If backbones are a different strain, then
set sbuib to 1 lower than nop (example - nop
4 and sbuip 3). (Integer 1 or more)
-bsn string Defines the name of the strain that the
backbone sequences have. (Any string)
-bft menu [fasta] Defines the filetype of the backbone
file given. Currently (2.8.1 ) only FASTA,
CAF and GBF files are supported. When GBF
(GenBank files, also named .gbk) files are
loaded, the features within these files are
automatically transformed into
Staden-compatible tags and get passed
through the assembly. (Values: fasta
(Fasta); caf (CAF); gbf (GenBank))
-brl integer [2500] Parameter for the internal sectioning
size of the backbone. Extremely repetitive
sequences may require reducing the default
value, but the default value should work
well in 99.9% of all cases. (Integer from
1000 to 3000)
-bbq integer [-1] Defines the default quality that the
backbone sequences have if they came without
quality values in their files (like in GBF
format or when FASTA is used without .qual
files). A value of -1 causes mira to use the
same default quality for backbones as for
reads. (Integer from -1 to 100)
-[no]abnc boolean [Y] The standard mode of the assembler is to
assemble available reads to a backbone and
make new contigs with the remaining reads.
If this option is set to 'N', the reads that
cannot be assembled into existing contigs
are put as singlets into the assembly, not
forming new contigs.
-mrl integer [40] Minimum length that reads must have to
be considered for the assembly. Shorter
sequences will be filtered out at the
beginning of the process and won't be
present in the final project. (Integer 20 or
more)
-nop integer [3] Defines how many iterations of the whole
assembly process are done. Rule of thumb -
for quick and dirty assembly use 1 (not
recommended). For assembly using read
extensions and / or automatic contig editing
(-ure and -ace) use at least 2. The
recommended setting is 3 or higher, as some
knowledge generated by the assembler can be
used only from the third iteration on. More
than 3 passes might be useful for projects
containing many repetitive elements. See
also -rbl and -mr for parameters that affect
the assembly and disentanglement of
possible repeats. (Integer 1 or more)
-[no]sep boolean [Y] Defines whether the skim algorithm (and
with it also the recalculation of
Smith-Waterman alignments) is called in
between each main pass. If set to 'N',
skimming is done only when needed by the
workflow, either when read extensions are
searched for (-ure) or when possible vector
leftovers are to be clipped (-pvc). Setting
this option to 'Y' is highly recommended,
setting it to 'N' is only for quick and
dirty assemblies.
-rbl integer [2] Defines the maximum number of times a
contig can be rebuilt during main assembly
passes (-nop) if misassemblies, due to
possible repeats, are found. (Integer 1 or
more)
-[no]sd boolean [Y] Default is 'Y' for mira and 'N' for
miraEST. A spoiler can be either a chimeric
read or it is a read with long parts of
unclipped vector sequence still included
(that was too long for the -pvc vector
leftover clipping routines). A spoiler
typically prevents contigs being joined;
MIRA will cut them back so that they present
no more harm to the assembly. Recommended
for assemblies of mid-to-high coverage
genomic assemblies; not recommended for
assemblies of ESTs as one might lose splice
variants with that. A minimum number of two
assembly passes (-nop) must be run for this
option to take effect.
-[no]sdlpo boolean [Y] Defines whether the spoiler detection
algorithms are run only for the last pass or
for all passes (-nop). Takes effect only if
spoiler detection (-sd) is on.
-bdq integer [10] Defines the default base quality of
reads that have no quality read from a file.
(Integer 0 or more)
-[no]ugpf boolean [Y] MIRA has two different pathfinder
algorithms it chooses from to find its way
through the (more or less) complete set of
possible sequence overlaps; a genomic and an
EST pathfinder. The genomic looks a bit
into the future of the assembly and tries to
stay on safe grounds using a maximum of
information already present in the contig
that is being built. The EST version, on the
contrary, will directly jump at the complex
cases posed by very similar repetitive
sequences and try to solve those first; it
is willing to fall down to brute force when
really bad cases (such as coverage with
thousands of sequences) are encountered.
Generally, the genomic pathfinder will also
work quite well with EST sequences (but
might get slowed down a lot in pathological
cases), while the EST algorithm does not
work so well on genomes. If in doubt,
leaveas 'Y' for genome projects and set to
'N' for EST projects.
-[no]uess boolean [Y] Another important switch if you plan to
assemble non-normalised EST libraries, where
some ESTs may reach coverages of several
hundreds or thousands of reads. This switch
lets MIRA save a lot of computational time
when aligning those extremely high coverage
areas (but only there), at the expense of
some accuracy.
-esspd integer [500] Defines the number of potential
partners a read must have for MIRA switching
into emergency search stop mode for that
read. (Integer 1 or more)
-umcbt boolean [N] Defines whether there is an upper limit
of time to be used to build one contig. Set
this to 'Y' in EST assemblies where you
think that extremely high coverages occur.
Less useful for assembly of genomic
sequences.
-bts integer [10000] Depending on -umcbt above, this
number defines the time in seconds alloted
to building one contig. (Integer 1 or more)
-[no]ure boolean [Y] Defines whether there is an upper limit
of time to be used to build one contig. Set
this to 'Y' in EST assemblies where you
think that extremely high coverages occur.
Less useful for assembly of genomic
sequences.
-rewl integer [30] Only takes effect when -ure is set to
'Y'. The read extension routines use a
sliding window approach on Smith-Waterman
alignments. This parameter defines the
window length. (Integer 1 or more)
-rewme integer [2] Only takes effect when -ure is set to
'Y'. The read extension routines use a
sliding window approach on Smith-Waterman
alignments. This parameter defines the
number maximum number of errors
(disagreements) between two alignments in
the given window. (Integer 1 or more)
-feip integer [0] Only takes effect when -ure is set to
'Y'. The read extension routines can be
called before assembly and/or after each
assembly pass (see -nop). This parameter
defines the first pass in which the read
extension routines are called. The default
of 0 tells mira to extend the reads the
first time before the first assembly pass.
(Integer 0 or more)
-leip integer [0] Only takes effect when -ure is set to
'Y'. The read extension routines can be
called before assembly and/or after each
assembly pass (see -nop). This parameter
defines the last pass in which the read
extension routines are called. The default
of 0 tells mira to extend the reads the last
time before the first assembly pass.
(Integer 0 or more)
-tpae boolean [N] This option is useful in EST assembly.
Poly-AT stretches at the end of reads that
were not correctly masked or clipped in
pre-processing steps from external programs
get tagged here. The assembler will not use
these stretches for critical operations.
Additionally, the tags do provide a good
visual anchor when looking at the assembly
with different programs.
-pbwl integer [7] Only takes effect when -tpae is set to
'Y'. Defines the window length within which
all bases (except the maximum number of
errors allowed) must be either A or T to be
considered a polybase stretch. (Integer 1 or
more)
-pbwme integer [2] Only takes effect when -tpae is set to
'Y. Defines the maximum number of errors
allowed in a given window length such that a
stretch is considered to be a polybase
stretch. The distribution of these errors is
not important. (Integer 1 or more)
-pbwgd integer [9] Only takes effect when -tpae is set to
'Y'. Defines the number of bases from the
end of a sequence (if masked, from the end
of the masked area) within which a polybase
stretch is looked for without finding one.
(Integer 1 or more)
-[no]pvc boolean [Y] Mira will try to identify possible
sequencing vector relicts present at the
start of a sequence and clip them away.
These relicts are usually a few bases long
and were not correctly removed from the
sequence in data pre-processing steps of
external programs. You might want to turn
off this option if you know (or think) that
your data contains a lot of repeats and the
option below to fine tune the clipping
behaviour does not give the expected
results.
-pvcmla integer [18] The clipping of possible vector relicts
option works quite well. Unfortunately the
bounds of repeats or differences in EST
splice variants sometimes show the same
alignment behaviour as possible sequencing
vector relicts and could therefore also be
clipped. To stop the vector clipping from
mistakenly clipping repetitive regions or
EST splice variants, this option puts an
upper bound to the number of bases a
potential clip is allowed to have. If the
number of bases is below or equal to this
threshold then the bases are clipped. If the
number of bases exceeds the threshold then
the clip is NOT performed. Setting the value
to 0 turns off the threshold i.e. clips are
then always performed if a potential vector
is found. (Integer 0 or more)
-qc boolean [N] Default is 'N', but is automatically set
to 'Y' when using the setparam options
'fasta' or 'phd' (can be turned off again by
subsequent options afterwards). This will
let mira perform its own quality clipping
before sequences are entered into the
assembly. The clip function performed is a
sequence end window quality clip with back
iteration to get a maximum number of bases
as useful sequence. Note that the bases
clipped away here can still be used
afterwards if there is enough evidence
supporting their correctness when the option
-ure is turned on.
-qcmq integer [20] This is the minimum quality required of
bases in a window in order to be accepted.
Please be cautious and don't use extreme
values here, because then the clipping will
be too lax or too harsh. Values below 15 and
higher than 35 are disallowed. (Integer
from 15 to 35)
-qcwl integer [30] This is the length of a window in bases
for the quality clip. (Integer 10 or more)
-[no]mbc boolean [Y] This will let mira perform a 'clipping'
of bases that were masked out (replaced with
the character X). It is generally not a
good idea to use mask bases to remove
unwanted portions of a sequence; the EXP
file format and the NCBI traceinfo format
have excellent possibilities to circumvent
this. But because a lot of pre-processing
software is built around cross_match,
scylla- and phrap-style base masking, the
need arised for mira to be able to handle
this too. mira will look at the start and
end of each sequence to see whether there
are masked bases that should be 'clipped'.
-mbcgs integer [20] While performing the clip of masked
bases, mira will look if it can merge larger
chunks of masked bases that are a maximum
of -mbcgs apart. (Integer 0 or more)
-mbcmfg integer [40] While performing the clip of masked
bases at the start of a sequence, mira will
allow up to this number of unmasked bases in
front of a masked stretch. (Integer 0 or
more)
-mbcmeg integer [60] While performing the clip of masked
bases at the end of a sequence, mira will
allow up to this number of unmasked bases
behind a masked stretch. (Integer 0 or more)
-[no]emlc boolean [Y] If on, ensures a minimum left clip on
each read according to the parameters in
-mlcr & -smlc
-mlcr integer [25] If -emlc is 'Y', checks whether there
is a left clip whose length is at least the
size specified here. (Integer 0 or more)
-smlc integer [30] If -emlc is 'Y' and the actual left
clip is < -mlcr, then set the left clip of
read to the value given here. (Integer 0 or
more)
-bph integer [14] Default is 14 on 32 bit systems and 16
on 64 bit systems. Controls the number of
consecutive bases n which are used as a word
hash. The higher the value the faster the
search. The lower the value the more weak
matches are found. Values below 10 are not
recommended. (Integer 1 or more)
-hss integer [4] This is a parameter controlling the
stepping increments with which hashes are
generated. This allows for a more
fine-grained search as matches are now found
with at least n+s (see -bph) equal bases
instead of the SSAHA 2n. The higher the
value the faster the search. The lower the
value the more weak matches are found.
(Integer 1 or more)
-pr integer [50] Controls the relative percentage of
exact word matches in an approximate overlap
that has to be reached to accept the
overlap as a possible match. Increasing this
number will decrease the number of possible
alignments that have to be checked by
Smith-Waterman later on in the assembly, but
it might also lead to the rejection of
weaker overlaps (i.e. overlaps that contain
a higher number of mismatches). (Integer 1
or more)
-mhpr integer [200] Controls the maximum number of
possible hits one read can maximally
transport to the Smith-Waterman alignment
phase. If more potential hits are found,
only the best ones are taken. This is an
important option for tackling projects that
contain extreme assembly conditions. For
example, 5000 reads that are all very
similar would generate around 40 to 50
million possible alignments (forward and
reverse complement). Setting this parameter
to 200 reduces the number of alignments to
check to around 1.5-2 million. As the
assembly increases in passes (-nop),
different combinations of possible hits will
be checked, always the probably best ones
first. So the accuracy of the assembly
should only suffer when lowering this number
too much. (Integer 1 or more)
-bip integer [15] The banded Smith-Waterman alignment
uses this percentage number to compute the
bandwidth it has to use when computing the
alignment matrix. E.g. expected overlap is
150 bases, bip=10 -> the banded SW will
compute a band of 15 bases to each side of
the expected alignment diagonal, thus
allowing up to 15 unbalanced inserts /
deletes in the alignment. INCREASING AND
DECREASING THIS NUMBER - increasing will
find more non-optimal alignments but will
also increase SW runtime between linear and
^2, decreasing will work the other way round
(it might miss a few bad alignments but
gain speed). (Integer from 1 to 100)
-bmin integer [25] Minimum bandwidth in bases to each
side. (Integer 1 or more)
-bmax integer [50] Maximum bandwidth in bases to each
side. (Integer 1 or more)
-mo integer [15] Minimum number of overlapping bases
needed in an alignment of two sequences to
be accepted. (Integer 1 or more)
-ms integer [15] Describes the minimum score of an
overlap to be taken into account for
assembly. mira uses a default scoring scheme
for SW align. Each match counts 1, a match
with an N counts 0, each mismatch with a
non-N base -1 and each gap -2. Use a bigger
score to weed out a number of chance
matches, a lower score to perhaps find the
single (short) alignment that might join two
contigs together (at the expense of
computing time and memory). (Integer 1 or
more)
-mrs integer [65] Describes the min percentage of
matching between two reads to be considered
for assembly. Increasing this number will
save memory but one might lose possible
alignments. A maximum of 80 is probably
sensible here. Decreasing below 55 will
probably make memory and time consumption
explode. (Integer from 1 to 100)
-egp boolean [N] Defines whether or not to increase
penalties applied to alignments containing
long gaps. Setting this to 'Y' might help in
projects with frequent repeats. On the
other hand, it is definitively disturbing
when assembling very long reads containing
multiple long indels in the called base
sequence ... although this should not happen
in the first place and is a sure sign for
problems lying ahead. When in doubt, set it
to 'Y' for EST projects and de-novo genome
assembly, set it to 'N' for assembly of
closely related strains (assembly against a
backbone). When set to 'N', it is
recommended to have -amgb and -amgbemc both
set to 'Y'.
-egpl menu [low] Has no effect if extra_gap_penalty is
off. Defines an extra penalty applied to
'long' gaps. There are these predefined
levels - 1. low - use this if you expect
your base caller frequently misses two or
more bases. 2. medium - use this if your
base caller is expected to frequently miss
one to two bases. 3. high - use this if your
base caller does not frequently miss more
than one base. For some stages of the EST
assembly process, a special value 'est' is
used. (Values: low (Low); medium (Medium);
high (High); est (EST split splices))
-megpp integer [100] Has no effect if extra_gap_penalty is
off. Defines the maximum extra penalty in
percent applied to 'long' gaps. (Integer
from 1 to 100)
-np string [mira] Contigs will have this string
prepended to their names. (Any string)
-an menu [signal] When adding reads to a contig,
dangerous regions can get an extra integrity
check. none = no extra check. text = check
is only text-based. signal = check is signal
based, if the SCF trace is not available,
fallback is 'text'. For the time being, only
regions tagged as ALUS or REPT in the
experiment file are considered dangerous.
(Values: none (None); text (Text); signal
(Signal))
-rodirs integer [15] When adding reads to a contig, reject
the reads if the drop in the quality of the
consensus is > the given value in %. Lower
values mean stricter checking. This value is
doubled should a read be entered that has a
template partner (a read pair) at the right
distance. (Integer from 1 to 100)
-dmer integer [1] When adding reads to a contig, reject
the reads if the error in zones known as
dangerous exceeds the given value in %.
Lower values mean stricter checking in these
danger zones. For the time being, only
regions tagged as ALUS or REPT in the
experiment file are considered dangerous.
(Integer from 1 to 100)
-[no]mr boolean [Y] One of the most important switches in
MIRA. If set to 'Y', MIRA will try to
resolve misassemblies due to repeats by
identifying single base stretch differences
and tag those critical bases as RMB (Repeat
Marker Base, weak or strong). This switch is
also needed when MIRA is run in EST mode to
identify possible inter-, intra- and
intra-and-interorganism SNPs.
-asir boolean [N] Only takes effect when -mr is set to
'Y', effect is also dependent on the fact
whether strain data (see -lsd) is present or
not. Usually, mira will mark bases that
differentiate between repeats, when a
conflict occurs between reads that belong to
one strain. If the conflict occurs between
reads belonging to different strains they
are marked as SNP. However, if this switch
is set to 'Y',= then conflicts within a
strain are also marked as SNP. This switch
is mainly used in assemblies of ESTs; it
should not be set for genomic assembly.
-mrpg integer [2] Only takes effect when -mr is set to
'Y'. This defines the minimum number of
reads in a group that are needed for the RMB
(Repeat Marker Bases) or SNP detection
routines to be triggered. A group is defined
by the reads carrying the same nucleotide
for a given position, i.e., an assembly with
mrpg=2 will need at least two times two
reads with the same nucleotide (having at
least a quality as defined in -mgqrt) to be
recognised as repeat marker or a SNP.
Setting this to a low number increases
sensitivity, but might produce a few false
positives, resulting in reads being thrown
out of contigs because of falsely identified
possible repeat markers (or wrongly
recognised as SNP). (Integer 2 or more)
-mgqrt integer [30] Only takes effect when -mr is set to
'Y'. This defines the minimum quality of a
group of bases to be taken into account as
potential repeat marker. The lower the
number, the more sensitive you get, but
lowering below 25 is not recommended as a
lot of wrongly called bases can have a
quality approaching this value and you'd end
up with a lot of false positives. The
higher the overall coverage of your project
the better, and the higher you can set this
number. A value of 35 will probably remove
all false positives, a value of 40 will
probably never show false positives.
(Integer 25 or more)
-emea integer [15] Only takes effect when -mr is set to
'Y'. Using the end of sequences of Sanger
type shotgun sequencing is always a bit
risky, as wrongly called bases tend to crowd
there or some sequencing vector relicts
hang around. It is even more risky to use
these stretches for detecting possible
repeats, so one can define an exclusion area
where the bases are not used when
determining whether a mismatch is due to
repeats or not. (Integer 0 or more)
-[no]amgb boolean [Y] Determines whether columns containing
gap bases (indels) are also tagged.
-[no]amgbemc boolean [Y] Only takes effect when -amgb is set to
'Y'. Determines whether multiple columns
containing gap bases (indels) are also
tagged.
-[no]amgbnbs boolean [Y] Only takes effect when -amgb is set to
'Y'. Determines whether, for both tagging
columns containing gap bases, both strands
need to have a gap. Setting this to 'N' is
not recommended except when working in
desperately low coverage situations.
-dismin integer [500] The minimum distance that read pairs
may be apart. There is an additional error
margin of 10% subtracted from this value
during internal computations. (Integer 0 or
more)
-dismax integer [5000] The maximum distance that read pairs
may be apart. There is an additional error
margin of 10% added to this value during
internal computations. (Integer 0 or more)
-ace boolean [N] Once contigs have been build, mira can
call a built-in version of the automatic
contig editor EdIt. EdIt will try to resolve
discrepancies in the contig by performing
trace analysis and correct even hard to
resolve errors. This option is always
useful, but especially in conjunction with
-nop and -ure. Notice: the current
development version has a memory leak in the
editor, therefore the option is not
automatically turned on.
-[no]sem boolean [Y] If set to 'Y' the automatic editor will
not take error hypotheses with a low
probability into account, even if all the
requirements to make an edit are fulfilled.
-ct integer [50] The higher this value, the more strict
the automatic editor will apply its internal
rule set. Going below 40 is not
recommended. (Integer from 1 to 100)
-[no]orc boolean [Y] Output CAF results
-[no]org boolean [Y] Output GAP4 results
-[no]orf boolean [Y] Output FASTA results
-ora boolean [N] Output ACE results
-[no]ort boolean [Y] Output TXT results
-[no]ors boolean [Y] Output TCS results
-orh boolean [N] Output HTML results
-otc boolean [N] Output temporary CAF results
-otg boolean [N] Output temporary GAP4 results
-otf boolean [N] Output temporary FASTA results
-ota boolean [N] Output temporary ACE results
-ott boolean [N] Output temporary TXT results
-ots boolean [N] Output temporary TCS results
-oth boolean [N] Output temporary HTML results
-oetc boolean [N] Output extra temporary CAF results
-oetg boolean [N] Output extra temporary GAP4 results
-oetf boolean [N] Output extra temporary FASTA results
-oeta boolean [N] Output extra temporary ACE results
-oett boolean [N] Output extra temporary TXT results
-oeth boolean [N] Output extra temporary HTML results
-tcpl integer [60] When producing an output in text format
(-ort|ott|oett), this parameter defines how
many bases each line of an alignment should
contain. (Integer 1 or more)
-hcpl integer [60] When producing an output in text format
(-orh|oth|oeth), this parameter defines how
many bases each line of an alignment should
contain. (Integer 1 or more)
-gapfda string [gap4da] Defines the extension of the
directory where mira will write the result
of an assembly ready to import into the
Staden package (GAP4) in Direct Assembly
format. The name of the directory will then
be
|
| Qualifier | Type | Description | Allowed values | Default | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Standard (Mandatory) qualifiers | ||||||||||||||
| -project | string | Default is mira. Defines the project name for this assembly. The project name automatically influences the name of input and output files or directories. E.g. in the default setting, the file names for the output of the assembly in FASTA format would be mira_out.fasta and mira_out.fasta.qual. Setting the project name to 'MyProject' would generate MyProject_out.fasta and MyProject_out.fasta.qual. | Any string | mira | ||||||||||
| Additional (Optional) qualifiers | ||||||||||||||
| (none) | ||||||||||||||
| Advanced (Unprompted) qualifiers | ||||||||||||||
| -paramsfile | infile | Loads parameters from the filename given. Allows a maximum of 10 levels of recursion, i.e. a -params option appearing within a file that loads other parameter files | Input file | Required | ||||||||||
| -setparam | list | Sets parameters suited for loading sequences from FASTA, PHD or CAF files. The default is not to specify the type of input file. |
|
unspecified | ||||||||||
| -expdir | directory | Defines the directory where mira should search for experiment files (EXP). | Directory | . | ||||||||||
| -scfdir | directory | Defines the directory where mira should search for SCF files | Directory | . | ||||||||||
| -feifile | infile | Defines the file of filenames where the names of the EXP files of a project are located. | Input file | mira_in.fofn | ||||||||||
| -fpifile | infile | Defines the file of filenames where the names of the PHD files of a project are located. | Input file | mira_in.fofn | ||||||||||
| -pifile | infile | Defines the PHD file to load sequences of a project from. | Input file | mira_in.phd | ||||||||||
| -faifile | infile | Defines the FASTA file to load sequences of a project from. | Input file | mira_in.fasta | ||||||||||
| -fqifile | infile | Defines the fasta file to load base qualities of a project from. Although the order of reads in the quality file does not need to be the same as in the fasta or fofn projects (although it saves a bit of time if they are). | Input file | mira_in.fasta.qual | ||||||||||
| -cifile | infile | Defines the file to load a CAF project from. Filename must end with '.caf'. | Input file | mira_in.caf | ||||||||||
| -sdifile | infile | Defines the file to load straindata from. Only used in EST projects (miraEST). | Input file | mira_straindata_in.txt | ||||||||||
| -xtiifile | infile | Defines the file to load a trace info file in XML format from. This can be used both when merging XML data to loaded files or when loading a project from an XML trace info file. | Input file | mira_xmltraceinfo_in.xml | ||||||||||
| -genome | list | Quality grades of de-novo genome assembly. Draft is quick-and-dirty, suited to get a first look on approximate coverage of a running project. Should not be used for anything else. Normal is the default parameter set of mira that is able to tackle most genomes. A bit slower than the draft version, but includes such options as read extension and vector remnant clipping. Accurate is still slower than the normal mode but should be used for genomes that pose a problem to the normal mode. |
|
normal | ||||||||||
| -mapping | list | Work like the -genome switches except they are to be used when performing mapping assemblies against given backbone sequences. |
|
normal | ||||||||||
| -clipping | list | Three clipping grade modifiers, from light clipping when working with well preprocessed sequences to heavy clipping when the sequences that are being assembled had only sloppy or no preprocessing. Note 1 - the light version is already included in the -genome and -mapping switches. Note 2 - it is recommended that you perform a thorough preprocessing (clipping sequencing vector stretches, clipping of low quality bases, tagging standard repeats etc.) before assembling sequences. The clipping routines of mira are more optimised to cope with the last remnants of wrongly preprocessed sequences than with sequences having had no pre-processing at all. |
|
medium | ||||||||||
| -highlyrepetitive | boolean | A modifier switch for genome data that is deemed to be highly repetitive. The assemblies will run slower due to more iterative cycles that give mira a chance to resolve nasty repeats. | Boolean value Yes/No | No | ||||||||||
| -highqualitydata | boolean | A modifier switch when the sequences that are used are of exceptional quality. mira will then bump up a few quality parameters which should lead to less false positives in the repeat and SNP detection routines. | Boolean value Yes/No | No | ||||||||||
| -estmode | boolean | Switches mira to a good initial preset for assembling EST data. Note that this is not needed (and even counterproductive) when used with miraEST. | Boolean value Yes/No | No | ||||||||||
| -horrid | boolean | Sets a number of parameters useful when dealing with really horrid data sets. Useful means that parameters are chosen to so that time and memory consumption do not explode beyond all hope of the program returning. Note that MIRA will return in most cases useful assemblies with this switch, but these might not be as optimised as with normal operation. The definition of 'horrid' is a bit flexible, for example, (a) a genomic projects with more than 2.000 reads that all seem to align partly to each other but have different repetitive structures or (b) EST clusters with a few thousand almost similar reads. | Boolean value Yes/No | No | ||||||||||
| -borg | boolean | Sets several parameters to have mira try to assemble as many reads as possible. Will probably slow down the assembly process and use more memory. 'We are MIRA of borg. You will be assembled, resistance is futile!' | Boolean value Yes/No | No | ||||||||||
| -lj | list | Defines whether to load and assemble EXP files from a file of filenames ('mira_in.fofn'), load and assemble FASTA sequences ('mira_in.fasta') and their qualities ('mira_in.fasta.qual'), load and assemble sequences or qualities from a phd file ('mira_in.phd') or to load a project from a CAF file ('mira_in.caf') and assemble or eventually reassemble it. N.B. fofnphd is not currently available. |
|
fofnexp | ||||||||||
| -fo | boolean | If set to 'Y', the project will not be assembled and no assembly output files will be produced. Instead, the project files will only be loaded. This switch is useful for checking consistency of input files. | Boolean value Yes/No | No | ||||||||||
| -mxti | boolean | Some file formats above (FASTA, PHD or even CAF and EXP) possibly don't contain all the info necessary or useful for each read of an assembly. Should additional information, such as like clipping positions etc., be available in a XML trace info file in NCBI format (see File formats), then set this option to 'Y' and it will be merged to the data loaded. Please note, quality clippings given here will override quality clippings loaded earlier or performed by mira. Minimum clippings will still be made by the program, though. | Boolean value Yes/No | No | ||||||||||
| -rns | list | Defines the centre naming scheme for read suffixes. Currently, only Sanger Institute and TIGR naming schemes are supported out of the box. How to choose? Please read the documentation available at the different centres or ask your sequence provider. In a nutshell, the Sanger scheme is 'somename.[pqsfrw][12][bckdeflmnpt][a|b|c|...' (e.g. U13a08f10.p1ca), TIGR scheme is 'somenameTF*|TR*|TA*' (e.g. GCPBN02TF or GCPDL68TABRPT103A58B). |
|
sanger | ||||||||||
| -eq | list | Defines the source format for reading qualities from external sources. Normally takes effect only when these are not present in the format of the load_job project (EXP and FASTA can have them, CAF and PHD must have them). |
|
SCF | ||||||||||
| -eqo | boolean | Only takes effect when 'lj' is fofnexp. Defines whether or not the qualities from the external source override the possibly loaded qualities from the load job project. This might be of use in case some post-processing software fiddles around with the quality values of the input file but one wants to have the original ones. | Boolean value Yes/No | No | ||||||||||
| -[no]droeqe | boolean | Should there be a major mismatch between the external quality source and the sequence (e.g. the base sequence read from a SCF file does not match the originally read base sequence), should the read be excluded from assembly or not. If not, it will use the qualities it had before trying to load the external qualities (either default qualities or the ones loaded from the original source). | Boolean value Yes/No | Yes | ||||||||||
| -[no]uti | boolean | Two reads sequenced from the same clone template form a read pair with a known minimum and maximum distance. This feature will definitively help for contigs containing lots of repeats. Set this to 'Y' if your data contains information on insert sizes. Information on insert sizes can be given via the SI tag in EXP files (for each read pair individually), or for the whole project using dismin and dismax | Boolean value Yes/No | Yes | ||||||||||
| -ess | integer | Controls the starting step of the EST assembly and is therefore only useful in miraEST. EST assembly is a three step process, each with different settings to the assembly engine, with the result of each step being saved to disk. If results of previous steps are present in a directory, one can easily 'play around' with different setting for subsequent steps by reusing the results of the previous steps and directly starting with step two or three. | Integer from 1 to 4 | 1 | ||||||||||
| -[no]ps | boolean | Controls whether date and time are printed out during the assembly. Suppressing it isn't useful in normal operation, only when debugging or benchmarking. | Boolean value Yes/No | Yes | ||||||||||
| -lsd | boolean | Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them. | Boolean value Yes/No | No | ||||||||||
| -lb | boolean | A backbone is a sequence (or a previous assembly) that is used as a template for the current assembly. The current assembly process will first assemble reads to loaded backbone contigs before creating new contigs. This feature is helpful for assembling against previous (and already possibly edited) assembly iterations, or to make a comparative assembly of two very closely related organisms. Please read 'very closely related' as in 'only SNP mutations or short indels present'. | Boolean value Yes/No | No | ||||||||||
| -sbuip | integer | When assembling against backbones, this parameter defines the pass iteration (see nop) from which on the backbones will be really used. In the passes preceding this number, the non-backbone reads will be assembled together as if no backbones existed. This allows mira to correctly spot repetitive stretches that differ by single bases and tag them accordingly. Rule of thumb - if backbones belong to the same strain as the reads to assemble, set to 1. If backbones are a different strain, then set sbuib to 1 lower than nop (example - nop 4 and sbuip 3). | Integer 1 or more | 3 | ||||||||||
| -bsn | string | Defines the name of the strain that the backbone sequences have. | Any string | |||||||||||
| -bft | list | Defines the filetype of the backbone file given. Currently (2.8.1 ) only FASTA, CAF and GBF files are supported. When GBF (GenBank files, also named .gbk) files are loaded, the features within these files are automatically transformed into Staden-compatible tags and get passed through the assembly. |
|
fasta | ||||||||||
| -brl | integer | Parameter for the internal sectioning size of the backbone. Extremely repetitive sequences may require reducing the default value, but the default value should work well in 99.9% of all cases. | Integer from 1000 to 3000 | 2500 | ||||||||||
| -bbq | integer | Defines the default quality that the backbone sequences have if they came without quality values in their files (like in GBF format or when FASTA is used without .qual files). A value of -1 causes mira to use the same default quality for backbones as for reads. | Integer from -1 to 100 | -1 | ||||||||||
| -[no]abnc | boolean | The standard mode of the assembler is to assemble available reads to a backbone and make new contigs with the remaining reads. If this option is set to 'N', the reads that cannot be assembled into existing contigs are put as singlets into the assembly, not forming new contigs. | Boolean value Yes/No | Yes | ||||||||||
| -mrl | integer | Minimum length that reads must have to be considered for the assembly. Shorter sequences will be filtered out at the beginning of the process and won't be present in the final project. | Integer 20 or more | 40 | ||||||||||
| -nop | integer | Defines how many iterations of the whole assembly process are done. Rule of thumb - for quick and dirty assembly use 1 (not recommended). For assembly using read extensions and / or automatic contig editing (-ure and -ace) use at least 2. The recommended setting is 3 or higher, as some knowledge generated by the assembler can be used only from the third iteration on. More than 3 passes might be useful for projects containing many repetitive elements. See also -rbl and -mr for parameters that affect the assembly and disentanglement of possible repeats. | Integer 1 or more | 3 | ||||||||||
| -[no]sep | boolean | Defines whether the skim algorithm (and with it also the recalculation of Smith-Waterman alignments) is called in between each main pass. If set to 'N', skimming is done only when needed by the workflow, either when read extensions are searched for (-ure) or when possible vector leftovers are to be clipped (-pvc). Setting this option to 'Y' is highly recommended, setting it to 'N' is only for quick and dirty assemblies. | Boolean value Yes/No | Yes | ||||||||||
| -rbl | integer | Defines the maximum number of times a contig can be rebuilt during main assembly passes (-nop) if misassemblies, due to possible repeats, are found. | Integer 1 or more | 2 | ||||||||||
| -[no]sd | boolean | Default is 'Y' for mira and 'N' for miraEST. A spoiler can be either a chimeric read or it is a read with long parts of unclipped vector sequence still included (that was too long for the -pvc vector leftover clipping routines). A spoiler typically prevents contigs being joined; MIRA will cut them back so that they present no more harm to the assembly. Recommended for assemblies of mid-to-high coverage genomic assemblies; not recommended for assemblies of ESTs as one might lose splice variants with that. A minimum number of two assembly passes (-nop) must be run for this option to take effect. | Boolean value Yes/No | Yes | ||||||||||
| -[no]sdlpo | boolean | Defines whether the spoiler detection algorithms are run only for the last pass or for all passes (-nop). Takes effect only if spoiler detection (-sd) is on. | Boolean value Yes/No | Yes | ||||||||||
| -bdq | integer | Defines the default base quality of reads that have no quality read from a file. | Integer 0 or more | 10 | ||||||||||
| -[no]ugpf | boolean | MIRA has two different pathfinder algorithms it chooses from to find its way through the (more or less) complete set of possible sequence overlaps; a genomic and an EST pathfinder. The genomic looks a bit into the future of the assembly and tries to stay on safe grounds using a maximum of information already present in the contig that is being built. The EST version, on the contrary, will directly jump at the complex cases posed by very similar repetitive sequences and try to solve those first; it is willing to fall down to brute force when really bad cases (such as coverage with thousands of sequences) are encountered. Generally, the genomic pathfinder will also work quite well with EST sequences (but might get slowed down a lot in pathological cases), while the EST algorithm does not work so well on genomes. If in doubt, leaveas 'Y' for genome projects and set to 'N' for EST projects. | Boolean value Yes/No | Yes | ||||||||||
| -[no]uess | boolean | Another important switch if you plan to assemble non-normalised EST libraries, where some ESTs may reach coverages of several hundreds or thousands of reads. This switch lets MIRA save a lot of computational time when aligning those extremely high coverage areas (but only there), at the expense of some accuracy. | Boolean value Yes/No | Yes | ||||||||||
| -esspd | integer | Defines the number of potential partners a read must have for MIRA switching into emergency search stop mode for that read. | Integer 1 or more | 500 | ||||||||||
| -umcbt | boolean | Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences. | Boolean value Yes/No | No | ||||||||||
| -bts | integer | Depending on -umcbt above, this number defines the time in seconds alloted to building one contig. | Integer 1 or more | 10000 | ||||||||||
| -[no]ure | boolean | Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences. | Boolean value Yes/No | Yes | ||||||||||
| -rewl | integer | Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the window length. | Integer 1 or more | 30 | ||||||||||
| -rewme | integer | Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the number maximum number of errors (disagreements) between two alignments in the given window. | Integer 1 or more | 2 | ||||||||||
| -feip | integer | Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the first pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the first time before the first assembly pass. | Integer 0 or more | 0 | ||||||||||
| -leip | integer | Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the last pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the last time before the first assembly pass. | Integer 0 or more | 0 | ||||||||||
| -tpae | boolean | This option is useful in EST assembly. Poly-AT stretches at the end of reads that were not correctly masked or clipped in pre-processing steps from external programs get tagged here. The assembler will not use these stretches for critical operations. Additionally, the tags do provide a good visual anchor when looking at the assembly with different programs. | Boolean value Yes/No | No | ||||||||||
| -pbwl | integer | Only takes effect when -tpae is set to 'Y'. Defines the window length within which all bases (except the maximum number of errors allowed) must be either A or T to be considered a polybase stretch. | Integer 1 or more | 7 | ||||||||||
| -pbwme | integer | Only takes effect when -tpae is set to 'Y. Defines the maximum number of errors allowed in a given window length such that a stretch is considered to be a polybase stretch. The distribution of these errors is not important. | Integer 1 or more | 2 | ||||||||||
| -pbwgd | integer | Only takes effect when -tpae is set to 'Y'. Defines the number of bases from the end of a sequence (if masked, from the end of the masked area) within which a polybase stretch is looked for without finding one. | Integer 1 or more | 9 | ||||||||||
| -[no]pvc | boolean | Mira will try to identify possible sequencing vector relicts present at the start of a sequence and clip them away. These relicts are usually a few bases long and were not correctly removed from the sequence in data pre-processing steps of external programs. You might want to turn off this option if you know (or think) that your data contains a lot of repeats and the option below to fine tune the clipping behaviour does not give the expected results. | Boolean value Yes/No | Yes | ||||||||||
| -pvcmla | integer | The clipping of possible vector relicts option works quite well. Unfortunately the bounds of repeats or differences in EST splice variants sometimes show the same alignment behaviour as possible sequencing vector relicts and could therefore also be clipped. To stop the vector clipping from mistakenly clipping repetitive regions or EST splice variants, this option puts an upper bound to the number of bases a potential clip is allowed to have. If the number of bases is below or equal to this threshold then the bases are clipped. If the number of bases exceeds the threshold then the clip is NOT performed. Setting the value to 0 turns off the threshold i.e. clips are then always performed if a potential vector is found. | Integer 0 or more | 18 | ||||||||||
| -qc | boolean | Default is 'N', but is automatically set to 'Y' when using the setparam options 'fasta' or 'phd' (can be turned off again by subsequent options afterwards). This will let mira perform its own quality clipping before sequences are entered into the assembly. The clip function performed is a sequence end window quality clip with back iteration to get a maximum number of bases as useful sequence. Note that the bases clipped away here can still be used afterwards if there is enough evidence supporting their correctness when the option -ure is turned on. | Boolean value Yes/No | No | ||||||||||
| -qcmq | integer | This is the minimum quality required of bases in a window in order to be accepted. Please be cautious and don't use extreme values here, because then the clipping will be too lax or too harsh. Values below 15 and higher than 35 are disallowed. | Integer from 15 to 35 | 20 | ||||||||||
| -qcwl | integer | This is the length of a window in bases for the quality clip. | Integer 10 or more | 30 | ||||||||||
| -[no]mbc | boolean | This will let mira perform a 'clipping' of bases that were masked out (replaced with the character X). It is generally not a good idea to use mask bases to remove unwanted portions of a sequence; the EXP file format and the NCBI traceinfo format have excellent possibilities to circumvent this. But because a lot of pre-processing software is built around cross_match, scylla- and phrap-style base masking, the need arised for mira to be able to handle this too. mira will look at the start and end of each sequence to see whether there are masked bases that should be 'clipped'. | Boolean value Yes/No | Yes | ||||||||||
| -mbcgs | integer | While performing the clip of masked bases, mira will look if it can merge larger chunks of masked bases that are a maximum of -mbcgs apart. | Integer 0 or more | 20 | ||||||||||
| -mbcmfg | integer | While performing the clip of masked bases at the start of a sequence, mira will allow up to this number of unmasked bases in front of a masked stretch. | Integer 0 or more | 40 | ||||||||||
| -mbcmeg | integer | While performing the clip of masked bases at the end of a sequence, mira will allow up to this number of unmasked bases behind a masked stretch. | Integer 0 or more | 60 | ||||||||||
| -[no]emlc | boolean | If on, ensures a minimum left clip on each read according to the parameters in -mlcr & -smlc | Boolean value Yes/No | Yes | ||||||||||
| -mlcr | integer | If -emlc is 'Y', checks whether there is a left clip whose length is at least the size specified here. | Integer 0 or more | 25 | ||||||||||
| -smlc | integer | If -emlc is 'Y' and the actual left clip is < -mlcr, then set the left clip of read to the value given here. | Integer 0 or more | 30 | ||||||||||
| -bph | integer | Default is 14 on 32 bit systems and 16 on 64 bit systems. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended. | Integer 1 or more | 14 | ||||||||||
| -hss | integer | This is a parameter controlling the stepping increments with which hashes are generated. This allows for a more fine-grained search as matches are now found with at least n+s (see -bph) equal bases instead of the SSAHA 2n. The higher the value the faster the search. The lower the value the more weak matches are found. | Integer 1 or more | 4 | ||||||||||
| -pr | integer | Controls the relative percentage of exact word matches in an approximate overlap that has to be reached to accept the overlap as a possible match. Increasing this number will decrease the number of possible alignments that have to be checked by Smith-Waterman later on in the assembly, but it might also lead to the rejection of weaker overlaps (i.e. overlaps that contain a higher number of mismatches). | Integer 1 or more | 50 | ||||||||||
| -mhpr | integer | Controls the maximum number of possible hits one read can maximally transport to the Smith-Waterman alignment phase. If more potential hits are found, only the best ones are taken. This is an important option for tackling projects that contain extreme assembly conditions. For example, 5000 reads that are all very similar would generate around 40 to 50 million possible alignments (forward and reverse complement). Setting this parameter to 200 reduces the number of alignments to check to around 1.5-2 million. As the assembly increases in passes (-nop), different combinations of possible hits will be checked, always the probably best ones first. So the accuracy of the assembly should only suffer when lowering this number too much. | Integer 1 or more | 200 | ||||||||||
| -bip | integer | The banded Smith-Waterman alignment uses this percentage number to compute the bandwidth it has to use when computing the alignment matrix. E.g. expected overlap is 150 bases, bip=10 -> the banded SW will compute a band of 15 bases to each side of the expected alignment diagonal, thus allowing up to 15 unbalanced inserts / deletes in the alignment. INCREASING AND DECREASING THIS NUMBER - increasing will find more non-optimal alignments but will also increase SW runtime between linear and ^2, decreasing will work the other way round (it might miss a few bad alignments but gain speed). | Integer from 1 to 100 | 15 | ||||||||||
| -bmin | integer | Minimum bandwidth in bases to each side. | Integer 1 or more | 25 | ||||||||||
| -bmax | integer | Maximum bandwidth in bases to each side. | Integer 1 or more | 50 | ||||||||||
| -mo | integer | Minimum number of overlapping bases needed in an alignment of two sequences to be accepted. | Integer 1 or more | 15 | ||||||||||
| -ms | integer | Describes the minimum score of an overlap to be taken into account for assembly. mira uses a default scoring scheme for SW align. Each match counts 1, a match with an N counts 0, each mismatch with a non-N base -1 and each gap -2. Use a bigger score to weed out a number of chance matches, a lower score to perhaps find the single (short) alignment that might join two contigs together (at the expense of computing time and memory). | Integer 1 or more | 15 | ||||||||||
| -mrs | integer | Describes the min percentage of matching between two reads to be considered for assembly. Increasing this number will save memory but one might lose possible alignments. A maximum of 80 is probably sensible here. Decreasing below 55 will probably make memory and time consumption explode. | Integer from 1 to 100 | 65 | ||||||||||
| -egp | boolean | Defines whether or not to increase penalties applied to alignments containing long gaps. Setting this to 'Y' might help in projects with frequent repeats. On the other hand, it is definitively disturbing when assembling very long reads containing multiple long indels in the called base sequence ... although this should not happen in the first place and is a sure sign for problems lying ahead. When in doubt, set it to 'Y' for EST projects and de-novo genome assembly, set it to 'N' for assembly of closely related strains (assembly against a backbone). When set to 'N', it is recommended to have -amgb and -amgbemc both set to 'Y'. | Boolean value Yes/No | No | ||||||||||
| -egpl | list | Has no effect if extra_gap_penalty is off. Defines an extra penalty applied to 'long' gaps. There are these predefined levels - 1. low - use this if you expect your base caller frequently misses two or more bases. 2. medium - use this if your base caller is expected to frequently miss one to two bases. 3. high - use this if your base caller does not frequently miss more than one base. For some stages of the EST assembly process, a special value 'est' is used. |
|
low | ||||||||||
| -megpp | integer | Has no effect if extra_gap_penalty is off. Defines the maximum extra penalty in percent applied to 'long' gaps. | Integer from 1 to 100 | 100 | ||||||||||
| -np | string | Contigs will have this string prepended to their names. | Any string | mira | ||||||||||
| -an | list | When adding reads to a contig, dangerous regions can get an extra integrity check. none = no extra check. text = check is only text-based. signal = check is signal based, if the SCF trace is not available, fallback is 'text'. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. |
|
signal | ||||||||||
| -rodirs | integer | When adding reads to a contig, reject the reads if the drop in the quality of the consensus is > the given value in %. Lower values mean stricter checking. This value is doubled should a read be entered that has a template partner (a read pair) at the right distance. | Integer from 1 to 100 | 15 | ||||||||||
| -dmer | integer | When adding reads to a contig, reject the reads if the error in zones known as dangerous exceeds the given value in %. Lower values mean stricter checking in these danger zones. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. | Integer from 1 to 100 | 1 | ||||||||||
| -[no]mr | boolean | One of the most important switches in MIRA. If set to 'Y', MIRA will try to resolve misassemblies due to repeats by identifying single base stretch differences and tag those critical bases as RMB (Repeat Marker Base, weak or strong). This switch is also needed when MIRA is run in EST mode to identify possible inter-, intra- and intra-and-interorganism SNPs. | Boolean value Yes/No | Yes | ||||||||||
| -asir | boolean | Only takes effect when -mr is set to 'Y', effect is also dependent on the fact whether strain data (see -lsd) is present or not. Usually, mira will mark bases that differentiate between repeats, when a conflict occurs between reads that belong to one strain. If the conflict occurs between reads belonging to different strains they are marked as SNP. However, if this switch is set to 'Y',= then conflicts within a strain are also marked as SNP. This switch is mainly used in assemblies of ESTs; it should not be set for genomic assembly. | Boolean value Yes/No | No | ||||||||||
| -mrpg | integer | Only takes effect when -mr is set to 'Y'. This defines the minimum number of reads in a group that are needed for the RMB (Repeat Marker Bases) or SNP detection routines to be triggered. A group is defined by the reads carrying the same nucleotide for a given position, i.e., an assembly with mrpg=2 will need at least two times two reads with the same nucleotide (having at least a quality as defined in -mgqrt) to be recognised as repeat marker or a SNP. Setting this to a low number increases sensitivity, but might produce a few false positives, resulting in reads being thrown out of contigs because of falsely identified possible repeat markers (or wrongly recognised as SNP). | Integer 2 or more | 2 | ||||||||||
| -mgqrt | integer | Only takes effect when -mr is set to 'Y'. This defines the minimum quality of a group of bases to be taken into account as potential repeat marker. The lower the number, the more sensitive you get, but lowering below 25 is not recommended as a lot of wrongly called bases can have a quality approaching this value and you'd end up with a lot of false positives. The higher the overall coverage of your project the better, and the higher you can set this number. A value of 35 will probably remove all false positives, a value of 40 will probably never show false positives. | Integer 25 or more | 30 | ||||||||||
| -emea | integer | Only takes effect when -mr is set to 'Y'. Using the end of sequences of Sanger type shotgun sequencing is always a bit risky, as wrongly called bases tend to crowd there or some sequencing vector relicts hang around. It is even more risky to use these stretches for detecting possible repeats, so one can define an exclusion area where the bases are not used when determining whether a mismatch is due to repeats or not. | Integer 0 or more | 15 | ||||||||||
| -[no]amgb | boolean | Determines whether columns containing gap bases (indels) are also tagged. | Boolean value Yes/No | Yes | ||||||||||
| -[no]amgbemc | boolean | Only takes effect when -amgb is set to 'Y'. Determines whether multiple columns containing gap bases (indels) are also tagged. | Boolean value Yes/No | Yes | ||||||||||
| -[no]amgbnbs | boolean | Only takes effect when -amgb is set to 'Y'. Determines whether, for both tagging columns containing gap bases, both strands need to have a gap. Setting this to 'N' is not recommended except when working in desperately low coverage situations. | Boolean value Yes/No | Yes | ||||||||||
| -dismin | integer | The minimum distance that read pairs may be apart. There is an additional error margin of 10% subtracted from this value during internal computations. | Integer 0 or more | 500 | ||||||||||
| -dismax | integer | The maximum distance that read pairs may be apart. There is an additional error margin of 10% added to this value during internal computations. | Integer 0 or more | 5000 | ||||||||||
| -ace | boolean | Once contigs have been build, mira can call a built-in version of the automatic contig editor EdIt. EdIt will try to resolve discrepancies in the contig by performing trace analysis and correct even hard to resolve errors. This option is always useful, but especially in conjunction with -nop and -ure. Notice: the current development version has a memory leak in the editor, therefore the option is not automatically turned on. | Boolean value Yes/No | No | ||||||||||
| -[no]sem | boolean | If set to 'Y' the automatic editor will not take error hypotheses with a low probability into account, even if all the requirements to make an edit are fulfilled. | Boolean value Yes/No | Yes | ||||||||||
| -ct | integer | The higher this value, the more strict the automatic editor will apply its internal rule set. Going below 40 is not recommended. | Integer from 1 to 100 | 50 | ||||||||||
| -[no]orc | boolean | Output CAF results | Boolean value Yes/No | Yes | ||||||||||
| -[no]org | boolean | Output GAP4 results | Boolean value Yes/No | Yes | ||||||||||
| -[no]orf | boolean | Output FASTA results | Boolean value Yes/No | Yes | ||||||||||
| -ora | boolean | Output ACE results | Boolean value Yes/No | No | ||||||||||
| -[no]ort | boolean | Output TXT results | Boolean value Yes/No | Yes | ||||||||||
| -[no]ors | boolean | Output TCS results | Boolean value Yes/No | Yes | ||||||||||
| -orh | boolean | Output HTML results | Boolean value Yes/No | No | ||||||||||
| -otc | boolean | Output temporary CAF results | Boolean value Yes/No | No | ||||||||||
| -otg | boolean | Output temporary GAP4 results | Boolean value Yes/No | No | ||||||||||
| -otf | boolean | Output temporary FASTA results | Boolean value Yes/No | No | ||||||||||
| -ota | boolean | Output temporary ACE results | Boolean value Yes/No | No | ||||||||||
| -ott | boolean | Output temporary TXT results | Boolean value Yes/No | No | ||||||||||
| -ots | boolean | Output temporary TCS results | Boolean value Yes/No | No | ||||||||||
| -oth | boolean | Output temporary HTML results | Boolean value Yes/No | No | ||||||||||
| -oetc | boolean | Output extra temporary CAF results | Boolean value Yes/No | No | ||||||||||
| -oetg | boolean | Output extra temporary GAP4 results | Boolean value Yes/No | No | ||||||||||
| -oetf | boolean | Output extra temporary FASTA results | Boolean value Yes/No | No | ||||||||||
| -oeta | boolean | Output extra temporary ACE results | Boolean value Yes/No | No | ||||||||||
| -oett | boolean | Output extra temporary TXT results | Boolean value Yes/No | No | ||||||||||
| -oeth | boolean | Output extra temporary HTML results | Boolean value Yes/No | No | ||||||||||
| -tcpl | integer | When producing an output in text format (-ort|ott|oett), this parameter defines how many bases each line of an alignment should contain. | Integer 1 or more | 60 | ||||||||||
| -hcpl | integer | When producing an output in text format (-orh|oth|oeth), this parameter defines how many bases each line of an alignment should contain. | Integer 1 or more | 60 | ||||||||||
| -gapfda | string | Defines the extension of the directory where mira will write the result of an assembly ready to import into the Staden package (GAP4) in Direct Assembly format. The name of the directory will then be <projectname>_.<extension> | Any string | gap4da | ||||||||||
| -log | string | Defines the directory where mira will write some log files to. Note that the name of the actual project will be prepended. | Any string | miralog | ||||||||||
| -co | string | Defines the file in CAF format to save an assembled project to. Filename must end with '.caf'. | Any string | mira_out.caf | ||||||||||
| Associated qualifiers | ||||||||||||||
| (none) | ||||||||||||||
| General qualifiers | ||||||||||||||
| -auto | boolean | Turn off prompts | Boolean value Yes/No | N | ||||||||||
| -stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | ||||||||||
| -filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | ||||||||||
| -options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | ||||||||||
| -debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | ||||||||||
| -verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | ||||||||||
| -help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | ||||||||||
| -warning | boolean | Report warnings | Boolean value Yes/No | Y | ||||||||||
| -error | boolean | Report errors | Boolean value Yes/No | Y | ||||||||||
| -fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | ||||||||||
| -die | boolean | Report dying program messages | Boolean value Yes/No | Y | ||||||||||
| -version | boolean | Report version number and exit | Boolean value Yes/No | N | ||||||||||
|
|
This directory contains output files.
This directory contains output files.
This directory contains output files.
| Program name | Description |
|---|---|
| emiraest | MIRAest fragment assembly program |
Although we take every care to ensure that the results of the EMBOSS version are identical to those from the original package, we recommend that you check your inputs give the same results in both versions before publication.
Please report all bugs in the EMBOSS version to the EMBOSS bug team, not to the original author.