jaspextract

 

Wiki

The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

Please help by correcting and extending the Wiki pages.

Function

Extract data from JASPAR

Description

JASPAR is a collection of transcription factor DNA-binding preferences, modelled as matrices. These can be converted into Position Weight Matrices (PWMs or PSSMs), used for scanning genomic sequences.

JASPAR is the only database with this scope where the data can be used with no restrictions (open-source).

This program copies the JASPAR distribution into its component matrix sets (e.g. JASPAR_CORE, JASPAR_PHYLOFACTS etc) and copies them into the EMBOSS data directories, performing any necessary conversions

The home page of JASPAR is: http://jaspar.genereg.net/

The EMBOSS program jaspscan will not work unless this program is run.

Running this program may be the job of your system manager.

Usage

Here is a sample session with jaspextract


% jaspextract 
Extract data from JASPAR
JASPAR database directory [.]: jaspar

Go to the output files for this example

Command line arguments

Extract data from JASPAR
Version: EMBOSS:6.4.0.0

   Standard (Mandatory) qualifiers:
  [-directory]         directory  The FlatFileDir directory containing the
                                  .pfm files and the matrix_list.txt file

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-directory" associated qualifiers
   -extension1         string     Default file extension

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-directory]
(Parameter 1)
directory The FlatFileDir directory containing the .pfm files and the matrix_list.txt file Directory  
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
(none)
Associated qualifiers
"-directory" associated directory qualifiers
-extension1
-extension_directory
string Default file extension Any string  
General qualifiers
-auto boolean Turn off prompts Boolean value Yes/No N
-stdout boolean Write first file to standard output Boolean value Yes/No N
-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N
-options boolean Prompt for standard and additional values Boolean value Yes/No N
-debug boolean Write debug output to program.dbg Boolean value Yes/No N
-verbose boolean Report some/full command line options Boolean value Yes/No Y
-help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N
-warning boolean Report warnings Boolean value Yes/No Y
-error boolean Report errors Boolean value Yes/No Y
-fatal boolean Report fatal errors Boolean value Yes/No Y
-die boolean Report dying program messages Boolean value Yes/No Y
-version boolean Report version number and exit Boolean value Yes/No N

Input file format

The input files are part of the uncompressed and extracted Archive.zip file provided in the JASPAR html/DOWNLOAD directory of the JASPAR homepage (http://jaspar.genereg.net). After extracting the file you should specify the all_data/FlatFileDir directory when running jasparextract. It is advisable to first delete any old data files from your EMBOSS data file area e.g. from the /usr/local/emboss/share/EMBOSS/data/JASPAR_* directories

Output file format

The output file format is currently the same as the JASPAR distribution format, but with the matrix files separated into directories according to their type.

Output files for usage example

Directory: JASPAR_CNE

This directory contains output files.

Directory: JASPAR_CORE

This directory contains output files, for example MA0070.1.pfm MA0071.1.pfm MA0072.1.pfm MA0073.1.pfm MA0074.1.pfm MA0075.1.pfm MA0076.1.pfm MA0077.1.pfm MA0078.1.pfm MA0079.1.pfm and matrix_list.txt.

File: JASPAR_CORE/MA0070.1.pfm

 5  3 16  1  0 17 17  0  0 16 12  8
 6  9  1  1 18  1  0  0 18  1  0  2
 2  3  1  0  0  0  0  1  0  0  1  2
 5  3  0 16  0  0  1 17  0  1  5  6

File: JASPAR_CORE/MA0071.1.pfm

15  9  6 11 21  0  0  0  0 25
 1  1 12  2  0  0  0  0 25  0
 2  0  4  5  4 25 25  0  0  0
 7 15  3  7  0  0  0 25  0  0

File: JASPAR_CORE/MA0072.1.pfm

 9 17 15 35 23  2  0 28  0  0  0  0 36 15
 8  2  0  1  0 12  0  0  0  0  0 36  0  6
 8  7  3  0  0 13  0  8 36 36  0  0  0 10
11 10 18  0 13  9 36  0  0  0 36  0  0  5

File: JASPAR_CORE/MA0073.1.pfm

 3  1  3  0  7  9  8  4  0 11  4  1  3  4  2  4  4  4  1  4
 8 10  8 11  4  2  3  6 11  0  7 10  8  6  9  5  5  6  7  4
 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  3  2
 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  2  1  1  0  1

File: JASPAR_CORE/MA0074.1.pfm

 3  0  0  0  0  9  4  2  2  5  0  0  1  0  7
 0  0  0  0  9  0  2  4  0  0  0  0  0  9  1
 7 10  9  0  0  1  0  2  8  5 10  0  0  0  2
 0  0  1 10  1  0  4  2  0  0  0 10  9  1  0

File: JASPAR_CORE/MA0075.1.pfm

52 59  0  0 58
 2  0  0  0  0
 4  0  1  0  1
 1  0 58 59  0

File: JASPAR_CORE/MA0076.1.pfm

16  0  0  0  0 20 16  4  1
 1 20 20  0  0  0  0  1  6
 2  0  0 20 20  0  0 15  0
 1  0  0  0  0  0  4  0 13

File: JASPAR_CORE/MA0077.1.pfm

24 54 59  0 65 71  4 24  9
 7  6  4 72  4  2  0  6  9
31  7  0  2  0  1  1 38 55
14  9 13  2  7  2 71  8  3

File: JASPAR_CORE/MA0078.1.pfm

 7  8  3 30  0  0  0  0  0
 9  8 18  0  1  0  0  0 17
 6  4  1  0  0  0 31  2 10
 9 11  9  1 30 31  0 29  4

File: JASPAR_CORE/MA0079.1.pfm

1 2 0 0 0 2 0 0 1 2
1 1 0 0 5 0 1 0 1 0
4 4 8 8 2 4 5 6 6 0
2 1 0 0 1 2 2 2 0 6

File: JASPAR_CORE/matrix_list.txt

MA0071.1	13.1897301896459	RORA_1	Zinc-coordinating	; acc "NP_599023" ; collection "CORE" ; comment "isoform type" ; family "Hormone-nuclear Receptor" ; medline "7926749" ; pazar_tf_id "TF0000047" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" 
MA0077.1	9.07881462267178	SOX9	Other Alpha-Helix	; acc "P48436" ; collection "CORE" ; comment "-" ; family "High Mobility Group" ; medline "9973626" ; pazar_tf_id "TF0000053" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" 
MA0075.1	9.06306510239134	Prrx2	Helix-Turn-Helix	; acc "Q06348" ; collection "CORE" ; comment "-" ; family "Homeo" ; medline "7901837" ; pazar_tf_id "TF0000051" ; species "10090" ; tax_group "vertebrates" ; type "SELEX" 
MA0070.1	14.6408952002356	PBX1	Helix-Turn-Helix	; acc "Q5T486" ; collection "CORE" ; comment "-" ; family "Homeo" ; medline "7910944" ; pazar_tf_id "TF0000046" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" 
MA0079.1	9.7185757452318	SP1	Zinc-coordinating	; acc "P08047" ; collection "CORE" ; comment "-" ; family "BetaBetaAlpha-zinc finger" ; medline "2192357" ; pazar_tf_id "TF0000055" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" 
MA0073.1	22.2782723704014	RREB1	Zinc-coordinating	; acc "Q92766" ; collection "CORE" ; comment "-" ; family "BetaBetaAlpha-zinc finger" ; medline "8816445" ; pazar_tf_id "TF0000049" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" 
MA0079.2	11.1288626921664	SP1	Zinc-coordinating	; acc "P08047" ; collection "CORE" ; comment "Annotations from PAZAR SP1 + SP1_MOUSE + SP1_HUMAN + SP1_RAT in the pleiades genes project (TF0000105, TF0000121, TF0000137, TF0000146)." ; family "BetaBetaAlpha-zinc finger" ; medline "17916232" ; pazar_tf_id "TF0000055" ; species "9606,10090,10116" ; tax_group "vertebrates" ; type "COMPILED" 
MA0078.1	10.5018372361999	Sox17	Other Alpha-Helix	; acc "Q61473" ; collection "CORE" ; comment "-" ; family "High Mobility Group" ; medline "8636240" ; pazar_tf_id "TF0000054" ; species "10090" ; tax_group "vertebrates" ; type "SELEX" 
MA0072.1	17.4248426117905	RORA_2	Zinc-coordinating	; acc "NP_599022" ; collection "CORE" ; comment "isoform type" ; family "Hormone-nuclear Receptor" ; medline "7926749" ; pazar_tf_id "TF0000048" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" 
MA0074.1	20.4511671987138	RXRA::VDR	Zinc-coordinating	; acc "P19793,P11473" ; collection "CORE" ; comment "heterodimer between RXRA and VDR" ; family "Hormone-nuclear Receptor" ; medline "8674817" ; pazar_tf_id "TF0000050" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" 
MA0076.1	14.123230134165	ELK4	Winged Helix-Turn-Helix	; acc "P28324" ; collection "CORE" ; comment "-" ; family "Ets" ; medline "8524663" ; pazar_tf_id "TF0000052" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" 

Directory: JASPAR_FAM

This directory contains output files.

Directory: JASPAR_PBM

This directory contains output files.

Directory: JASPAR_PBM_HLH

This directory contains output files.

Directory: JASPAR_PBM_HOMEO

This directory contains output files.

Directory: JASPAR_PHYLOFACTS

This directory contains output files.

Directory: JASPAR_POLII

This directory contains output files.

Directory: JASPAR_SPLICE

This directory contains output files.

Data files

None

Notes

The home page of JASPAR is: http://jaspar.genereg.net Running this program may be the job of your system manager.

References

  1. DNA binding sites: representation and discovery Bioinformatics. 2000 Jan;16(1):16-23
  2. Applied bioinformatics for the identification of regulatory elements Nat Rev Genet. 2004 Apr;5(4):276-87

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0 unless an error is reported

Known bugs

None.

See also

Program name Description
aaindexextract Extract amino acid property data from AAINDEX
cutgextract Extract codon usage tables from CUTG database
printsextract Extract data from PRINTS database for use by pscan
prosextract Processes the PROSITE motif database for use by patmatmotifs
rebaseextract Process the REBASE database for use by restriction enzyme applications
tfextract Process TRANSFAC transcription factor database for use by tfscan

Author(s)

Alan Bleasby
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.

History

Completed 23rd July 2007

Target users

This program is intended to be used by administrators responsible for software and database installation and maintenance.

Comments

None