Mammalian PromoterCAD: Data Driven Design of Regulatory DNA for Mammals

Mammalian PromoterCAD snapshot

Synthetic Promoter Design

Synthetic promoters can be used to control when, where, and how much of an arbitrary gene is expressed in an organism. The regulation of a promoter is specified by its pattern of cis-regulatory motifs; several methods have been developed for identifying and predicting the functions of natural motifs. Synthetic promoters with novel regulation functions can be designed by rational motif arrangement. PromoterCAD is a simple web-based UI for data-driven regulatory DNA design. Plugin tools are used to mine regulatory motifs from public databases, with an iterative design workflow for the accumulation of many motifs within a synthetic promoter design. The UI arbitrates when different introduced motifs overlap in position. Currently, Mammalian PromoterCAD uses data collected from Mus musculus to aid synthetic mamalian promoter design.

Mammalian PromoterCAD

Publication


Tutorials and Help

Mammalian PromoterCAD extensions and workflow management

The backend LinkData.org provides several extra capabilities (some require user login). (1) The user can save their entire design workflow as a snapshot which can be published on the LinkData site (by default user workflows are private), or shared for community discussion on Facebook. (2) Human accessible: users can directly access data tables (web, excel, csv). Each data table contains a description of its sources and data processing methods, which the user can check. (3) Machine accessible: The LinkData.org API provides programmatic access to all data (RDF, JSON, Turtle, XML). (4) Data Extensibility: Users are provided with an excel template which they can use to upload their own data tables. These can be added to PromoterCAD using a graphical input data control system. This will allow the extension of PromoterCAD to other organisms, with more experimental data, and more cis-regulatory motif identification methods. (5) Code Extensibility: Users can fork the source code of PromoterCAD to easily create their own data mining plugins. Source codes of Mammalian PromoterCAD (http://app.linkdata.org/update/app1s449i) is licensed under

the LGPL-3.0 license (the GNU Lesser General Public License, version 3.0), and

the Creative Commons license CC-BY-SA, version 3.0.

by-sa


Template Files for Adding Mammalian Data to PromoterCAD

We provide template files to make new PromoterCAD data tables. These assist users expand the scope of PromoterCAD by adding new data for a new organism, experimental measurements, or motif identification methods.

Table Data Name URL Description Reference
Mammalian PromoterCAD Template files for adding new gene expression data http://linkdata.org/work/rdf1s1068i These template files are useful if you would like to add new data to Mammalian PromoterCAD, such as for a new organisms, experimental measurements, or motif identification methods.  
Mammalian PromoterCAD Template files for adding promoter motif data http://linkdata.org/work/rdf1s1067i These template files are useful if you would like to add new data to Mammalian PromoterCAD, such as for a new organisms, experimental measurements, or motif identification methods.  

Data Tables Currently Used by Mammalian PromoterCAD

We collated previously published genomic and transcriptomic data, including information on 679 genes from Mus musculus and 65,000 microarray data measurements in 96 tissue organs and cell types. All of the data sources and processing steps are described in the linked data tables below.

Table Data Name URL Description Reference

[PromoterCAD Data Table]
Mammalian PromoterCAD Data Table http://linkdata.org/work/rdf1s918i This will allow the Mammalian PromoteCAD Application to find the appropriate tools, the input data and the gene rank list.  

[Raw Table Data]
Gene Atlas http://biogps.org/downloads/ Anatomical Mus musculus microarray measurements. The Gene Atlas dataset (GSE10246, GPL1261) includes systematic microarray analysis of mouse GPCR expressions in various cell types including macrophage and other tissues. Anatomical Series includes measurements from the main tissue organs of Mus musculus. Lattin JE et al, 2008
PEDB http://promoter.cdb.riken.jp/ The Mammalian Promoter/Enhance Database (PEDB) includes three types of regulatory elements, which constitute the mammalian circadian clock: Clock/Bmal1-binding elements (E-box) (CACGTG), DBP/E4BP4 binding elements (D-box) (TTATG[T/C]AA), and RevErbA/ROR binding elements (RRE) ([A/T]A[A/T]NT[A/G]GGTCA). Kumaki Y et al, 2008

[PEDB (Mammalian Promoter/Enhancer Database) data]
Gene Atlas expression + PEDB promoter motif http://linkdata.org/work/rdf1s912i Systematic microarray analysis (Gene Atlas) of mouse GPCR expressions in various cell types including macrophage and other tissues, combined with clock-controlled elements by computational prediction (PEDB). We selected probe set with highest mean intensity across all the samples in the dataset as a gene representative from Gene Atlas, then sorted the cell panel sample types into cell type (Lymphocytes, Myeloid leukocytes, ..., Derived cells) and into organ types (Brain & Neural tissues, Eye, .... Reproductive organs). The motifs range from -100000 to +100000 relative to the TSS, we focused on the motifs of from -10000 to -1. Gene Loci without expression data or motif data were removed from this database. Kumaki Y et al, 2008
Lattin JE et al, 2008
Rank Lists of Gene Atlas expression + PEDB promoter motif http://linkdata.org/work/rdf1s914i Lists for the Gene Atlas and PEDB mashup data table. For each experiment, all genes ranked by absolute level of mean expression. Kumaki Y et al, 2008
Lattin JE et al, 2008

[Baseline sequence]
Mouse promoter sequence http://linkdata.org/work/rdf1s917i Baseline promoter sequences from M. musculus, calculated based on the information of NCBI genome build 33. Church DM et al, 2011