PromoterCAD: Data Driven Design of Plant Regulatory DNA

PromoterCAD_snapshot

Synthetic Promoter Design

Synthetic promoters can be used to control when, where, and how much of an arbitrary gene is expressed in an organism. The regulation of a promoter is specified by its pattern of cis-regulatory motifs; several methods have been developed for identifying and predicting the functions of natural motifs. Synthetic promoters with novel regulation functions can be designed by rational motif arrangement. PromoterCAD is a simple web-based UI for data-driven regulatory DNA design. Plugin tools are used to mine regulatory motifs from public databases, with an iterative design workflow for the accumulation of many motifs within a synthetic promoter design. The UI arbitrates when different introduced motifs overlap in position. Currently, PromoterCAD uses data collected from Arabidopsis thaliana to aid synthetic plant promoter design.

launch_button_withds

Publication


Tutorials and Help

PromoterCAD extensions and workflow management

The backend LinkData.org provides several extra capabilities (some require user login). (1) The user can save their entire design workflow as a snapshot which can be published on the LinkData site (by default user workflows are private), or shared for community discussion on Facebook. (2) Human accessible: users can directly access data tables (web, excel, csv). Each data table contains a description of its sources and data processing methods, which the user can check. (3) Machine accessible: The LinkData.org API provides programmatic access to all data (RDF, JSON, Turtle, XML). (4) Data Extensibility: Users are provided with an excel template which they can use to upload their own data tables. These can be added to PromoterCAD using a graphical input data control system. This will allow the extension of PromoterCAD to other organisms, with more experimental data, and more cis-regulatory motif identification methods. (5) Code Extensibility: Users can fork the source code of PromoterCAD to easily create their own data mining plugins. PromoterCAD source code (http://app.linkdata.org/update/app1s335i) is licensed under

the LGPL-3.0 license (the GNU Lesser General Public License, version 3.0), and

the Creative Commons license CC-BY-SA, version 3.0.

by-sa


Template Files for Adding Plant Data to PromoterCAD

We provide template files to make new PromoterCAD data tables. These assist users expand the scope of PromoterCAD by adding new data for a new organism, experimental measurements, or motif identification methods.

Table Data Name URL Description Reference
PromoterCAD Template files for adding new gene expression data http://linkdata.org/work/rdf1s479i These template files are useful if you would like to add new data to PromoterCAD, such as for a new organisms, experimental measurements, or motif identification methods.  
PromoterCAD Template files for adding promoter motif data http://linkdata.org/work/rdf1s921i These template files are useful if you would like to add new data to PromoterCAD, such as for a new organisms, experimental measurements, or motif identification methods.  

Data Tables Currently Used by PromoterCAD

We collated previously published genomic and transcriptomic data, including information on 21,000 genes from Arabidopsis thaliana, and 1,410,000 microarray data measurements in 20 growth conditions and 79 tissue organs and developmental stages. All of the data sources and processing steps are described in the linked data tables below.

Table Data Name URL Description Reference

[PromoterCAD Data Table]
PromoterCAD Data Table http://linkdata.org/work/rdf1s792i This will allow the PromoteCAD Application to find the appropriate tools, the input data and the gene rank list.  

[Raw Table Data]
AtGenExpress http://arabidopsis.org/servlets/TairObject?type=expression_set&id=1006710873 Developmental Arabidopsis thaliana microarray measurements. AtGenExpress is a high-throughput database which includes microarray measurements of gene expression in several conditions. The Developmental Series includes measurements from the main tissue organs of Arabidopsis thaliana. Schmid M et al, 2005
DIURNAL http://diurnal.mocklerlab.org/about Circadian Arabidopsis thaliana microarray measurements. The DIURNAL project is a microarray gene expression database which was collected over two days (44 hours) at four hour intervals in various growth conditions. These measurements are made on 7-9 day old seedlings, and show gene expression levels across the whole plant. Mockler TC et al, 2007
ATTED-II http://atted.jp/ The ATTED-II database uses gene co-expression analysis of the AtGenExpress data to predict functional 7 base-pair motifs in promoters. Motifs identified within a promoter region 200 bp upstream of the transcription start site. Obayashi T et al, 2007
PPDB http://ppdb.agr.gifu-u.ac.jp/ppdb/ The Plant Promoter Database (PPDB) uses word frequency analysis to identify 8 base-pair motifs in promoters. Motifs are categorized as TATA-box, Y-patch (a CT-rich region often found in plant TATA box promoters), the initiator near the transcription start site, and cis-regulatory motifs. The regulatory motifs are a group of 308 8 base-pair sequences which are used to represent regulatory sites. Yamamoto YY et al, 2007

[Derived Data]
AtGenExpress median data http://linkdata.org/work/rdf1s460i Gene expression data from AtGenExpress was taken in triplicate, and we use the median value. Expression data was used as log-transformed absolute expression values. Schmid M et al, 2005
AtGenExpress mean data http://linkdata.org/work/rdf1s785i Gene expression data from AtGenExpress was taken in triplicate, and we use the mean value. Expression data was used as log-transformed absolute expression values. Schmid M et al, 2005
DIURNAL fitted up with sine wave http://linkdata.org/work/rdf1s457i For DIURNAL gene expression data, we calculated the Phase of each data by non-linear best fit to a sine wave with a 24 hour period, and similarly calculated the Amplitude. Visually checking the 1000 largest amplitude genes showed a good fit in all cases, though a few genes clearly deviated from sinusoidal behavior. All had a major period of 24 hours. Mockler TC et al, 2007

[Normalized Data]
AtGenExpress mean normalized http://linkdata.org/work/rdf1s369i Expression data from AtGenExpress is used as values normalized to the mean of all experiments for the AtGenExpress data. Schmid M et al, 2005
DIURNAL fitted by sine wave, mean normalized http://linkdata.org/work/rdf1s370i Expression data from DIURNAL is used as values normalized to the mean expression value over 12 measurements for the DIURNAL data. Mockler TC et al, 2007

[Expression data mashed up with motif sequence data]
Developmental Coexpression (AtGenExpress gene expression + ATTED-II promoter motif) median http://linkdata.org/work/rdf1s584i Developmental Microarray Expression Data (AtGenExpress) of plant developmental tissues, combined with CEG coexpression analysis regulatory (7mer) motif calculations (ATTED-II). We took the median of triplicate measurements from AtGenExpress, then sorted the developmental series into plant organs (Flower, Leaf, Root, Stem, Fruit & Seeds), with one category for seedlings (8 days old or less) and another for whole plants (older than 8 days). Gene Loci without expression data or motif data were removed from this database. Schmid M et al, 2005
Obayashi T et al, 2007
Developmental Genomic (AtGenExpress gene expression + PPDB promoter motif) median http://linkdata.org/work/rdf1s705i Developmental Microarray Expression Data (AtGenExpress) of plant developmental tissues, combined with LDSS sequence analysis of regulatory (8mer) motif calculations (PPDB). We took the median of triplicate measurements from AtGenExpress, then sorted the developmental series into plant organs (Flower, Leaf, Root, Stem, Fruit & Seeds), with one category for seedlings (8 days old or less) and another for whole plants (older than 8 days). For each motif, we calculated the position relative to the TSS as determined experimentally (PPDB). Gene Loci without expression data or motif data were removed from this database. Schmid M et al, 2005
Yamamoto YY et al, 2007
Circadian Coexpression (DIURNAL gene expression + ATTED-II promoter motif) http://linkdata.org/work/rdf1s586i Circadian Data collected over two days (44 hours) at four hour intervals in various growth conditions. We calculated the Phase of each data by non-linear best fit to a sine wave with a 24 hour period, and similarly calculated the Amplitude. Visually checking the 1000 largest amplitude genes showed a good fit in all cases, though a few genes clearly deviated from sinusoidal behavior. All had a major period of 24 hours. For each gene locus, we added the CEG motifs as calculated by ATTED-II. Gene Loci without expression data or motif data were removed from this database. Mockler TC et al, 2007
Obayashi T et al, 2007
Circadian Genomic (DIURNAL gene expression + PPDB promoter motif) http://linkdata.org/work/rdf1s706i Circadian Data collected over two days (44 hours) at four hour intervals in various growth conditions. We calculated the Phase of each data by non-linear best fit to a sine wave with a 24 hour period, and similarly calculated the Amplitude. Visually checking the 1000 largest amplitude genes showed a good fit in all cases, though a few genes clearly deviated from sinusoidal behavior. All had a major period of 24 hours. For each gene locus, we added the LDSS motifs as calculated by PPDB. Gene Loci without expression data or motif data were removed from this database. Mockler TC et al, 2007
Yamamoto YY et al, 2007

[Normalized expression data mashed up with motif sequence data]
Developmental Coexpression (AtGenExpress gene expression + ATTED-II promoter motif) normalized http://linkdata.org/work/rdf1s585i Normalized Expression Data of AtGenExpress plant developmental tissues, combined with CEG coexpression analysis regulatory (7mer) motif calculations (ATTED-II). Schmid M et al, 2005
Obayashi T et al, 2007
Developmental Genomic (AtGenExpress gene expression + PPDB promoter motif) normalized http://linkdata.org/work/rdf1s707i Normalized Expression Data of AtGenExpress plant developmental tissues, combined with LDSS sequence analysis of regulatory (8mer) motif calculations (PPDB). Schmid M et al, 2005
Yamamoto YY et al, 2007
Circadian Coexpression (DIURNAL gene expression + ATTED-II promoter motif) normalized http://linkdata.org/work/rdf1s587i Normalized Expression Data of DIURNAL circadian conditions, combined with CEG coexpression analysis regulatory (7mer) motif calculations (ATTED-II). Mockler TC et al, 2007
Obayashi T et al, 2007
Circadian Genomic (DIURNAL gene expression + PPDB promoter motif) normalized http://linkdata.org/work/rdf1s708i Normalized Expression Data of DIURNAL circadian conditions, combined with LDSS sequence analysis of regulatory (8mer) motif calculations (PPDB). Mockler TC et al, 2007
Yamamoto YY et al, 2007

[Gene Rank Lists, sorted by Gene Expression Property]
Rank Lists of Developmental Coexpression (AtGenExpress gene expression + ATTED-II promoter motif) http://linkdata.org/work/rdf1s709i Lists for the AtGenExpress and ATTED-II mashup data table. For each experiment, all genes ranked by absolute level of median expression. Schmid M et al, 2005
Obayashi T et al, 2007
Rank Lists of Developmental Genomic (AtGenExpress gene expression + PPDB promoter motif) http://linkdata.org/work/rdf1s712i Lists for the AtGenExpress and PPDB mashup data table. For each experiment, all genes ranked by absolute level of median expression. Schmid M et al, 2005
Yamamoto YY et al, 2007
Rank Lists of Circadian Coexpression (DIURNAL gene expression + ATTED-II promoter motif) http://linkdata.org/work/rdf1s710i Lists for the Diurnal and ATTED-II mashup data table. For each experiment, all genes ranked by circadian amplitude. Mockler TC et al, 2007
Obayashi T et al, 2007
Rank Lists of Circadian Genomic (DIURNAL gene expression + PPDB promoter motif) http://linkdata.org/work/rdf1s713i Lists for the Diurnal and PPDB mashup data table. For each experiment, all genes ranked by circadian amplitude. Mockler TC et al, 2007
Yamamoto YY et al, 2007
Rank Lists of Developmental Coexpression (AtGenExpress gene expression + ATTED-II promoter motif) normalized http://linkdata.org/work/rdf1s715i Lists for the normalized expression of AtGenExpress and ATTED-II mashup data table. For each experiment, all genes ranked by absolute expression level. Schmid M et al, 2005
Obayashi T et al, 2007
Rank Lists of Developmental Genomic (AtGenExpress gene expression + PPDB promoter motif) normalized http://linkdata.org/work/rdf1s717i Lists for the normalized expression of AtGenExpress and PPDB mashup data table. For each experiment, all genes ranked by absolute expression level. Schmid M et al, 2005
Yamamoto YY et al, 2007
Rank Lists of Circadian Coexpression (DIURNAL gene expression + ATTED-II promoter motif) normalized http://linkdata.org/work/rdf1s716i Lists for the normalized expression of Diurnal and ATTED-II mashup data table. For each experiment, all genes ranked by circadian amplitude. Mockler TC et al, 2007
Obayashi T et al, 2007
Rank Lists of Circadian Genomic (DIURNAL gene expression + PPDB promoter motif) normalized http://linkdata.org/work/rdf1s718i Lists for the normalized expression of Diurnal and PPDB mashup data table. For each experiment, all genes ranked by circadian amplitude. Mockler TC et al, 2007
Yamamoto YY et al, 2007

[PLACE (Plant Cis-acting Regulatory DNA Element) data]
AtGenExpress median gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s740i Developmental Microarray Expression Data (AtGenExpress) of plant developmental tissues, combined with PLACE (Plant Cis-acting Regulatory DNA Elements) motif database. We took the median of triplicate measurements from AtGenExpress, then sorted the developmental series into plant organs (Flower, Leaf, Root, Stem, Fruit & Seeds), with one category for seedlings (8 days old or less) and another for whole plants (older than 8 days). For each motif, we calculated the position relative to the TSS as determined experimentally. Gene Loci without expression data or motif data were removed from this database. Schmid M et al, 2005
Higo K et al, 1999
DIURNAL gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s746i Circadian Data collected over two days (44 hours) at four hour intervals in various growth conditions. We calculated the Phase of each data by non-linear best fit to a sine wave with a 24 hour period, and similarly calculated the Amplitude. Visually checking the 1000 largest amplitude genes showed a good fit in all cases, though a few genes clearly deviated from sinusoidal behavior. All had a major period of 24 hours. For each gene locus, we added the various sizes of motifs predicted by PLACE (Plant Cis-acting Regulatory DNA Elements) database. Gene Loci without expression data or motif data were removed from this database. Mockler TC et al, 2007
Higo K et al, 1999
AtGenExpress normalized gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s741i Normalized Expression Data of AtGenExpress plant developmental tissues, combined with PLACE (Plant Cis-acting Regulatory DNA Elements) motif database. Schmid M et al, 2005
Higo K et al, 1999
DIURNAL normalized gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s747i Normalized Expression Data of DIURNAL circadian conditions, combined with PLACE (Plant Cis-acting Regulatory DNA Elements) motif database. Mockler TC et al, 2007
Higo K et al, 1999
Rank Lists of AtGenExpress median gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s748i Lists for the AtGenExpress and PLACE mashup data table. For each experiment, all genes ranked by absolute level of median expression. Schmid M et al, 2005
Higo K et al, 1999
Rank Lists of DIURNAL gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s749i Lists for the Diurnal and PLACE mashup data table. For each experiment, all genes ranked by circadian amplitude. Mockler TC et al, 2007
Higo K et al, 1999
Rank Lists of AtGenExpress normalized gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s750i Lists for the normalized expression of AtGenExpress and PLACE mashup data table. For each experiment, all genes ranked by normalized expression level. Schmid M et al, 2005
Higo K et al, 1999
Rank Lists of DIURNAL normalized gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s751i Lists for the normalized expression of Diurnal and PLACE mashup data table. For each experiment, all genes ranked by normalized circadian amplitude. Mockler TC et al, 2007
Higo K et al, 1999

[Mean data for triplicate expression values of AtGenExpress]
Developmental Coexpression (AtGenExpress gene expression + ATTED-II promoter motif) mean http://linkdata.org/work/rdf1s769i Developmental Microarray Expression Data (AtGenExpress) of plant developmental tissues, combined with CEG coexpression analysis regulatory (7mer) motif calculations (ATTED-II). We took the mean of triplicate measurements from AtGenExpress, then sorted the developmental series into plant organs (Flower, Leaf, Root, Stem, Fruit & Seeds), with one category for seedlings (8 days old or less) and another for whole plants (older than 8 days). Gene Loci without expression data or motif data were removed from this database. Schmid M et al, 2005
Obayashi T et al, 2007
Developmental Genomic (AtGenExpress gene expression + PPDB promoter motif) mean http://linkdata.org/work/rdf1s778i Developmental Microarray Expression Data (AtGenExpress) of plant developmental tissues, combined with LDSS sequence analysis of regulatory (8mer) motif calculations (PPDB). We took the mean of triplicate measurements from AtGenExpress, then sorted the developmental series into plant organs (Flower, Leaf, Root, Stem, Fruit & Seeds), with one category for seedlings (8 days old or less) and another for whole plants (older than 8 days). For each motif, we calculated the position relative to the TSS as determined experimentally (PPDB). Gene Loci without expression data or motif data were removed from this database. Schmid M et al, 2005
Yamamoto YY et al, 2007
AtGenExpress mean gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s774i Developmental Microarray Expression Data (AtGenExpress) of plant developmental tissues, combined with PLACE (Plant Cis-acting Regulatory DNA Elements) motif database. We took the mean of triplicate measurements from AtGenExpress, then sorted the developmental series into plant organs (Flower, Leaf, Root, Stem, Fruit & Seeds), with one category for seedlings (8 days old or less) and another for whole plants (older than 8 days). For each motif, we calculated the position relative to the TSS as determined experimentally. Gene Loci without expression data or motif data were removed from this database. Schmid M et al, 2005
Higo K et al, 1999
Rank Lists of Developmental Coexpression (AtGenExpress gene expression + ATTED-II promoter motif) mean http://linkdata.org/work/rdf1s770i Lists for the AtGenExpress and ATTED-II mashup data table. For each experiment, all genes ranked by absolute level of mean expression. Schmid M et al, 2005
Obayashi T et al, 2007
Rank Lists of Developmental Genomic (AtGenExpress gene expression + PPDB promoter motif) mean http://linkdata.org/work/rdf1s779i Lists for the AtGenExpress and PPDB mashup data table. For each experiment, all genes ranked by absolute level of mean expression. Schmid M et al, 2005
Yamamoto YY et al, 2007
Rank Lists of AtGenExpress mean gene expression + PLACE promoter motif http://linkdata.org/work/rdf1s775i Lists for the AtGenExpress and PLACE mashup data table. For each experiment, all genes ranked by absolute level of mean expression. Schmid M et al, 2005
Higo K et al, 1999

[Baseline sequence]
Arabidopsis promoter sequence http://linkdata.org/work/rdf1s728i Baseline promoter sequences from Arabidopsis, calculated based on the PPDB data nad TAIR9 information. Yamamoto YY et al, 2007
Yamamoto YY et al, 2009
Lamesch P et al, 2012