Fork Your Own GenoCon2 Data Mining Plug-in to Compete or Share to find useful Gene Expression Patterns

The creative activity of the GenoCon2 Special programming challenge is to “mine” GenoCon2 data mashups by looking for new associations and genes with interesting or useful expression patterns.

Examples can include sets of genes that are all expressed in the same growth condition, at the same time of day, and/or in the same part of the plant. We provide basic functions and example workflows which show how to look for a gene expression property (e.g. highly expressed in leaf tissue in young plants) and retrieve the functional DNA sequences to add to a synthetic promoter design.

We will judge submitted functions based on their novelty, usefulness, and implementation. With social networking participants can discuss programming functions, find out which functions the DNA designers with more biological experience would like to have, and even share code under development. We encourage group participation, and our system automatically tracks the history of code forking so credit can be given to all contributors.

If you want to learn more about making your own biological design program, how our data is structured, how to submit and test your program, or anything else about the programming challenge please head over to our manuals section for more details!

Background:
We developed a web-interface software CAD system for designing synthetic plant promoter sequences. Users can start with a baseline or natural promoter sequence, then interface directly with biological databases to alter this DNA sequence using a series of modular data mining functions.

We uploaded several genomic datasets related to gene expression and promoter sequence to LinkData.org. This semantic web system allows the data to be accessed in a variety of formats, both human and machine readable, and sets up the rows and columns of data files to be easy to hyperlink together or “mashup” to do cross-dataset analysis. We have created 4 such mashups from Arabidopsis thalania databases as a resource for synthetic promoter design.

The LinkData system is also a rapid deployment platform for web apps, which has allowed us to construct a simple menu-driven user interface (PromoterCAD). We are challenging programmers to contribute to our system by adding new data mining functions. Participants can (1) copy and modify function modules (‘plug-ins’) from our source code and then easily add them back into the CAD system using App.LinkData.Org or (2) write entirely new functions in Javascript for accessing the data mashups and retrieving useful DNA sequences.