ISMB/ECCB 2009 Day 3

Jul 1, 10:15 am

Yang Huang (NIH) – Graph Theoretical Approach To Study eQTL: A Case Study of Plasmodium Falciparum

P.falciparum is the most deadly human malaria pathogen. Little information about gene regulation so far, eQTL might be able to shed some light on this regulation and drug resistance. Reference to Daphne’s talk, difficult to deploy her methods due to lack of information in this pathogen.

Hypothesis: SNPs might affect gee expression. Consider expression as a quantitative trait like height, weight. Identify the associated locus by statistical methods.

For each progeny strain measure all gene expression, genotypes for predetermined loci. Result is a set of vectors, one for each strain including the genotype and expression values. Detect locus/gene pairs with statistical association between gene expression, identified genotype at one locus.

Traditional tests between multiple loci, all expression. Comprehensive and without biast, but does not use the inherent data structure, computationally expensive and a problem of statistical power. Alternative approach GeD, Graph-based eQTL decomposition. Include strain data in the association graph to identify hidden structure (eQTL association cliques) that help reduce the complexity of the data.

Construct graph

Three types of vertices: gene linked to strain linked to locus. One node for each distinct locus, two nodes per gene (up/down-regulated).

eQTL association cliques

Each clique has 3 vertices (G/S/L) that are fully connected, in addition each clique is a maximal subgraph that cannot be extended further; enumerate all maximal bipartite cliques in the graph (Farach-Colton, CPM 2008). Merge cliques by common vertices for strains.

eQTL detection

Heuristic approach on eQTL cliques to look for (Locus,gene) pairs with certain patterns; refer to graph/diagram in paper. Support is the number of strains in the data set that agree with an identified pattern.

Results

34 progeny strains, eQTLs need to be supported by at least six strains. Significant difference to random background cliques, 1327 eQTLs for 513 probes are significant (p-value adjusted by background model). About 25% overlap with a classical eQTL results, but with similar genomic distribution. Enrichment in chr3 subtelomeric regions, genes in the region enriched for host interaction.

Cliques help to detect eQTLs, avoiding a large number of tests; integration of strain information provides a new framework for eQTL studies. Improve heuristics, identify one-to-many loci to gene interactions as future work.


Mark Clement (Brigham Young) – GNUMAP: Unbiased Probabilistic Mapping of Next-Generation Sequencing Reads

Hash target genome into k-mers (indexing) for constant time lookup. Alignment step: identify possible match locations based on seed location followed by probabilistic Needleman-Wunsch, taking base call quality into account. Includes PWM for each sequence, allows inclusion of insertions / deletions in addition to quality information. If a read (PWM) matches multiple locations uses a probabilistic assignment, assigning proportions of a read to all possible match locations.

Relation to other tools via simulation studies as well as actual data. Seems to be doing better recovering the original location, but not quite clear what would be the ideal problem / data set to play to GnuMap’s advantages. No support for paired-end reads currently, but planned along with SOLiD support.

Slower but more precise than other mappers. No limit on sequence size and would work with 454, but needs all four base call probabilities (FASTQ file is not enough).


Younghoon Kim (KAIST) – MONET: A Cytoscape plugin for genome-scale network inference from expression profiles using modularization and parallel processing techniques with supercomputing resources

A need to add information beyond large scale expression data. Improving the sample to gene ratio by modularization. Incorporates pre-existing functional annotation (GO). Identify functional modules based on annotation, incorporate in global network in a divide-and-conquer approach. Calculates a bayesian network analysis for each modules. Identifies seed genes for each condition and expands based on functional annotation and expression data (details are a bit difficult to follow, recommend looking at the paper)


Shai Lubliner (Weizmann) – Modeling Interactions between Adjacent Nucleosomes Improves Genome-wide Predictions of Nucleosome Occupancy

75-90% of the DNA is associated with nucleosomes, play an important regulatory role. Determining occupancy of DNA is being done with a number of different methods, results in an affinity landscape. Here: a thermodynamic model. Reasonable correlation of 0.65 between model and data despite only using the DNA information.

Additional interactions are important for chromatin organization. DNA bending proteins, TF, histone modifications, etc. Trying to capture interactions between adjacent nucleosomes (cooperative effects) with the linker length preferences being encoded by a function. A ‘no cooperation’ function as reference / background model. Linker preferences obtained from in vivo data (right shifted peak, exponential decay at a certain length), tested five different representative functions.

Sample 5000 random configurations from a model instance and compared it to the in vivo data, functions can represent the underlying cooperative data. Repeat with data samples, add noise and try to fit different models to the sampled occupancy landscape. Model can be fitted if cooperativity is taken into effect, no success without cooperative interactions.

Do interactions play a role in vitro and in vivo (aka real biological systems)? For an in vitro validation the Exp, Step functions work better than the no cooperativity model, they exhibit a strong preference for short linker lengths.

For in vivo examples similar results with almost all functions doing better than the background model. Repeat for C.elegans with the Exp function only, again a strong preference for short linker lengths

Biological basis for this preference: shorter length allow for interaction of nucleosomes, energetically favoring their shift from otherwise better binding positions


Karim Chine – Computational Biology in the cloud, towards a federative and collaborative R-based platform (Biocep-R)

(Talk actually given by Eamonn Maguire)

Java app built on top of R and Scilab. With RESTful API, improved graphics, extensibility (plugins, server-side), distributed resources. Components such as R, Bioconductor; GUIs with collaborative views; Scripting (R/Python/Ruby); stateless web services; NFS/FTP/S3 storage; cluster/grid support.

  • Server-side, grid-enabled collaborative spreadsheet, multiple clients connected to the same server sharing the same view.
  • GUI builder using Netbeans.
  • Full cloud support and node worker support. Biocep can automatically add workers on the cloud on demand given a certain load
  • Scripting to wrap additional services

Actually.. coverage over at FF is much better ;-)


Other notes

  • BoF meeting on microblogging on ff
  • Lengauer Keynote coverage on ff
Oliver Hofmann

,

---

Comments

Commenting is closed for this article.

---