Love the organization of SageCon: excellent twitter coverage (#sagecon), all talks streamed live, program with slides available online. Topics mostly revolve around data management, integrative genomics, tools and databases. I’m merely linking to the presentations of interest and copying and pasting selected twitter notes (mostly from @CameronNeylon, thanks as always!).
Stephen Friend, Sage Bionetworks
Kicks off the program with an overview of challenges and solutions (pdf) in drug development and the ideas around Sage (global coherent data sets, Sage Commons). It’s important we not be naive, nor evangelical about sharing of data. This room has a quarter of the people who have spent their life working on [standards, ontologies, networks….sharing etc. etc.]. People don’t do new things unless the risks are low and the benefits are higher.
Cancer drugs take $1b, 10 years to go to market. Helps 1/4th of patients. Not nearly good enough. Need to move away from single domain/discipline views. Need to integrate multiple layers of data and model. New technologies rapidly emerging. But we’re not prepared. It’s not one method. It’s many methods. Two schools: “Heterogeneous conditions lead to more reproducible behavioral results” vs “need more control and standards”.
Don’t look for changes. Buffers in the system will hide them. Drugs won’t work in phase 2 trials using that model. We need to look for the non-buffering, low redundancy drivers of disease. Need ways to manage, host, integrate, and use vast amounts of data to make this vision reality. We have to find a way to share clinical data with genomic data to really leverage it.
Need to appreciate that it will require decades if evolving representations to get good understanding. Four requirements: Data repository, platform architecture, prob causal network models of disease, rules and governance. Each of our three tasks are audacious. The massive data, the network disease models, the tools needed for the Commons. People could say ‘who are you, to think you can do this?’ The answer is, it’s not one group. Has to be a community. There’s still a territoriality to data. Sage bionetworks just “blowing wind into the sails of Sage Commons” – community has to believe in value to support value creation.
Sage has new partnership with Xinhua Univ. in China. No details. Close to another deal w/UCSF Qb3 at Mission Bay. New things SAGE is working on now: systems biology of Huntington’s, breast cancer. Cross-species networks. age is working on agreement with a top publisher to host network models for sharing. Focus of Sage is Global Coherent Datasets: Datasets with genome wide DNA variation, as well as phenotype over large number of individuals.
Andrea Califano, Columbia
Presentation on high-grade regulatory networks (pdf). The search for master regulators in glioblastoma. 8 trans factors explain most effect, but the regulatory network is important. Over expression of each of the eight has only minor effect. In humans using two of the TFs as markers correlates well with patient survival. Often these markers don’t work well in patients. Implications for therapeutic targets. To link biomarkers to molecular targets need to understand and target regulatory systems.
Need standards for effective sharing of data. Need to actually get the data out there. 80% not current available. Are we willing to change?
Lee Hood, ISB
On Network analysis for prion diseases — along with an overview of approaches to data gathering and handling noise. Data should be global. Different. Dynamic. Integrated. Prion study with 100 million data points. Reduce noise with “subtractive deep biology” — translation: Good experiments with many controlled conditions. Time course of multiple datasets, via transcriptomics, pathology, protein interactions, &c. All in eight strains of mouse. Highly successful in identifying differentially expressed genes strong associated with prion response. See more at their website.
Worried about sequencing whole cancer populations which average out signals and enhance noise.
Project descriptions
Kasarskis and Kipershmidt, NextBio
End-to-End pilot (pdf) on combining data, building and querying models. Trying to identify problems and issues with going from data to model and making all available. Network models will be shared in RDF form. Steep learning curve on leveraging ontologies.
Ilya Kupershmidt talking about using NextBio to correlate breast cancer drivers across public data sets. Based on ‘semantically organized datasets’. Is this same as annotated datasets? [Sounds like a tour through NextBio] Semantic datasets were curated by Nextbio, disease annotations based on Snomed CT.
Jesse Tennenbaum, Duke Translational Science
Standards and ontologies (pdf). What people think are three top issues: Consistent data format and metadata, representations, ontologies. Background for content, semantics, and syntax. Examples: MIAME, ontologies, XML and Tab formats. Identified standards, a set of open soure standards-based tools, some annotations as proof of concept [no monolithic tools, strict data model or prescribed ontologies]. What remains to be done: Formalize some recommended minimal information models, extend and integrate tools. Find a balance between structured annotation versus ease and expressivity. Divide work between curation experts and experimenters
Carole Goble, U of Manchester, UK
Sage Infrastructure Tools (pdf). Build around Alitora interface to GenePattern, Taverna, Cytoscape, all qyering the Sage repository. Core principles: maximize access, use, reuse. Distribute multiple formats, use existing standards and tools, design flexible, support community collaboration and annotation. Need better APIs to integrate better into tools.
John Wilbanks (Science Commons) covering for Rossini
Intro from the internationalization working group (pdf). Take on law, contracts, privacy. We do not live in a nation-centric world. Sage is a great potential global asset but… interop is a problem. Legal and policy regimes radically different across the world. Particularly privacy rights. Human genetic privacy being treated very differently internationally (where it is being treated at all). Contraints created by the use of human subjects, privacy and identifiability. Privacy means many different things. Also people using complexity as leverage (or an excuse) in negotiation; potential for unexpected problems: e.g. use of clinical or public health data as economic or political weapons. mportance of clear marking of rights associated with data. Transparency and certainty problem
Conclusions: Life is complex. Norms, contracts, IP, and privacy combine. Standards are promising. Specific cases to drive general
Liz Lyon, Bath
On citation (pdf) of network models. How to cite and credit the work and contribution of people to commons? Citations are still currency in academia. This is a serious problem. What are we citing? Journal articles, very macro, need more granularity. Workflows, visualizations, models, data, annotation, concepts. ow? Functionality? Policy? Citation requires some recognized unique ID. Researcher IDs? Data IDs. How to make interoperable? Biggest obstacle: Tenure process; trying to change that mindset.
Draft overview of guidelines.
Jeff Hammerbacher, Cloudera
On open source and open data (pdf). “From narrative to design”: leaving technology aside — have a narrative, collect and structure data, build tools, make models. Finance: A cautionary tale. What happens when sophisticated models have limited (expensive) data, when code is highly guarded. Price for market data was rising as market was tanking – data as a community resource would have been better. There are a number of banks on wall street that have designed their own programming languages.
Everything on the web accessible via http and ftp is open data. Big successful web companies have scalable data management and analysis systems. Connectome project: an automated microtome for microscopy samples built in basement. Dealing with large datasets of unstructured data (900 pB) – hackers, large data, machine learning.
The Sage commons. Large drug companies not that different from banks. Need to learn from Web – open source, open data. The platform is one thing: It is the people who build on the platforms that will make the difference. Focus on the data. “Amazing problems that fall apart when you just count stuff at scale” example of Google dominating translation. Invest in new measurement technologies and store everything. Use existing OS tools and share those you build. 1. Share your data 2.Share your tools 3. Share your results.
Josh Sommer, Chordoma Foundation
On curing his disease. “I’m a 22 year old college dropout…want to talk about what I’ve done to outrun the disease I was diagnosed with”. In college was diagnosed with Chordoma – 7 year average survival, 20-30% cure rate – no effective chemo – unacceptable statistics.”‘Imagine how frustrating it is to be sitting in a hospital room, unable to access journal articles about disease just been diagnosed with”.
Found only NIH funded researcher who said “I could use some help in the lab…” 3 weeks later was in there learning from scratch. But one person in one lab was not going to solve the problem. Need to scale out to wider set of researchers. Barriers identified: Access to scientific resources, funding, coordination and collaboration, flow of information. Set up the Chordoma foundation in 2007 to systematically address these barriers.
Built a research roadmap – but pace is slow and difficult. Unlikely therefore to directly help those with disease today. “Why do I go all in…” – for many patients hope is a powerful motivator but also perhaps close to tipping point. Technological revolutions, social change (how do we make use of and share data?). Publication system 300 years old, when curing disease unimaginable, so why do we still use same mechanism in C20? [ournal system is entrenched…not an unusual thing to say but it means more from someone 4 years into 7 year average survival].
What would an intelligently designed system of biomedical data exchange accomplish? As an ex engineering student this looks like an optimization problem. Cures/Time. Knowledge turns within a lab can act quickly but knowledge turns between labs far too slow. How to accelerate? Acting as a research broker to promote and catalyze data sharing. But still not good enough. Chordoma foundation accelerates knowledge transfer between labs. Needs to happen on a larger scale.
Trey Ideker, UCSC
Biomarkers based on networks, not loci (pdf). Network biomarkers more informative than individual genes and proteins. Challenges ahead: Informing network models with rich data, including functional interaction data beyond physical interactions. Recognizing the dimension of scale – network models are modular. Translation to individual patients remains difficult.
Hroaki Kitano, Systems Biology Institute, Okinawa
Software platform for systems drug design (pdf). [Bonus points for the movie on their iPad pathway visualization]. We need a map in the war on disease. How to build? How to maintain? How to use? [Talk revolves around tools, standards and platforms, including a gene annotation tool on the iPad]. Recommend going through the slides for links to Cell Designer, SBML, SBGN, Payao (community-based tagging system), open drug discovery, iPathways and more.
Sam Aparicio, UBC
Molecular Taxonomy of Breast Cancer (pdf). Currently disease divided up based on markers of HERb2. Breast cancer is heterogeneous, and we cannot fully explain the variation. Majority of patients getting unnecessary therapies. “How many diseases is breast cancer?” – very different survival rates, but only know from outcomes data, hard to get.
Motivations for METABRIC – can markers be identified to support therapeutic decisions with high precision? here was data, but datasets are too small, different platforms, long term outcomes data is missing in most. METABRIC has high resolution SNP and CNV data on 2100 tumour samples with outcome data – beginning to identify subgroups. Once you start looking into detailed cancer sequencing can ask “How many diseases does this patient have?”
Bian Yandell, UW Madison
Systems genetics analysis platform (pdf). An analysis pipeline acts on objects, has settings, generates outputs, possibly checksystems. Need to provide collaborative frameworks that let people work with, modify, or contribute to pipelines. he system is all prototypes with some thought toward software design.
Jill Mesirov, Broad
Integrated bayesian patiant stratification (pdf). Want to predict outcome for childhood malignant brain tumour. Treatment is harsh and serious side effects but outcome predictors poor. Employ a hierarchical network approach to identify pathways to identify subtypes. Could rescue 6/15 patients in small study – standard clinical analysis gave ok outcome, model gave poor prediction, therefore treatment chosen.
Couple of notes taken at the first session of Network Biology 2.0 at the Broad. Only managed to sneak in during day one as all seats had sold out — somewhat frustrating when half of the auditorium is empty. Won’t be able to cover the remaining talks, unfortunately. Personal comments in brackets, all errors mine.
George Church (HMS): Technologies for collecting and integrating genome, environment and trait data
[Kicks off with a busy slide thanking about 50 different companies and groups]. Drink your milk, eat your grains (unless you have lactase deficiency, …) — a different view on health advices. We have a host of anti-cancer options — prevention, surgery, stem cell/gene therapy, vaccines, synthetic biology strategies, chemical elimination, maintenance (rather than hasten resistance by elimination), combination therapy.
Prevention: avoiding the environmental components (UV, radiation, chemicals, infection, diet). We know of a smaller subset of actionable adult onset variants, testable to some extent by genome sequencing (detecting gaps, inversions, causative mutations). Still not done with the human genome build, gaps of millions of bases in some chromosomes, informative regions not present in hg18/19.
On sequencing: current cost at around 1500$ on consumables for a full genome run. Lists 20 companies (half of which still in R&D) now driving sequencing technologies. Crowded field. Highlights Nanopore, IonTorrent.
Challenges: structural variation detection. Methods to get continuity from end-to-end of a chromosome (Mbp). Use dilution libraries, in-situ sequencing (ss-DNA-EM). Extend to the cell and combine with morphological measurements (Balkal et al, Science 316).
Moving from sequencers to Bio-Fab: select synthetic sequences, biochemical kinetics, cell sorting (extension of the polonator project). Systems biology won’t go from genome alone to predictions. Rich trait data, epigenome, etc needed. The problem of finding trait information in dbGap, anonymization, data escape and re-identification. What about cell lines which still have all relevant information?
Medical genomics: about 1500 genes highly predictive and actionable for inherited diseases and cancer. Very few on DTC SNP chips, unfortunately. Reviews recent exome, genome sequencing papers that (re)identified causal genes. Adding in family information to aid in the analysis of 25 genomes. Well-established pathogenic alleles at low allele frequency (less than 4%). Explore via PersonalGenomes. For each PGP member fibroblast cell lines, (sometimes?) IPS cell lines and derived tissues — finding person to person variation in hepatic proteins involved in drug efficacy. A way around the limitations of blood-based diagnostics?
Measuring pathogens directly or indirectly (immune-response test). Time series vaccine experiments to track dynamic response to 11 infectious strains, following VDJ usage and distributions.
[A George Church talk: almost impossible to capture even half of what he is talking about, much less what is covered by the slides. Question on handheld devices: matter of demand. Sees multiple markets (low cost, long reads, high speed, mobility), not all of them covered by a single device.]
Stephen Turner (Pacific Biosciences): Real time DNA sequencing from single polymerase molecules
Focus on the actual technology (should be well-known by now). DNA polymerase as a sequencing engine (750 bp/s, processive, frugal, low-error rate). Need to observe one molecule at a time. Zero-mode wave guide (ZMW) development, avoiding base-labeled nucleotides (inhibitory, background light), labels instead attached to the terminal phosphate and clipped off by polymerase. Lovely movie of 3000 polymerases in action (idling and after activation) at around 3-5 bases/second. Commercial release ‘towards the end of this year’.
3’/5’ ends of sequence templates joined together into a circular template, tried to streamline sample prep workflow. Test run on influenza, under nine hours from sample to sequence results, half of that taken up by RNA prep. Identify strains by looking at individual molecules. [Should be interesting to see entire quasi-species (HIV) in a single sample.]
Second case study: MCF7 transcriptome to identify structural variants using reads of 2+kb, found known and novel variants. Previous example of E.coli re-sequencing with low error rates, 10kB read example, uniform coverage (rather than normal distribution), low GC content bias, strobe sequencing, etc. See notes from #AGBT. Sample of the strobe read approach to resolve genomic structures like large-scale inserts at essentially 1-fold coverage.
Future expansions: DNA polymerase kinetics known at every position, real-time kinetic data obtained by the consensus sequence (to overcome stochastic variation noise). Can use this to ‘sense’ methylation, at least in theory. Information spread out around the methylated region, difficult to process. 5-hydroxymethylcytosine can be discriminated from standard methylation (5-methylcytosine) [which could be huge as bisulfite-sequencing might be a dead-end for methylation important to stem cells]. Not limited to methylation (8-oxoguanine, other base modifications leave kinetic footprint).
Extend to RNA sequencing, first proof of principle but with lower overall quality so far, but can also detect RNA nucleotide modifications. Other explorations: use a Ribosome instead of a Polymerase to trace translation, less photobleaching due to small cell volume. See today’s nature publication. Next: millions of ZMW on a single chip (packaged), all running at 3-5bp/s. [Ouch!]
Unfortunately once again no information on error rates.
Marc Vidal (HMS): Interactome networks and human disease
Michael Cusick covers for Marc Vidal who is attending an ENCODE meeting (under the threat of ‘or else…’). Interactions between cells, genes, evolution, chemicals/environment. What is missing is the (emergent) complexity. Information flow between networks, organization and logical relationships. Focus on networks (interactome, PPI, ignoring RNA/DNA/metabolite interactions) and organization.
Caveat to define global princples: current interactome datasets only cover 5-10% (at most) of all interactions. Binary / static interactions insufficient to model dynamic interactome. We can test all vs all possible PPI (about 400 million of binary interactions ignoring splice isoforms), ignoring challenges like low affinities, abundance, transmembrane proteins, modifications, domain vs protein interactions. Used high quality Y2H (with multiple reporters), Affinity Purification. Important not to mix binary with complex interaction information – they have different topological and biologial properties. Now at 12.000 genes (25% of the interactome). Can be augmented by Mass Spec.
Add experimentally derived confidence values for interactions (Braun, Nat Methods 2009). Assess interaction maps (completeness, assay-sensitivity, sampling sensitivity, specificity as parameters). Can’t get funding for more than four replicates, ten would be ideal.
Projects at CCSB on human data: human PPI, viral vs human for 8 different pathogens, cancer-associated single AA changes and their impact on the interactome, splice isoform screening for multiple interactions. Yeast done (published last year), C.elegans in progress, A.thaliana ongoing collaboration.
Also essential: add in curated information, prediction information into the framework. No such thing as a gold standard, but use literature as reference sets (which themselves are imperfect, have biases). Literature and HT abets the other.
Arabidopsis: about 8000 ORFs tested (around 8% of the interaction space). Positive reference set pulled from databases (2 independent publications, re-curated for 118 interactions total). Negative set a random selection of similar size; test against predicted interactions and compare against reference set and Y2H. Even at highest quality Y2H can only recover 35% of interactions, need additional methods and assays (that, and a fair amount of not recoverable interactions in the literature). Tested 2% subspace six times to test for sampling sensitivity. Still increasing number of recovered PPIs at six replicates.
Genotype to phenotype via network perturbation, may help explain complex genetic problems (penetrance, expessivity, pleiotropy, redundancy). Edgetic models (edge-genetic) of molecular dysfunction — remove interactions, not the gene which removes all edges. Edgetic defects can be directional. Map in frame and truncating mutations to find loss of all interactions, or just a few interactions (see: Edgetic perturbation models of human inherited disorders, Zhong et al, Mol Syst Biol. 2009). Example of osteogenesis imperfecta, severity of the disease scales with increased of loss of edges.
Reverse edgetics: needs new methods not removing the node (RNAi, etc.). Proposal in Nature Methods, Dreze et al, Edgetic perturbations in C.elegans. Edgetic mutaions usually map to protein surface (rather than complete structural shifts in the center), cluster together for a given edge (interaction / docking surface).
Robert Weinberg (Whitehead): Malignant progression and the stem cell state (Keynote)
What additional genetic or epigenetic steps have to occur for malignancy? Tumor yield in cell culture can be driven in one direction by different media despite identical introduced genetic changes. Cell of origin differentiation program has a strong influence on tumor potential (metastasis and tumorigenicity). Goes over the model of cancer stem cells (CSCs, a hierarchy rather than stem cell/non-stem cell). Self-renewal cells qualify to seed metastasis. Ratio of CSCs an indicator of malignancy of a tumor?
Micrometastasis to metastasis transition inefficient (transition to a foreign tissue micro-environment). As rare as the steps leading to the initial tumor formation (invasion, intravasation, transport, extra vasation etc). What additional mutations or changes are required to acquire all these abilities?
Location of cells in the tumor is a strong determinant of their properties (epithelial-mesenchymal transition, EMT); different micro-environment at the edge of a tumor exposing the cells to signals inducing EMT. Increases experimental complexity.
EMT an ancient program with roles in embryogenesis. Cells undergo changes in morphology, motility (with thousand or more genes undergoing expression changes) driven by Snail, Twist, Slug, Sip1 and other TFs. Twist shuts down epithelial markers, induces mesenchymal ones. Model of cancer cells exploiting this early embryonic program (along with wound healing responses). Shut down Twist in mouse model reduces metastasis amount by 85% (remaining metastasis express high levels of Twist).
The EMT program might be sufficient to drive development up to micrometastasis; probably not sufficient for the last step (adaption to foreign tissue environment during colonization). If Twist expression can be induced in tumor-associated stroma no additional mutations might be needed for invasion-metastasis.
Transform melanocytes with 2nd embryonic transcription factor. Unlike epithelial cells melanocyte transformation generates numerous metastasis. Neural crest derived, TF Slug at 1000-fold higher expression than in breast cancer cells — much easier to induce EMT program. Knocking down Slug drastically reduces metastasis frequency. Verified with third EMT TF (FOXC2). Close association with basaloid breast cancers (triple receptor negative); 44% with high level of FOXC2 in the nuclei (hardly present in luminal breast cancers).
His contribution to a network conference: the EMT interactome. Links to cadherins/catenins, GSC, TGF9, Snail, RAS. Use gene expression patterns of EMT-derived cells to distinguish from fibroblasts. Similar to epithelial stem cells? [Lovely story on why you should be suspicious if results connect two independent research areas of the lab, in his words a potential sign of opportunistic post-docs]. Induction of EMT by Snail, Twist creates CD44hi/CD24lo cells with mesenchymal markers. Operate the stem cell program active in their tissue of origin rather than ‘inventing’ a new one.
EMT confers on cancer cells: physical dissemination, self-renewal powers (and a large descendant population if they survive in new sites). Brief ex-vivo expression of slug sufficient to have those cells dominate over non-induced cells (mammary gland repopulating stells), form ductal trees in tissue. At least ‘almost indistinguishable’ from stem cells.
Treatment problem: we de-bulk the tumor, stem cells survive and repopulate the tumor mass. Experiment: force epithelial cancer cells to undergo EMT and acquire stem cell traits. Screen for compounds (Lander lab) that kill these cells rather than the precursos cells. Conventiional chemo preferentially kills precursor cells, identified 22 compounds, 2 of which available in bulk, which preferentially kill the stem cells. Both anti-parasitic compounds, no idea how this actually works. Not highly toxic, licensed for cows (Salinomycin). Tumor with 5% stem cell frequency, after Paclitaxel treatment 70% of the remaining cells have stem cell properties; Salinomycin reduced the stem cell frequency to 0.2%.
Floating epithelial cells (after re-culturing, flops-cells) with quite different properties from their parent population. Single clones from CD44hi regenerate the bulk population (CD33lo CD24high) in culture, SCs differentiate into non-SCs. Unexpectedly non-SCs seem to be able to repopulate the SC population as well (unlikely to be a higher growth rate of a few SCs as they generally grow much slower). Conversion rate can be accelerated by oncogenes. Spontaneous induction of new stem cells in these cultures have implications for therapy targeting SCs selectively: a variety of differentiated cells can be transformed to stem cells. Not only a (cancer) stem cell hierarchy, but shifts in both directions.
Combination therapy that targets all cells of a tumor, not just the SCs. Great talk, hope it is being made available online.
Disclaimer: All notes, comments and links are just cut’n‘pasted from Twitter (#AGBT), FF and RSS feeds, blog posts are attributed, Twitter comments are not.
In general, people seem to be cautious over the niche that PacBio is likely to fit in, have written off Helicos, and seem to be excited about the single-mindedness of Complete Genomics and the new technologies such as Ion Torrent’s machine.
General notes
- Summary of the current state of sequencing technology at Genetic Inference. Helicos is noticeably absent.
- Slide used to describe Complete Genomic’s sequencing process
- Details on Ion Torrent’s sequencing chemistry using an ion-sensitive layer (essentially a pH-meter); no light/scanning/cameras required.
- Omics! take on the Pacific Biosciences release, and more coverage from MassGenomics. Lots of skepticism given the lack of hard data.
- From Omics!, something already discussed on the SAM mailing list: “SAM/BAM stores all sorts of information on read pairs, and the strobe sequencing can generate many more than 2 tags per DNA fragment.”
- IBM timed their press release on a new data analysis method well
The workshops and New Technologies session
Tons of data. 2 billion bases per run, 25GB/day with the HiSeq 2000. Work in progress: total human transcriptome, 16 tissues. Massive throughput, plug-and-play reagents, remote access (e.g. iPhone). Two human genomes in one run.
Elliot Margulies: Using 2 HiSeq flowcells rather than 1 moves >10 read coverage from 97% to 98%. Not worth using both; bsically, we’re now at the point where one run is overkill for sequencing a genome. SNP chip concordances: 98.2% for GAIIx, 99.7% for one HiSeq FC, >99.9% for both. Depths: 34X, 39X, 77X
1 million genomes in the next five years, 500 genomes/month this year.Customers are delivered software and data – nothing else. The company will take care of the steps between DNA sample and genetic variant calls. Wants to build 10 sequencing centers worldwide. 50 genomes completed so far. Rade Drmanac from Complete: Read errors very low (0.05%) Errors come from bubbles, dust, DNA damage, somatic variation. RD calls GCs offering “cloud sequencing”. Throughput per run is now approaching 2 trillion bases. That’s utterly insane. 20-instrument facility could do 100,000 genomes per year.
Jared Roach (ISB): four genomes from Complete from same family. Combining info across samples —> 99.9997% accuracy.
Dr. Zemin Zhang (Genentech): 1 mutation for every 3 cigarettes smoked… good to know. Cancer genomes. Aml norm and tumor. Had array data. Seq data>90% coverage above 10x. 40 and 60x avg coverage. Signed up for more genomes.
Helicos
ChIP-seq, direct-RNA. Runs with over 1e9. Single molecule. Helicos has short reads, high indel error rates, weighs 900 kg and costs $800K. Yeah, this talk is a tough sell.
PacBio
Eric Schadt: sequencing to explore energy-producing bacteria. Added 13X PacBio seq to 19X Illumina seq to assemble a bacterial genome. Strobed reads powerful for spanning repeats. Can integrate DNA variation, molecular traits and phenotypes to construct a probabilistic causal gene network.
LifeTech
Joseph Beechem (Life Tech) just got applause for having the longest talk title of the session. JB is introducing Life Tech’s new single molecule sequencing instrument based on quantum dots. Portable Nanometer sequencer. Tunable read lenght, tunable accuracy. Table top instrument. Can replenish a batch of polymerase mid-way through a run to replace dead molecules; effectively unlimited read length. Beta instruments will be available to a small set of collaborators by the end of 2010.
Ion Torrent
Jonathon Rothberg of Ion Torrent is a student of history. He sees Second Generation tech as a minicomputer. The box is just a box: Ion Torrent is a chip. Measures H+ as a base is incorporated. No lights, no moving parts. Not a single complexity.
Can leverage developments in the semiconductor industry, not reliant on optical technology like other platforms. Offering two free Ion Torrent instruments to researchers who come up with the best possible applications. Can sequence in hotels and on the backs of donkeys. With wireless you can analyze your data in GeneSifter.
Talk notes
- Joseph Puglisi, Stanford University School of Medicine, The Molecular Choreography of Translation. Using the PacBio system to track translation.
- Bing Ren, UCSD, Epigenomic Landscapes of Pluripotent and Lineage-Committed Human Cells.
- Jesse Gray, Harvard Medical School, Widespread RNA Polymerase II Recruitment and Transcription at Enhancers During Stimulus-Dependent Gene Expression.
- Keynote: Henry Erlich, Roche Molecular Systems, Applications of Next Generation Sequencing: HLA Typing With the GSFLX System.
- Christopher Mason, Weill Cornel Medical College, Developmental Changes in Human Neocortical Transcriptome Revealed by RNA-Seq. No visible end to gene discovery. The more you sequence the more you see.
- Yardena Samuels, NHGRI, Mutational Analysis of the Melanoma Genome.