Love the organization of SageCon: excellent twitter coverage (#sagecon), all talks streamed live, program with slides available online. Topics mostly revolve around data management, integrative genomics, tools and databases. I’m merely linking to the presentations of interest and copying and pasting selected twitter notes (mostly from @CameronNeylon, thanks as always!).
Stephen Friend, Sage Bionetworks
Kicks off the program with an overview of challenges and solutions (pdf) in drug development and the ideas around Sage (global coherent data sets, Sage Commons). It’s important we not be naive, nor evangelical about sharing of data. This room has a quarter of the people who have spent their life working on [standards, ontologies, networks….sharing etc. etc.]. People don’t do new things unless the risks are low and the benefits are higher.
Cancer drugs take $1b, 10 years to go to market. Helps 1/4th of patients. Not nearly good enough. Need to move away from single domain/discipline views. Need to integrate multiple layers of data and model. New technologies rapidly emerging. But we’re not prepared. It’s not one method. It’s many methods. Two schools: “Heterogeneous conditions lead to more reproducible behavioral results” vs “need more control and standards”.
Don’t look for changes. Buffers in the system will hide them. Drugs won’t work in phase 2 trials using that model. We need to look for the non-buffering, low redundancy drivers of disease. Need ways to manage, host, integrate, and use vast amounts of data to make this vision reality. We have to find a way to share clinical data with genomic data to really leverage it.
Need to appreciate that it will require decades if evolving representations to get good understanding. Four requirements: Data repository, platform architecture, prob causal network models of disease, rules and governance. Each of our three tasks are audacious. The massive data, the network disease models, the tools needed for the Commons. People could say ‘who are you, to think you can do this?’ The answer is, it’s not one group. Has to be a community. There’s still a territoriality to data. Sage bionetworks just “blowing wind into the sails of Sage Commons” – community has to believe in value to support value creation.
Sage has new partnership with Xinhua Univ. in China. No details. Close to another deal w/UCSF Qb3 at Mission Bay. New things SAGE is working on now: systems biology of Huntington’s, breast cancer. Cross-species networks. age is working on agreement with a top publisher to host network models for sharing. Focus of Sage is Global Coherent Datasets: Datasets with genome wide DNA variation, as well as phenotype over large number of individuals.
Andrea Califano, Columbia
Presentation on high-grade regulatory networks (pdf). The search for master regulators in glioblastoma. 8 trans factors explain most effect, but the regulatory network is important. Over expression of each of the eight has only minor effect. In humans using two of the TFs as markers correlates well with patient survival. Often these markers don’t work well in patients. Implications for therapeutic targets. To link biomarkers to molecular targets need to understand and target regulatory systems.
Need standards for effective sharing of data. Need to actually get the data out there. 80% not current available. Are we willing to change?
Lee Hood, ISB
On Network analysis for prion diseases — along with an overview of approaches to data gathering and handling noise. Data should be global. Different. Dynamic. Integrated. Prion study with 100 million data points. Reduce noise with “subtractive deep biology” — translation: Good experiments with many controlled conditions. Time course of multiple datasets, via transcriptomics, pathology, protein interactions, &c. All in eight strains of mouse. Highly successful in identifying differentially expressed genes strong associated with prion response. See more at their website.
Worried about sequencing whole cancer populations which average out signals and enhance noise.
Project descriptions
Kasarskis and Kipershmidt, NextBio
End-to-End pilot (pdf) on combining data, building and querying models. Trying to identify problems and issues with going from data to model and making all available. Network models will be shared in RDF form. Steep learning curve on leveraging ontologies.
Ilya Kupershmidt talking about using NextBio to correlate breast cancer drivers across public data sets. Based on ‘semantically organized datasets’. Is this same as annotated datasets? [Sounds like a tour through NextBio] Semantic datasets were curated by Nextbio, disease annotations based on Snomed CT.
Jesse Tennenbaum, Duke Translational Science
Standards and ontologies (pdf). What people think are three top issues: Consistent data format and metadata, representations, ontologies. Background for content, semantics, and syntax. Examples: MIAME, ontologies, XML and Tab formats. Identified standards, a set of open soure standards-based tools, some annotations as proof of concept [no monolithic tools, strict data model or prescribed ontologies]. What remains to be done: Formalize some recommended minimal information models, extend and integrate tools. Find a balance between structured annotation versus ease and expressivity. Divide work between curation experts and experimenters
Carole Goble, U of Manchester, UK
Sage Infrastructure Tools (pdf). Build around Alitora interface to GenePattern, Taverna, Cytoscape, all qyering the Sage repository. Core principles: maximize access, use, reuse. Distribute multiple formats, use existing standards and tools, design flexible, support community collaboration and annotation. Need better APIs to integrate better into tools.
John Wilbanks (Science Commons) covering for Rossini
Intro from the internationalization working group (pdf). Take on law, contracts, privacy. We do not live in a nation-centric world. Sage is a great potential global asset but… interop is a problem. Legal and policy regimes radically different across the world. Particularly privacy rights. Human genetic privacy being treated very differently internationally (where it is being treated at all). Contraints created by the use of human subjects, privacy and identifiability. Privacy means many different things. Also people using complexity as leverage (or an excuse) in negotiation; potential for unexpected problems: e.g. use of clinical or public health data as economic or political weapons. mportance of clear marking of rights associated with data. Transparency and certainty problem
Conclusions: Life is complex. Norms, contracts, IP, and privacy combine. Standards are promising. Specific cases to drive general
Liz Lyon, Bath
On citation (pdf) of network models. How to cite and credit the work and contribution of people to commons? Citations are still currency in academia. This is a serious problem. What are we citing? Journal articles, very macro, need more granularity. Workflows, visualizations, models, data, annotation, concepts. ow? Functionality? Policy? Citation requires some recognized unique ID. Researcher IDs? Data IDs. How to make interoperable? Biggest obstacle: Tenure process; trying to change that mindset.
Draft overview of guidelines.
Jeff Hammerbacher, Cloudera
On open source and open data (pdf). “From narrative to design”: leaving technology aside — have a narrative, collect and structure data, build tools, make models. Finance: A cautionary tale. What happens when sophisticated models have limited (expensive) data, when code is highly guarded. Price for market data was rising as market was tanking – data as a community resource would have been better. There are a number of banks on wall street that have designed their own programming languages.
Everything on the web accessible via http and ftp is open data. Big successful web companies have scalable data management and analysis systems. Connectome project: an automated microtome for microscopy samples built in basement. Dealing with large datasets of unstructured data (900 pB) – hackers, large data, machine learning.
The Sage commons. Large drug companies not that different from banks. Need to learn from Web – open source, open data. The platform is one thing: It is the people who build on the platforms that will make the difference. Focus on the data. “Amazing problems that fall apart when you just count stuff at scale” example of Google dominating translation. Invest in new measurement technologies and store everything. Use existing OS tools and share those you build. 1. Share your data 2.Share your tools 3. Share your results.
Josh Sommer, Chordoma Foundation
On curing his disease. “I’m a 22 year old college dropout…want to talk about what I’ve done to outrun the disease I was diagnosed with”. In college was diagnosed with Chordoma – 7 year average survival, 20-30% cure rate – no effective chemo – unacceptable statistics.”‘Imagine how frustrating it is to be sitting in a hospital room, unable to access journal articles about disease just been diagnosed with”.
Found only NIH funded researcher who said “I could use some help in the lab…” 3 weeks later was in there learning from scratch. But one person in one lab was not going to solve the problem. Need to scale out to wider set of researchers. Barriers identified: Access to scientific resources, funding, coordination and collaboration, flow of information. Set up the Chordoma foundation in 2007 to systematically address these barriers.
Built a research roadmap – but pace is slow and difficult. Unlikely therefore to directly help those with disease today. “Why do I go all in…” – for many patients hope is a powerful motivator but also perhaps close to tipping point. Technological revolutions, social change (how do we make use of and share data?). Publication system 300 years old, when curing disease unimaginable, so why do we still use same mechanism in C20? [ournal system is entrenched…not an unusual thing to say but it means more from someone 4 years into 7 year average survival].
What would an intelligently designed system of biomedical data exchange accomplish? As an ex engineering student this looks like an optimization problem. Cures/Time. Knowledge turns within a lab can act quickly but knowledge turns between labs far too slow. How to accelerate? Acting as a research broker to promote and catalyze data sharing. But still not good enough. Chordoma foundation accelerates knowledge transfer between labs. Needs to happen on a larger scale.
Trey Ideker, UCSC
Biomarkers based on networks, not loci (pdf). Network biomarkers more informative than individual genes and proteins. Challenges ahead: Informing network models with rich data, including functional interaction data beyond physical interactions. Recognizing the dimension of scale – network models are modular. Translation to individual patients remains difficult.
Hroaki Kitano, Systems Biology Institute, Okinawa
Software platform for systems drug design (pdf). [Bonus points for the movie on their iPad pathway visualization]. We need a map in the war on disease. How to build? How to maintain? How to use? [Talk revolves around tools, standards and platforms, including a gene annotation tool on the iPad]. Recommend going through the slides for links to Cell Designer, SBML, SBGN, Payao (community-based tagging system), open drug discovery, iPathways and more.
Sam Aparicio, UBC
Molecular Taxonomy of Breast Cancer (pdf). Currently disease divided up based on markers of HERb2. Breast cancer is heterogeneous, and we cannot fully explain the variation. Majority of patients getting unnecessary therapies. “How many diseases is breast cancer?” – very different survival rates, but only know from outcomes data, hard to get.
Motivations for METABRIC – can markers be identified to support therapeutic decisions with high precision? here was data, but datasets are too small, different platforms, long term outcomes data is missing in most. METABRIC has high resolution SNP and CNV data on 2100 tumour samples with outcome data – beginning to identify subgroups. Once you start looking into detailed cancer sequencing can ask “How many diseases does this patient have?”
Bian Yandell, UW Madison
Systems genetics analysis platform (pdf). An analysis pipeline acts on objects, has settings, generates outputs, possibly checksystems. Need to provide collaborative frameworks that let people work with, modify, or contribute to pipelines. he system is all prototypes with some thought toward software design.
Jill Mesirov, Broad
Integrated bayesian patiant stratification (pdf). Want to predict outcome for childhood malignant brain tumour. Treatment is harsh and serious side effects but outcome predictors poor. Employ a hierarchical network approach to identify pathways to identify subtypes. Could rescue 6/15 patients in small study – standard clinical analysis gave ok outcome, model gave poor prediction, therefore treatment chosen.