Couple of notes taken at the first session of Network Biology 2.0 at the Broad. Only managed to sneak in during day one as all seats had sold out — somewhat frustrating when half of the auditorium is empty. Won’t be able to cover the remaining talks, unfortunately. Personal comments in brackets, all errors mine.
George Church (HMS): Technologies for collecting and integrating genome, environment and trait data
[Kicks off with a busy slide thanking about 50 different companies and groups]. Drink your milk, eat your grains (unless you have lactase deficiency, …) — a different view on health advices. We have a host of anti-cancer options — prevention, surgery, stem cell/gene therapy, vaccines, synthetic biology strategies, chemical elimination, maintenance (rather than hasten resistance by elimination), combination therapy.
Prevention: avoiding the environmental components (UV, radiation, chemicals, infection, diet). We know of a smaller subset of actionable adult onset variants, testable to some extent by genome sequencing (detecting gaps, inversions, causative mutations). Still not done with the human genome build, gaps of millions of bases in some chromosomes, informative regions not present in hg18/19.
On sequencing: current cost at around 1500$ on consumables for a full genome run. Lists 20 companies (half of which still in R&D) now driving sequencing technologies. Crowded field. Highlights Nanopore, IonTorrent.
Challenges: structural variation detection. Methods to get continuity from end-to-end of a chromosome (Mbp). Use dilution libraries, in-situ sequencing (ss-DNA-EM). Extend to the cell and combine with morphological measurements (Balkal et al, Science 316).
Moving from sequencers to Bio-Fab: select synthetic sequences, biochemical kinetics, cell sorting (extension of the polonator project). Systems biology won’t go from genome alone to predictions. Rich trait data, epigenome, etc needed. The problem of finding trait information in dbGap, anonymization, data escape and re-identification. What about cell lines which still have all relevant information?
Medical genomics: about 1500 genes highly predictive and actionable for inherited diseases and cancer. Very few on DTC SNP chips, unfortunately. Reviews recent exome, genome sequencing papers that (re)identified causal genes. Adding in family information to aid in the analysis of 25 genomes. Well-established pathogenic alleles at low allele frequency (less than 4%). Explore via PersonalGenomes. For each PGP member fibroblast cell lines, (sometimes?) IPS cell lines and derived tissues — finding person to person variation in hepatic proteins involved in drug efficacy. A way around the limitations of blood-based diagnostics?
Measuring pathogens directly or indirectly (immune-response test). Time series vaccine experiments to track dynamic response to 11 infectious strains, following VDJ usage and distributions.
[A George Church talk: almost impossible to capture even half of what he is talking about, much less what is covered by the slides. Question on handheld devices: matter of demand. Sees multiple markets (low cost, long reads, high speed, mobility), not all of them covered by a single device.]
Stephen Turner (Pacific Biosciences): Real time DNA sequencing from single polymerase molecules
Focus on the actual technology (should be well-known by now). DNA polymerase as a sequencing engine (750 bp/s, processive, frugal, low-error rate). Need to observe one molecule at a time. Zero-mode wave guide (ZMW) development, avoiding base-labeled nucleotides (inhibitory, background light), labels instead attached to the terminal phosphate and clipped off by polymerase. Lovely movie of 3000 polymerases in action (idling and after activation) at around 3-5 bases/second. Commercial release ‘towards the end of this year’.
3’/5’ ends of sequence templates joined together into a circular template, tried to streamline sample prep workflow. Test run on influenza, under nine hours from sample to sequence results, half of that taken up by RNA prep. Identify strains by looking at individual molecules. [Should be interesting to see entire quasi-species (HIV) in a single sample.]
Second case study: MCF7 transcriptome to identify structural variants using reads of 2+kb, found known and novel variants. Previous example of E.coli re-sequencing with low error rates, 10kB read example, uniform coverage (rather than normal distribution), low GC content bias, strobe sequencing, etc. See notes from #AGBT. Sample of the strobe read approach to resolve genomic structures like large-scale inserts at essentially 1-fold coverage.
Future expansions: DNA polymerase kinetics known at every position, real-time kinetic data obtained by the consensus sequence (to overcome stochastic variation noise). Can use this to ‘sense’ methylation, at least in theory. Information spread out around the methylated region, difficult to process. 5-hydroxymethylcytosine can be discriminated from standard methylation (5-methylcytosine) [which could be huge as bisulfite-sequencing might be a dead-end for methylation important to stem cells]. Not limited to methylation (8-oxoguanine, other base modifications leave kinetic footprint).
Extend to RNA sequencing, first proof of principle but with lower overall quality so far, but can also detect RNA nucleotide modifications. Other explorations: use a Ribosome instead of a Polymerase to trace translation, less photobleaching due to small cell volume. See today’s nature publication. Next: millions of ZMW on a single chip (packaged), all running at 3-5bp/s. [Ouch!]
Unfortunately once again no information on error rates.
Marc Vidal (HMS): Interactome networks and human disease
Michael Cusick covers for Marc Vidal who is attending an ENCODE meeting (under the threat of ‘or else…’). Interactions between cells, genes, evolution, chemicals/environment. What is missing is the (emergent) complexity. Information flow between networks, organization and logical relationships. Focus on networks (interactome, PPI, ignoring RNA/DNA/metabolite interactions) and organization.
Caveat to define global princples: current interactome datasets only cover 5-10% (at most) of all interactions. Binary / static interactions insufficient to model dynamic interactome. We can test all vs all possible PPI (about 400 million of binary interactions ignoring splice isoforms), ignoring challenges like low affinities, abundance, transmembrane proteins, modifications, domain vs protein interactions. Used high quality Y2H (with multiple reporters), Affinity Purification. Important not to mix binary with complex interaction information – they have different topological and biologial properties. Now at 12.000 genes (25% of the interactome). Can be augmented by Mass Spec.
Add experimentally derived confidence values for interactions (Braun, Nat Methods 2009). Assess interaction maps (completeness, assay-sensitivity, sampling sensitivity, specificity as parameters). Can’t get funding for more than four replicates, ten would be ideal.
Projects at CCSB on human data: human PPI, viral vs human for 8 different pathogens, cancer-associated single AA changes and their impact on the interactome, splice isoform screening for multiple interactions. Yeast done (published last year), C.elegans in progress, A.thaliana ongoing collaboration.
Also essential: add in curated information, prediction information into the framework. No such thing as a gold standard, but use literature as reference sets (which themselves are imperfect, have biases). Literature and HT abets the other.
Arabidopsis: about 8000 ORFs tested (around 8% of the interaction space). Positive reference set pulled from databases (2 independent publications, re-curated for 118 interactions total). Negative set a random selection of similar size; test against predicted interactions and compare against reference set and Y2H. Even at highest quality Y2H can only recover 35% of interactions, need additional methods and assays (that, and a fair amount of not recoverable interactions in the literature). Tested 2% subspace six times to test for sampling sensitivity. Still increasing number of recovered PPIs at six replicates.
Genotype to phenotype via network perturbation, may help explain complex genetic problems (penetrance, expessivity, pleiotropy, redundancy). Edgetic models (edge-genetic) of molecular dysfunction — remove interactions, not the gene which removes all edges. Edgetic defects can be directional. Map in frame and truncating mutations to find loss of all interactions, or just a few interactions (see: Edgetic perturbation models of human inherited disorders, Zhong et al, Mol Syst Biol. 2009). Example of osteogenesis imperfecta, severity of the disease scales with increased of loss of edges.
Reverse edgetics: needs new methods not removing the node (RNAi, etc.). Proposal in Nature Methods, Dreze et al, Edgetic perturbations in C.elegans. Edgetic mutaions usually map to protein surface (rather than complete structural shifts in the center), cluster together for a given edge (interaction / docking surface).
Robert Weinberg (Whitehead): Malignant progression and the stem cell state (Keynote)
What additional genetic or epigenetic steps have to occur for malignancy? Tumor yield in cell culture can be driven in one direction by different media despite identical introduced genetic changes. Cell of origin differentiation program has a strong influence on tumor potential (metastasis and tumorigenicity). Goes over the model of cancer stem cells (CSCs, a hierarchy rather than stem cell/non-stem cell). Self-renewal cells qualify to seed metastasis. Ratio of CSCs an indicator of malignancy of a tumor?
Micrometastasis to metastasis transition inefficient (transition to a foreign tissue micro-environment). As rare as the steps leading to the initial tumor formation (invasion, intravasation, transport, extra vasation etc). What additional mutations or changes are required to acquire all these abilities?
Location of cells in the tumor is a strong determinant of their properties (epithelial-mesenchymal transition, EMT); different micro-environment at the edge of a tumor exposing the cells to signals inducing EMT. Increases experimental complexity.
EMT an ancient program with roles in embryogenesis. Cells undergo changes in morphology, motility (with thousand or more genes undergoing expression changes) driven by Snail, Twist, Slug, Sip1 and other TFs. Twist shuts down epithelial markers, induces mesenchymal ones. Model of cancer cells exploiting this early embryonic program (along with wound healing responses). Shut down Twist in mouse model reduces metastasis amount by 85% (remaining metastasis express high levels of Twist).
The EMT program might be sufficient to drive development up to micrometastasis; probably not sufficient for the last step (adaption to foreign tissue environment during colonization). If Twist expression can be induced in tumor-associated stroma no additional mutations might be needed for invasion-metastasis.
Transform melanocytes with 2nd embryonic transcription factor. Unlike epithelial cells melanocyte transformation generates numerous metastasis. Neural crest derived, TF Slug at 1000-fold higher expression than in breast cancer cells — much easier to induce EMT program. Knocking down Slug drastically reduces metastasis frequency. Verified with third EMT TF (FOXC2). Close association with basaloid breast cancers (triple receptor negative); 44% with high level of FOXC2 in the nuclei (hardly present in luminal breast cancers).
His contribution to a network conference: the EMT interactome. Links to cadherins/catenins, GSC, TGF9, Snail, RAS. Use gene expression patterns of EMT-derived cells to distinguish from fibroblasts. Similar to epithelial stem cells? [Lovely story on why you should be suspicious if results connect two independent research areas of the lab, in his words a potential sign of opportunistic post-docs]. Induction of EMT by Snail, Twist creates CD44hi/CD24lo cells with mesenchymal markers. Operate the stem cell program active in their tissue of origin rather than ‘inventing’ a new one.
EMT confers on cancer cells: physical dissemination, self-renewal powers (and a large descendant population if they survive in new sites). Brief ex-vivo expression of slug sufficient to have those cells dominate over non-induced cells (mammary gland repopulating stells), form ductal trees in tissue. At least ‘almost indistinguishable’ from stem cells.
Treatment problem: we de-bulk the tumor, stem cells survive and repopulate the tumor mass. Experiment: force epithelial cancer cells to undergo EMT and acquire stem cell traits. Screen for compounds (Lander lab) that kill these cells rather than the precursos cells. Conventiional chemo preferentially kills precursor cells, identified 22 compounds, 2 of which available in bulk, which preferentially kill the stem cells. Both anti-parasitic compounds, no idea how this actually works. Not highly toxic, licensed for cows (Salinomycin). Tumor with 5% stem cell frequency, after Paclitaxel treatment 70% of the remaining cells have stem cell properties; Salinomycin reduced the stem cell frequency to 0.2%.
Floating epithelial cells (after re-culturing, flops-cells) with quite different properties from their parent population. Single clones from CD44hi regenerate the bulk population (CD33lo CD24high) in culture, SCs differentiate into non-SCs. Unexpectedly non-SCs seem to be able to repopulate the SC population as well (unlikely to be a higher growth rate of a few SCs as they generally grow much slower). Conversion rate can be accelerated by oncogenes. Spontaneous induction of new stem cells in these cultures have implications for therapy targeting SCs selectively: a variety of differentiated cells can be transformed to stem cells. Not only a (cancer) stem cell hierarchy, but shifts in both directions.
Combination therapy that targets all cells of a tumor, not just the SCs. Great talk, hope it is being made available online.