Carol Lushbough (U of South Dakota) BioExtract Server
Distributed system to query multiple data sources, save query results, run analytical tools and workflows… and at that point I had to run for a meeting. Apologies, it sounds interesting. Allyson Lister covers the talk at her blog; wish I’d figured out how the system differs from Galaxy and other similar workflow systems.
Philippe Rocca-Serra (EBI) on Standards and Infrastructures for experimental data
Was looking forward to this talk as everyone at work is excited about the BioInvestigation Index which we are considering to deploy for a number of groups.
Uses all the usual suspects (ISATab, MIBBI, OBO) to generate a standards-compliant infrastructure. Infrastructure needs to be scalable given current technologies; we are not yet at quantum physics-scale, but metagenomics and other high-throughput experiments generate complex multi-assay studies that share the general complexity. We want to store the experimental metadata along with the actual data.
Minimal standards address the syntax and semantics for reporting and include a large variety of grass-root efforts (MGED, PSI, BioPAX, MSI, …) that tend to be fragmented both with regards to standards and systems. Nice comparison of painful processes: dentist visits, tax returns and data submission to repositories — only too true, particularly given the redundant overhead as soon as multiple repositories and data types get involved. Challenge is to create interoperable reporting standards and overcome technical (and social!) barriers to promote synergies between initiatives.
Proposes to use MIBBI (information to be reported), ISA-tab (syntax and formats) and OBO (semantics) to address these issues.
- MIBBI: Minimal information for biological and biomedical investigations; brings together communities to develop a minimum information checklist for researchers, reviewers to identify the ideal reporting standard for a given project
- OBO Foundry: Effort to create a set of principles for ontology development, evaluate existing ontologies, identify orthogonal ontologies, support interoperability (mappings, cross-products). Followed by a slide from Barry Smith that maps MIBBI categories to OBO ontologies (CL GO, FMA, CARO, PaTO, etc)
- ISAinfrastructure to pull it all together using tools such as ISA-tab (extending the MAGE-Tab success beyond microarray data)
Package has a number of tools:
- Configurator: sets metadata and allowed values for each field; researchers don’t get drowned in fields not relevant to their field
- Creator: Basically a spreadsheet, natural interface to tables used to describe the experimental metadata. Ontologies accessed in real time to ensure consistency (this is not excel by the way, but a basic java-based spreadsheet view with added visualization cues, for example to group samples and experiments). Philippe then walks through a sample annotation. Includes a ‘mapping’ functionality to import existing legacy information.
- Validator
- Automator
- Additional integration via an API to R
All that information is then accessible (and query-able) via the BioInvestigation Index.
Christoph Best (EBI), Grid/cloud computing for 3D image databases
Initial project to handle electron microscopy, but is applicable to informatics support of biological imagining (confocal etc) in general. Starts off with a general overview of protein / structural databases at the EBI and the microscopy data generation process from sample fixation to image capturing. Interesting aspect of combining multiple 2D images into a 3D representation, including an example that zooms in (no pun intended) to a range of 4A and fitting images to known protein peptide backbones.
Data management can be handled as they are in the MB to GB range after converting the initial EM images to voxel representations. Software a nice blend of 70s Fortran code with 90s C algorithms. And I thought proteomics had a lack of standards and communities already…
Image enhancements, point / line / surface identification is playing a crucial role to move towards high-throughput applications; the alternative is manual inspection and curation of images.
Data has to be deposited in some way: density maps, complementary to PDB coordinate sets, XML metadata. About 20 entries per months allow for manual curation. Always interesting to get some insight into the problems different fields are struggling with.
Data upload and distribution would benefit from a distributed approach; image analysis moving towards a virtual research community involved the imaging centers, algorithm people and public / in-house database maintainers.
Kathy Wolstencroft (Manchester), SysMO-DB for sharing and exchanging systems biology data and models
‘Just enough’ exchange of information between consortium members, in this case 91 (!) institutes with different research interests and pet organisms. Aim is to describe biological models with shared mathematical terms. Team needs to retrofit data access and model handling platform given that existing data standard adherence is limited, researchers use their own data and collaboration environments and are reluctant to share data.
Store omics, images, reaction kinetics, models, metadata and analytical results without reinventing the wheel.
- Easiest part: yellow pages (expertise, facilities, data availability). Has access model, called SysMo-Seek
- Social approach: get (postdoc) feedback on technical requirements
- Experimental processes: sharing of experimental protocols and bioinformatics workflows (less sensitive than data sharing) following the Nature Protocols format, SBML models. Access to JWS online through SEEK
- Data: pulling in public data sources (SGD, BRENDA); internal data frequently pulled in from Excel spreadsheets. Handled by JERM (Just Enough Results Model), defined for each data type and applied as a template for access / exporting. Sounds very similar to the ISA-Configurator. Even has Source Extractor for legacy information, a mapper for databases etc. Annotation is incremental, initial set from the upload, additional information from the owner through the SEEK interface and curated at a later stage by consortium members (hopefully!)
Group is now collaborating with the ISA team to identify overlap and minimize redundancy. Whole system is available on Google Code; should be a great resource particularly if it starts merging / integrating with ISA.