NIH R35GM143056: Machine Learning and Control Principles for Computational Biology
- PI: Travis Gibson
Abstract: With our increasing ability to measure biological data at scale and the digitalization of health records, computational thinking is becoming ever more important in the biological science and healthcare. The research directions proposed in this grant look to build robust machine learning models and tools for computational biology by including principles and analysis from other engineering fields, like control, that have a proven record of incorporating robustness into the systems they have automated. This increased robustness will save resources during the development of these machine learning models. It will also lead to more reliable diagnostics, clinical tools, and machine learning based biological discoveries. We have proposed three future research directions at the intersection of machine learning, control, and computational biology (a) modeling dynamical systems, (b) robust optimization schemes (c) control principles for in vivo modeling of microbial communities. The first proposed research area involves the development of flexible models for performing inference on dynamical systems models with time-series data. Dynamical systems models are able to learn mathematically causal relationships between variables, compared to other models whose parameters may only have correlative relationships. Our flexible models will be differentiable allowing them to be trained using the same efficient algorithms and hardware that have propelled deep learning models into the spotlight. These differentiable methods will allow for us to more easily integrate the uncertainty associated with biological measurements into our models. The second research area looks to develop more robust gradient optimization algorithms, the work horse for training deep neural networks. Many of the popular algorithms used to train deep neural networks were not explicitly designed to be robust. By developing more robust optimization techniques machine learning models trained on disparate data sets at different hospitals or labs will be more reproducible and will require less time for tuning parameters, ultimately saving resources as well. These robust optimization techniques will also aid in the certification of machine learning based tools that will ultimately be deployed in the clinic. The third research area we propose is an approach for the discovery and design of robust microbial communities. Communities of commensal, or engineered, bacteria have long been proposed as alternative therapies for the treatment of gut related illness (“bugs as drugs”). We propose a top down approach to identifying putative microbial consortia members from time-series experiments with germ free mice colonized by complex flora. By beginning the consortia design process in vivo we hope to overcome the challenge that many other attempts at consortia construction have encountered where in vitro designed communities do not reproduce their intended properties once transferred into living host organisms. The tools from this work will be built using open access software and all data will be made easily accessible and explorable to the public.