Introduction to the DISCOVER module | TeselaGen Biotechnology Help Center

Overview

TeselaGen’s DISCOVER module provides an interface and compute infrastructure to create, train and execute ML (Machine Learning) algorithms using Teselagen’s Proprietary Software for fast data loading and processing.

Once you have designed, built and experimentally characterized the genetic constructs of your study, you can now apply the various DISCOVER Machine Learning and/or Deep Learning tools created by TeselaGen. This technology allows you, in a systematic and autonomous way, to extract useful information to make optimization and redesign decisions for your genetic elements. In this way, TeselaGen users can complete the D-B-T-L (Design, Build, Test, Learn) cycle of bio-manufacturing and evolve their biological systems. An image of the DISCOVER application that you can see below, shows the different models implemented: Predictions, Evolutions and Generations.

We’ll take a closer look at these functionalities associated with AI (Artificial Intelligence) developments in future articles.

Architecture

The DISCOVER module is built to help biotech companies design, deploy, train and test state-of-the-art machine learning algorithms. DISCOVER runs on cloud-based hardware and it’s optimized for running compute-intensive ML applications. The DISCOVER module contains a frontend interface, developed in React/Redux/MobX, that allows the scientist to easily choose among a selection of TeselaGen’s proprietary algorithms. The backend includes a ML engine that can run algorithms dependent on Tensorflow, Scikit Learn and PyTorch. TeselaGen's architecture and all its applications are set in a suitable environment, which meets the high programming standards demanded by TeselaGen's customers. It is a stable and fail-safe system, is easily scalable and is based on a modularized system with docker containers.

Schema of DISCOVER’s front-end and back-end components.

Besides that, DISCOVER meets modern data security guarantees. Each component has a security layer that shields these containers from external access. Every interaction between them is protected and any data exchange with the outside is done through corresponding HTTPs or TLS protocols using TeselaGen's APIs.

Integration with TEST Module

The application of ML to genetics and synthetic biology raises a number of challenges that need to be addressed. Training ML models might require a large amount of data, which can be difficult to acquire. In addition, some ML techniques require extensive computational resources, without which training becomes too time-consuming.

The DISCOVER module will provide you with powerful libraries written in Python to design and deploy machine and deep learning applications. This includes libraries for communicating with the DESIGN and TEST APIs, as well as our ML task messaging queue and our DISCOVER database. The TEST module takes the open-source Experimental Data Depot (EDD) knowledge base and provides the necessary data to train supervised machine learning algorithms. If you want to read more about it you can go to this article. The output of the DISCOVER module can guide the next iteration of libraries to be designed and built.

Integration between TeselaGen applications.

Optimizing Assemblies

With TeselaGen's platform, researchers can design combinatorial libraries that include thousands of different variants. For many applications, it is not enough to design and build DNA constructs. Researchers need to be confident that their synthesized combinatorial or hierarchical libraries meet stringent quality assurance criteria. The high throughput experiments and massive data generation that is common to many biotech and biopharma companies depend crucially on the efficient generation of screening candidates with a high probability of success. TeselaGen’s j5 algorithm, which creates detailed instructions to assemble DNA, relies on the specification of assembly strategies, parameters, and rules that can be tuned to achieve optimal results. The tuning of these choices can be guided based on the output of bioinformatic tools such as j5 itself, as well as experimental DNA sequence validation results. As an example, our platform allows our users to align DNA sequencing runs with their reference designs in order to validate the quality of their synthesized constructs. As we collect these DNA sequence validation datasets, the DISCOVER module can train machine learning models (like RNNs) that can help the biologist further refine their designs and assembly simulations.

A recurrent neural network (RNN) is a network with memory, ideal for modeling sequences such as DNA.

Conclusions

TeselaGen has developed a powerful, cloud-based, computer aided design and build platform for accelerating synthetic biology. Our customers are already using our flexible informatics backbone to guide the construction of synthetic DNA to further the production of immunotherapy biologics, virus like particles (VLPs), sustainable chemicals, natural products, and plant modifications for enhanced agricultural traits. TeselaGen Biotechnology has set the goal of developing a next-generation software platform that will harness state-of-the-art machine learning to assist customers with the design, build and Synthetic Evolution™ of their biological constructs. As companies seek to scale their synthetic biology efforts, they will benefit from TeselaGen’s DISCOVER module to optimize their Synthetic Biology processes. The success will depend ultimately on how well scientists can collect and store experimental data in the TEST module. As our customers work closely with our platform, we will empower them with enabling design decisions that accelerate product development.