DISCOVER: Candidate Selection

Solve optimization problems such as finding the best strain or fermentation parameters with TeselaGen's DISCOVER module

E
Written by Eduardo Abeliuk
Updated over a week ago

Evolutions tool, is the framework that allows TeselaGen users to solve general optimization problems in the real world, like finding an optimal strain or achieving the best fermentation parameters on a fed batch process. The output of this tool mainly consists on a list of recommended candidates or suggested samples, each of which represent a potential solution to the problem at hand. The following article briefly describes how these candidates are chosen and how you should read the list of suggested samples to get the most out of it.

After the training process is finished, the detail view of the model will display a card containing a table that looks like the following

At this table, each row represent a potential solution, a new design which experimental results aren't known by the algorithm. These recommended candidates are designs that might be most useful to experimentally explore, according to the predictions of the inner machine learning model.

The columns of this table can be described as follows:

  • Priority: Is the position in the ranking of candidates that the algorithm builds based on the prediction and uncertainty associated with each design.

  • Feature columns: In this example, Enzyme A and Enzyme B are feature columns. These are the columns that were used as features by the user when the model was submitted. The set of values at these columns should completely define a unique design, within the variables the user utilizes for exploration.

  • Target column: In this example, Production is a target column, which is the column with the values obtained from experimental measurements, that the user wants to optimize. The values here are set to N/A as currently there are not experimental evaluations of the suggested candidates.

  • Prediction: Is the mean predicted value, according to the inner machine learning model, for the target value. In this framework, predictions are statistical distributions that can be described by a mean value and an associated spread or uncertainty. This spread is associated not only to the experimental error, but also to the error of the estimation of the model for that particular design, according with the available information in training data. This deviation is not being currently shown, but the acquisition score is displayed on behalf.

  • Acquisition: Is the algorithm's score that weights the predicted outcome (mean predicted value) and the information that can be gained from measuring a individual design. This score combines both exploitation and exploration criteria, helping balance optimization towards looking for global solutions while avoiding local optima. The displayed score is the individual score associated to each candidate, and not the group score, which was used for ranking. The latter takes into account that some candidates were already selected before, so that information gain is different from the displayed individual score.

  • Training: Displays "yes" when the row was in the training set and "no" otherwise. By default, all values are 'no' because no training points are shown. You can add the training samples into display by disabling the recommended candidates filter with the "Only recommended candidates" switch at the top of the table (this will show the predictions for additional candidates that weren't selected to be recommendations) and then enabling the view of training data, with the "Show training samples" switch, also over the top of the table.

How evolutive candidates are selected?

After training, the model is used to evaluate hundreds or thousands of new samples that are generated from the combinatory of available feature values. This evaluation not only considers the predicted value of the target to optimize, but also the estimated uncertainty of each prediction. This value may represent the information that can be learnt from testing a particular sample, and its use can help to attain a better optima after 2 or more experimental iterations.

Once the evaluation is finished, the predicted value and uncertainty are weighted into a new score (Acquisition) that is used to rank the candidates. Candidates are selected sequentially and Acquisition scores are changed dynamically to reflect the expected information gain of each selected candidate.

As only the original acquisition scores are reported in the platform, sometimes it may seem that some candidates with high Acquisition scores were omitted from the list of suggested candidates. However, this is due to the fact that previous selected candidates, on sequential selection, can modify the original Acquisition scores for the remaining samples.

FAQ

  • Why the algorithm chose designs with lower predicted outcomes than other designs that weren't selected?

The predicted outcome (aka mean predicted value) isn't the only criterion by which recommendations are selected. The algorithm uses the Acquisition score, which combines prediction and uncertainty (exploitation and exploration) in order to avoid local optima in the long term. This is very useful if you plan to run this tool multiple times, for several experimental rounds. Alternatively, you can use the Prediction value to make another ranking based only in exploitation if you are restricted to make just 1 additional experimental round after running the tool.


More DISCOVER help articles:

Did this answer your question?