AI-dvancing cell and gene therapies: an interview with Manoel Veiga

Written by RegMedNet

In this interview, we spoke to Manoel Veiga about how AI technology translates images into quantitative data and what is involved in implementing high-content screening solutions in your lab.

This interview is part of the RegMedNet In Focus on AI for cell culture. Discover expert opinions on this topic by visiting our feature homepage.

Visit the feature


Meet the interviewee

Manoel Veiga, Application Specialist, Life Science Research

Manoel Veiga earned his PhD in physical chemistry at the University of Santiago de Compostela (Spain), working with picosecond and femtosecond time-resolved spectroscopy. Following two postdoctorates at the Complutense University of Madrid (Spain) and the University of Münster (Germany), he worked for PicoQuant (Berlin, Germany) as a senior scientist in the fields of time-resolved spectroscopy, fluorescence lifetime imaging microscopy and fluorescence correlation spectroscopy. Manoel now works as a global application specialist at Evident in Germany, where he focuses on high-content analysis and deep learning.

Questions

What is the difference between AI, machine learning and deep learning?

Artificial Intelligence (AI) is a broad term that encompasses both Machine Learning (ML) and Deep Learning (DL), but also other disciplines like robotics. It describes the ability of computers to perform complex tasks commonly associated with humans for example, driving a car, playing chess, recognizing a specific face from a picture or translating a text into another language.

ML is a branch of AI that uses algorithms that can make predictions in input data based on the analysis of many parameters. The interesting thing is that these algorithms are not programmed in a classical way with a defined set of rules.  Instead, the algorithms are designed in such a way that they can learn from examples (i.e., the parameter values are being optimized with the examples). Hence, they can improve with increasing example data and adapt to new situations. The process in which an algorithm is fed with example data to learn how to perform a specific task is called training. For example, let’s imagine that the specific task is to recognize a type of rare cells in a set of images. A way of providing example data is by annotating those rare cells with software tools and indicating the key parameters to differentiate them (texture, intensity, elongation factor, etc.). The algorithm will tune the thresholds of those selected parameters to adapt to the examples, and a ML model is produced. When a new image is presented to the ML model, it should be able to detect the “rare” cells in it.

DL is a subset of ML in which the algorithms are Deep Neural Networks, which can compare millions of parameters in a similar way as it happens in human brains. Unlike in classical ML, the parameters are not set by the user but reside within the deep neural network architecture. By checking those millions of parameters it can discover patterns and trends, enabling very accurate predictions. On the other hand, to fully exploit their potential they need many more examples than classical ML algorithms and much longer training times. Due to their superior performance in image recognition tasks, they are revolutionizing imaging analysis.

How does AI technology implemented in scanR, translate images into quantitative data?

The AI technology implemented in scanR is used in the image analysis pipeline of microscopy images, and more precisely in the image segmentation and classification step.  Nowadays, image analysis pipelines require several steps: image processing, object segmentation (i.e., detection of specific objects), extraction of parameters of the segmented objects (area, length, intensities of protein markers, etc.), classification of the segmented objects (by looking at their parameters or directly by an AI prediction) and finally the comparison of data across several images or samples. The last three steps provide the quantitative data. However, the most difficult step is the accurate segmentation of the objects of interest and their classification, which is where AI comes into play. Accurate automated segmentation and classification are challenging because microscopy images can vary strongly depending on the sample preparation, image modality used, noise level, magnification, or resolution. If the objects are not properly segmented and classified the parameters extracted will not resemble reality. Hence, image analysis will not be reliable.

scanR comes with pre-trained AI models that recognize (regardless of image variations) the most common objects used in life science image analysis, which are:  nuclei, cells and spots. Apart from that, it also provides the training tools to develop Deep Learning Models for the segmentation and classification of any other object of interest in microscopy images.

What is the difference between high-throughput and high-content screening?

High-throughput screening (HTS) and high-content screening (HCS) have in common that they both use automated microscopes for imaging wellplates and software for automatic analysis of the large amount of data generated. Each of the wells of a wellplate corresponds to a different sample treatment (genetic, pharmacological, chemical, etc.,) and the aim is to have a fast comparative analysis of the treatments.

HTS and HCS differ in sample throughput and in the richness of information extracted from each sample. HTS prioritizes analyzing a high volume of samples  ̶  it can image and analyze thousands of samples in a single day.  In order to work quickly, only a few parameters are extracted, so the amount of information obtained per sample is limited. These are normally targeted screens in drug discovery in which what is being screened are the treatments, and the final goal is to elucidate which of the treatments affect the cells in the desired manner.

HCS can also have a large throughput, but it prioritizes extracting high amounts of data from each sample at the expense of speed. This is where the term High Content comes from (high content of information). Here, image analysis involves the segmentation and classification of several objects and the extraction of many parameters to gain rich information on the biology. Hence, HCS balances the collection of rich data with high efficiency. Typically, rather than drug discovery, the goal is to understand the underlying biology from biomolecular pathways.

In that sense, HCS can also be done on a limited number of samples (for example, eight samples in an eight-chamber slide) and what is being screened are not the treatments, but the individual cells contained in each of the chambers. These studies render very high statistics and robust interpretation of the underlying biology. Additionally, since so much information can be gathered from a single screen, it can increase the discovery of new information versus clarifying existing models.

What’s involved in implementing high-content screening solutions for cell and gene therapy applications?

The first step towards implementing high-content screening is understanding at what stage of your workflow, and more broadly, what stage of research development (R&D) you want to implement it. For example, in gene therapies, HCS can provide valuable information from early-stage R&D, where it can help to find the best method for gene modification (CRISPR, viral vector, etc.), or at later stages of R&D where it can determine the effects of those genetic modifications by checking if it produces the desired cell behavior.

Once decided, you have to develop workflows for automated sample preparation, image acquisition and image analysis. This can require several iterations since all three are strongly correlated  ̶  What type of cells will you use? Will it be an end-point assay or a live cell assay? Do you need fluorescent markers, or can you work label-free? Does classical segmentation work with your cells or do you need AI? Do general AI models work with the cell lines you are working with, or do you need to develop your own AI models? What are the key parameters you want to extract from your cells?

There are many potential applications for HCS implementation across the regenerative medicine field. HCS can help to evaluate stem cells throughout the differentiation process to help determine which stem cells to use, test differentiation protocols, or ensure quality control.  In tissue engineering, HCS can ensure that you are working with a population of cells likely to succeed at tissue formation and view the impact of culturing those cells on or in different substrates to assess their biological relevance.  HCS can also be used in the development of quality control procedures throughout the development and production process by helping to determine and then evaluating markers that indicate success.

Can researchers who do not work with large image data sets, such as those in pre-clinical stages, still benefit from using the scanR software?

Absolutely. And the definition of a large data set is somehow not very specific in microscopy. A single microscope image can have thousands of cells, and a larger stitched image covering a whole well in a 96-well plate can easily go up to 50,000 cells. Hence, by just imaging 20 wells of a wellplate, one million cells can be imaged and individually analyzed to obtain rich information with very good cell statistics.

Apart from that, the extracted parameters in scanR can be directly arranged in 1D histograms or 2D scatterplots, like in flow cytometry, in which cell populations are easily identified even if these happen at very low percentages. It happens often with scanR users that when creating scatterplots, they discover hidden cell populations that they were not aware of. But unlike flow cytometry, these cell populations can be seen as image data and even be revisited in life cell time-lapse experiments.  All of this helps to broaden the understanding of the underlying biology under study.  There is a big community of scanR users that have developed an imaging and analysis pipeline that they have coined Quantitative Image-Based Cytometry (QIBC) and that makes use of the interpretation of scatterplot profiles.

Another great usage of scanR is in assay development. To develop an assay, the goal is to make it sensitive, specific, robust, reproducible, and fast. Let’s say that there is an established assay that used four fluorescent channels and 20 read-out parameters. scanR can help to simplify the assay by understanding the minimum number of parameters needed to get the same results, or even with the help of the AI tools to evaluate if the same results are possible label-free, without fluorescence stains being involved.

What limitations still need to be overcome for Deep Leering-powered image analysis to be fully adopted?

In my opinion, there are two main limitations: the hardware needed to use AI and the perception of AI by some researchers.

To train or to apply AI models, a powerful Graphics Processing Unit (GPU) is needed in the computer. These can quickly compare the millions of parameters analyzed with DL. To give an example, the segmentation of a simple image using a DL model takes one second or less with a GPU, whereas it takes one minute or more without a GPU. Hence, to analyze hundreds of images in an automated way GPUs become almost mandatory.  Luckily, GPUs are installed in most microscopy workstations, but they are also slowly getting introduced in standard computers and even in laptops.

Regarding how researchers perceive AI, I have to say that when we started to implement AI in our software packages some customers were approaching us stating: “I have a set of images, please show me what AI can do with them,” but no further information was provided. AI was expected to be a kind of “magic” that would just look at the images and almost elucidate the underlying biology. But since AI does not work like this some users were quickly disappointed.

On the other hand, there are also customers who would like to use AI for specific tasks for which classical methods fail (e.g., difficult segmentation or classifications) where AI can really help. However, many customers still think that they need to have programing skills to develop AI models or to use them. Fortunately, this is not the case anymore, at least with commercial software, but this message has not reached a broad audience yet.

Another key point is to understand how much annotation effort researchers are willing to invest to develop their own AI models. In general, if little effort is provided the developed AI models will only work in the data sets they have been trained with and for each new data set a new AI model needs to be created. This produces robust results, but in my opinion it is inefficient. It makes more sense to put in more effort and develop models that will work on a broad range of data sets, for which Deep Learning is really optimal. Once this initial effort is done no further training would be needed. But to achieve this users need to understand that they need to design Deep Learning training strategies for their applications, for which they need to annotate images with as many variations as they expect in their experimental conditions. In my experience, the annotation effort always pays off when using Deep Learning.


Disclaimer
The opinions expressed in this interview are those of the interviewee and do not necessarily reflect the views of RegMedNet or Future Science Group.

In association with