Keywords

1 Introduction

In Assisted Reproductive Technology (ART) procedures, eggs are fertilized outside the body. The fertilized eggs called embryos are cultivated in a controlled environment before being transferred to the woman. The selection of an embryo for transfer is based on the embryologist’s evaluation of its quality. Embryos are typically assessed using morphological features such as cell count being specific to a cell stage or the size of the cells and the duration of the different cell stages [5]. The morphokinetic parameters include the period of successive embryonic cell divisions leading chronologically to the 2-cell stage (for two cells), 3-cell stage (for three cells), 4-cell stage (for four cells), 5-cell stage, 6-cell stage, 7-cell stage, 8-cell stage, 9+-cell stage and finally morula, which is a compacted structure made of small size cells in the range of 8—16 followed by blastocyst which is made up of about hundred cells. The cell stages of embryo development are shown in Fig. 1. The duration of different cell stages has proved to be significant in evaluating the embryo quality [18]. A simple way for calculating the duration is by counting the number of cells and tracking cell division, which requires the continuous monitoring of the developing embryo. The time-lapse technology (TLT) systems now used in many clinics are capable of providing digital images of embryos at frequent time intervals [14]. In a vast majority of cases, the output from TLT systems is still analysed by embryologists who manually annotate morphological features, abnormal cleavage pattern that are correlated to embryo quality [6] and duration of cell stages, thus introducing intra- and interobserver variability [17]. Some TLT systems though, allow computer-assisted annotation which might reduce the intra- and interobserver variability among embryologists [9], but the usage of the feature can incur additional costs. Recently, the application of object detection algorithms in the field of medical imaging has proven to provide fast and accurate results [10, 12].

Fig. 1.
figure 1

Cell stages of human embryo development.

In this study, we have developed an approach to locate cells in the images depicting embryonic development. The approach was developed and evaluated based on TLT images. The images were the frames of TLT videos. The suggested approach was able to count the number of cells in each TLT frame, track the detected cells and cell divisions in consecutive frames. Our approach also identified different cell stages. The suggested approach employed YOLO v5 to detect cells present in the frames. The approach further tracked each individual cell across different cell stages by marking each cell boundary with distinct colored circular overlays. The distinct color scheme helped the embryologists in tracking individual cells, their cell divisions and identifying cell cleavages over the course of the TLT video. The average processing time taken by our approach was 8 s for a TLT video. The methodology could also detect abnormal cleavage pattern such as direct cleavage [16] and reverse cleavage [11].

We used six performance metric to evaluate the software’s performance in detecting cell stages and the software performed best for 2-cell stage detection and the performance was reducing with increase in the number of cells inside the embryo. The performance of our method was validated by embryologists and they considered tracking of cells with colored overlays as useful. The main contributions of this study were: (i) Using our method, the embryologists could accurately detect cells, track cell divisions and determine cell cleavage stages up to 5 cells; (ii) our approach has the potential for detecting abnormal cleavage patterns in human embryo development; and (iii) this approach could generate accurate annotations for the morphokinetics related to cell cleavages and cell-stages in 8 s for TLT videos with the frame rate of 20 on an average.

2 Methods and Materials

2.1 Data

The dataset was collected retrospectively at Fertilitetssenteret, a fertility clinic in Oslo, Norway, and consisted of TLT videos of human embryo development. The embryos were cultured inside a time-lapse system called EmbryoscopeTM (Vitrolife, Denmark).

Time-Lapse Imaging. The introduction of TLT in ART practices enables continuous monitoring of embryos throughout their whole culture period. EmbryoscopeTM is an incubator equipped with an inbuilt microscope and a camera. For each embryo placed inside the incubator, the system took 8-bit images at several focal planes (number varying between 3 or 5) between every 10–15 min. Each 8-bit image has a resolution of 500\(\,\times \,\)500 pixels. By using time-lapse imaging (TLI) images, embryologists gets insights into the morphokinetics associated with the embryo cell development without removing embryos from the incubators [7]. Later for every TLT video the embryologists analyzed each video’s frame (8-bit image) and manually annotated starting of an observed cell-stage. The observed cell stages were as: 2-cell, 3-cell, 4-cell, 5-cell, 8-cell, 9+-cell, morula and blastocyst. In this study, we used 890 TLT videos from which we extracted the frames corresponding to the annotated start of a cell cleavage stages. It resulted in total of 2785 images and each cell stage had 350 images except for Blastocyst with 335 images. We denoted this as Dataset I and used it to train the object detection algorithm. A second dataset, Dataset II, was also created comprising of 11 other TLT videos. We annotated this dataset for the start of observed cell-stages using our methodology. Dataset II was used as an independent dataset.

Abnormal Cleavage Patterns. A successful fertilization between sperm and egg results in a fertilized egg which over next few days undergoes a series of cell division progressing through the cell stages. The embryo should cleaves every 12 or 24 h. Thus, by the time an embryo has reached Day 3 of development, it should be between four and eight cells. [1]. The continuous monitoring of embryo morphology using TLT revealed certain abnormal cell cleavage pattern [4]. One such pattern is reverse cleavage which is defined as a decrease in the number of cell during cell division. This means that cells in a cell stage fused together to form a cell (reducing cell count) and they cleaved again after that [11]. Another abnormal cleavage pattern is direct cleavage which occurs when a cell divides directly into three more daughter cells [16]. Such abnormal cleavages correlate with impaired embryo development and implantation potential [13, 19] and should be detected.

Ethical Consideration. A fully anonymized data was collected after the approval by Regional Committee for Medical and Health Research Ethics - South East Norway (REC). All experiments were performed in accordance with the guidelines and regulations of REC, and the General Data Protection Regulations.

2.2 Object Detection

Object detection is fundamental task in image processing. It is a form of image classification where method predict objects in an image using bounding boxes around the objects. It is referred as the detection and localization of objects in an image, where the objects belong to predefined classes [2]. In recent years, due to contribution of deep learning (DL), and especially convolutional neural network (CNN), object detection models outperforms specifically in field of medical imaging [12]. The convolutional kernels in the models extract features, layer by layer and obtain the probabilities of candidate bounding boxes belonging to different classes. The object detection models can be categorised as: one stage network such as You Only Look Once (YOLO) [15] and two stage network such as Fast R-CNN [8]. A two stage object detection model breaks down object detection into two task, first detects possible object region and then classify the image in those regions into predefined classes [2]. Whereas, YOLO as a one stage network, proposes the use of an end-to-end neural network that processes the whole picture by dividing it into N grids with equal dimensional region. Each of these grids predicts the probability of object classes being present in the grid along with object label and bounding box coordinates relative to grid’s cell coordinates. The bounding boxes are weighted by the expected probability of each object. Then, YOLO using non maximal suppression technique to suppress all bounding boxes with lower probability scores. YOLO uses the metric mean Average Precision (mAP) for measuring the decision performance while predicting bounding boxes for object classes. mAP is the mean of the Average Precision (AP) for all object classes. AP is the summary of the precision-sensitivity curve for YOLO v5 predicting bounding box per object class into a single value that provides average of all precision values [2]. If we want to apply object detection in real time videos at fertility clinic, algorithm speed should be fast. YOLO is a much faster algorithm than its counterparts [2]. Thus, in this study, we used YOLO v5 to detect object classes: cell, morula and blastocyst in the frames of TLI videos. The annotated location of the object classes in the training images (Dataset I) and YOLO v5 predictions on Dataset I were reviewed by embryologists. The mAP for object cell was 0.65, morula 0.78 and for blastocyst was 0.80.

2.3 Colored Circular Overlay Algorithm

In this section we explained the suggested algorithm to add colored circular overlay to embryo cells. Our approach first used YOLO v5 to detect cells present in frames of TLI videos. Once we got bounding boxes or coordinates for the detected cells, then we used OpenCV library to mark each cell boundary with different colored circular overlay. After detecting cells with distinct colored overlay, the methodology computed the cell count and recorded coordinates for each cell. The assigned color to a cell was maintained until the cell divided into daughter cells. Later, each daughter cell got a distinct coloured overlay for itself. The methodology recognized the daughter cells as unique individual cells and kept track of them in the succeeding frames using the color of the overlay. After processing the whole TLI videos, the methodology provided a new version of the input TLI video, where the frames had colored overlays on detected cell boundaries in each video frames.

If cell count remained same between consecutive frames, for the current frame, our methodology calculated proximity between each cell in the current frame to the cells detected in the preceding frame. The proximity was calculated using the difference between the coordinates of two cells, the first cell from the current frame and the second cell from preceding frame. If the calculated proximity lay within a specific threshold, the methodology copied color scheme of the cells from preceding frame to the cells in current frame. This way cell tracking using colored overlays was performed. The proximity threshold used in our algorithm was 0.10 for cell count less than 4, 0.05 for count greater than 4.

If cell count differed between consecutive frames, our methodology checked whether the current frame has higher cell count than the preceding frame. If true, then there was a possibility that one of the cell might have cleaved into daughter cells. The methodology detected the parent cell from preceding frame using same concept of proximity and assigned color of parent to daughter cells recognizing the frame with cell division. The methodology, then, annotated the current frame as the start of cleavage of a cell-stage. The cell-stage was corresponding to the number of detected cells. If false, or the cell count for the current frame being lower than the cell count of preceding frame, the methodology still calculated proximity between cells and copied the matching color scheme. The lower count the for current frame could be case of abnormal cleavage or few cells not being detected by YOLO v5.

3 Results

To test our methodology we used Dataset II for cell tracking and detecting cell cleavage stages. The methodology processed each video in the dataset and generated a corresponding video with colored circular overlays on detected cells in every video frame. The embryologists could track a cell using the color of overlay for that cell. Starting from the first frame, our methodology assigned distinct color to each cell and that color was maintained up until the cell divided. Then the daughter cells were also assigned different color overlays from the next frame. In Fig. 2, we present few frames extracted from a video generated by our methodology present in the bottom row. The top row shows actual video frames. The frames in the bottom row, have colored circular overlay marking the boundary of detected cell and same color scheme is maintained until cell division. The cell division can be seen in frame 5 and 7 of Fig. 2 and distinct colored overlay for each cell in the succeeding frames 6 and 8 of Fig. 2.

Fig. 2.
figure 2

Extracted frames from TLI video of embryo development til 4-cell stage. The top row shows actual video frames and the bottom row shows our method’s output with colored overlay on each cell. In frame 1, single cell divided into 2 cell as shown in frame 3 and 4. The yellow colored cell divided in frame 5. From frame 6, our method annotated 3-cell stage, each cell with distinct color. The blue colored cell starts to divide in frame 7 and 4-cell stage was annotated from frame 8. (Color figure online)

3.1 Comparison with Embryologists

Two embryologists independently validated the performance of our methodology. To this end, they verified the number of detected cells, in each frame of the generated videos. They also verified that the starting of cell stage, as annotated by the methodology, was either exact match to their annotation or varied by only a few frames on average. It was observed that our methodology detected cells, tracked cell division and precisely annotated the start of each cell stage up up to 5-cell one. For stages with cell count above five, the annotated start of cleavage was later than actual by 9 to 10 frames on an average. In Fig. 3 we present some frames extracted from a video with embryo development til 9-cell stage. Our methodology could detect cells and tracked cell divisions accurately up up to 5 cell-stage, as seen from frames 1 to 8 of Fig. 3. When cell count exceeded five the methodology confused between overlapping cell boundaries and either missed detecting a cell (frame 12 of Fig. 3) or detected incorrect location for cell (yellow circle in frame 9 of Fig. 3).

Fig. 3.
figure 3

Extracted frames from TLI video of embryo development til 9-cell stage. The top row shows actual video frames and the bottom row shows our method’s output with colored overlay on each cell. The green colored cell divided in frame 4. From frame 5, our method annotated it as 3-cell stage and tracked the cell division from frame 6: blue colored overlay. The 4-cell stage was annotated in frame 7. In frame 9, incorrect cell location was detected: yellow overlay but correct cell count was detected in frame 10 and 11. Again, a cell was missed in frame 12. (Color figure online)

3.2 Cell Counting Performance

Next, we evaluated the performance of our methodology using the following six performance metrics: sensitivity (SENS), precision (PREC), specificity (SPEC), accuracy (ACC), F1-score (F1), and the Matthews correlation coefficient (MCC). Using multiple metrics provides a more reliable and robust insight into the real capabilities of our approach. We measured the efficiency of the methodology in reporting the correct cell count in a frame, tracking of cell division and annotating the start of a cell cleavage stage. The results were validated by the embryologists using the criteria based on cell count, detected cell boundary, for cell division picking correct parent for the daughter cells and matching our methodology’s annotation with their annotation for the start of a cell-stage. The metric MCC is a reliable statistical rate giving high scores only if the prediction (frame belonging to a cell stage) obtained good results in all of the four confusion matrix categories [3]. MCC measures the difference between actual label (frame annotated by embryologist for belonging to a cell stage) and predicted label (frame annotated by our methodology for belonging to a cell stage). A MCC value lies between −1 to 1. A negative MCC value indicates that there is no agreement between actual and predicted label. While MCC value around zero indicates model decides randomly and a value above zero indicates correct prediction. Our methodology obtained an MCC of 0.77 for predicting start of cleavage stages up up to 5-cell stage. We observed that sometimes the overlay color changes for cells abruptly between frames or wrong parent was chosen for the daughter cells. We labelled these predictions as incorrect. Next, to quantify the performance of our methodology we used the performance metrics as listed in Table 1. The methodology performed best for 2-cell stage (precision = 0.91, sensitivity = 0.98, highest F1-score = 0.95). The detectiom of 1-cell stage was quite accurate (precision = 0.99, sensitivity = 0.86, high F1-score = 0.91) but, a few instances of 1-cell stage were misclassified as morula. A few instances of 4-cell stage were also misclassified with 3-cell and 5-cell stage, but our methodology mostly detected 4-cell stage accurately (high precision = 0.87, low sensitivity = 0.62, high F1-score = 0.73). A higher number of instances of 3-cell and 5-cell stage were misclassified with other stages, still the detection of the cleavage stage was better than random: 3-cell (average precision = 0.46, high sensitivity = 0.93, average F1-score = 0.61), 5-cell (high precision = 1.0, low sensitivity = 0.31, average F1-score = 0.47). For cell stages with cell count greater than 5 we observed poor performance of our methodology as sensitivity, precision and F1-score for the stages was below 0.40. Further, we did not evaluate our methodology for these cell stages.

Table 1. Evaluation results of the performance metrics on Dataset II for detecting embryo cell cleavage stages using our methodology

We observed the similar pattern in the receiver operating characteristic (ROC) curve for cell stages up upto 5-cell stages. As shown in Fig. 4 the area under the curve (AUC) is maximum for 2-cell stage and minimum for 5-cell stage. Thus, our methodology performed best in detecting and tracking cell division for 2-cell stage and is worst for 5-cell stage.

Fig. 4.
figure 4

ROC curve for the software detecting embryo cell cleavage stages on Dataset II.

3.3 Computational Efficiency

We also calculated the processing time taken by our methodology. The processing time included the duration for video processing and generating its corresponding video with colored overlays on Dataset II. On an average 8 s were required. If we divide Dataset II into two groups: (i) A: videos upto 5-cell stage. (ii) B: videos having cell stage with cell count greater than five. Our methodology, for A reported 4 s and for B reported 19 s as an average processing time. The average number of processed frames per second (fps) for videos in Dataset II was 20, 8 fps for A and 33 fps for B. This is far quicker than the real-time progression of embryos, and the processing time do not pose any practical delay for the embryologists using the method for embryo assessment.

3.4 Anomaly Detection

We further evaluated whether our method could detect anomalies in the embryo development. In Dataset II, there were two TLI videos with instances of direct cleavage and reverse cleavage. Figure 5 shows frames from one of these video where our method detected anomalies. For direct cleavage the single cell divided into 3 cells. Reverse cleavage was observed on 3-cell stage (2 cells fused into one and later divided again into 2 cells) and 4-cell stage (2 cells fused into one cell). The abnormal cleavage pattern detected by our methodology was validated by the embryologists as correct detection.

Fig. 5.
figure 5

Extracted frames from TLI video of embryo development til 4-cell stage. The top row shows actual video frames and the bottom row shows our method’s output with colored overlay on each cell. First two frames from left shows direct cleavage of single cell to 3 cells. The next three frames show reverse cleavage from 3 cells to 2 cells and again 3 cells. The last two frames on right show reverse cleavage from 4 cells to 3 cells.

4 Discussion

Our method detected cells, cell divisions and cleavage stages up to 5 cells. For single cell or 1-cell stage detection, it performed with high precision, but also misclassification with the stage morula was observed. This could be attributed to the compacted structure of morula that has high resemblance to 1-cell stage. Our approach performed best in the detection of 2-cell stage, and the performance reduced on much higher scale while detecting cells or reporting cell stages having cell count greater than five. The methodology detected those cell stages later than their actual cleavage and it was because of increased overlapping between neighbouring cell boundaries. With the higher cell count, the structure of a cell-stage gets more complex and cells tend to lie on top of each other, making cell counting more difficult. The methodology considered two cells as one because YOLO v5 is trained to analyse a 2-D image and the depth information (3-D view) directing towards potential overlap is missing. We observed that for cell stages three and five, there were high fluctuation in reported values for the performance indicators such as sensitivity and precision. 3-cell stage had lower precision and higher sensitivity while the 5-cell stage had lower sensitivity and higher precision. For these stages, the imbalance in the performance of our approach was because the overlay’s color changed for cells abruptly between the frames.

Once a cell stage was detected using our approach, in the consecutive frames less number of cells were detected by YOLO v5, and then again the correct count was reported. Thus, the training dataset for object detection need to be more comprehensive. If there is some noise in the images or some situations that are not covered by the training data, the robustness of the object detection model will be reduced [12]. Our methodology was time efficient and could generate videos with colored overlays with annotated cell stages in 8 s on average for Dataset II videos with 20 fps on average. In comparison, the camera in time-lapse incubator captures images of an embryo after 10–15 min. This shows that the inclusion of our methodology to process TLT videos will not bear any additional time delay and will support embryologist in decision making. Thus, our approach can be included in real time.

The methodology can help in reducing the subjectivity associated with the assessment of an embryo’s quality. The methodology also proved potential for detecting abnormal cleavage pattern which can be useful for embryologist while assessing embryo’s quality and viability to be transferred to female body.

5 Conclusion

Object detection proved to be pragmatic for ART. Overall, our approach successfully detected cells, effectively tracked cell divisions and accurately determined cleavage stages up up to 5 cell-stage. Our approach was time efficient and can be used in the real time processing of TLI videos without introducing an additional time delay. Tracking cell division using our methodology seems to have potential for detecting abrupt cleavage patterns such as direct or reverse cleavage. Qualitative evaluation by embryologists resulted in the overall verdict that the methodology is useful and seems promising for clinical practice. We also hypothesise that using a larger dataset for training and including images from other focal planes, to provide depth information, will enable our methodology to detect overlapping cells and cell cleavage stages with cell count greater than five.