Introduction

Fluorescent microscopy is a major tool in biomedical research and diagnostics [6]. By applying immunohistochemistry and immunofluorescent techniques, biomedical samples can be analyzed and classified in terms of their molecular content by targeted antibody staining and subsequent imaging. This approach allows for the investigation of regulatory mechanisms and subsequent genetic or environmental perturbations from the subcellular to the organism level [1] including cellular heterogeneity [10, 13].

A major challenge for the high-throughput analysis of complex images is the reliable segmentation and quantification of target molecules, due to the inevitable presence of noise and luminescence variability in 3D fluorescent acquisitions [2, 11, 14]. Developments in machine-learning and deep-learning approaches [3] address this issue using training data to perform semi-supervised segmentation. While these approaches can provide rather stable and coherent outputs, they require extensive data sets that are often difficult to acquire when working with human samples.

Here, we address the issue of reliable and efficient segmentation of 3D fluorescent images by a novel parameter-free segmentation (PaFSe) approach. Based on our established one-parameter Protein Relative Abundance Quantification Algorithm (PRAQA) tool [7], we present PaFSe as a parameter-free and fully automated approach tailored towards high-content 3D images. In particular, PaFSe provides a reliable parameter selection for PRAQA, which significantly decreases biases resulting from an inaccurate user parameterization. To measure PaFSe performance, we used synthetic scenarios and compared its performance with the machine-learning-based approach Ilastik, as the state-of-the-art tool for segmentation [3]. We also validated the efficacy of PaFSe by analyzing high-content 3D images taken from human post-mortem brain slices of Alzheimer’s disease (AD) patients and age-matched control (CTL). The samples were stained for phosphorylated tau (P-tau) protein, one of the major hallmarks of dementia. Overall, the analysis shows that PaFSe is an efficient and reliable 3D image segmentation tool especially when dealing with large inter- and intra-sample variability.

The manuscript is organized as follows:

  • Section “Methodology” describes the algorithm, and gives the details for the two synthetic scenarios used for subsequent benchmarking and for the real case application to human brain slices.

  • Section “Results” presents the benchmarking of PaFSe including the comparison with Ilastik and the real application including the comparison with manual segmentation.

  • Section “Conclusion” discusses the scope of PaFSe and summarizes the results.

Methodology

Fig. 1
figure 1

PaFSe architecture (A) and pseudocode (B). A The overarching structure of PaFSe is organized in four modules, where the embedding PaFSe module (blue) loads and exports the data, calls the threshold estimator find\(\theta\)Est (magenta), and subsequently calls the PRAQA segmentation module (green) with the image and estimated threshold parameter \(\theta\) which, in turn, uses the GetPixAvg routine (turquoise) for performing convolution operations. B The corresponding pseudocode is described in detail in the “Methodology” section

The “Methodology” section is divided into three parts. First, the implementation of PaFSe is presented by pseudocodes of the different modules. Second, we provide the rationals behind the synthetic scenarios used to benchmark the performance. Finally, we describe the relevant methods used for the real scenario of classifying human post-mortem brain slices leading to the data set presented in the “Results” section.

Pseudocode

PaFSe is organized in four modules. The corresponding software architecture and pseudocode are presented in Fig. 1A, B, respectively. Briefly, PaFSe loads a 3D confocal image and estimates a suitable parameterization for the subsequent segmentation, which is executed by the PRAQA module. In the following, we present the algorithm by a top–down approach which follows the order of function calls during the execution (Fig. 1A).

PaFSe Pseudocode

The PaFSe module (blue box in Fig. 1) takes as input a 3D image I and initializes \(I_\mathrm{out}\) (line 2) for final storage of the segmented image at the end of the execution. Since differences in signal strength (or luminosity) are very common throughout the third dimension in confocal fluorescent microscopy acquisitions, the algorithm processes each slice of the 3D image separately (lines 3–7). For each slice, the algorithm estimates a an optimal threshold value \(\theta\) (line 5) for the subsequent segmentation by PRAQA by calling the function \(find\theta Est\). An option for parallelizing the computation per slice is available in the Matlab implementation. Once all the slices have been processed, the 3D segmented image is returned (line 8).

find\(\theta\)Est Pseudocode

The function to estimate the threshold \(\theta\) (magenta box in Fig. 1) takes as input a slice of the 3D stack as a 2D image \(I_s\). We found that the appropriate value for \(\theta\) required to successfully segment the image by PRAQA strongly depends on the amount of noise in the slice. We thus estimate the slice noise variance (line 2) with the method proposed in [8], which guarantees robust estimations regardless of the image content. We then use this estimate to gather the correct \(\theta\) parameterization, namely \(\theta _\mathrm{est}\) (line 3). The function that maps each noise estimate \(Noise_\mathrm{est}\) to a value \(\theta _\mathrm{est}\) was obtained based on empirical evidence gathered in synthetic scenarios by searching for the best \(\theta\) parameterization that, once fed to PRAQA, achieves the best segmentation in terms of different performance indexes (see Section  “Definitions for Synthetic Scenario Analysis)”. The function fitNoise follows the trend depicted in Fig. 2. We assess the robustness of the estimates in the Results section “Theta Estimation Error”.

PRAQA Pseudocode

The PRAQA module (green box in Fig. 1) takes as input the 2D image \(I_\mathrm{s}\) and the estimated \(\theta _\mathrm{est}\) threshold as \(\theta _\mathrm{c}\) parameter. First, the slice is normalized by mapping the values in a [0, 1] range (line 2) by

$$\begin{aligned} I_\mathrm{norm} = \frac{I_\mathrm{s} - \mathrm{min}(I_\mathrm{s})}{\mathrm{max}(I_\mathrm{s}) - \mathrm{min}(I_\mathrm{s})}. \end{aligned}$$
(1)

In line 3 of the PRAQA module, a predefined list of Structuring Elements (SE) is loaded into the memory—these SE are used to define local pixel neighborhoods. The SE shapes can be changed by the user to improve accuracy if peculiar protein conformation shapes appear in the acquisition, but for all considered scenarios, three circular SEs with radius 4, 6, and 8 produced satisfying results [7]. For each SE, the algorithm calls the routine getPixAvg (line 6), which returns a vector (\(P_\mathrm{avg}\)) containing the value of the average neighborhood intensity defined by the SE for each pixel as described below.

Next, we normalize \(P_\mathrm{avg}\) by scaling each value by the Median Absolute Deviation (MAD) of the vector itself by

$$\begin{aligned} \mathrm{MAD}_{S}(x) = \frac{\left| x_i - \mathrm{med}(x) \right| }{\mathrm{med}(\left| x_i - \mathrm{med}(x) \right| )}. \end{aligned}$$
(2)

This normalization allows to have comparable \(P_\mathrm{avg}\) arrays for all slices. After having binarized each pixel by using \(\theta _\mathrm{c}\) as threshold (line 7), we obtain a set of binarized images equal to the number of SE’s. Eventually, the algorithm considers a pixel as positive if it has been marked as positive in the majority of the SE analyses (line 10 in the pseudocode). Finally, PRAQA returns the segmented slice \(I_\mathrm{seg}\).

GetPixAvg Pseudocode

The routine GetPixAvg (turquoise box in Fig. 1) takes as input the 2D image \(I_\mathrm{s}\) and a SE. First, a moving average filter based on the neighborhood defined by the SE is created (line 2). Then, the matrix is convoluted with \(I_\mathrm{s}\) to create the filtered image (line 3). Finally, the image is returned.

Fig. 2
figure 2

Dependence of \(\theta\) on the noise strength \(\eta\)

Definitions for Synthetic Scenario Analysis

Fig. 3
figure 3

A Zoom in for the sphere scenario with noise intensities \(\eta = [0, 0.5, 1, 2]\) from left to right, respectively. B Zoom in for the spiral scenario with noise intensities \(\eta = [0, 0.5, 0.75]\) from left to right, respectively

To quantify the performance of PaFSe, we benchmarked and compared our method with supervised segmentation algorithms in terms of two performance indexes. For this purpose, we synthetically generated 3D images that resemble ideal acquisitions without noise or intensity changes, and used those images as ground truth to benchmark the algorithms’ efficiency on the identical images with added noise and intensity changes [7]. The synthetic images were generated by randomly placing two different kinds of shapes, either spheres or spirals, of different sizes throughout the whole 3D image, as depicted in Fig. 3.

We used two performance indexes, Balanced Accuracy (BA) and Precision Recall Average (PR) defined by

$$\begin{aligned} \mathrm{BA} = \frac{\mathrm{TPR} + \mathrm{TNR}}{2}, \end{aligned}$$
(3)

and

$$\begin{aligned} PR = \frac{\frac{\mathrm{TPR}}{\mathrm{TPR}+\mathrm{FPR}}+\frac{\mathrm{TPR}}{\mathrm{TPR}+\mathrm{FNR}}}{2}, \end{aligned}$$
(4)

where TPR is the True-Positive Rate, TNR denotes the True-Negative Rate, FPR denotes the False-Positive Rate and FNR denotes the False-Negative Rate. Both BA and PR indexes are suited to deal with imbalanced classes [9] typically observed in fluorescent acquisitions where only a minority of pixels represent the signal of interest.

We used the synthetic images in the Results Sections “Theta Estimation Error” and “Ilastik Comparison” to assess the performance of the \(\theta\) estimation and to compare the new methodology with Ilastik [3] as a standard machine-learning based tool for image analysis. All synthetic images used in Section “Ilastik Comparison” were trained in Ilastik manually by labeling pixels as positive signal and background in the first, middle, and last slices. The Ilastik feature selection that was applied is specified in Table 1. Using features with \(\sigma\) bigger than 3 produced worse results despite the suggestion in the Ilastik manual to select all features. Furthermore, using the automatic feature selection (Wrapper Method) also produced results worse than the aforementioned optimized parameterization.

Table 1 Feature selection used in Ilastik

Methods Used for Real Scenario

To test the performance of the PaFSe algorithm on images obtained from biological tissue, we used anonymized human brain samples, which were provided by the Douglas-Bell BrainBank (Douglas Mental Health University Institute, Montréal, QC, Canada). All experiments were conducted in accordance with the guidelines of the Ethics Board of the Douglas-Bell Brain Bank and the Ethics Panel of the University of Luxembourg. To assess and compare protein abundance between control and disease conditions, we used human post-mortem brain samples from 11 neuropathologically confirmed AD patients, as well as from 11 age-matched CTL without neurodegenerative manifestations. Immunohistochemistry was performed (as previously described [4, 12]) to reveal the presence of AD-typical accumulations of hyperphosphorylated microtubule-associated protein, Tau (P-tau). The quantification of P-tau accumulations inside neurons, the paired helical filaments and the neurofibrillary tangles, are commonly used in neuropathology to estimate the severity of AD [5]. Briefly, hippocampal brain slices of 70\(\upmu\)m thickness were permeabilized and incubated with the mouse anti-AT8 antibody (against P-tau) (1:500, Thermofisher [MN1020]) and subsequently incubated with a secondary donkey anti-mouse antibody coupled to AlexaFluor 647 (1:400, Jackson ImmunoResearch Laboratories, West Grove, PA and Invitro-gen, Molecular Probes, Eugene, OR). A confocal Laser Scanning Microscope (Zeiss 710) with a 20x air objective was used to acquire 3D tile scans of the hippocampal subfield Cornu Ammonis 1 (CA1) with a resolution of 0.208 \(\upmu\)m in the x and y dimension and a resolution of 1 \(\upmu\)m in the z dimension. To prevent the introduction of variables, identical image acquisition parameters were used for all samples. The surface module of Imaris software (9.5.1 and 9.6) was used for a manually tuned segmentation of the AT8 signal. The volumes of the resulting objects (3D reconstruction) were summed up in a final score representing the percentage of the volume of the stack covered by AT8 staining. These volumetric scores are used as a reference for evaluating the performance of PaFSe in a real case scenario (see Section “Real Scenario”).

Results

The Results section is divided into three parts. First, we quantify the performance of PaFSe in estimating the \(\theta\) parameter in the synthetic scenarios. Then, we compare our unsupervised PaFSe approach with Ilastik segmentation. Finally, we present an application on a real scenario by classifying human post-mortem brain samples of dementia patients by comparing manually segmented images with PaFSe segmentations.

Theta Estimation Error

Fig. 4
figure 4

Optimal \(\theta\) evaluated by BA and PR versus the estimated \(\theta\) value per different levels of noise \(\eta\)

Fig. 5
figure 5

Sphere scenario. Left: comparison between the optimal \(\theta\) value (solid line) versus the estimated \(\theta\) value (dashed line) for the different performance indexes. Right: difference in performance between the optimal \(\theta\) value and the estimated \(\theta\) value

Fig. 6
figure 6

Spiral scenario. Left: comparison between the optimal \(\theta\) value (solid line) versus the estimated \(\theta\) value (dashed line) for the different performance indexes. Right: difference in performance between the optimal \(\theta\) value and the estimated \(\theta\) value

To identify the value of \(\theta\) that leads to the best segmentation, we ran PRAQA with \(\theta\) starting from 0 up to 10 with a step size of 0.1 for each 2D image slice. We performed this operation with different levels of noise (\(\eta = 0\) to \(\eta = 3\) with a step of 0.1), and results are based on an average of 30 runs for each specific noise level. We achieved two optimal \(\theta\) values: one based on the best segmentation in terms of BA, and one in terms of PR.

Figure 4 shows the \(\theta\) estimates together with the two optimal \(\theta\) values in dependence on the noise strength \(\eta\). While for the simpler Sphere Scenario, the estimate is extremely close to the optimal value, the Spiral Scenario exhibited more pronounced differences. Nevertheless, given the flexibility of the PRAQA parameterization, the loss in BA and PR is negligible, as shown in Figs. 5 and  6 for the Sphere and Spiral synthetic scenarios, respectively. We observed a loss in performance independent of the noise level \(\eta\), that reaches a maximum of 0.003 in the Sphere Scenario, and 0.035 in the Spiral Scenario.

Ilastik Comparison

Fig. 7
figure 7

Segmentation output comparison between Ilastik and PaFSe with \(\eta = 0.5\) for the Sphere and Spiral scenarios

Table 2 Performance comparison for PaFSe and Ilastik (ILA) for the Sphere scenarios with \(\eta = [0.5,1,2]\)
Table 3 Performance of PaFSe and Ilastik for the Spiral scenarios with \(\eta = [0.5,0.75]\)

The performance results of PaFSe and Ilastik for the Sphere and Spiral scenarios exhibited a loss in performance of only 3–\(4\%\) in both performance indexes for \(\eta = 0.5\) (Tables 2 and Table 3, respectively). As depicted in Fig. 7, PaFSe segmentation resulted less accurate in defining the objects contours but almost free of false-positive pixels’ classification when compared to Ilastik.

The advantage of using circular SE led to a better performance of PaFSe in the Sphere Scenario when high noise is considered (\(\eta = 1\)). For extreme noise conditions (\(\eta = 2\)), Ilastik performed slightly better, although these scenarios would likely fail a general quality assessment of a real microscopy image acquisition.

Real Scenario

Fig. 8
figure 8

3D projection of segmentation output comparison between manual and PaFSe segmentation for P-tau in a brain slice of a CTL subject (top) and of an AD patient (bottom). Shown images represent a surface of 246 \(\times\) 378 \(\upmu\)m \(\times\) 42 \(\upmu\)m brain slices

Fig. 9
figure 9

2D segmentation output comparison between manual and PaFSe segmentation for P-tau in a brain slice of an AD patient. Each column represents the same acquisition region coming from three subsequent slices (note the intensity variation as the slice number increases). The first row represents the original images, and the second row represents the original images with maximized saturation, while the third and fourth rows, respectively, depict the manual and PaFSe segmentation overlayed with the second row. Shown images represent a surface of 90 \(\times\) 90 \(\upmu\)m brain slices

After the performance analysis of PaFSe, we next applied PaFSe to a real application in order to quantify protein abundance in human post-mortem brain slices from AD patients and age-matched CTL subjects. The accumulation of abnormal proteins is a common pathological feature of many neurodegenerative diseases including AD and Dementia with Lewy Bodies. While the mechanisms that drive the aberrant accumulation of these proteins remains elusive, the abundance and regions affected are characteristic for different diseases and are often used for post-mortem diagnoses [5]. AD is characterized by the accumulation and deposition of amyloid beta and P-tau proteins. Hence, a reliable segmentation and quantification of these proteins is crucial for diagnostic and research approaches.

As a proof of concept, we applied PaFSe to 3D image stacks taken from hippocampal samples stained for P-tau of 11 AD patients and 11 CTL subjects, and compared the results to manual segmentation performed by a neuropathologist. For both CTL and AD conditions, PaFSe was able to automatically segment P-tau abundance in a reliable manner and similar to the manual approach of the neuropathology expert as exemplified in Fig. 8, where a maximum intensity projection of a 3D image stack is shown. These examples show that PaFSe exhibits a trend for more positively identified pixels compared to the manual segmentation but in an apparently consistent manner and in agreement with the low intensity signal in the original image. A closer 2D inspection of individual image slices is presented in Fig. 9 where subsequent 2D slices imaged from an AD sample are considered. While the manual segmentation is more coherent with the lower intensity raw image, PaFSe segmentation resolves also subtle structures in greater detail despite the luminescence variability.

To quantitatively compare the manual and PaFSe segmentation, we determined the P-tau density as the amount of positive voxels normalized by the whole volume of the acquisition for both approaches, and subsequently calculated the correlation between these densities (Fig. 10). We found a Pearson Correlation Coefficient of 0.9705 and 0.9276 with the corresponding p values of \(p=0.0000007\) and \(p=0.00003\) for the CTL and AD conditions, respectively. The quantification obtained with the PaFSe segmentation is on average five times larger than the those obtained by manual segmentation, since the manual segmentation was executed in a more conservative way by not considering voxels with lower intensity and thereby neglecting more subtle structures, as shown in Fig. 9. Despite this systematic difference, both quantification led to an identical Mann–Whitney U test p value up to a precision of \(10^{-7}\) when comparing AD and CTL densities. We conclude that while the use of density estimates may be discouraged for model parameterization where a higher precision is required, PaFSe density quantification can be successfully applied to observe and assess differences among different sample categories.

Fig. 10
figure 10

Density quantification scatter plots for P-tau aggregations for CTL and AD conditions, respectively. The grey dashed line represents the optimal correlation

Conclusion

Here, we introduced PaFSe, a parameter-free algorithm for segmenting protein aggregations in 3D fluorescent image acquisitions which is tailored particularly to tackle luminescence variability of fluorescent microscopy images. PaFSe is based on our previously established PRAQA approach [7], where extensive benchmarking has demonstrated its computational efficiency and robustness with respect to intensity changes and noise. A drawback of PRAQA was the choice of the threshold parameter \(\theta\) which could still induce user-triggered biases in the analysis of, e.g., samples of different origin or quality. PaFSe addresses this issue by the additional modules that allow for an automatic estimate of the optimal segmentation parameter \(\theta\) (Section “Pseudocode”) leading to a parameter-free and therefore more robust and user independent approach.

For the establishment of PaFSe and demonstration of its potential, we investigated its performance with synthetic scenarios (Sect. 2.3) and showed a real case application in Section “Definitions for Synthetic Scenario Analysis”. For the performance test of PaFSe, we used the synthetic scenario to assess the \(\theta\) parameter estimates (Section “Theta Estimation Error”) and to compare the new methodology with the semi-supervised approach of Ilastik (Section “Ilastik Comparison”). We found that PaFSe performs similarly to Ilastik in scenarios with low and medium amounts of noise (\(\eta = 0.5\)–1) with a very limited loss in performance and without the need for time consuming and potentially bias-inducing parameterizations and trainings. Finally, in the real case application, we compared the protein quantification of P-tau in human hippocampal samples obtained from PaFSe with a manual segmentation (Section “Real Scenario”). The analysis showed that, despite the linear shift in absolute values, PaFSe quantification yields the same statistical outcome when comparing the abundance of P-tau in CTL and AD populations. Hence, PaFSe as a parameter-free and automated approach is a novel and efficient method to identify differences in protein abundance among diverse sample populations and is able to address luminescence variability observed specifically in acquisitions acquired from thick sections.