Refereed journal article or data article (A1)
H&E Multi-Laboratory Staining Variance Exploration with Machine Learning
List of Authors: Prezja Fabi, Polonen Ilkka, Ayramo Sami, Ruusuvuori Pekka, Kuopio Teijo
Publisher: MDPI
Publication year: 2022
Journal: Applied Sciences
Journal name in source: APPLIED SCIENCES-BASEL
Journal acronym: APPL SCI-BASEL
Volume number: 12
Issue number: 15
Number of pages: 25
DOI: http://dx.doi.org/10.3390/app12157511
URL: https://doi.org/10.3390/app12157511
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/176244936
Abstract
In diagnostic histopathology, hematoxylin and eosin (H&E) staining is a critical process that highlights salient histological features. Staining results vary between laboratories regardless of the histopathological task, although the method does not change. This variance can impair the accuracy of algorithms and histopathologists' time-to-insight. Investigating this variance can help calibrate stain normalization tasks to reverse this negative potential. With machine learning, this study evaluated the staining variance between different laboratories on three tissue types. We received H&E-stained slides from 66 different laboratories. Each slide contained kidney, skin, and colon tissue samples stained by the method routinely used in each laboratory. The samples were digitized and summarized as red, green, and blue channel histograms. Dimensions were reduced using principal component analysis. The data projected by principal components were inserted into the k-means clustering algorithm and the k-nearest neighbors classifier with the laboratories as the target. The k-means silhouette index indicated that K = 2 clusters had the best separability in all tissue types. The supervised classification result showed laboratory effects and tissue-type bias. Both supervised and unsupervised approaches suggested that tissue type also affected inter-laboratory variance. We suggest tissue type to also be considered upon choosing the staining and color-normalization approach.
In diagnostic histopathology, hematoxylin and eosin (H&E) staining is a critical process that highlights salient histological features. Staining results vary between laboratories regardless of the histopathological task, although the method does not change. This variance can impair the accuracy of algorithms and histopathologists' time-to-insight. Investigating this variance can help calibrate stain normalization tasks to reverse this negative potential. With machine learning, this study evaluated the staining variance between different laboratories on three tissue types. We received H&E-stained slides from 66 different laboratories. Each slide contained kidney, skin, and colon tissue samples stained by the method routinely used in each laboratory. The samples were digitized and summarized as red, green, and blue channel histograms. Dimensions were reduced using principal component analysis. The data projected by principal components were inserted into the k-means clustering algorithm and the k-nearest neighbors classifier with the laboratories as the target. The k-means silhouette index indicated that K = 2 clusters had the best separability in all tissue types. The supervised classification result showed laboratory effects and tissue-type bias. Both supervised and unsupervised approaches suggested that tissue type also affected inter-laboratory variance. We suggest tissue type to also be considered upon choosing the staining and color-normalization approach.
Downloadable publication This is an electronic reprint of the original article. |