Researchers spend around 70% of their time managing data produced with state-of-the-art methodologies in software and hardware dedicated to observe/analyze small sample regions. This combination make geological studies particularly slow and biased as they sometimes need to evaluate complex rocks and it takes time to apply different perspectives. Further using various microscopic techniques do not solve the problem of lacking statistical data that backup the scientific outcome.
Microscopy has been a core component of Geology for several decades with a reputation for being domain-specific and semi-quantitative at the best. Workers traditionally obtain and describe snaps of region of interest (ROI, μm2 to mm2) of difficult management as data is also coming from various software platforms. A whole-slide microscopic description requires jumping over various ROIs, connecting observations and sometimes doing intellectual leapfrogging to link previous descriptions into a genetic interpretation. This process of mental combination of snaps and analytical data is not efficient and subject of human bias or protracted "Excel-sheet" examinations.
Hence, conventional workflows mostly end up producing a mountain of imaging, mapping and spot data that has no synergy and ends up stored in an offline database ("curated data") or unused in published papers ("dark data"). We need a platform that give access to a hierarchically organized database (in the back-end) including image metadata, annotations, shapes of ROI, and a image pyramid stack, while seamlesly displaying this information.
In addition, there is a great demand for automation, which is largely modifying a number of industries (airplane, automobile, media, etc.) and research fields (medicine, material sciences, etc.). This is a great opportunity to keep up with society needs of sustainable development and different industries stake holders (growing population, investors, employees, NGOs, etc.) with urging production schedules, predictability, and resource efficiency demands. Differently from academia, mining and oil companies can afford systematic drilling, sampling and preparation of slides (30-μm-thick rock slabs) that they can use from feasibility studies, though medium-, to long-term planning.
Increasing their analysis potential with a scientific approach can:
- Streamline the academia and industry approaches, resources, and observation scales.
- Strengthen confidence in sampling and analysis results.
- Benefit research institutions by thoroughly popularizing the science behind discoveries and principal investigators.
- Shorten the time for innovation (funding attractor).
- Ease young researchers participation in remote industry projects.
Whole-slide images, also known as virtual slides, are large and high-resolution images mostly used in Biomedicine and Digital Pathology. They routinely exceed RAM sizes, often occupying tens of gigabytes when uncompressed. However, when working with this type of raster images, it is necessary to display them at their native resolution, such that one pixel in the image occupies exactly one pixel on the screen (e.g.: Adobe Photoshop ‘100%’). So, the large images are converted to multi-resolution versions of themselves by a process called image pyramid, and only a small amount of image data might be needed at a particular resolution.
An image pyramid slices up one massive image into several layers, where each level has a different resolution and is subdivided into reasonably sized tiles, for example, by recursive Gaussian (or Laplacian) filtering and rescaling. The process sacrifices disk space for map rendering speed, storing resampled, lower-resolution versions of image files in subfolders alongside the original image. Thus, the ongoing burden of processing and fetching image data in memory is transfered to an up-front processing software.
However, there is a point of diminishing returns if you create too many levels, since pyramids take up additional disk space (plus 33% of original file size). Usually 4 to 5 levels are effective. As a corollary, it is highly desirable to be able to zoom in/out by a factor of 2. This gives enough levels for not needing to zoom where there is no pyramid image, while the field of view will not appear downsampled (squeezed) through for a low quality resizing algorithm optimized for speed. Next, the overviews can be stored in the header of the file for certain formats or as an external file format, a software can then load the necessary tiles for the current field of view. For example, Google Earth or ESRI Server has already generated the tile sets of satellite photos at various zoom levels and provides those to you. Any online latency problems can be minimized if the pyramid is locally saved on disk, as explained, for example, with the QuPath software below.
Image pyramids can be elaborated with low-level programming language image processing libraries such as LIBVIPS or GDAL (C++) that have bindings with common high-level programming languages. They are the engine of Image Processing tools of bigger software like QGIS and ArcGIS. There are additional software options and compatibility issues with generated pyramids.
Optical Microscopy with Polarized Light (OM PPL/XPL)
Polarized-light microscopy is used in either transmitted or reflected light for the quantitative or qualitative characterization and identification of optically anisotropic materials. A conventional light microscope can be converted into a polarizing microscope by placing two polarizing elements in the optical system. The first polarizer is located between the light source and the object plane and converts unpolarized light into plane or linearly polarized light. The second polarizer (the analyzer), which is usually rotatable, is positioned between the objective and the eyepiece.
Automatic photomicrographs acquisition has been the technological roof of this technique due to the great complexity and variation of optical properties of mineral species. For instance, colors under transmitted (e.g.: Michel-Levy Interference Color Chart) or reflected polarized light are sensible to light beam illumination (optical path), focusing (objective numerical aperture), thickness, anisotropy, twinning, composition, reflectivity, etc., and easily vary by rotating the microscope stage. Nonetheless, it has the highest potential for creating software for automatic mineral identification and segmentation that scale up studies to thousands of samples from hundreds of different projects. Computer Vision techniques could assist selecting grain center points for assisting image segmentation and also benefit grain-integrated spectral processing in other systems (Hrstka T., personal communication).
We can use Reflected Light (RL) modality for studying specimens that remain opaque even when ground to a thickness of 30 μm. In addition to mineral specific guidelines using RL with Plane Polarized Light (PPL) and Cross Polarized Light (XPL), experienced users know that:
For RL in PPL, we can work with any mineral. Birreflectance, a potentially quantitative property includes:
- Anisotropic minerals (also pleochroic)
- Submerging the objective top in oils
- Careful polishing to avoid producing relief that strongly affects other techniques that we might want to apply like EBSD (see: explanation, video 1, and video 2).
- Adequate illumination level (not excessive producing oversaturation)
For RL in XPL, we can only work with anisotropic minerals because isotropic minerals are extinguished (i.e.: pyrite, galena, sphalerite, etc.). Internal reflections, a qualitative property requires translucent minerals (Fe-sphalerite, proustite-pyrargyrite, etc.), where colors are diagnostic values.
Sensors are the hearth of digital cameras and work converting the incoming light into charge through charge accumulation, transfer, conversion to voltage, amplification and analog to digital (A/D) conversion. In a CMOS sensor, the charge to voltage conversion and voltage amplification is carried out in each pixel. So, each pixel has its own charge to voltage converter and voltage amplifier circuit. Such pixels are known as the active pixel sensor. Afterwards, pixel voltage reading is carried out line-by-line. The large quality advantage CCD over CMOS sensors has narrowed over time.
Scanning Electron Microscopy (SEM EDX/BSE)
Castaing’s equations (1951) were a major breakthrough in calculating sample compositions in SEM-WDS applying matrix correction coefficients (Z*A*F; atomic number, absorption and fluorescence). Numerous workers have calculated these coefficients theoretically (first principles) and experimentally (standards) for WDS (mostly EPMA) and EDS (mostly SEM) analyses in many studies (Heinrich, 1981; Goldstein et al., 2018). Nowadays, EDS detectors combined with field-electron gun (FEG) columns technology is the gold standard providing simultaneous characteristic X-ray acquisition and stable energy and beam calibration on experiments.
NIST DTSA-II (Ritchie, 2009; Ritchie, 2018), MCXRayLite, and Probe CalcZAF software can do spot spectra quantification and simulation tracking the primary electrons in a propagation volume considering experimental geometries. They have in their guts various matrix corrections, mass absorption coefficients, and physical model databases used for calculations. For example, DTSA-II can quantify (MLLSQ fitting to std, k-ratios/z-factors) and simulate (analytical or Monte Carlo model) spectra saving the track of primary electrons inside the analysed interaction volume.
Diverse physical models can be selected to calculate the random production of photons (characteristic X-rays) and electrons during beam propagation of the primary beam and depending on the sample elemental composition. The most important parameters are the mass absorption coefficient (µ/p) and the detector efficiency, which must be the same between analyses. For the calculation, composition and density are necessary for the calculation, so they are iteratively estimated or set by the user when doing MLLS (Multiple Linear Least Square) fitting of the unknown spectra and standards, while various associated uncertainties are also determined.
There are several limitations on doing quantitative analyses (standard-based or standardless) depending on the acquisition settings letting to obtain EDS maps with low spectra (e.g.: WSI). More specifically, AZtec software QuantMaps have +/- 25 wt.% outright accuracy, as has been demonstrated here. AZtec software algorithms iteratively uses the XPP phi-rho-z model and empirical k-ratios (‘remote standards’). It is purely ‘standardless’ because we do not know the ‘remote’ dose and detector solid angle/efficiency used when measuring the standards for constructing their ‘Factory stds database’.
In Burgess S. web article he quotes Pinard et al. (2019) conference paper. They have tapped the dose problem with a pure ‘Co’ std and the detector problem with a model after synchrotron EDS measurements in pure element/binary compounds stds, following Statham (2009, 2014). Their +/- 2% error comes after efficiency corrections for the detector model and measuring certified materials at 20 kV without light element calculations.
However, the current challenge is that for mapping purposes, we do not measure these ‘monolithic’ materials, we measure mineral assemblages and the remote stds will have matrix effects, making some of the phases semi-quantitative. Especially, phases with large ‘concentration ratios’ between minor/traces to major elements, and viceversa, like pyrite, sphalerite, galena, etc., which pixels end up with +/-25 % error. This issue might be avoided testing the approach of combining HyperSpy and NIST DTSA-II software (see Project 3).
Laser Ablation Inductively-coupled Plasma Mass Spectrometry with Time Of Flying (LA-ICP-MS-TOF) Mapping
Trace element studies can reveal complexities in compositional zoning and genetic processes that complement petrographic studies (Ginibre et al., 2007; Ubide & Kamber, 2018). This project will use the recent developments on LA-ICP-MS, including systems improvements (Muller et al., 2009; van Malderen et al., 2015; Sylvester and Jackson, 2016; Hendriks et al., 2017), LA mapping (Ulrich et al., 2009; Paul et al., 2012, 2014), data reduction schemes (Paton et al., 2011; Petrus and Kamber, 2012) and post-processing (Petrus et al., 2017) methodologies.
Following Petrus et al., 2017, LA mapping experiments consist in translating the sample stage under the laser to ablate a shallow series of parallel and adjacent grooves (0.3–3 μm) describing rectangular areas. These areas usually extend over a single or part of a mineral grain, scanned various times for increasing the elemental menu, while keeping adequate accuracies (balanced dwell times). A two-step mapping approach includes monitoring the sample and improving analytical settings before more detailed and longer data acquisition (Ubide et al., 2015). On the controlling PCs, the resulting MS and laser-log files are saved for processing with Iolite software v3.6.
Trace elements analysis with LA-ICP-MS (spot and mapping) experiments is relatively expensive (250 €/day; max. 4 thin sections) and technically demanding. For example, mantainance of a TOF mass spectrometer costs around 40K €/year compared to 10K €/year for a Quadrupole (Chew D., personal communication). In addition, the Iolite software that has been used here for data reduction of the time-resolved spectra, has a yearly license cost. It is now an independent software but it was previously (v3) a self-contained package in IgorPro (C++).
This project has implemented a new MatLab library for data management, parsing, image processing (registration) and analysis. MatLab is a high-level prototyping programming language (student license) that has great popularity for data science and research. The scripts are filling software gaps and guide typical analytical workflows from spot/mapping analysis and image data processing in offline and researcher PCs to organizating and generating report-like plots with statistical analysis in a few seconds.
Specifically, the scripts can be useful for (1 & 2) parsing experiment LA-ICP-MS metadata (from MS, laser log and Iolite files), (3) gathering map and spot data table exports (from Iolite) within a master table that can also be appended with the previously parsed XY spot coordinates, (4) exporting LA maps XYZ images from Iolite (IgorPro), (5) shaping image matrices, registering, plotting and saving LA map experiments, (6) doing quality control of SEM-EDS QuantMaps exports (from AZtec) from image matrices (ROI's underlying data), (7) massively converting 0-100 wt.% images matrices (from AZtec) into 64-float images (*.tif) in an organized folder directory, and (8) doing PCA of HCA statistical analysis flexibly either from image stacks (e.g.: QGIS interrogated pixels) or master tables.
Depending on the MSI system the scripts are intended to be applied progressively from (1) to (6). Although, as we know, this is rather applicable to single reduced experiment data (*.csv) from AZtec or Iolite software but we have approximated this with a QGIS routine. Thus, (6) results will be improved on future works by replacing QGIS image processing (importation, ‘georeferencing’, interrogation and exportation) with a more powerful product in MatLab itself. Greater detail in the conference paper.
Open-source tools are especially important in science due to their transparency and inherent ability for sharing and extensibility (Swedlow & Eliceiri, 2009). Open-source software is available in online repositories for version management like GitHub, where it can be inspected, improved, installed and reused. In fact, developers who chose to follow the open science paradigm require instrument-agnostic files that bring scientific communities together and facilitate using the correlation microscopy approach. For spectral processing solutions, browse for a longer explanation at Somnath et al. (2019).
Overall, image processing is normally less computationally expensive than spectral processing. Hence, it is not surprising that image processing solutions seem to have greater development levels (e.g.: see this Coding Party event, 2015). The large variety of software has been narrowed down to a focused group that deserves larger consideration, see below. In a nutshell, TrakEM2 (Cardona & Saalfeld) is used for image registration and stitching of biological tissue slides in ImageJ platform. QuPath and Cytomine are useful for visualization and machine-assited (AI) analysis of the slides. HyperSpy and NIST DTSA-II are mostly used in Material Science applications for S(T)EM spectral deconvolution, calculating and simulating sample compositions with diffent geometries and models. Although, all of this research-dedicated software can be adapted for a plethora of applications.
The first two platforms have Web-UI (user interfaces) and are tailored for computationally very expensive Deep Learning implementations in servers (i.e.: high-end PCs) or cluster infrastructure. The third is an offline solution that works well for expensive Machine Learning implementations in average personal computers with image analysis in ImageJ/MatLab (Bankhead et al., 2017). These three work with libraries for opening various virtual slides formats (OpenSlide, Bio-Formats, etc.), drawing and annotating objects (ROI), and performing Image Analysis with customizable Artificial Intelligence algorithms. Further, Cytomine (currently in v2) is looking forward to developing new automatic image registration and allows buying terabytes of storage to upload users’ data.
Image stitching plugins
The role of computer-based imaging is very important and helps pathologists in making a decision. Therefore, digital pathology is an intersection of pathology and computers that is capable of replacing the conventional microscope-based diagnosis in the near future. Technically, Digital Histopathology is the technique of analyzing high-resolution digitally scanned histology images, which may then take advantage of computational tools and algorithms (Puri et al., 2016). See this video from the HistomicsTK (Kitware) algorithm library for understanding the workflow of Biomedical Microscopy studies.
In the first step, the whole histology glass slide is scanned with the help of a high-resolution image scanner.Whole-slide imaging systems scan the slides at 20x, 40x, and even 100x with high precision. This image information is then shared with a distant pathologist using a high-speed Internet connection. Consequently, the remote second opinion has already saved time, cost, and physical transportation of slides and prevented slide damage.
Astronomy is a natural science that studies celestial objects and phenomena. It uses mathematics, physics, and chemistry in order to explain their origin and evolution. Researchers, helped by amateur collectors and observers, sometimes have access to fossil meteorites and micrometeorites found in sediments, terrestrial impact craters (large bolide impacts), and space mission samples describing presolar grains (cosmic dust). They have to analyse them to understand the parent stars and history of our Galaxy, and the delivery history of extraterrestrial matter to Earth. Thus, observations are coupled with geochemical fingerprinting that traces geological (i.e.: planetary) processes that left chemical and isotopic patterns in the rock record.
Differences in composition can be slight, thus we require analytical data of very high precision and accuracy. Following Kamber (2009), it is not surprising that the advancement of geochemical fingerprinting occurred alongside progress in geochemical analysis techniques. The drivers of these advances include the Apollo lunar sample return program that demanded incrementing the quality of geochemical data and pushed towards minimizing required sample volumes.
The central theory behind materials science involves relating the microstructure of a material to its macromolecular physical and chemical properties. This interdisciplinary field is a syncretic discipline hybridizing metallurgy, ceramics, solid-state physics, and chemistry. The most pressing scientific problems humans currently face are due to the limits of available materials and how they are used. Thus, breakthroughts in material design and discovery are likely to affect the future of technology significantly.
Industrial applications of materials science include materials design, cost-benefit tradeoffs in industrial production of materials, processing methods (casting, rolling, welding, ion implantation, crystal growth, thin-film deposition, sintering, glassblowing, etc.), and analytic methods (characterization methods such as electron microscopy, X-ray diffraction, calorimetry, nuclear microscopy (HEFIB), Rutherford backscattering, neutron diffraction, small-angle X-ray scattering (SAXS), etc.). Besides material characterization, the material scientist or engineer also deals with extracting materials and converting them into useful forms.