Skip to main content

Deriving image features for autonomous classification from time-series recurrence plots

Summary

This paper shows the use of a specific type of time series analyses, the so named recurrence plot (RP), for investigations of the outer hull of an imaged and pre-segmented object to derive image features suitable for usage in classificators. Additionally to the features derived by the well documented recurrence quantification analysis (RQA) a new set of features was developed based on closed structures (“eyes”) in a RP. The new features were named eye structure quantification (ESQ). Two sets of images are analysed: a) 1023 in-situ plankton images comprising nine different organism classes, and b) each 50 algorithmically created geometric shapes of five different classes. These images were characterised by standard image features, RQA quantification and the newly proposed features. A Linear Discriminant Analysis (LDA) was used to determine discriminative success between the classes of plankton organisms or geometric shapes respectively. The discriminative success was compared between a model using standard features and additional RQA and ESQ. For the high intra- and low interclass variance of the plankton contour line data set the included features enhanced discriminative success by 3 % to a maximum of 65.8 %. For the data set of geometric shapes an increase of 6.8 % to 95.2 % was observed. Although the overall increase of discriminative success was not extraordinary high by using a linear model, it can be seen that both RQA and ESQ are valuable auxiliary features to split specific classes from the entire population. Thus, they may also be valuable for methods mapping the finite dimensional feature space into higher dimensional spaces (e.g. Kernel trick, Support Vector Machines).

Background

Time series are sequences of metered values. Such readings generally have a natural chronology, are non-circular and exhibit a defined start and end for the recorded time interval. Typical examples of time series are e.g. tidal signals, meteorological observations, stock exchange quotations or cardiograms. Tools for the investigation of time series include a large portfolio of forecasting, estimating or classifying methods and the identification of dependencies, harmonic anomalies or recurrences.

Especially the identification of recurrences allows identifying whether the current state of a dynamic system retraces prior observed states. Eckmann et al. [1] introduced a visual method to investigate such recurrences. The respective tool is the recurrence plot (RP). It uses the time delay embedding theorem (DET, [2]) to display previously encountered states in a phase space. Advantageously RPs using DET not only identify parity situations but also approximations to the compared template structure with given precision. Thus, a RP identifies sections of phase space trajectories that converge. The recurrence quantification analysis (RQA) comprises a set of heuristically developed methods to derive numerical characterisations of the complexity of a RP and its small-scale features (e.g. [35]). Here we first investigate the use of RP and RQA for automated image discrimination and apply it to the very different field of marine plankton data.

For a wide range of marine investigations it is important to chart distribution, abundance and diversity of major plankton groups and suspended material. Traditional methods include sampling the water column by nets of fine gauze and defined mouth opening. Skilled taxonomists determine and enumerate biota from aliquots under stereo-microscopes. The human eye easily gives a first taxonomic impression based on shape and habitus of an organism. Manual arrangement to best see specific morphological characteristics (e.g. bristles, setae or body appendages) further allows a more precise taxonomic identification. Even if an object cannot be determined to species level a superordinate taxonomic group membership can be assigned; often sufficient for the scientific question at hand.

During the last decades various in-situ plankton imaging systems were developed. Today most of these devices are capable to sufficiently image tiny organisms or particles for detailed analyses (e.g. [6]). Although accurate species identification often fails, the major taxonomic group membership can generally be determined. Thus, these approaches add new opportunities to net samplings and have proven to be valuable tools (e.g. [7]). By this, they can partially substitute labour and cost-intensive net analyses and continuously map fine-scale distributions of dispersed objects in the water column.

However, the sheer amount of images and data face researchers with new challenges. In contrast to net samplings in situ systems deliver two dimensional still images which represent information of incident light scattered from imaged objects at arbitrary angles and spatial alignment. Although alignment can be partially controlled by fluidic design of the sampling chambers object appearances are still highly variable (e.g. clinging or abducted antenna and body appendages). To fully utilise the advantages of in-situ plankton imaging systems requires sophisticated machine vision approaches aiding researchers to handle the flood of information. For this, automatic image feature extraction and classification are required that are capable to assign major group memberships in a comparable way as a human taxonomist would.

A variety of algorithms are available to extract numerical features from 2D images and their silhouettes. Standard methods are moments derived from pattern intensity variations, colour information and geometric parameters, like roundness, compactness or elliptical shape equivalents. More sophisticated methods investigate contour lines by Fourier descriptors (e.g. [8]), characteristic inflexions (e.g. [9]) or identification of points of interest in scale space (e.g. [10, 11]).

Although, such features are generally invariant to scale, rotation and translation downstream classification systems often lack high discriminatory power for plankton specimens (e.g. [12]). An important factor is the multivariate high intra-class heteroscedasticity. This high variability is a general challenge when compiling feature sets considering contour lines of plankton species. Depending on illumination, resolution, contrast and orientation the outer contour and tissues appear highly variable. This arises from on-site illumination variations and flexibility and agility of body parts and appendages. Thus, predictability of the contour line’s curve progression is comparable with dynamic systems.

Here we present an approach to apply the recurrence plot method on circular contour line data by using a modified embedding, where the contour line data are augmented by recycled elements. The resulting RP is the basis to get a first glimpse about usefulness of RQA scalars as features for automated classification systems. For comparison we used two different image sets. The first set is composed of geometric forms, while the second is compiled from images of plankton specimens and marine snow taken under arbitrary angles and showing high morphological variability.

Methods

Images

Geometric shapes

Two sets of images were used. The first is a generic set of algorithmically created geometric shapes. This data set includes 50 shapes each out of five classes: circles, ellipses, squares, rectangles and triangles (Appendix A: Fig. 4). To minimise the impact of the contour line length the shapes where chosen to have a comparable intra-group perimeter (mean 140.57, SD 1.13).

Plankton images

The second image set contains 1023 images out of 21 groups (Table 1). These 21 groups can finally be superordinated into 9 higher classes. They present mainly taxonomic or morphological plankton groups and marine snow. This data set is published and freely accessible via the Pangaea data publisher system [13].

Table 1 Taxonomic class sizes used in the analyses

Images were sampled with the Lightframe On-sight Keyspecies Investigation (LOKI) system [6]. The advantage of the LOKI sampling design is the high contrast imaging of minute objects at high magnifications (here ~15 μm per pixel) at very short shutter times (<30 μs) in a physically constrained volume, being transparent before and behind the depth of field. Thus, the system delivers bright and detailed images of taxons that are often destroyed during traditional net samplings. Images were manually classified by declared experts of the respective plankton taxon. The images were taken from a larger subset sampled during an earlier expedition off the coast of Peru (rf. [14]) and represent major plankton classes of the on-site community.

Standard image features

The 8 image features, hereafter referred to as STANDARD (Table 2), were extracted by using the MATLAB function ‘regionprops’ and ‘graycoprops’. For more detailed information see MATLAB documentation. Area: Number of pixels within the object’s contour line. Compactness: Quantified by the inverse Patton Shape Index [15], which compares the perimeter of the shape to the perimeter of a standard shape. An index of 1 equals a perfect circle. Contrast: Intensity contrast between neighbouring pixels (zero for constant images). Eccentricity: Eccentricity of an ellipse corresponding best to the object shape. Hu Moments: The seven moment invariants of the object [16], calculated using a script by Gonzalez et al. [17]. In the following the first Hu-moment was used only, as higher moments sometimes caused collinearities in the following analyses. Homogeneity: Closeness of the distribution of elements in the normalised grey-level co-occurrence matrix to its diagonal (one for diagonal matrix). Perimeter: The perimeter of the organism’s shape in pixels. Solidity: Quotient of the number of pixels within the object contour line and the number of pixels in the respective convex hull.

Table 2 Categories of numerical features extracted for each image

Contour line extraction and measurement

For each imaged object the coordinates of the mass centroid is calculated. Additionally, the finite contour outline of the organism is determined. The contour line is a list of length l giving the coordinates of the points at the organism’s outer boundary (Fig. 1a). From the centroid the distance to each point with index i of the contour line is calculated clockwise according to a pre-defined norm. In the following the Euclidean norm was used. Values are tabulated in a list u (Eq. 1) and normalised to 1:

Fig. 1
figure 1

Schematic workflow. a) Extraction of the outer contour line of the object (red line). The cyan dot indicates the mass centroid. b) For each point of the red contour line the distance to the centroid is measured according to a predefined norm and normalised. The greatest distance is stored as first element in a list u (1). All other distances u (i) are enumerated clockwise from this starting point (blue line). The red line is the distance list augmented by (m-1)*t elements, recycling the beginning of u. Parameters m and t are given by the subsequent embedding. c) From list u (i) a set of m dimensional vectors is derived, each having m elements of u with an equidistant spacing of t. The chronology of u (i) is embedded in v (i). d). A phase space trajectory in m dimensional space can be constructed from v (here shown for an example with m = 3). Numbers attached to some points of the phase space trajectory refer to index i of the original contour line. e) For each point i of the phase space trajectory the distance to any other point j is measured and tabulated. This can be plotted as a colour heat map. f) In a later step it is checked, whether the respective distance is greater than a given threshold ε (Heaviside operator). The result is tabulated as a square, symmetric and binary matrix, the recurrence plot (RP). White dots indicate that the distance between v (i) and v (j) is greater than ε. On the main diagonal points are compared against themselves. Thus, the distance is always zero. From the RP a number of numerical features are derived in the subsequent recurrence quantification analysis (RQA, refer to the text). g) The enclosed white coherent areas within a RP have been termed “eyes”. Due to the circular data structure and above mentioned augmentation the truncated eyes along the borders need to be interpreted as connected structures on the opposite sides of the plot. This is displayed by matching colours of associated eyes. This plot serves as a basis for the eye structure quantification (ESQ, see text)

$$ u(i);0<i\le l $$
(1)

The basis for the recurrence quantification analyses thus is a list of distances u, from each contour line point to the centroid. The list is shifted in a way that the first index u (1) represents the maximum distance found; increasing indices clockwise enumerate the subsequent distances (Fig. 1b).

Embedding

Using the embedding theorem [2] a phase space trajectory in dimension m with m > 1 is created from u. Therefore m values from u are used to create a new vector v of dimension m representing the points of the phase space trajectory. Values used from u are chosen to have equidistant spacing t. As mentioned before the contour line data, in contrast to a time series, represents a circular structure. Therefore the first (m-1)*t elements of u need to be recycled and added to the end of the list u. The length of u becomes l + (m-1)*t. In case of (m-1)*t > l the elements of u need to be re-recycled. This results in a set of vectors v defining the points of the phase space trajectory (Equation 2):

$$ v(i)=\left(\begin{array}{l}u(i)\\ {}u\left(i+t\right)\\ {}u\left(i+\left(m-1\right)*t\right)\end{array}\right);0<i\le l $$
(2)

Dimension m and time delay t have to be chosen properly prior to analysis. To investigate their impact several tests were performed beforehand for 1 < m ≤ 10 and 1 ≤ t ≤ 10. For the examples given in this paper m = 6; t = 6; ε = 3.0 was used. Sample plots for various parameter combinations are given in Appendix B: Figs. 5, 6 and 7.

RP - Recurrence plot

For each investigated object a matrix R is calculated from the phase space trajectory (Fig. 1d). For each element R (i,j) the norm ǁ · ǁ between the vectors v (i) and v (j) is calculated (Eq. 3). For the results presented in this paper the Euclidean norm was chosen. Finally, R is a l × l square and symmetric matrix that can be displayed as a false colour heat map representing the distances between all points of the phase space trajectory according to the used norm (Fig. 1). For downstream processing the Heaviside step function Θ (·) is applied to identify those distances of phase space trajectory points that fall below a predefined minimum value ε. Thus, the definition of the recurrence plot becomes a matrix of binary values given by:

$$ R\left(i,j\right)=\varTheta \left(\varepsilon -\parallel v(i)-v(j)\parallel \right);1\le i\le l\; and\;1\le j\le l $$
(3)

Consequently, the main diagonal of such a recurrence plot represents the distance of a point to itself and is therefore 0. Once the Heaviside step function was applied all off-diagonal non-zero entries of R indicate phase space approximations smaller than ε having a distance on the contour line of ǁi-jǁ.

Side diagonals parallel to the main diagonal indicate that structures of the contour line are similar in phase space. The length of the similarity structure is equivalent to the length along the axis, with the latter given distance on the contour line. Among diagonal structures coherent areas exceeding ε (name “eyes”) can be found (Fig. 1e-f). These patterns within a RP represent major characteristics and are investigated in detail numerically.

RQA - Recurrence quantification analysis

Parameters of the Recurrence Quantification Analysis (RQA, Table 2) were obtained using the Cross Recurrence Plot Toolbox [5, 18]. Values transferred in the function call are the embedding vectors v (i), dimension m, time delay t, size of neighbourhood ε and norm to be used (Euclidean). A total of 13 features were extracted from each RP (Table 2). Details are given in [35, 19] or [20]: Clustering coefficient gives the degree to which points of the phase space trajectory tend to cluster. Determinism gives the proportion of recurrent points forming diagonals. Entropy diagonal length gives the Shannon entropy of the probability distribution of the lengths of the diagonals. Laminarity gives the amount of recurrence points forming vertical structures. Longest diagonal length gives the counted length of the longest diagonal. Longest vertical length gives the counted length of the longest vertical. Mean diagonal length gives the average length of the diagonal structures. Recurrence period density gives the periodicity of the signal in the RP. Recurrence rate gives the density of observed recurrence points in the RP. Recurrence times give an estimation of the periodicity in the RP signal. Transitivity gives the probability that two points of the phase space trajectory neighbouring a third are also directly connected. Trapping time gives the average length of the vertical structures.

ESQ - Eye structure quantification

From the recurrence plot matrix R additional features were derived. In the Eye Structure Quantification (ESQ) distribution and size of enclosed structures, the so-called ‘eyes’, were measured. Due to the circular structure of an organism’s contour line opposite sides of the RP need to be interpreted as connected structures. Thus, eyes truncated at the borders of R have to be associated with their counterpart on the opposite side prior to evaluation (Fig. 1g). After identification of associated eyes, the total number of eyes, mean number of pixels per eye (e.g. mean eye size), the median of the numbers of pixels per eye, and total number of pixels in all eyes were determined. Increasing eye numbers generally indicate, that a high number of independent features recurrences in phase space are found. These are often associated with repetitive morphological structures of the object, like polychaete parapodia, silica spicules or regular diatom frustule indentations.

LDA - Linear discriminant analysis

A Linear Discriminant Analysis (LDA, [2123]) was used as classificator. The LDA model was built with the training data set (geometric shapes or plankton images) and tested against itself to investigate the role of the included features. An individual LDA was run for each of the 4 feature combinations (Table 3) and both image sets. LDA results evaluated in this paper are:

Table 3 Feature combinations used for the different LDA models
  • Coefficients of linear discriminant roots. These values represent the loadings and thus, importance of the individual features during discrimination.

  • Proportions of trace. These values give the variance explained by the respective root. As explained variance decreases with each successive root we give just the first roots in this paper; although for some LDA’s more roots could be given (number of roots equals number of objects or number of included features minus one; whatever is lower).

  • Confusion matrices. They show the rate of true positive and false positive classifications.

  • From the coefficients of linear discriminants, the most important features were identified that best separate objects by the respective root. A feature was considered to be important when it’s loading reached at least 10 % of the maximum feature loading on either side of a root’s spanned hyperplane.

  • Canonical scores. The scores of the individual objects were plotted to visualise the discriminative success among object classes for the respective roots.

Computational work

Image processing and feature extraction (RQA, ESQ) were performed in Matlab (MathWorks, 2013, v8.1.0.604). The LDA models were implemented in R (www.r-project.org), using the additional package MASS.

Results

LDA - Linear Discriminant Analysis

Geometric shapes

Standard

First LDA included the STANDARD parameters. The first discriminant root (LD1) explained 72.58 % of the observed variance, while the second (LD2) explained additional 26.02 % (Table 4). LD3 and LD4 are of less importance, as their cumulative impact is less than 1.5 %. It is obvious, that parameters like Area and Contrast have least impact for discriminating geometric structures. The confusion matrix shows that 88.4 % of the geometric shapes were classified correctly (Table 5). In the canonical plot (Fig. 2a) rectangles and circles show a clear clustering tendency, while other geometric shape categories show much higher dispersal.

Table 4 LDA Geometric shapes
Table 5 LDA Geometric shapes
Fig. 2
figure 2

LDA Geometric shapes. Canonical plot of the linear discriminants. The parameters used for the analyses were a) STANDARD, b) RQA, c) STANDARD & RQA and d) STANDARD & RQA & ESQ

RQA

The second LDA included the RQA toolbox parameters, where LD1 explains 73.42 % of the observed variance and LD2 explains 14.13 % (Table 6). The cumulative explanatory power of LD3 and LD4 still comprises approximately 12.5 %. The confusion matrix shows a total discrimination success of 83.6 % (Table 5). It can be seen in the canonical plot, that rectangles and circles separate from other categories (Fig. 2b) but inter-class discrimination is lower compared to STANDARD. The three other classes separate well, but show a higher dispersal on both roots.

Table 6 LDA Geometric shapes

Standard & RQA

The third LDA included both the STANDARD and RQA parameters. LD1 explains 59.82 % of the observed variance, while LD2 contributes with a value as high as 31.63 % (Table 7). Again LD3 and LD4 have neglectable explanatory power. The confusion matrix of the model shows a discrimination success of 95.6 % (Table 5). In the canonical plot the classes show a well discriminable clustering (Fig. 2c).

Table 7 LDA Geometric shapes

Standard, RQA & ESQ

The fourth LDA included the STANDARD, the RQA and the newly developed parameters ESQ. Again the first two roots show highest proportions of trace (Table 8), with LD1 explaining 62.32 % of the observed variance and LD2 explaining 29.84 %. The confusion matrix shows a discrimination success of 95.2 %. In the canonical plot the classes again show a well discriminable clustering (Fig. 2d). Although some minor differences are observable, the result is comparable to the latter STANDARD & RQA setup.

Table 8 LDA Geometric shapes

Plankton images

Standard

The first discriminant root (LD1) explained 51.93 % of the observed variance, while the second (LD2) contributed with 28.97 % (Table 9). Cumulated proportions of trace of LD3 and LD4 explain less than 16 %. The confusion matrix (Table 10) shows a total discrimination success of 62.8 %. The canonical plot (Fig. 3a) reveals good separation between some classes. Dinoflagellata and Bacillariophyceae separate well from Appendicularia, Vertebrates and Cnidarians. The majority of Crustacea, Annelida, and Mollusca overlap largely with Marine snow.

Table 9 LDA Plankton images
Table 10 LDA Plankton images
Fig. 3
figure 3

Plankton images. Canonical plot of the linear discriminants. a) STANDARD, b) RQA, c) STANDARD & RQA, and d) STANDARD & RQA & ESQ

RQA

The first discriminant root (LD1) explained 60.66 % of the observed variance, while the second (LD2) contributed with an additional 19.77 % (Table 11). Cumulated LD3 and LD4 contributed with less than 16 %. The confusion matrix (Table 10) shows a total discrimination success of 55.0 %. As in the previous LDA, centroids of Bacillariophyceae and Dinoflagellata separate from the majority of objects in the canonical plot (Fig. 3b). The same is observed for Appendicularia and Vertebrata, although separation on LD2 is more pronounced, than in the previous plot on LD1.

Table 11 LDA Plankton images

Standard, RQA

The first discriminant root (LD1) explained 41.58 % of the observed variance, while the second root (LD2) contributed with 31.35 % (Table 12). The cumulative explanatory power of LD3 and LD4 was 20.78 %. The confusion matrix shows a total discriminative success of 66.1 %. In the canonical plot (Fig. 3c) it can be seen that centroids of the formerly identified classes (Bacillariophyceae, Dinoflagellata, Appendicularia and Vertebrata) again separate but are now more spread out in the LD1/LD2 plane, allowing better discrimination.

Table 12 LDA Plankton images

Standard, RQA, ESQ

The first discriminant root (LD1) explained 40.15 % of the observed variance, while the second (LD2) explained 32.18 % (Table 13). Roots LD3 and LD4 contributed with a cumulative observed variance of 20.88 %. In the confusion matrix the total discriminative success is found to be 65.8 % (Table 10). The canonical plot (Fig. 3d) is comparable to the previous LDA (STANDARD & RQA), but shows a slight shift of Vertebrata, better separating from the remaining classes.

Table 13 LDA Plankton images

Importance of the image features

Geometric shapes

The features with highest loadings for LDA image classification of the geometric shapes are listed in Table 14. The most frequently occurring features are Transitivity, Determinism, Hu1, Laminarity, Recurrence rate. Less frequent are Compact, Clustering coefficient, Solidity, while Eccentricity, Homogeneity, Recurrence time 1 and Recurrence period density rarely contribute with high loadings to the LDA.

Table 14 LDA Geometric shapes

Plankton Images

As observed for the set of geometric shapes, a few key features can be identified, which contribute frequently with high loadings to the LDA (Table 15). For the plankton images these are Laminarity, Hu1 and Determinism, followed by Homogeneity and Transitivity. Compactness, Clustering coefficient, Recurrence rate, Eccentricity, Entropy diagonal length and Recurrence period density were observed seldomly.

Table 15 LDA Plankton images

Discussion

Method

The first principle task of this study was to apply the well-established methods of Recurrence Plots (RP) and Recurrence Quantification Analysis (RQA) in the new context of circular contour line data of an imaged object’s outer hull. To set up the circular contour line data for the proposed methods, each point of the contour line was enumerated and its distance to an arbitrary point was calculated. This arbitrary reference point was static. In contrast to traditional RP and RQA investigations, we augmented the contour line distance data during the embedding process. Thus, the distance data are recycled to allow creating a number of embedding vectors equal to the number of contour lines points. By this, opposite sides of the RP wrap up. This allowed the introduction of the eye structure quantification (ESQ).

Image discrimination

The second principle task was to perform an initial test of these methods on both real life plankton data of high contour line variability and a synthetic sample data with similar intra-class structure and symmetry.

The multivariate analyses revealed that neither RQA nor RQA & ESQ are well suited as exclusive features for the classification task at hand. Nevertheless, used in combination with the STANDARD features, they increased discrimination success.

An important feature of the STANDARD feature class was the HU1-moment, which is scale and transformation invariant. Therefore it is able to describe the characteristic shape of an organism irrespective to camera rotation in plane view or magnification. One of the key features of the RQA was the Recurrence rate, which simply gives the density of observed recurrences indicating the degree to which the organism’s contour line exhibits repetitions of similar structures (e.g. polychaete parapodia). It is thus a measure of the structural regularity of the organism. The key RQA features Laminarity and Determinism focus on the vertical and diagonal structures. These two features have been shown before to be some of the most characteristic properties of an RP (details on RQA and how to read an RP can be found in [20]). They characterise diagonals and vertical lines and thus, length and type of contour line segment similarity. The key feature Transitivity further gives a probability on the phase space neighbourhood situation.

As successive roots explain less of the observed variance, the general discrimination success is often identified by plotting the scores of the first roots (Figs. 2 and 3). Within the plots better clustering of objects of the same class and better separation among classes mean a higher discrimination success. The ability to discriminate between classes of similar shape structure can be improved by using RQA parameters. There is also an indication, that the use of ESQ can further improve discrimination between objects of different size classes and regularity (e.g. Appendicularians and Vertebrata vs. Crustaceans), but does not improve general classification. However, these improvements can be used to separate at least 1–2 classes from the entire population. After excluding identified classes a downstream model with less classes allows improving discrimination during the next iterations.

Study design

This study is a first conceptual approach to introduce and test the general usability of RQA and ESQ feature sets for image classification. In the overview presented here, some pre-tests and verifications (e.g. [24]) have been intentionally neglected and the approach was directly applied to a highly diverse plankton set. Criticisms may include, that objects were analysed by using a ‘one-fits-all’ embedding approach and analysed diagonal/vertical line length histograms for several features included lengths as low as 2 recurrence points. Nevertheless, it was found that classificatory systems can benefit from the use of RQA features. Thus, this paper primarily sketches out the method and gives first examples how to use it. We assume that the ESQ features gain higher importance with decreasing neighbourhood threshold ε. Thus, future work needs to focus on avoidance of potential problems and consideration of specific adaptations. In detail it seems appropriate to use recurrence analyses with RQA and ESQ specifically in tailored models, to first split distinct classes from the image population. In succeeding steps then better customised RQA and ESQ with adjusted values for m, t and e can be used.

It is also obvious that some of the included features, especially those that characterise textural properties, are barely sufficient for proper discrimination of the geometrical line art shapes. Respectively the parameter Contrast showed negligible loadings (Table 4, Tables 6 and 7). However, to date these features are important in automated plankton discrimination and often appear to be among the most important ones in plankton discrimination [12].

It is also clear that Linear Discriminant Analysis is not the most powerful classificatory system available for such multivariate data. As an LDA tries to insert separating hyperplanes in a dimensional space that is defined by the number of given variables, linear classifications often fail. Especially for low inter- and high intra-class variances, as generally expected for in-situ plankton images, it is recommended to apply methods for mapping input features into higher dimensional spaces, using the kernel trick (e.g. Support Vector Machines). However, the advantage of a LDA is the simple access and interpretation of the feature loadings and thus an initial assessment of the importance of the different variables.

Conclusions

It could be shown, that the principle of recurrence plots and subsequent analyses can be applied to contour line data of imaged and pre-segmented objects. The tailored embedding algorithm enabled our application to derive new image features for automated classification systems of plankton organisms. Additionally, a new set of features was derived by measurement of contiguous elements of given phase space dissimilarity (eye-structures in the recurrence plots).

The discriminative success of the LDA was enhanced by using a combination of standard image features, recurrence quantification analysis features and the newly proposed eye-size features. This improvement was observed both for the synthetic data set of geometric and the real-world phytoplankton images. The characterization of images by recurrence quantification analysis and eye structure quantification offers auxiliary image features that could not be derived by applying standard image features alone. We recommend the use of the standard features in combination with the features derived from the application of recurrence analysis to discriminate between classes of phytoplankton. With further improvements the class of such methods may further improve automated plankton identification, which represents an important step forward in the effective processing of large numbers of under-water images and autonomous monitoring stations.

References

  1. Eckmann, J.-P., Kamphorst, S.O., Ruelle, D.: Recurrence Plots of Dynamical Systems. Europhys. Lett. 4, 973–977 (1987)

    Article  ADS  Google Scholar 

  2. F. Takens, “Detecting strange attractors in turbulence” in Lecture Notes in Mathematics, David Rand, and Lai-Sang Young, ed., 366–381 (Springer, Warwick, 1981). doi://10-1007-BFb0091924.

  3. J. P. Zbilut and C. L. Webber, “Embeddings and delays as derived from quantification of recurrence plots” Physics Letters A 171, 199–203 (1992). doi://10.1016/0375-9601(92)90426-M.

  4. Webber, C.L., Zbilut, J.P.: Dynamical assessment of physiological systems and states using recurrence plot strategies”. J. Appl. Physiol. 76, 965–973 (1994)

    Google Scholar 

  5. N. Marwan, N. Wessel, U. Meyerfeldt, A. Schirdewan, and J. Kurths “Recurrence-plot-based measures of complexity and their application to heart-rate-variability data” Physical Review E 66, 026702 (2002). doi://10.1103/PhysRevE.66.026702.

  6. Schulz, J., Barz, K., Ayon, P., Lüdtke, A., Zielinski, O., Mengedoht, D., Hirche, H.-J.: Imaging of plankton specimens with the Lightframe On-sight Keyspecies Investigation (LOKI) system”. J. Eur. Opt. Soc. Rapid Publ 5, 10017s (2010)

    Article  Google Scholar 

  7. MacLeod, N., Benfield, M.C., Culverhouse, P.F.: Time to automate identification. Nature 467, 154–155 (2010)

    Article  ADS  Google Scholar 

  8. Persoon, E., Fu, K.S.: Shape discrimination using Fourier descriptors. IEEE Trans. Syst. Man Cybern. 7, 170–179 (1977)

    Article  MathSciNet  Google Scholar 

  9. Mokhtarian, F.: Silhouette-Based Isolated Object Recognition through Curvature Scale Space. IEEE Trans. Pattern Anal. Mach. Intell. 17, 539–544 (1995)

    Article  Google Scholar 

  10. D.G. Lowe “Object recognition from local scale-invariant features” Proceedings of the International Conference on Computer Vision 2, 11501157 (1999). doi://10.1109/ICCV.1999.790410.

  11. Bay, H., Tuytelaars, T., Van Gool, L.: SURF – Speeded Up Robust Features. In: Leonardis, A., Bischof, H., Axel, P. (eds.) Computer Vision – ECCV 2006, pp. 404–417. Springer Verlag, Berlin Heidelberg (2006)

    Chapter  Google Scholar 

  12. Hu, Q.: “Application of statistical learning theory to plankton image analysis” PhD Thesis Massachusetts Institute of Technology and Woods Hole Oceanographic Institution, Supervisors: Cabell S. Davis and Hanumant Singh. (2006)

    Book  Google Scholar 

  13. J. Schulz, K. Barz, P. Ayon, and H.-J. Hirche “A sample data set of plankton and particles for automated image classification systems sampled off the Peruvian coast” Pangaea Data Publisher System for Earth & Environmental Science, Registration in progress.

  14. H.-J. Hirche, K. Barz, P. Ayón, and J. Schulz “High resolution vertical distribution of the copepod Calanus chilensis in relation to the shallow oxygen minimum zone off northern Peru using LOKI, a new plankton imaging system” Deep Sea Research Part I 88, 63–73 (2014). doi://10.1016/j.dsr.2014.03.00.

  15. Patton, D.R.: A Diversity Index for Quantifying Habitat Edge. Wildl. Soc. Bull. 3, 171–173 (1975)

    Google Scholar 

  16. Hu, M.K.: Visual Pattern Recognition by Moment Invariants. IRE Trans. Inf. Theory IT-8, 179–187 (1962)

    MATH  Google Scholar 

  17. Gonzalez, R.C., Woods, R.E., Eddins, S.L.: “Script: invmoments” Digital Image Processing Using MATLAB, Revision: 1.5, Date: 2003/11/21 14:39:19, Prentice-Hall. (2004)

    Google Scholar 

  18. Marwan, N. “Cross Recurrence Plot Toolbox for Matlab, Reference Manual. Version 5.17, Release 28.16” http://tocsy.pikpotsdam.de/CRPtoolbox/.

  19. J. Gao, and H. Cai, “On the structures and quantification of recurrence plots”. Physical. Letters, A 270, 75–87, doi://10.1016/S0375-9601(00)00304-2

  20. Webber, C.L., Marwan, N.: Recurrence Quantification Analysis (Springer International Publishing. (2015)

    Google Scholar 

  21. Fisher, R.A.: The utilization of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)

    Article  Google Scholar 

  22. Jennrich, R.I.: Stepwise regression. In: Enslein, K., Ralston, A., Wilf, H.S. (eds.) Statistical Methods for Digital Computers. Wiley, New York (1977)

    Google Scholar 

  23. Jennrich, R.I.: Stepwise discriminant analysis”. In: Enslein, K., Ralston, A., Wilf, H.S. (eds.) Statistical Methods for Digital Computers. Wiley, New York (1977)

    Google Scholar 

  24. Marwan, N.: How to avoid potential pitfalls in recurrence plot based data analysis. Int. J. Bifurcation Chaos 21, 1003–1017 (2011)

    Article  ADS  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Schulz.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JS: Idea, preparation of the manuscript, general programming and statistical evaluation in R. AM: Code implementaion in Matlab and R, general analysis, preparation of data sets and general contribution to the manuscript. OZ: General contribution to the manuscript. All authors read and approved the final manuscript.

Appendices

Appendix A

Fig 4
figure 4

The set of geometric forms used for the analyses

Appendix B

In the following some recurrence plot panels are shown for different values of m, t and ε. To the left one sample organism is shown for each of the 9 taxonomic/morphologic classes (Appendicularia, Annelida, Bacillariophyceae, Cnidaria, Crustacea, Dinoflagellata, Marine snow, Mollusca, and Vertebrata). The red line marks the extracted organism’s contour line. The cyan dot inside the imaged object area indicates the object’s centroid. The yellow star displays the first entry of the contour line, which is the contour line point with the highest distance to the centroid. The following columns show the recurrence plots for varying recurrence parameters.

Fig. 5
figure 5

Recurrence plots for m = 6, t = 6 and varying ε

Fig. 6
figure 6

Recurrence plots for t = 6, ε = 3 and varying m

Fig. 7
figure 7

Recurrence plots for m = 6, ε = 3 and varying t

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schulz, J., Mentges, A. & Zielinski, O. Deriving image features for autonomous classification from time-series recurrence plots. J. Eur. Opt. Soc.-Rapid Publ. 12, 5 (2016). https://doi.org/10.1186/s41476-016-0003-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s41476-016-0003-y

Keywords