1 Introduction

In prehistoric days, humans began to communicate with one another using the concept of cave paintings, petro glyphs, pictograms, ideogram, etc., where images were the basic foundation of the communications. With the advancement of handheld technology, the amount of captured images is innumerable. Moreover, these multimedia data are easily uploaded and shared in internet media. Internet statistics[1] says that more than 2 billion images were uploaded on the web in 2012, Facebook had over 300 million images hosted in the late 2012 and more than 58 million photos were uploaded to the web every second. Owing to this enormous activity, we can see the urgent need to develop an effective and efficient image retrieval concept. The main reason for the lack of imperfection in image retrieval system is that unlike literal data, the image data cannot be made generic, as images come in different formats and the low level features of the same image with different formats are completely different. So, many concepts and systems have been proposed thus far, but a searching system needs to have an ability that it should think and act like human does, and this concept is said to be computer vision. The requirement needed to make a machine think like human by providing the clear-cut semantic of all the data available to the system, so that it knows the knowledge of what it is really searching for. We can easily provide semantic to a numerical data. But for multimedia data, the semantic knowledge is provided either visually or textually. So, there is a need to fill the semantic gap[2], which is between this visual based low-level feature and textual based high-level feature. The main scope of this paper is to create ontology for an image of specific domain by integrating the domain knowledge and the low-level feature of the domain-specific images. In this paper, we consider the asteroideae flower family, which consists of more than 161 000 genre. Among them, nearly 72 flower genre images are considered with the help of world-wide biological universities lab images. The attribute used to elaborate the images were prevalent color of the image, basic intrinsic pattern of the image and the texton based contour gradient. The paper is organized in such a way that in the related work section, the brief literature survey on image retrieval approaches are listed. Then, the creation of ontology entities with respect to the domain knowledge of the asteroideae flower family is elaborated. Finally, frameworks for querying the constructed ontology with different rules are discussed.

2 Related work

Image retrieval is an active and broad research area since 1970[3]. Initially, text-based image retrieval techniques have been employed in this research area. Then during 1990s, content-based image retrieval techniques were introduced. Till now, research is going on in this area. The image retrieval technique approach was broadly classified as text-based image retrieval approach, content-based image retrieval approach and multimodal image retrieval approach. The text-based image retrieval approach[3, 4] was in practice from early 1970s. This approach can be also classified into two main broad categories, one is the traditional text-based image retrieval approach where the work has been standardized and the other category is the semantic text-based image retrieval approach.

In semantic text-based image retrieval technique, metadata of web pages such as RSS or metadata tag of dynamic hyper text markup language (DHTML) has been used. Now, researchers are working on ontology-based image retrieval technique[5] where ontologies such as domain ontology, knowledge ontology or personalized ontology are used for image retrieval concepts. Content-based image retrieval famously called as CBIR uses the low-level features of the image as a variable to compare and search a given image. The low-level features used are mean, moment, co-occurrence, color, shape, texture, etc. Color, shape and texture are known to be global features and others are called as local features. As the images are available in different formats and in different pixel arrangements, the features of two images which look alike but in different formats are not always the same. To make an efficient search, in recent years, intelligent learning techniques such as supervised or unsupervised learning have been employed. The query by image content (QBIC)[6] is a prototype system, where the query would be an image or rough sketch. The QBIC system uses the red-green-blue (RGB) composition for color content feature. The image coarseness, contrast and directionality were used for texture feature vector. For shape feature, the author used the area, circularity, eccentricity, major axis orientation and a set of algebraic moment invariants. The visual seek[7] is again a prototype system, where the query for the image retrieval system would be the color of the needed image. For precise results, the user has to specify the spatial location, minimum size and color composition. This work will not provide efficient results for the internet images of this age. The VIDEOQ[8] prototype system also comes under query by image or sketch technique. In addition to considering color, shape and texture of the image, it mainly concentrates on spatial-temporal constraint. The motion vector of the key object searched is considered. Multimedia analysis and retrieval system (MARS)[9] is one of the biggest projects started by the University of Illinois and supported by NSF/DARPA/NASA, etc. They try to provide a standard way for multimedia analysis and retrieval system. They also use the basic content of image such as color, texture, shape and layout. They mainly concentrated on new approaches towards image segmentation and image shape representation. The system also supported complex queries. They also try to accomplish their project for compressed images. Still, research is going on in this project. In Blobworld, the main functional areas are the feature extraction, multidimensional image indexing and retrieval[10]. The features such as color histogram, color correlogram and wavelets are extracted. Here, Carson et al.[10] use the concept of expectation maximization (EM), an algorithm used to learn a model along with its hidden variable. The extracted features are clustered using the concept of Gaussian clustering by determining the likelihood between them. The minimum description length principle is used to determine the output. In [11], the retrieval procedure uses the concept of wavelet feature extraction by Haar transform. From the transformation, Robels et al[11] uses two types of feature vectors namely multi-resolution global color histogram and multi-resolution local histogram. This system will provide relevant result if it is used locally. Snoek et al.[12] have used both local and global features for retrieval purpose. Image global features such as color correlogram, gray level co-occurrence matrix (GLCM) and moving picture experts group (MPEG 7’s) edge histogram descriptor (EHD) were used. The local features such as grid feature color moments, texture wavelet and histogram of gradient (HOG) were also used. All these features are considered to train the set using a linear kernel for HOG and non-linear for all the other features. In [13], the image is segmented using the J-Segmentation (JSEG) technique. The average color of the image and Gabor value with respect to the image texture are calculated and normalized. Image similarity is measured using Earth mover’s distance (EMD). As a result, a semantic template based decision tree is built, which can be used for retrieval purpose. But this approach requires a ground truth image to arrive at a conclusion. Thyagharajan and Harikrishnan[14] used the local low-level feature, namely RGB color information of the selected key frame for the summarization of the video concepts.

In this field of image retrieval research, a hybrid approach, i.e., combination of text and content of an image, has been employed. This fusion approach is called as multimodal image retrieval technique. This technique is also broadly categorized as fusion of hyper text markup language (HTML) tag, low-level features of images, fusion of image context with its low-level features, and metadata fusion with its low-level features. Romberg et al.[15] tried to combine the low-level detail of the image with the highlevel feature. This concept is performed by extracting sift invariant feature transform (SIFT) feature from the image and a visual word was created by employing probabilistic latent semantic analysis (pLSA). In [16], the bag of visual words were created from some set of sport images using an improved version of SIFT. In [17], ontology-based approach for image retrieval was discussed for domain-specific canine animal, but this concept does not describe the way in which image features are integrated with the ontology.

3 Ontology creation

The mere existence of natural living things can be studied and analyzed efficiently only by ontology. In ontology, resources are considered as entities and they are grouped hierarchically via their relationships. The motivating concept of this paper is to create low-level feature ontology for images, so that it would be easier to analyze and study the images automatically by a machine, thus a machine can visualize an image as human does. This would provide a way to develop a semantic-based image retrieval system. There are nearly 250 000 to 400 000 flower genres[18] that humans can normally identify, 10–20 % of them are flower images. The main objective of this paper is to create domain-specific, low-level feature ontology for a flower family in order to build a satisfactory image retrieval system. Some of the profound works carried on flower image retrieval system are discussed here. Saitoh and Kaneko[19] compared 6 features of flower and 3 features of leaf for retrieval procedure. Fukuda et al.[20] used the shape and its power spectrum graph to identify and differentiate whether the flower is rounded or with many petals or with clear single petal. Nilsback and Zisserman[21] classified the image automatically using the techniques such as segmentation, geometric modeling, feature extraction and classification. For classification, they used multi-kernel SVM classifier. The extension of [21] is [22], where they used super-pixel-based flower background/foreground segmentation and linear SVM with bag of visual word (BoVW) feature combination. Ontologies enable a better communication between humans and machines. Ontologies standardize and formalize the meaning of words through concepts. Ontology is represented by a web ontology language called as OWL. In OWL, each concept is represented as a class, a sub-class or a individual variable. The properties of each class can be represented by object properties and data properties. For these variables, logical relationship can be also specified using the basics of the first order logics. In this paper, the flower family domain of asteroideae was considered. This family can be also called as daisy flower family as most of the genres belonging to it were the categories of daisies. There were nearly 100 different genres in this flower family. Among them, 72 were distinct flowers and the others were the shrubs type. For these genres, a domain-specific low-level feature ontology creation was elaborated.

3.1 Domain knowledge analysis of asteroideae

One of the heavily populated sub-family of asteraceae is asteroideae. This family can be also termed as daisy family. Again, this sub-family can be classified into different tribes. For this work, the heavily populated tribe called asterodae is considered. This tribe has 222 genera and 3 100 species distributed worldwide. Among them, 59 genera were considered for this work. These flower species can be categorized with respect to the color of the flower and also with respect to the flower head. When analyzing these flowers, most of the flowers come under either white or yellow or purple color. Fig. 1 shows the flower species with respect to their prevalent color.

Fig. 1
figure 1

Categorization of flowers with respect to corolla colors

As shown in Fig. 2 there are three types of flower heads, i.e., ray florets, disk florets and the combination of both. In ray florets, only the petal arrangement is visible. There is no restriction on number of petals around the receptacle. The disk florets (i.e., aarosohnia) are the best example for this category. In disk florets only the central part of the flowers will be visible, they don’t have any distinct petals. The combination of ray and disk florets belong to the last categories, most of the flowers in this family are all comes under this group as shown in Fig. 3.

Fig. 2
figure 2

Asteroideae flower categorization as per flower head

Fig. 3
figure 3

Categorization with respect to flower head

Fig. 3 shows the categorization of asteroideae flower species with respect to ray florets, disk florets and combination of both. Among them, only 12% of flowers come under ray florets. Thus, the percentage of flower species in both the categories are shown in Fig. 4.

Fig. 4
figure 4

Percentages of asteroideae flower categorization with respect to color and flower head

The overall percentage of yellow, white and purple flowers among asteroideae are shown in Fig. 4. From the graph, the yellow and white colored flowers are dominant in the asteroideae flower family. On the right side of the graph, the percentage with respect to flower head were shown. Among disk, ray and combination of both, more than 57% of the flowers from asteroideae flower family are from the category of combination of both disk and ray florets.

3.2 Classes, subclasses and individual identification

Classes, sub-classes, individual, object properties, data properties and annotation properties are all the entities of the ontology. The asteroideae flower group was categorized as single flower and bunch of flower with respect to the arrangement of flower corolla. Most of the flowers from this family would be of yellow, white and purple color. So, these categories are subdivided with respect to the defined colors. The top level hierarchy of created ontology is shown in Fig. 5.

Fig. 5
figure 5

Top-level ontology of asteroideae flower domain

The OWL code snippet for the shown top level ontology creation is shown as follows:

figure Graphic1

Among each color dependent sub-class, the specific flower genre are identified and made as sub-class for that group. The sample images of the flower group are shown in Fig. 6. The flowers under yellow, white and purple polypetalous would be a single flower with many petal arrangements with respective colors. This flower family has more than 10 flower genres, which would be round in shape with small corolla, and the flowers were grouped as round yellow petalous. Under bunch flowers, different groups of yellow and white tiny flowers were identified and grouped.

Fig. 6
figure 6

Asteroideae flower domain ontology sample

The OWL code snippet for round yellow petalous subclass is shown follows:

figure Graphic2

After creating all the required classes and sub-classes, the instance which is said to be the individual of the class has to be created. This term, individual, is the same as object creation in object-oriented programming artifact. So, the individual of the class inherits all the properties of the parent classes. Normally, an image retrieval system requires training sets, so the framework would try to match the given input image with respect to the trained image set. Likewise, in this ontology creation for each flower genre, there may be a requirement of two to three training images. So, for each flower class, say Pentzia, which is a sub-class of Yellow_Round_Petalous, there would have to be about three individuals, which can be named as Pentzia_l, Pentzia_2 and Pentzia_3. The OWL code snippet is shown as follows:

figure Graphic3

Once the basic entities like classes, sub-classes and individuals were created, the properties of classes has to be specified, which will convert an ontology to a higher level semantic ontology.

3.3 Syntactic visual feature vector

In [23], MPEG 7’s descriptor such as dominant color descriptor, edge histogram descriptor, color layout descriptor and additionally texton of an image were used to represent an image ontology. But, as the value varies more than 256 bits, the query processing part becomes quite complex. To avoid such complexity, certain syntactic visual features were calculated in lower bits for creating this effective low-level feature-based image ontology. The identified syntactic visual features were prevalent color representation, basic intrinsic pattern representation and contour gradient representation.

In prevalent color representation, the most profound color of the image was identified. For this, the given image is first quantized by employing expectation maximization based K-means clustering algorithm[24]. Then, the color histogram for the quantized image is calculated. The maximum valued color is determined as the most dominant color for the given image.

In basic intrinsic pattern representation, the generic pattern of the given flower genre was calculated. For this calculation, more than two different images of the same flowers were taken and the flower region was extracted. From the extracted blobs, the grey level co-occurrence matrix for all the images was calculated. These GLCM matrix values were used to calculate the eigen value of the matrix. The GLCM matrix is normalized in such a way that for any number of sample images, it would generate 64 eigen values. From these values, the positive six values are used to determine the texture pattern for a given image. To strengthen the image ontology, the contour gradient of the images were also used as one of the data property. To determine the contour gradient of the given image, the image has to be discriminated into 4 × 4 patches. From the eight different patches, the patches with adequate information are identified. The contour gradient of those patch were calculated using the concept of texton[25], where the fundamental bases of the image are calculated by applying 13 different harmonics functioned filter on the image. Each filter response provides a (1 × 80) matrix value. These values are clustered to identify the most redundant vector value. Thus, for the given image, there would be 13 different contour gradient vector values. Table 1 shows the syntactic feature for yellow poly petalous group flowers. The abbreviation expansion of Table 1 are:

  1. 1)

    PCR = Prevalent color red;

  2. 2)

    PCG = Prevalent color green;

  3. 3)

    PCB = Prevalent color blue;

  4. 4)

    BIP = Basic intrinsic pattern;

  5. 5)

    Ln(BIP) = Natural logarithm (basic intrinsic pattern);

  6. 6)

    CGA5 = Contour gradient analysis of the 5th filter;

  7. 7)

    CGA10 = Contour gradient analysis of the 10th filter.

Table 1 Syntactic feature analysis of Yellow_Poly_Petalous

3.4 Properties and their axioms

OWL has two main types of data properties and object properties. These properties are used to provide semantic for the created ontology. Object properties are used to provide the relation between two classes. Data properties are used to relate any literal data type to particular classes. The data properties of this ontology would be the three different syntactic visual feature vector values. Fig. 7 shows the hierarchical semantics of the data property representation. Three main data properties were declared, PC for prevalent color, BIP is used for the basic intrinsic pattern representation and CGA is used for contour gradient representation. As the prevalent color will have three distinct integer values for red, green and blue, a separate data property is declared for RGB combination. Six different eigen values of type double are calculated to identify the pattern of the given image. And finally, there would be 13 different gradient vector values which would be of type double. Below shows the code snippet for the creation of PCR data property.

figure Graphic4

In this ontological structure, same individual functionality is used to provide the semantic between two instances of the same class, PC value of same class and BIP value of same class. To avoid the conflict of values between the classes, there is a need to make all the top-level classes disjointed with respect to each other, shown as

figure Graphic5

Thus, after creating the whole ontology structure, the corresponding data property values have to be specified. Let us say that the PC value for yellow colored images would be (255, 255, 128) with respect to RGB. This can be specified in each individual of the created image ontology shown as

figure Graphic6
Fig. 7
figure 7

Data properties hierarchy

4 Query rule generation

Ontology is a mere representation of knowledge base[26]. The sentence in a knowledge base can be formalized through logical languages. The entities in ontology can be represented through description language. In this paper, the class, sub-class and the relationship of property entities are expressed using the logical symbol of the first order logics. Before employing any created ontology to an application framework, the completeness of the ontology has to be studied. By mapping the set of generated rules with the ontology axioms, the completeness and the reasoning factor of the created ontology can be evaluated. The representation of some ontology entities with respect to description logic[27, 28] are listed is this section.

4.1 Class assertion axioms

The sub-classes were represented by the symbol ⊆. Let us define X to be a set of all identified Yellow_Round_Petalous, then the class assertion axiom would be of the form ∀ (x ⊆ Yellow_Round_Petalous). Likewise, class assertion axiom can be created for all the classes and sub-classes. To represent that the class Single_flower is disjointed with Bunch_flower, we can use the axiom “(Single_flower ∩ Bunch_flower ≡ ⊥)”. To specify that the subclasses of Single_flower class are disjointed with each other, the axiom could be “∀ (x,y) ((x,y) ⊆ Single_flower) ((x ∩ y) ≡ ⊥)”. So, the values that belong to x will not be reflected on y. Table 2 shows some of the class axioms.

Table 2 Two and three level class assertions and axioms

4.2 Property assertion axioms

To represent sub-property assertion, the axioms used were (PCR, PCG, PCB) ⊆ PC. The range and domain of the properties have to be specified. For data properties, the domain would be the class and the range would be the data type. The axiom used to define the domain is “((PCR) ∈ (Single_flower, Bunch_flower))” and range is “((PCR) ∈ (integer))”. Each class and its data properties are related using the ontology HasValue syntax, which can be elaborated as

$$x|\exists \,(x,{\rm{PC}}) \in ({\rm{Single\_flower,Bunch\_flower}}).$$

Table 3 lists some of the property assertions and axioms.

Table 3 Property axiom

4.3 Ontology reasoning

Ontology can be learned either through direct reasoning methods or indirect reasoning methods. The key technique used is the logical entailment between the knowledge-based ontology and the entities. For some time, the entailment may provide errors due to the unexpected equivalences in ontology and consistency with intuitions. These errors may tend to provide incompleteness in the created ontology which will not provide the required results for the given query. So, a complete set of rules has to be generated from the given axioms. In this paper, two different sets of rules were generated to check the completeness of the ontology. The created ontology structure can be visualized as the merger of both asteroideae flower domain and visual feature ontology. Description logic (DL) based query and DL query system are used to generate queries with respect to domain knowledge ontology. The set of queries is called as domain knowledge query set. To extract the classes and sub-classes of the ontology, the following rule axiom were used.

$$<\,?{\rm{x}}\,{\rm{rdfs:subClassOf}}\,?{\rm{y}}\,>.$$

In the above rule, x and y are the variables. This concept is elaborated in Table 4. If we pass the value for the variable x, the sub-classes of the variable will be listed in variable y. To extract the data property related information, we can use the axiom

$$< \,?{\rm{x owl}}:{\rm{DatatypeProperty rdf}}:{\rm{ID}} = {\rm{PCR }}?{\rm{R}}\, >.$$
Table 4 Property axiom

This will list all the PCR values to the variable array R.

To generate the rule with respect to the syntactic visual feature, rule-based classifier can be employed. This query set can be called as visual feature query set. For this, primarily, the visual feature data are transferred to any flat file system. From this data, the frequent patterns are determined in order to calculate the support and confidence of the data. By setting the support data to threshold of 80%, the effective set of rules were generated. To generate query through reasoning on ontology, the ontology has to be learned. To determine the functionality of a system with vague knowledge on a set of inputs and outputs, induction learning concept can be employed. The set of rules with example are shown as

$${\rm{Rule}}\,{\rm{antecedent}} \rightarrow {\rm{Rule}}\,{\rm{consequent}}$$
$$({\rm PCR} = 255) \bigwedge ({\rm PCG} = 255) \bigwedge ({\rm PCB} = 128) \rightarrow {\rm Yellow} \ {\rm Polypetalous} \bigwedge {\rm Round} \ {\rm Yellow} \ {\rm Petalous} \bigwedge {\rm Yellow} \ {\rm Petalous}.$$

The performance of the rule generation can be calculated by changing the support value from the range of 20% to 80%. When the support threshold is set as 20%, nearly 203 rules (equivalent to the number of the given dataset) were generated within 0.004 ms. When reducing the support threshold, the number of rules were reduced, but the time taken for computation was increased. From the number of generated rules, the performance of the ontology is measured as shown in Fig. 8. The rule creation is not the effective way of measuring the performance of the completeness of the ontology. The number of instances covered by the created rules is also one of the measures. This accuracy is calculated by finding the number of rules that cover the individual with respect to the total number of rules created. Table 5 shows the accuracy rate.

Fig. 8
figure 8

Empirical evaluation of generated rules with respect to support values and time of computation

Table 5 Accuracy rate

This ontology can be used as a backend dataset for semantic web-based search engine system. So, for a query related to image, the syntactic visual features are all extracted and a formalized query with respect to visual feature query sets is generated to match with the ontology dataset. This would provide the information regarding the given flower query image. If the user provides a text query again, it will formalize query with respect to domain knowledge query set and will provide the needed information.

5 Empirical evaluation

The asteroideae flower image feature (AFIF) ontology is evaluated using the golden standard measure[29]. The vocabulary of the ontology is compared with the flower family term, which provides the evaluation of term precision and recall. The taxonomy of the flower family is analyzed with the hierarchical structure of the flower family and it manually calculates the taxonomical overlap ratio. The overall evaluation is listed in Table 6.

Table 6 Empirical evaluation of AFIF ontology

Thus, the created semantic search engine results were compared with the traditional content-based image retrieval system and text based annotation system. The dataset used are Maria Elena et al.’s dataset with 102 flower categories and Hortipedia’s flower dataset[30]. The results were compared with different test cases. From the results, the precision of our system was found to be satisfactory. The study is shown in Table 7.

Table 7 Empirical evaluation of our approach

6 Conclusions

Image retrieval is one of the widely explored research areas. To design an effective image retrieval system with high accuracy, we need to train our system syntactically and semantically. The combination of textual and visual feature information in an ontology is said to be multi-modal ontology. For the asteroideae family flower domain, a multimodal ontology has been created with respect to its domain knowledge and visual features. The syntactic visual features used in this ontology were the prevalent color, basic intrinsic pattern and the contour gradient of the given images. After the creation of ontology, its completeness has to be determined by ontology reasoning technique. For this, a set of query axioms was created and the instance of the coverage was determined to find the accuracy of the ontology. As a result, we have an accuracy of 72%. Created ontology can be used as a backend process for any kind of retrieval process. As the created ontology is in OWL language, this system can be easily incorporated in semantic web-based image retrieval systems.