1 Introduction

A short and easy-to-understand definition of online social networks (OSN) is that they “form online communities among people with common interest, activities, backgrounds, and/or friendships. Most OSN are web based and allow users to upload profiles (text, images, and videos) and interact with others in numerous ways” [1]. As structure, the OSN are usually perceived as a set of nodes—represented by its users, and a set of directed (e.g., “following” activity on Facebook) or undirected (e.g., friendship relationships) edges connecting the various pairs of nodes [2].

Apart from the research and development role played by the OSN mostly through the usage of the customers’ feedback to improve and solve problems or even to create and diversify different categories of products/services, the word-of-mouth marketing and targeted advertising are also some of the benefits brought by these networks. In a recent paper about social intelligence and customer experience, Synthesio [3] votes for learning who your customers really are to address them a proper campaign. In addition, by identifying the “army” of product advocates among different communities and getting them closer to a specific brand will give that firm a tremendous opportunity to gain more customers and to extend a real relationship on an individual basis [3].

Fig. 1
figure 1

Paper published per year

But before this, it still remains unsolved the question of identifying those particular advocates among the “big crowd” of customers on the OSN. For this, in the following, we are going to use some elements extracted form grey systems theory for spreading the customer’s crowd into groups based on how impressionable they are by commercials, ads, videos and comments in OSN.

2 An overview of online social networks

The online social networks (OSN) began to capture the attention of more and more scientists with the rise of Web 2.0. After that moment, the virtual world had developed and people all over the world began to connect with others, exchanging information. The social networks represent complex networks that contain a lot of people with a specific type of connection or interdependencies between them (friendship, business relations, community, etc.). The study of OSN requires a variety of techniques and methods which can be used to understand and predict the behavior of the components.

The studies in this field treated many research areas such as social issues, e-learning, marketing, communication and reporting, economics and health issues [47]. The main social issues that were addressed by the scientific world were aspects regarding confidentiality and privacy [8, 9], age gap [10], sentiment analysis [11, 12], social activity [13, 14], addiction [15] and social assets [16]. The impact of word-of-mouth communication was studied by Brown [17] and the viral marketing by Subramani and Rajagopalan [18]. Economic aspects in OSN were concerning the performance and risks within companies [19], consumers’ decisions [20], e-commerce and online public goods [21]. The interest in health issues generated analysis regarding medical professionalism [22, 23], patients’ communication and patient–doctor relationship in OSN [24].

In 2015, there were recorded 64,672 scientific papers that analyzed the social network area, from which 2687 focused mainly on the study of OSN. The figure below illustrates the evolution of the number of scientific papers published by year. It can be seen that lately, the number of papers have increased, most of them being published in 2013 (Fig. 1).

Most of the papers published in the field of OSN were included in the following areas of interest: computer science, engineering, psychology, business economics, sociology, telecommunications, public environment, education research and information science (see Fig. 2).

Fig. 2
figure 2

Research areas for the papers published in OSN field

Regarding the document type, most of the papers (7050) were articles, followed by proceedings papers (3721), book reviews (728), meeting abstracts (689), editorial materials (370), reviews (217), letters (66), news items (54), corrections (49) and notes (41)—see Fig. 3.

Fig. 3
figure 3

Document types of OSN papers

Most of the papers were from USA (5083), followed by China (1184), England (980), Canada (606), Germany (472), Spain (458), Australia (457), Italy (338), Netherlands (320) and South Korea (304). This geographical distribution is presented in Fig. 4.

Fig. 4
figure 4

Record count of papers—a global view

Correlated with the geographic distribution of articles published in the field of OSN, most of them were written in English, followed by Spanish, German and French—Fig. 5.

These numerous scientific papers published in this area of interest illustrated the fact that the OSN domain is a continuously growing area, with many aspects that can be analyzed.

3 Grey knowledge

Grey systems theory is one of the newest theories in the field of artificial intelligence and starts from a definition stipulated in the control theory in which an object was considered black when nobody knows anything about its inner structure and white when this structure was completely known. Therefore, a grey object is that particular entity whose structure is just partially known [25].

Fig. 5
figure 5

The language of the papers published on OSN

By developing its own methods and techniques, the grey systems theory succeeds to extract and bring some new knowledge about a specific object, process or phenomenon. Along with the grey relational analysis, one of the most known and used methods from the grey systems theory, the grey clustering, is also one of the techniques that is bringing some new knowledge regarding a specific community.

In OSN more than in other type of communities and networks, grey systems theory finds its applicability due to the nature of the relationships between its main actors. Forrest [26] identifies two main types of relationships in a system: the generative and the non-generative ones. While the generative relationships are due to the interactions among different elements of a system, the non-generative are represented by the inner characteristics of these elements [27, 28].

Even though most researches focus mainly on one or another type of these relationships, some advances have been made recently to include both of these aspects, as they are giving more substance about what is really happening at each network’s level [26].

Due to these different approaches related to a system, the amount of knowledge extracted is limited and can be easily regarded as grey [29]. Even more, by adding the human component, through the consumers’ demands and needs, strictly related to preferences, self-awareness, self-conscience, free-will, etc., the study of the knowledge that can be extracted through OSN is becoming more complicated.

For this, a new type of knowledge can be identified, the grey knowledge, which is lying between the two well-known types of knowledge: the tacit and the explicit one and is continually circulating and transforming within the network. It can be encountered in the internalized and externalized feedback loops that are formed between different network users and it accompanies the external (chatting, e-mails sending, etc.) and internal (listening, watching a commercial, reading a comment, evaluation, observing, etc.) processes. Considering the everyday activities, is can easily be seen that the grey knowledge is the most predominant type of knowledge that can be encountered and, therefore, the study of it can reveal new information that can be used in understanding the OSN’s complexity.

4 Grey clustering analysis

Assume that there are n objects to be clustered according to m cluster criteria into s different grey classes [26, 30, 31] A function noted \(f_j^k \left( \cdot \right) \) is called the “whitenization” weight function of the kth subclass of the j criterion, with: \(i=1,2,\ldots ,n;j=1,2,\ldots ,m;1\le k\le s\) [26]. Consider a typical whitenization function as described by [25, 30, 31] with four turning points noted as: \(x_j^k (1),x_j^k (2),x_j^k (3)\) and \(x_j^k (4)\):

$$\begin{aligned} f_j^k (x)=\left\{ \begin{array}{l@{\quad }l} 0, &{} x\notin [x_j^k \left( 1 \right) ,x_j^k \left( 4 \right) ]\\ \frac{x-x_j^k \left( 1 \right) }{x_j^k \left( 2 \right) -x_j^k \left( 1 \right) }, &{} x\in [x_j^k \left( 1 \right) ,x_j^k \left( 2 \right) ] \\ 1,&{} x\in [x_j^k \left( 2 \right) ,x_j^k \left( 3 \right) ] \\ \frac{x_j^k \left( 4 \right) -x}{x_j^k \left( 4 \right) -x_j^k \left( 3 \right) }, &{} x\in [x_j^k \left( 3 \right) ,x_j^k \left( 4 \right) ] \\ \end{array} \right. \end{aligned}$$
(1)

or the whitenization weight function of lower measure (a particular case of the typical whitenization function presented above, where the first and the second turning points \(x_j^k (1),x_j^k (2)\) are missing):

$$\begin{aligned} f_j^k \left( x \right) =\left\{ {{ \begin{array}{l@{\quad }l} 0, &{} x\notin [0,x_j^k \left( 4 \right) ] \\ 1, &{} x\in [0,x_j^k \left( 3 \right) ] \\ \frac{x_j^k \left( 4 \right) -x}{x_j^k \left( 4 \right) -x_j^k \left( 3 \right) },&{} x\in [x_j^k \left( 3 \right) , x_j^k \left( 4 \right) ] \\ \end{array} }} \right. \end{aligned}$$
(2)

or the whitenization function of moderate measure (also a particular form of the whitenization function, where the second and the third turning points \(x_j^k (2),x_j^k (3)\) coincide):

$$\begin{aligned} f_j^k \left( x \right) =\left\{ \begin{array}{l@{\quad }l} 0, &{} x\notin [x_j^k \left( 1 \right) ,x_j^k \left( 4 \right) ] \\ \frac{x-x_j^k \left( 1 \right) }{x_j^k \left( 2 \right) -x_j^k \left( 1 \right) }, &{} x\in [x_j^k \left( 1 \right) ,x_j^k \left( 2 \right) ] \qquad \\ 1, &{} x=x_j^k \left( 2 \right) \\ \frac{x_j^k \left( 4 \right) -x}{x_j^k \left( 4 \right) -x_j^k \left( 2 \right) }, &{} x\in [x_j^k \left( 2 \right) ,x_j^k \left( 4 \right) ] \\ \end{array} \right. \end{aligned}$$
(3)

or the whitenization weight function of upper measure (another particular form of the whitenization function where the final third and fourth points \(x_j^k (3),x_j^k (4)\) are missing):

$$\begin{aligned} f_j^k \left( x \right) =\left\{ \begin{array}{l@{\quad }l} 0,&{} x<x_j^k \left( 1 \right) \\ \frac{x-x_j^k \left( 1 \right) }{x_j^k \left( 2 \right) -x_j^k \left( 1 \right) },&{} x\in [x_j^k \left( 1 \right) ,x_j^k \left( 2 \right) ] \\ 1,&{} x\ge x_j^k \left( 2 \right) \\ \end{array} \right. \end{aligned}$$
(4)

The grey clustering analysis can be performed by following the below steps: [25, 3234]

  • Step 1: Determining the form of the whitenization function \(f_j^k \left( \cdot \right) \), for \(j=1,2,\ldots ,m;1\le k\le s\).

  • Step 2: Attributing a cluster weight \(\eta _j\) to each criterion based on external information such as prior experience or qualitative analysis, with \(j=1,2,\ldots ,m.\)

  • Step 3: Calculating all fixed weight cluster coefficients from the whitenization function \(f_j^k \left( \cdot \right) \) determined at step 1, cluster weights \(\eta _j \) at step 2 and observational values \(x_{ij} \) of the object i for the j criterion, with \(=1,2,\ldots ,n;j=1,2,\ldots ,m;1\le k\le s\) :

    $$\begin{aligned} \sigma _i^k =\mathop \sum \limits _{j=1}^m f_j^k ({x_{ij} }){*}\eta _j \end{aligned}$$
    (5)
  • Step 4: If \(\sigma _i^{*k} =\mathop {\max }\nolimits _{1\le k\le s} \{ {\sigma _i^k } \}\), then the object i is belonging to the \(k^{*}\)th grey class.

Let us perform the grey clustering analysis in the case study on the OSN users to determine which of them are being impressed by the marketing campaigns, comments, articles, videos, etc., in the online environment and what conclusions can be drawn from studying their personal and cluster characteristics.

5 Case study on OSN users

For conducting the cluster analysis, a questionnaire was applied to the online social networks’ users, 211 persons answering to all the addressed questions. Having the answers, a confirmatory factor analysis was accomplished to validate the construct, validity and reliability of the questionnaire. After proceeding this, the selected factors were passed through the grey clustering method, obtaining three relevant used categories as it will be shown in the following sections.

5.1 Questionnaire and data

The 211 questionnaire’s respondents can be divided into five age categories: 104 between 18–25 years old, 76 between 26–35 years old, 20 between 36–45 years old, 7 between 46–55 years old and 4 between 56–65 years old; 61.61 % of them being female and 38.39 % male. Along with the questions regarding the personal characteristics, the respondents were asked to answer the following questions, evaluated through a Likert scale taking values between 1 and 5:

  • When I want to buy a product: (DM_1)

    • I buy it immediately without hesitation;

    • I am thinking a while on this opportunity and in a couple of days I decide whether to buy it or not;

    • I am asking for my close friends’ advice;

    • I am asking for my friends and family’s advice;

    • I am asking for advice from friends and family, I am searching other buyers’ comments on internet and on social websites.

  • In general: (DM_2)

    • I make my own decision and I stick to it no matter what happens;

    • I make a set of possible decisions, I analyze them a couple of days and after that I take my decision;

    • I have a set of possible decisions and for validation, sometimes, I ask someone else’s opinion;

    • I discuss the possible decisions set with close friends/co-workers/ family;

    • I always discuss the decisions with other people, read news and comments.

  • When I cannot identify the product I am looking for: (PL_1)

    • I buy another product from the same producer, but from a different assortment;

    • I am looking for another store where I can buy my product;

    • I buy another product from a similar producer;

    • I am looking for new information that can help me in finding what I need;

    • I cannot evaluate this situation;

  • When choosing a particular product/service, I am taking into consideration the following aspects: (please select among: strongly disagree; disagree; undecided; agree; strongly agree)

    • The product and availability term: (P_1);

    • Product’s inner characteristics: (P_2);

    • Package characteristics: (P_3);

    • Brand awareness: (P_4);

    • Information received recently about that product: (P_5).

  • Which of the following actions have you made on friends’ recommendation: (please select the appropriate answer among: never, sometimes, often, usually, always):

    • I have watched a commercial: (INT_1);

    • I have looked for a product promotion campaign: (INT_2);

    • I have informed about an event of a certain company: (INT_3);

    • I have participated on a contest organized by a firm: (INT_4);

    • I have followed that company’s activity on social media: (INT_5).

These questions have been divided into three categories, as it can also be observed from the labels attached to them: Decision making and product placement (DM), Product (P) and Interaction with friends on social networks (INT).

Fig. 6
figure 6

Latent construct and the measured variables (a, b)

Fig. 7
figure 7

Latent construct and the measured variables (a, b)

Fig. 8
figure 8

Latent construct and the measured variables (a, b)

5.2 Model fit through a confirmatory factor analysis

Having the answers to the questionnaire above, a confirmatory factor analysis was conducted to validate its main constructions.

The starting construction contained 13 latent factors (Fig. 6a), but due to the poor values obtained for main confirmatory factor analysis’s indices such as CMIN/DF of 4.612, GFI of 0.836, AGFI of 0.760, CFI of 0.778, NFI of 0.737, RFI of 0.670, IFI of 0.782, RMSEA of 0.131, etc., the construction has been structured as in Fig. 6, holding in analysis just 10 latent factors. For the new latent construct (Fig. 6b), the received results for the mentioned parameters were better than in first case, but still low: CMIN/DF dropped to 1.713, GFI of 0.953, AGFI of 0.917, CFI of 0.964, NFI of 0.920, RFI of 0.884, IFI of 0.965, RMSEA of 0.058. Therefore, a new cut in the considered variables was needed.

The P_2 variable has been eliminated due to the low loadings values (see Fig. 7c). As a result, the values of the indicators have received better values: CMIN/DF of 1.678, GFI of 0.960, AGFI of 0.926, CFI of 0.970, NFI of 0.930, RFI of 0.895, IFI of 0.970, RMSEA of 0.057.

As improvements still can be made here, the P_3 variable was eliminated (Fig. 8d) conducting to the indicators’ values presented and analyzed in the tables (see Tables 1, 2, 3, 4).

Goodness of fit (GOF)

The goodness of fit indicates how well the specified model reproduces the covariance matrix among the indicator variables, establishing whether there is similarity between the observed and estimated covariance matrices.

One of the first measures of GOF is Chi-square statistic through which the null hypothesis is tested so that no difference is between the two covariance matrices, with an acceptance value for the null hypothesis of >0.050. As Table 1 indicates, this value is exceeded. The improved model has a CMIN/DF of 1.483 less than the threshold value 2.000 (Table 2).

Moreover, the values of GFI and AGFI are above the limit of 0.900, recording a 0.972, respectively, a 0.940 value, while CFI is exceeding 0.900 (being 0.983—see Table 3) the imposed value for a model of such complexity and sample size. As for the other three incremental fit indices, namely NFI, RFI and IFI, the obtained values are above the threshold value 0.900 for NFI and closely to 1.000 for RFI and IFI.

As Table 4 shows, the root mean squared error approximation (RMSEA) has a value below 0.100 for the default model, showing that there is a little degree to which the lack of fit is due to misspecification of the model tested versus being due to sampling error. The 90 % confidence interval for the RMSEA is between LO90 of 0.000 and HI90 of 0.085, the upper bound being close to 0.080, indicating a good model fit.

Validity and reliability

For testing the construct’s validity and reliability, first of all, the standardized loadings should be analyzed and should be higher than 0.500, ideally 0.700 or higher. In this case, these values are between 0.634 and 0.953, confirming the validity.

Table 1 Result table (AMOS 22 Output)
Table 2 CMIN (AMOS 22 Output)
Table 3 Baseline comparisons (AMOS 22 Output)

The convergent validity is given by two additional measures: the average variance extracted (AVE) and construct reliability (CR). As these two measures are not computed by AMOS 22, they have been determined using the equations presented in the literature [35].

The following values have been obtained for P, DM and INT—AVE: 0.431, 0.527 and 0.600, and CR: 0.714, 0.793 and 0.909. An AVE of 0.500 indicates an adequate convergent validity, while a CR of 0.700 or above suggests a good reliability. Having the obtained values, it can be concluded that the overall construct validity and reliability is good and that the considered measures are consistently representing the reality.

Table 4 RMSEA (AMOS 22 Output)

5.3 Grey clustering

Using the GSTM 6.0 software, the grey cluster analysis was performed and the results are shown in the following:

figure a

Based on the answers received, it can be concluded that the second grey cluster is formed mostly by impressionable persons who are positively reacting to promotion campaigns in an online environment and which are taking into account other’s opinions when making a decision.

Considering the members of the second grey cluster, it has been established that they have an average age of 24.1 years, their majority being formed by women, with an average number of friends on OSN of 516, who are accessing the OSN more than once a day. In addition, persons in this category are spending more than four hours per day in online social networks and actively participating in forms and discussions in the online environment.

Having this information about the most impressionable members in OSN, the companies can adapt their strategies to deliver the new pieces of information directly to these users [36]. From here, specific analysis can be done for each new user to determine which group he belongs to.

In addition, another further research direction can be the identification of the most important nodes among the ones that can easily be impressed using a grey approach similar to the one proposed by Wu et al. [37]. In this way, by knowing both the nodes that are easily to be impressed and the ones that have great influence in each network, the companies’ strategies can be adapted to better target the OSN audience. Additionally, a storage service [38] can be used to ease the access to such a great amount of stored data.

6 Conclusions

OSN are becoming more and more a reality nowadays. In this context, companies have adapted their strategies to meet the target audience. This paper presents a method for selecting the most impressionable members of a network. For this, a questionnaire has been deployed, applied and validated for better extracting the most impressionable members. Grey clustering was used as the information flowing within the feedback loops in OSN is a grey one.

As further research, a grey relational analysis will be used for identifying the most important and influential node among the most impressionable nodes within an OSN. Having this information, each company can adapt or create a specific strategy that will target this person to increase and strengthen competitive position on the market.