Keywords

1 Introduction

Along with the spread of the EC (Electronic Commerce) market, it is easier to order some products and get some information about an item. Due to the above convenience of customers, the market size of the EC site is increasing [1].

Focusing on the service contents provided by the EC site, the user-contributed review service has been introduced at many EC sites. The review contains a lot of important information for customer impressions and customer experiences. Therefore, it is very helpful when the customer cannot decide to buy a product. The review has a strong influence on consumer’s decision making. The review has positive content and negative content, which is also a factor for decision-making for consumers [2].

In this study, we use the reviews posted on a golf product for a golf portal site. Consumers want to find golf reviews that match their golf player level when considering purchasing products with reference to golf reviews. However, it is not easy to find a review that matches his or her golf level. In addition, we compare the characteristic words of the review by each golf player level and clarify noteworthy items that appear in the review.

2 Data Summary

In this study, we focus on the reviews posted by members of a golf portal site. In addition, we target reviews about golf products. Reviews include some texts that express the content, and evaluation score (5 stages, 1 is very bad, 5 is very good). Generally, after purchasing the target product, reviews are posted on the portal site. The total review number was 98,265. The review data includes the following items.

3 Dataset

First, we removed missing and outlier during the data. Next, we selected the target product. Specifically, we targeted products with more than 100 reviews in the data period. This is because it is difficult to identify the characteristics of the product from the review unless it was a product that had been reviewed to some extent. As a result, six golf product IDs were targeted. These six products were all included in the category “Ball”. Table 2 shows the number of reviews posted for 6 products.

Table 1. Data of items
Table 2. Number of reviews posted for each product IDs

4 Analysis of Characteristics of Review

4.1 Summary Statistics

First, we compiled information on user who posted reviews for each product ID. The results are shown in Tables 3, 4, 5, 6 and 7.

Table 3. Percentage of average reviewers golf score for each golf product
Table 4. Cross tabulation golf level vs product
Table 5. Satisfaction level vs product
Table 6. The result of tabulation on the evaluation score
Table 7. The result of aggregation on golfer type

Table 3 is construction ratio with respect to round score level. Generally, it is called that round score under 93 is an advance player, 93 to 110 is an intermediate player and over 111 is a beginner player.

As a result, all the looks like for the intermediate player. Especially, the highest percentage among 3534 and 4156 is 93~100. It is mean that these products often used by the intermediate player. Besides, 3900, 4071, 4088 and 4089 highest percentage is about 83~92. These products are usually used for intermediate player and advance player.

Among the six product IDs, 3534 and 4156 are the high percentage of the intermediate user and also the high percentage of the advanced users.

  • Product ID 4089 is the highest percentage of reviews among “advance” users.

  • Product ID 4156 is the highest percentage of reviews among “intermediate” users.

  • Product ID 3534 is the highest percentage of reviews among “beginner” users.

  • When classifying the six products into groups, we defined “beginner to intermediate,” “intermediate to advanced” and “advanced”.

Next, we focus on the satisfaction of the review. Satisfaction shows that the user expresses what feelings to the product.

All Six products satisfaction level are mostly felt good. However, 4156 is not so satisfying compared with other product.

  • Product IDs 3534 and 4088 is the highest proportion among “Good”.

  • Product ID 4156 is the highest proportion among “Normal”.

  • Product ID 4156 is the highest proportion among “Bad”.

  • 5 product IDs (3534, 3900, 4071, 4088 and 4089) percentage of “Good” is very high.

  • However, ID number 4156 had not only ordinary and dissatisfied but also special icons such as anger and surprise.

Next, we focus on the evaluation score of the review. Evaluation score is how satisfactory the user is with the product by 5-step evaluation. Here, 5 is the highest score and 1 is the worth score.

  • Product ID 4089 is the highest proportion among “evaluation points 5.”

  • Product ID 4156 is the highest proportion among “evaluation points 4.”

  • Product ID 4156 is the highest proportion among “evaluation points 3.”

  • Most of the product IDs evaluation points are 5.

Next, we focus on the golfer type of the review. Golfer type represents the user’s competitive motivation for golf. It is arbitrarily selected by the user from three options.

“Athlete” type has a highly competitive motivation. They can be inferred to be a type that practices regularly and tackles stoic. “Semi-athlete” type has middle competitive motivation. They can be inferred to be a type that enthusiastically engages in practice even though it is not as extensive as “Athlete.” “Enjoy” type is not high in competitive motivation.

As a result, all the 6 product IDs usually reviewed by “Semi-athlete” and “Enjoy” type of player. Especially, 3534, 3900, 4071 and 4156 reviewed by “Enjoy” type of player. On the other hands, 4088 and 4089 reviewed by “Semi-athlete” and “Enjoy” type of player.

  • Product ID 4071 is the highest proportion among “athletes.”

  • Product IDs 4088 and 4089 is the highest proportion among “semi athletes.”

  • Product IDs 3900 and 4156 is the highest proportion among “enjoy.”

  • When classifying the six products into groups, “Enjoy,” “Semi athletes to Enjoy” and “Athletes to Enjoy.”

4.2 Analysis of Features Included in the Review

We performed natural language processing analysis in order to clarify the characteristics of the reviews. Natural language processing is used for analysis of text data and many types of research are targeted on reviews on the EC site [3].

First, we compiled the reviews for each product into one document. Namely, we created six documents. Next, we performed Morphological analysis on each document. Here, we used MeCab [4] which is a Japanese dictionary for morphological analysis. In this study, we extracted nouns, verbs, adjectives, proper nouns, place names, organization names, and part-of-speech of proper nouns from review sentences.

Next, we try to identify words (characteristic words) that characteristically express each category. Specifically, we extract words that frequently appear in a specific category by the TF-IDF [5] method. TF-IDF method was adopted by the following Eqs. (1) to (3).

$$ TF - IDF_{i, j} = tf_{i,j} \times idf_{i} $$
(1)
$$ tf_{i,j} = \frac{{n_{i,j} }}{{\mathop \sum \nolimits_{S} n_{S,j} }} $$
(2)
$$ idf_{i} = \log \frac{\left| D \right|}{{\left| {\left\{ {d:d \in t_{i} } \right\}} \right|}} $$
(3)

Here, \( n_{i,j} \) is the number of appear frequency about word \( i \) in the sentence \( j \). \( \mathop{\sum}\limits_{S} {n_{S,j} } \) is the number of appear frequency of all words in the sentence j, \( \left| D \right| \) is the total number of all sentences. \( \left| {\left\{ {d:d \in t_{i} } \right\}} \right| \) is the number of sentences containing words \( i \).

Table 8 shows the characteristic words with the high TF-IDF value for each product (Top 20).

Table 8. The top 20 words obtained by the TF-IDF method

5 Discussion

Based on the above results, we point out the characteristics of reviewer condition of each product IDs using Tables 3, 4, 5, 6 and 7.

As an overall trend, the average score shows that the ratio of “83 to 92” and “93 to 100” is high by looking at the six product IDs. For the golf level, the proportion of Advance and Intermediate is high, and the satisfaction degree accounts for the majority of good and normal. It turns out that most of the evaluation points occupy the ratio of 4 to 5. For golfers, the proportion of semi athlete and enjoy type is high.

Product ID 3534 has a high percentage of “intermediate” users with an average score of “93 to 100” and satisfaction is also high. The proportion of the evaluation points was high, and the type was found to be a lot of the “enjoying” type. We inferred that this product is for low motivation people from the “beginning” of golf to “advanced” people.

Product ID 3900 is the percentage of middle and “advanced” in the average score “83 to 92” of high, the satisfaction level is high, the percentage of “evaluation point 5” is the highest. There are also many golf player types. We inferred that this product is for users who can easily enjoy golf.

Product ID 4071 is a high percentage of medium and advanced in the average score “83 to 92”. The satisfaction level is also very high, and the percentage of the “evaluation point 5” is also high. The percentage of “enjoying” types is high for golfers. We inferred that this product is for golf experienced and low motivation users.

Product ID 4088 is a high percentage of “advanced” users with the average score “83 to 92.” The satisfaction level is also high, the percentage of the “evaluation point 5” is also high. It turned out that the proportion of “semi athlete” was high. We inferred that this product is for users who are experienced in golf and can be satisfied with high motivation or low motivation. Moreover, we inferred that this product is for users with confidence in golf and regular motivation.

Product ID 4089 has a high percentage of “advanced” users with the average score “83 to 92.” The percentage of satisfaction and “evaluation point 5” is the highest. The proportion of golf player type semi-athlete is high. We inferred that this product is for users with a sense of golf and motivation is normal.

Product ID 4156 is a high average score of the for users of middle and advanced users of “83 to 92 and 93 to 100.” Satisfaction degree is the highest but uses special satisfaction such as surprised and angry. As a result, the percentage of the “evaluation points 4” is the highest. We inferred that this product is intended for users who play golf and have low motivation.

Next, we discussed the characteristic words of each product using Table 8.

Product ID 3900 is used the word “Bridgestone.” This is the name of a very famous golf company. Also, there is some word “yellow” and “orange.” It means that this product has some color variation. The words “sale,” “cost performance” and “great buy” which means this product is a bargain product with the standard product of the manufacturer.

Product ID 3534 is mostly used words like “cute,” “woman” and “colorful.” It means that golf product for girls due to the cute design of the product. It is understood that there are many people who purchase mainly for the purpose of using in the golf competition.

Product ID 4071 used the words “iron,” “green,” “driver” and “distance”. We understood that it is evaluated about shot performance.

Product ID 4088 used the word “maker.” That is mean this product developed by famous company. There were words “price” and “coupon” which means the price is affordable products. The word “happy” and “glad” means that this product can use for a gift.

Product ID 4089 used words “wonderful” and “feeling”. This product is known excellent functionality such as sense and durability. Therefore, it can be inferred that such words appeared in the reviews.

Product ID 4156 shows that the performance of the ball is good which used the word “score” and “straight”. It can be inferred that the performance of the ball was evaluated. Moreover, it can be inferred that the words “trajectory,” “turn,” “iron” and “driver” evaluated good flight distance in the first shot and approach scenes.

Finally, we evaluate each product IDs using the result of reviewer condition and characteristic words.

Product ID 3534 has a high percentage of “intermediate” users. Moreover, product ID 3534 is mostly used words like “cute,” “woman” and “colorful”. Additionally, satisfaction and evaluation points are nearly full. We evaluated this product is a commodity for gifts pleased even for women.

Product ID 3900 is a bargain product with goods of a famous manufacturer. We can see the characteristics that the performance is substantial. We evaluated this product is for the gift of competition.

Product ID 4071 is characterized by excellent performance. Exercises and scenes used in the course emerge. It is understood that it is perfect for users looking for products with particularly good performance. We evaluated this product is for using for practice golf and using at golf course.

Product ID 4088 is a product for gifts enough to please opponents. It is a pleasing tendency to send a gift to someone for famous manufacturers and for bargain reason. We evaluated this product is for a gift to make the person happy.

Product ID 4089 is a bargain product and has good characteristics. It seems that it is used for practice to understand the tendency to use as a premium for preparing as a good prize and to use it even in the course. We evaluated that this product is for gift of competition.

In product ID 4156, the performance part of the product is evaluated, and in addition, the cost performance is good. We evaluated this product is for golf playing.

6 Conclusion

In this study, we focus on the reviews posted by members of a golf portal site and targeted review about 6 golf product IDs. First of all, we compiled information on members who posted reviews for each product IDs. We targeted products with more than 100 reviews in the data period in this study. After that, we compared information on members who posted reviews for each product ID. Moreover, we performed natural language processing analysis in order to clarify the characteristics of the reviews. Finally, we evaluated each product IDs using the result of reviewer condition and characteristic words.

In the future, we need to increase the number of target data. It is possible to compare various golf products and to judge what kind of products are suitable for what users. Moreover, we judge whether the review sentence is positive or not and estimate what emotions the review wrote from the sentence.