4.5.1. Quantitative Data Analysis
Table 14 presents an overview of the total number of discrepancies reported (Discr.), the number of false positives (FPs), duplicate (Dupl.), i.e., different discrepancies that represent the same problem, repeated (Rep.), i.e., same discrepancies reported in different script activities, and the number of unique problems (UP). We see that the number of discrepancies reported by UX-Tips was higher than IHI. Therefore, UX-Tips enabled participants to find more problems (UP). This may be explained by the fact that, although the techniques have a similar number of items, the dimensions proposed in UX-Tips may lead to the search for problems outside the scope of IHI’s dimensions. Notice that this also happens in the opposite direction. IHI has some dimensions out of the UX-Tips scope, but to a smaller extent.
We analyzed the discrepancies classified as FPs to better understand what the number of FPs meant in each technique, given that this amount was considerably different in each one. Regarding the six FPs occurrences in the UX-Tips, we identified that participant P42 misunderstood the evaluative item VLE2 (The application has/represents values that are important to the user), answering it ambiguously. Participant P42 responded to this item: “Yes, several unnecessary, and what I am looking for, it does not show”.
Considering this report, it is not clear exactly what the participant refers to as “unnecessary” and “does not show”. Thus, we consider this report as FP. Two discrepancies were classified as not real problems in the app (e.g., “During the execution of this task, I noticed two options leading to the same destination”). Three other discrepancies were classified as issues external to the reviewed app (e.g., “The app makes me feel a little uncomfortable due to high prices”). In the case of TripAdvisor, the purpose of the app is to inform and not establish prices charged by establishments.
Regarding the IHI technique, three FPs were misinterpreted items of the technique, twenty-two were considered non-problems in the application, and the other 16 FPs were considered positive aspects reported during the evaluation (e.g., “I found no challenge. It was easy to use all the options”). We counted positive aspects of the application as FPs since the purpose of the evaluation was to identify problems. The positive reports indicate which aspects of the application do not have UX issues, and thus they do not necessarily point to improvement aspects.
As for repeated discrepancies (see
Table 14—Rep.), 20 occurrences were registered in UX-Tips and 33 in IHI. We observed that the issues considered as repeated recur at various points in the user’s interaction with the app during the analysis, causing the user to report the issue multiple times. In this sense, we consider it as an opportunity to improve the technique. In the Problem’s Report artifact, we can insert a field for the evaluator to report that the problem is persistent throughout the app or in part, reducing inspectors’ time in the problem consolidation stage. Regarding duplicates, UX-Tips had more occurrences than IHI.
This may indicate that the UX-Tips technique led participants to find out the same problems. These results show that, besides finding more problems, participants using the UX-Tips technique made fewer errors (by finding the same problems) than those who used the IHI technique. For instance, we took the most reported problem in each technique to analyze how many participants found the same problem.
At UX-Tips, the problem related to finding a restaurant in the app was the most recurrent, with 35 occurrences among 25 different participants. At IHI, the most recurrent problem was that the application did not have an initial guide for use. This issue was reported 14 times by 11 different participants. With these two examples, we can see that the problem most reported through UX-Tips is more practical, focused, and related to the proposal of the evaluated application. At IHI, the most reported problem indicates a more technical issue, a general problem in any application. This could indicate that the UX-Tips allows a more focused assessment in context. However, a more in-depth analysis is needed in this regard.
As
Figure 5 shows, UX-Tips’ group median is almost at the same level as IHI. We used the Kolmogorov–Smirnov test to test the normality since the sample was larger than 50. We verified that efficiency was normally distributed in both groups (
p-value = 0.162 for UX-Tips and
p-value = 0.200 for IHI). In order to determine whether the difference between the samples is significant, we applied the parametric
t-test for independent samples [
41]. When we compared the two samples using the
t-test, we found significant differences between the two groups (
p-value = 0.003). These results support the rejection of the null hypothesis H01 (
p-value < 0.05), and the acceptance of its alternative hypothesis HA1, suggesting that UX-Tips was more efficient than IHI when used to find UX problems in this experiment.
In order to evaluate the effect size, we computed Cohen’s d. Considering Cohen’s convention, the effect size for efficiency was medium (d = 0.76) [
39]. The mean difference was 2.74 defects/hour, i.e., UX-Tips participants found from two to three more defects than IHI participants for each hour dedicated to the UX evaluation task.
The boxplot graph with the efficacy distribution per technique suggests that UX-Tips’ group was more effective than IHI’s group. The UX-Tips’ group median was higher than the IHI’s group median. The number 19 in
Figure 5 represents the participant who had the best performance in this indicator related to UX-Tips. Despite this participant’s low experience in the industry and medium experience with UX and usability, all discrepancies he reported were real problems. We did not find any particular characteristic that could explain this superior result, comparing to other participants. According to the course instructor, he was a highly engaged student.
We also used Kolmogorov–Smirnov to test the normality. We verified that efficacy was not normally distributed in either group (
p-value = 0.014 for UX-Tips and
p-value = 0.000 for IHI). In order to determine whether the difference between the samples is significant, we applied the Mann–Whitney test [
42]. The
p-value obtained in the Mann–Whitney test was 0.044. This result, therefore, supports the rejection of the null hypothesis H02 (
p-value < 0.05) and the acceptance of its alternative hypothesis HA2, suggesting that the UX-Tips technique was more effective than IHI when used to find UX problems in this experiment.
The effect size for efficacy was also medium (d = 0.52) [
39]. The mean difference was 1.48%, i.e., the percentage of defects UX-Tips participants found from the total existing ones was 1.48% higher than the percentage of IHI participants.
Summary of quantitative analysis: Participants using UX-Tips reported more problems than participants using IHI and, therefore, were more effective. This may be explained by the more significant number of unique problems they found. They were also more efficient than participants using IHI, which may be related to their higher precision during the evaluation. Given that they had a smaller number of false positives, they may have spent less time finding real problems.
4.5.2. Summative Content Analysis
We divided the Summative Content Analysis (SCA) into two parts. The first part is related to the results obtained by technique, which allow the participants to describe the problems they encountered. The second part is related to the analysis of the answers gathered from the feedback questionnaire, which evaluated the perception of the use of the techniques.
In the first part, we looked at how the problems encountered by the participants affected their UX.
Table 15 presents reports of statements about the main problems they identified, associating them to the technique used by the participant.
As
Table 15 shows, almost all participants that described the consequences of the problems to their experience with the app were from the UX-Tips group. This may indicate that the description of IHI items may not evaluate hedonic attributes explicit to the participants. Thus, by allowing participants to describe their experiences, UX-Tips can aid them in expressing how a problem can affect UX. This characteristic may be necessary for the design team to verify the causes for a negative UX.
The second part of the SCA is related to the user perception of the techniques through the answers gathered from the feedback questionnaire. Since this questionnaire was available online, some participants did not respond to it (33 participants used UX-Tips, and 29 used IHI). Regarding the feedback questionnaire,
Table 16 presents the distribution of positive (Pos.), negative (Neg.), and neutral (Neu.) answers, i.e., the technique (Tech.) partially helped. The participants answered balanced regarding the appropriateness of the techniques and their ability to describe the problems that they found.
Considering that one of the objectives of UX-Tips is to allow the identification of UX problems, it was noteworthy that most participants answered that UX-Tips enabled indicating UX problems, confirming the results in
Table 15. The same may not be true for IHI. Although the number of participants using IHI also indicates that it enabled the description of problems (considering we use the UX-Tips problems form to IHI),
Table 15 shows an example that few participants reported the details about the problems they found.
Regarding ease of use,
Table 16 shows that most of the participants using the UX-Tips technique affirmed that it was easy to use. On the other hand, the number of participants that indicated that using IHI was easy to use was almost the same as the number that indicated that it was not easy to use. The difficulty in using might explain the high number of FPs in IHI (see
Table 14). Below, we highlight statements of ease of use of both techniques to present more details on their perceptions about this topic.
“It was easy. The descriptions were well didactic as to what to do.”
—(UX-Tips)
“Yes, the items were very self-explanatory, more direct on the tasks to be performed and questions to be found in the application.”
—(UX-Tips)
“It was easy to use the technique and understand it.”
—(IHI)
“It was not easy, some items were unclear and could be interpreted differently.”
—(IHI)
Summary of Summative Content Analysis: Participants who used IHI did not describe how problems affected their experience, limiting themselves to point out the problem without detail. Furthermore, UX-Tips encouraged participants to report their feelings regarding the problems, showing that the technique aids in assessing the hedonic attributes. Lastly, UX-Tips had a higher proportion of participants regarding the ease of use when compared to IHI.