Abstract
Promptly and accurately answering questions on products is important for e-commerce applications. Manually answering product questions (e.g. on community question answering platforms) results in slow response and does not scale. Recent studies show that product reviews are a good source for real-time, automatic product question answering (PQA). In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question. In this paper, we focus on the issue of answerability and answer reliability for PQA using reviews. Our investigation is based on the intuition that many questions may not be answerable with a finite set of reviews. When a question is not answerable, a system should return nil answers rather than providing a list of irrelevant reviews, which can have significant negative impact on user experience. Moreover, for answerable questions, only the most relevant reviews that answer the question should be included in the result. We propose a conformal prediction based framework to improve the reliability of PQA systems, where we reject unreliable answers so that the returned results are more concise and accurate at answering the product question, including returning nil answers for unanswerable questions. Experiments on a widely used Amazon dataset show encouraging results of our proposed framework. More broadly, our results demonstrate a novel and effective application of conformal methods to a retrieval task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
The original implementation uses a softmax activation function to compute P(r|q) (and so the probability of all reviews sum up to one); we make a minor modification to the softmax function and use a sigmoid function instead (and so each review produces a valid probability distribution over the positive and negative classes).
- 8.
Following the original papers, a “review” is technically a “review sentence” rather than the full review.
- 9.
To control for quality, we insert a control question with a known answer (from the QA pair) in every 3 questions. Workers who consistently give low scores to these control questions are filtered out.
- 10.
This step is only needed for moqa, as bertqa and fltr produce probabilities in the first place. For moqa, we convert the review score into a probability applying a sigmoid function to the log score.
References
McAuley, J., Yang, A.: Addressing complex and subjective product-related queries with customer reviews. In: WWW (2016)
Zhao, J., Guan, Z., Sun, H.: Riker: mining rich keyword representations for interpretable product question answering. In: SIGKDD (2019)
Zhang, S., Lau, J.H., Zhang, X., Chan, J., Paris, C.: Discovering Relevant Reviews for Answering Product-related Queries. In: ICDM (2019)
Gao, S., Ren, Z., et al.: Product-aware answer generation in e-commerce question-answering. In: WSDM (2019)
Chen, S., Li, C., et al.: Driven answer generation for product-related questions in e-commerce. In: WSDM (2019)
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for SQuAD. In: ACL (2018)
Herbei, R., Wegkamp, M.H.: Classification with reject option. The Canadian Journal of Statistics/La Revue Canadienne de Statistique (2006)
Gammerman, A.: Conformal Predictors for Reliable Pattern Recognition. In: Computer Data Analysis and Modeling: Stochastics and Data Science (2019)
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, New York (2005)
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)
Toccaceli, P., Gammerman, A.: Combination of inductive mondrian conformal predictors. Mach. Learn. 108(3), 489–510 (2018). https://doi.org/10.1007/s10994-018-5754-9
Carlsson, L., Bendtsen, C., Ahlberg, E.: Comparing performance of different inductive and transductive conformal predictors relevant to drug discovery. In: Conformal and Probabilistic Prediction and Applications (2017)
Cortes-Ciriano, I., Bender, A.: Reliable prediction errors for deep neural networks using test-time dropout. J. Chem. Inf. Model. 59(7), 3330–3339 (2019)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Devlin, J., Chang, M.W., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Gupta, M., Kulkarni, N., Chanda, R., et al.: AmazonQA: a review-based question answering task. In: IJCAI (2019)
Hu, M., Wei, F., Peng, Y., et al.: Read+ verify: machine reading comprehension with unanswerable questions. In: AAAI (2019)
Sun, F., Li, L., et al.: U-net: machine reading comprehension with unanswerable questions (2018)
Godin, F., Kumar, A., Mittal, A.: Learning when not to answer: a ternary reward structure for reinforcement learning based question answering. In: NAACL-HLT (2019)
Huang, K., Tang, Y., Huang, J., He, X., Zhou, B.: Relation module for non-answerable predictions on reading comprehension. In: CoNLL (2019)
Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. In: ACL (2017)
Dunn, M., Sagun, L., Higgins, M., Guney, V.U., Cirik, V., Cho, K.: Searchqa: a new qa dataset augmented with context from a search engine (2017)
Su, L., Guo, J., Fan, Y., Lan, Y., Cheng, X.: Controlling risk of web question answering. In: SIGIR (2019)
Sun, J., Carlsson, L., Ahlberg, E., et al.: Applying mondrian cross-conformal prediction to estimate prediction confidence on large imbalanced bioactivity data sets. J. Chem. Inf. Model. 57(7), 1591–1598 (2017)
Card, D., Zhang, M., Smith, N.A.: Deep weighted averaging classifiers. In: Proceedings of the Conference on Fairness, Accountability, and Transparency (2019)
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: ICML (2016)
Liu, F., Moffat, A., Baldwin, T., Zhang, X.: Quit while ahead: Evaluating truncated rankings. In: SIGIR (2016)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30, 195–215 (1998). https://doi.org/10.1023/A:1007452223027
Acknowledgement
Shiwei Zhang is supported by the RMIT University and CSIRO Data61 Scholarships.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, S., Zhang, X., Lau, J.H., Chan, J., Paris, C. (2021). Less Is More: Rejecting Unreliable Reviews for Product Question Answering. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12459. Springer, Cham. https://doi.org/10.1007/978-3-030-67664-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-67664-3_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67663-6
Online ISBN: 978-3-030-67664-3
eBook Packages: Computer ScienceComputer Science (R0)