[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Subjective databases

Published: 01 July 2019 Publication History

Abstract

Online users are constantly seeking experiences, such as a hotel with clean rooms and a lively bar, or a restaurant for a romantic rendezvous. However, e-commerce search engines only support queries involving objective attributes such as location, price, and cuisine, and any experiential data is relegated to text reviews.
In order to support experiential queries, a database system needs to model subjective data. Users should be able to pose queries that specify subjective experiences using their own words, in addition to conditions on the usual objective attributes. This paper introduces OpineDB, a subjective database system that addresses these challenges. We introduce a data model for subjective databases. We describe how OpineDB translates subjective queries against the subjective database schema, which is done by matching the user query phrases to the underlying schema. We also show how the experiential conditions specified by the user can be combined and the results aggregated and ranked. We demonstrate that subjective databases satisfy user needs more effectively and accurately than alternative techniques through experiments with real data of hotel and restaurant reviews.

References

[1]
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large-scale machine learning. In OSDI, pages 265--283, 2016.
[2]
I. Androutsopoulos, G. D. Ritchie, and P. Thanisch. Natural language interfaces to databases - an introduction. Natural Language Engineering, 1(1):29--81, 1995.
[3]
L. Aroyo and C. Welty. Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1):15--24, 2015.
[4]
R. A. Baeza-Yates. Bias on the web. Commun. ACM, 61(6):54--61, 2018.
[5]
S. Bird and E. Loper. Nltk: the natural language toolkit. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, page 31, 2004.
[6]
S. Brody and N. Elhadad. An unsupervised aspect-sentiment model for online reviews. In NAACL HLT, pages 804--812, 2010.
[7]
M. Buhrmester, T. Kwang, and S. D. Gosling. Amazon's mechanical turk: A new source of inexpensive, yet high-quality, data? Perspectives on psychological science, 6(1):3--5, 2011.
[8]
B. Chen, B. An, L. Sun, and X. Han. Semi-supervised lexicon learning for wide-coverage semantic parsing. In COLING, pages 892--904, 2018.
[9]
D. M. Christopher, R. Prabhakar, and S. Hinrich. Introduction to information retrieval. An Introduction To Information Retrieval, 151(177):5, 2008.
[10]
A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes. Supervised learning of universal sentence representations from natural language inference data. In EMNLP, pages 670--680, 2017.
[11]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[12]
S. Evensen, A. Feng, A. Halevy, J. Li, V. Li, Y. Li, H. Liu, G. Mihaila, J. Morales, N. Nuno, E. Pavlovic, W.-C. Tan, and X. Wang. Voyageur: An experiential travel search engine. In The World Wide Web Conference, WWW '19, pages 3511--5, 2019.
[13]
R. Fagin. Combining fuzzy information from multiple systems. In PODS, pages 216--226. ACM, 1996.
[14]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of computer and system sciences, 66(4):614--656, 2003.
[15]
E. Fast, B. Chen, and M. S. Bernstein. Empath: Understanding Topic Signals in Large-Scale Text. In CHI, pages 4647--4657, 2016.
[16]
K. Ganesan and C. Zhai. Opinion-based entity ranking. Information retrieval, 15(2):116--150, 2012.
[17]
GitHub. BERT-BiLSMT-CRF-NER. https://github.com/macanv/BERT-BiLSTM-CRF-NER, 2018.
[18]
GitHub. OpineDB. https://github.com/rit-git/opinedb_public, 2018.
[19]
C. Gormley and Z. Tong. Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. "O'Reilly Media, Inc.", 2015.
[20]
W. L. Hamilton, K. Clark, J. Leskovec, and D. Jurafsky. Inducing domain-specific sentiment lexicons from unlabeled corpora. In EMNLP, pages 595--605, 2016.
[21]
R. He, W. S. Lee, H. T. Ng, and D. Dahlmeier. An unsupervised neural attention model for aspect extraction. In ACL, pages 388--397, 2017.
[22]
M. Hu and B. Liu. Mining and summarizing customer reviews. In SIGKDD, pages 168--177, 2004.
[23]
M. Hu and B. Liu. Mining opinion features in customer reviews. In AAAI, pages 755--760, 2004.
[24]
I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR), 40(4):11, 2008.
[25]
S. Iyer, I. Konstas, A. Cheung, J. Krishnamurthy, and L. Zettlemoyer. Learning a neural semantic parser from user feedback. In ACL, pages 963--973, 2017.
[26]
R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip-thought vectors. In NIPS, pages 3294--3302, 2015.
[27]
E. P. Klement, R. Mesiar, and E. Pap. Book review:" triangular norms". International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 11(02):257--259, 2003.
[28]
G. Klir and B. Yuan. Fuzzy sets and fuzzy logic, volume 4. Prentice hall New Jersey, 1995.
[29]
F. Li and H. V. Jagadish. Understanding natural language queries over relational databases. SIGMOD Record, 45(1):6--13, 2016.
[30]
Y. Li, A. Feng, J. Li, S. Mumick, A. Y. Halevy, V. Li, and W. Tan. Subjective databases. arXiv preprint arXiv:1902.09661, 2019.
[31]
B. Liu. Sentiment Analysis and Opinion Mining. Morgan Claypool, 2012.
[32]
C. Makris and P. Panagopoulos. Improving opinion-based entity ranking. In WEBIST, pages 223--230, 2014.
[33]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[34]
M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, A.-S. Mohammad, M. Al-Ayyoub, Y. Zhao, B. Qin, O. De Clercq, et al. Semeval-2016 task 5: Aspect based sentiment analysis. In SemEval-2016, pages 19--30, 2016.
[35]
M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, and I. Androutsopoulos. Semeval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 486--495, 2015.
[36]
M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar. Semeval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 27--35, 2014.
[37]
H. Poon. Grounded unsupervised semantic parsing. In ACL, pages 933--943, 2013.
[38]
A. Popescu, O. Etzioni, and H. A. Kautz. Towards a theory of natural language interfaces to databases. In IUI, pages 149--157, 2003.
[39]
G. Qiu, B. Liu, J. Bu, and C. Chen. Opinion word expansion and target extraction through double propagation. COLING, 37(1):9--27, 2011.
[40]
R. Rehřek and P. Sojka. Gensim-statistical semantics in python. 2011.
[41]
S. Rothe, S. Ebert, and H. Schütze. Ultradense word embeddings by orthogonal transformation. In NAACL HLT, pages 767--777, 2016.
[42]
E. F. Sang and F. De Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050, 2003.
[43]
M. Stonebraker and L. A. Rowe. The design of Postgres, volume 15. ACM, 1986.
[44]
D. Suciu, D. Olteanu, C. Ré, and C. Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.
[45]
Y. Tai and H. Kao. Automatic domain-specific sentiment lexicon generation with label propagation. In IIWAS, page 53, 2013.
[46]
The Booking.com Dataset. https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe.
[47]
The Yelp Dataset. https://www.yelp.com/dataset.
[48]
I. Trummer, A. Y. Halevy, H. Lee, S. Sarawagi, and R. Gupta. Mining subjective properties on the web. In SIGMOD, pages 1745--1760, 2015.
[49]
I. S. Vicente, R. Agerri, and G. Rigau. Simple, Robust and (almost) Unsupervised Generation of Polarity Lexicons for Multiple Languages. In EACL, pages 88--97, 2014.
[50]
W. Wang, S. J. Pan, D. Dahlmeier, and X. Xiao. Recursive neural conditional random fields for aspect-based sentiment analysis. In EMNLP, pages 616--626, 2016.
[51]
W. Wang, S. J. Pan, D. Dahlmeier, and X. Xiao. Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In AAAI, pages 3316--3322, 2017.
[52]
T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In HLT/EMNLP, pages 347--354, 2005.
[53]
H. Xin, R. Meng, and L. Chen. Subjective knowledge base construction powered by crowdsourcing and knowledge base. In SIGMOD, pages 1349--1361. ACM, 2018.
[54]
X. Yan, J. Guo, Y. Lan, and X. Cheng. A biterm topic model for short texts. In WWW, pages 1445--1456, 2013.
[55]
L. A. Zadeh. Fuzzy logic= computing with words. IEEE transactions on fuzzy systems, 4(2):103--111, 1996.
[56]
L. Zhang, S. Wang, and B. Liu. Deep learning for sentiment analysis: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 8(4), 2018.
[57]
V. Zhong, C. Xiong, and R. Socher. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103, 2017.

Cited By

View all
  • (2023)Uncovering Synergy and Dysergy in Consumer ReviewsManagement Science10.1287/mnsc.2022.444369:4(2339-2360)Online publication date: 1-Apr-2023
  • (2022)Guided Text-based Item ExplorationProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557141(3410-3420)Online publication date: 17-Oct-2022
  • (2022)It’s the Same Old Story! Enriching Event-Centric Knowledge Graphs by Narrative AspectsProceedings of the 14th ACM Web Science Conference 202210.1145/3501247.3531565(34-43)Online publication date: 26-Jun-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 11
July 2019
543 pages

Publisher

VLDB Endowment

Publication History

Published: 01 July 2019
Published in PVLDB Volume 12, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Uncovering Synergy and Dysergy in Consumer ReviewsManagement Science10.1287/mnsc.2022.444369:4(2339-2360)Online publication date: 1-Apr-2023
  • (2022)Guided Text-based Item ExplorationProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557141(3410-3420)Online publication date: 17-Oct-2022
  • (2022)It’s the Same Old Story! Enriching Event-Centric Knowledge Graphs by Narrative AspectsProceedings of the 14th ACM Web Science Conference 202210.1145/3501247.3531565(34-43)Online publication date: 26-Jun-2022
  • (2022)A distantly supervised approach for enriching product graphs with user opinionsJournal of Intelligent Information Systems10.1007/s10844-022-00717-559:2(435-454)Online publication date: 1-Oct-2022
  • (2021)Exploring Ratings in Subjective DatabasesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457259(62-75)Online publication date: 9-Jun-2021
  • (2021)Constructing Explainable Opinion Graphs from ReviewsProceedings of the Web Conference 202110.1145/3442381.3450081(3419-3431)Online publication date: 19-Apr-2021
  • (2020)ExtremeReader: An interactive explorer for customizable and explainable review summarizationCompanion Proceedings of the Web Conference 202010.1145/3366424.3383535(176-180)Online publication date: 20-Apr-2020
  • (2020)Snippext: Semi-supervised Opinion Mining with Augmented DataProceedings of The Web Conference 202010.1145/3366423.3380144(617-628)Online publication date: 20-Apr-2020
  • (2020)Teddy: A System for Interactive Review AnalysisProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376235(1-13)Online publication date: 21-Apr-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media