[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3692070.3693468guideproceedingsArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Position: data-driven discovery with large generative models

Published: 21 July 2024 Publication History

Abstract

With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery--a paradigm encompassing the search and verification of hypotheses purely from a set of provided datasets, without the need for additional data collection or physical experiments. We first outline several desiderata for an ideal data-driven discovery system. Then, through DATAVOYAGER, a proof-of-concept utilizing GPT-4, we demonstrate how LGMs fulfill several of these desiderata--a feat previously unattainable--while also highlighting important limitations in the current system that open up opportunities for novel ML research. We contend that achieving accurate, reliable, and robust end-to-end discovery systems solely through the current capabilities of LGMs is challenging. We instead advocate for fail-proof tool integration, along with active user moderation through feedback mechanisms, to foster data-driven scientific discoveries with efficiency and reproducibility.

References

[1]
Achiam, O. J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., Bello, I., Berdine, J., Bernadett-Shapiro, G., Berner, C., Bogdonoff, L., Boiko, O., Boyd, M., Brakman, A.-L., Brockman, G., Brooks, T., Brundage, M., Button, K., Cai, T., Campbell, R., Cann, A., Carey, B., Carlson, C., Carmichael, R., Chan, B., Chang, C., Chantzis, F., Chen, D., Chen, S., Chen, R., Chen, J., Chen, M., Chess, B., Cho, C., Chu, C., Chung, H. W., Cummings, D., Currier, J., Dai, Y., Decareaux, C., Degry, T., Deutsch, N., Deville, D., Dhar, A., Dohan, D., Dowling, S., Dunning, S., Ecoffet, A., Eleti, A., Eloundou, T., Farhi, D., Fedus, L., Felix, N., Fishman, S. P., Forte, J., Fulford, I., Gao, L., Georges, E., Gibson, C., Goel, V., Gogineni, T., Goh, G., Gontijo-Lopes, R., Gordon, J., Grafstein, M., Gray, S., Greene, R., Gross, J., Gu, S. S., Guo, Y., Hallacy, C., Han, J., Harris, J., He, Y., Heaton, M., Heidecke, J., Hesse, C., Hickey, A., Hickey, W., Hoeschele, P., Houghton, B., Hsu, K., Hu, S., Hu, X., Huizinga, J., Jain, S., Jain, S., Jang, J., Jiang, A., Jiang, R., Jin, H., Jin, D., Jomoto, S., Jonn, B., Jun, H., Kaftan, T., Kaiser, L., Kamali, A., Kanitscheider, I., Keskar, N. S., Khan, T., Kilpatrick, L., Kim, J. W., Kim, C., Kim, Y., Kirchner, H., Kiros, J. R., Knight, M., Kokotajlo, D., Kondraciuk, L., Kondrich, A., Konstantinidis, A., Kosic, K., Krueger, G., Kuo, V., Lampe, M., Lan, I., Lee, T., Leike, J., Leung, J., Levy, D., Li, C. M., Lim, R., Lin, M., Lin, S., Litwin, M., Lopez, T., Lowe, R., Lue, P., Makanju, A. A., Malfacini, K., Manning, S., Markov, T., Markovski, Y., Martin, B., Mayer, K., Mayne, A., Mc-Grew, B., McKinney, S. M., McLeavey, C., McMillan, P., McNeil, J., Medina, D., Mehta, A., Menick, J., Metz, L., Mishchenko, A., Mishkin, P., Monaco, V., Morikawa, E., Mossing, D. P., Mu, T., Murati, M., Murk, O., M'ely, D., Nair, A., Nakano, R., Nayak, R., Neelakantan, A., Ngo, R., Noh, H., Long, O., O'Keefe, C., Pachocki, J. W., Paino, A., Palermo, J., Pantuliano, A., Parascandolo, G., Parish, J., Parparita, E., Passos, A., Pavlov, M., Peng, A., Perelman, A., de Avila Belbute Peres, F., Petrov, M., de Oliveira Pinto, H. P., Pokorny, M., Pokrass, M., Pong, V. H., Powell, T., Power, A., Power, B., Proehl, E., Puri, R., Radford, A., Rae, J., Ramesh, A., Raymond, C., Real, F., Rimbach, K., Ross, C., Rotsted, B., Roussez, H., Ryder, N., Saltarelli, M. D., Sanders, T., Santurkar, S., Sastry, G., Schmidt, H., Schnurr, D., Schulman, J., Selsam, D., Sheppard, K., Sherbakov, T., Shieh, J., Shoker, S., Shyam, P., Sidor, S., Sigler, E., Simens, M., Sitkin, J., Slama, K., Sohl, I., Sokolowsky, B. D., Song, Y., Staudacher, N., Such, F. P., Summers, N., Sutskever, I., Tang, J., Tezak, N. A., Thompson, M., Tillet, P., Tootoonchian, A., Tseng, E., Tuggle, P., Turley, N., Tworek, J., Uribe, J. F. C., Vallone, A., Vijayvergiya, A., Voss, C., Wainwright, C., Wang, J. J., Wang, A., Wang, B., Ward, J., Wei, J., Weinmann, C., Welihinda, A., Welinder, P., Weng, J., Weng, L., Wiethoff, M., Willner, D., Winter, C., Wolrich, S., Wong, H., Workman, L., Wu, S., Wu, J., Wu, M., Xiao, K., Xu, T., Yoo, S., Yu, K., Yuan, Q., Zaremba, W., Zellers, R., Zhang, C., Zhang, M., Zhao, S., Zheng, T., Zhuang, J., Zhuk, W., and Zoph, B. Gpt-4 technical report. 2023. URL https://api.semanticscholar.org/CorpusID:257532815.
[2]
Agarwal, D., Das, R., Khosla, S., and Gangadharaiah, R. Bring your own kg: Self-supervised program synthesis for zero-shot kgqa. ArXiv, abs/2311.07850, 2023. URL https://api.semanticscholar.org/CorpusID:265158071.
[3]
Agrawal, A., McHale, J., and Oettl, A. Artificial intelligence and scientific discovery: A model of prioritized search. SSRN Electronic Journal, 2023. URL https://api.semanticscholar.org/CorpusID:260906716.
[4]
Alexander, K. L., Riordan, C., Fennessey, J., and Pallas, A. M. Social background, academic resources, and college graduation: Recent evidence from the national longitudinal survey. American Journal of Education, 90(4): 315-333, 1982.
[5]
Anderson, C. The end of theory: The data deluge makes the scientific method obsolete. Wired magazine, 16(7): 16-07, 2008.
[6]
Bailis, P., Gan, E., Madden, S., Narayanan, D., Rong, K., and Suri, S. Macrobase: Prioritizing attention in fast data. In Proceedings of the 2017 ACM International Conference on Management of Data, pp. 541-556, 2017.
[7]
Bekhuis, T. Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy. Biomedical digital libraries, 3:1-7, 2006.
[8]
Bengio, Y., Louradour, J., Collobert, R., and Weston, J. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pp. 41-48, 2009.
[9]
Bianchini, S., Müller, M., and Pelletier, P. Artificial intelligence in science: An emerging general method of invention. Research Policy, 51(10):104604, 2022.
[10]
Boiko, D. A., MacKnight, R., Kline, B., and Gomes, G. Autonomous chemical research with large language models. Nature, 624:570 - 578, 2023. URL https://api.semanticscholar.org/CorpusID:266432059.
[11]
Bowers, S. and Ludäscher, B. An ontology-driven framework for data transformation in scientific workflows. In International Workshop on Data Integration in the Life Sciences, pp. 1-16. Springer, 2004.
[12]
Brickley, D., Burgess, M., and Noy, N. Google dataset search: Building a search engine for datasets in an open web ecosystem. The World Wide Web Conference, 2019. URL https://api.semanticscholar.org/CorpusID:86688027.
[13]
Burda, Y., Edwards, H., Storkey, A. J., and Klimov, O. Exploration by random network distillation. ArXiv, abs/1810.12894, 2018. URL https://api.semanticscholar.org/CorpusID:53115163.
[14]
Cai, T., Wang, X., Ma, T., Chen, X., and Zhou, D. Large language models as tool makers. ArXiv, abs/2305.17126, 2023. URL https://api.semanticscholar.org/CorpusID:258947222.
[15]
Caliskan, A., Bryson, J. J., and Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science, 356:183 - 186, 2016. URL https://api.semanticscholar.org/CorpusID:23163324.
[16]
Callison-Burch, C. Understanding generative artificial intelligence and its relationship to copyright. Testimony before The U.S. House of Representatives Judiciary Committee, Subcommittee on Courts, Intellectual Property, and the Internet, May 2023. Hearing on Artificial Intelligence and Intellectual Property: Part I - Interoperability of AI and Copyright Law.
[17]
Camerer, C., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., Isaksson, S., Manfredi, D., Rose, J., Wagenmakers, E., and Wu, H. Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nature Human Behaviour, 2:637 - 644, 2018. URL https://api.semanticscholar.org/CorpusID:52098703.
[18]
Cao, H., Dodge, J., Lo, K., McFarland, D. A., and Wang, L. L. The rise of open science: Tracking the evolution and perceived value of data and methods link-sharing practices. ArXiv, abs/2310.03193, 2023. URL https://api.semanticscholar.org/CorpusID:263671521.
[19]
Chakraborty, A., Banerjee, A., Dasgupta, S., Raturi, V., Soni, A., Gupta, A., Harsola, S., and Subrahmaniam, V. T. Navigator: A gen-ai system for discovery of factual and predictive insights on domain-specific tabular datasets. Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), 2024. URL https://api.semanticscholar.org/CorpusID:266743618.
[20]
Chang, J. C., Zhang, A. X., Bragg, J., Head, A., Lo, K., Downey, D., and Weld, D. S. Citesee: Augmenting citations in scientific papers with persistent and personalized historical context. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023. URL https://api.semanticscholar.org/CorpusID:256868353.
[21]
Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., and Schulman, J. Training verifiers to solve math word problems. ArXiv, abs/2110.14168, 2021. URL https://api.semanticscholar.org/CorpusID:239998651.
[22]
Collaboration, O. S. Reproducibility project: Psychology, 2015.
[23]
Dalvi, B., Jansen, P. A., Tafjord, O., Xie, Z., Smith, H., Pipatanangkura, L., and Clark, P. Explaining answers with entailment trees. In Conference on Empirical Methods in Natural Language Processing, 2021. URL https://api.semanticscholar. org/CorpusID:233297051.
[24]
Dunn, O. J. Multiple comparisons among means. Journal of the American Statistical Association, 56:52-64, 1961. URL https://api.semanticscholar.org/CorpusID:122009246.
[25]
Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., Grosse, R., McCandlish, S., Kaplan, J., Amodei, D., Wattenberg, M., and Olah, C. Toy models of superposition. Transformer Circuits Thread, 2022. https://transformercircuits.pub/2022/toy_model/index.html.
[26]
Elliott, K. C., Cheruvelil, K. S., Montgomery, G. M., and Soranno, P. A. Conceptions of good science in our data-rich world. BioScience, 66(10):880-889, 2016.
[27]
Ellis, K., Wong, C., Nye, M., Sablé-Meyer, M., Cary, L., Morales, L., Hewitt, L., Solar-Lezama, A., and Tenenbaum, J. B. Dreamcoder: growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. Philosophical Transactions of the Royal Society A, 381, 2020. URL https://api.semanticscholar.org/CorpusID:219687434.
[28]
Eysenbach, B., Gupta, A., Ibarz, J., and Levine, S. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SJx63jRqFm.
[29]
Fanelli, D. Opinion: Is science really facing a reproducibility crisis, and do we need it to? Proceedings of the National Academy of Sciences, 115:2628 - 2631, 2018. URL https://api.semanticscholar.org/CorpusID:4639856.
[30]
Farhadi, A., Atkinson, D., Callison-Burch, C., DeCario, N., Dumas, J., Lo, K., and Soldiani, L. AI2's Response to the US Copyright Requence for Comments on Artificial Intelligence and Copyright. US Copyright Office Docket No. 2023-6, 2023. Comment.
[31]
Feng, S., Park, C. Y., Liu, Y., and Tsvetkov, Y. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair nlp models. ArXiv, abs/2305.08283, 2023. URL https://api.semanticscholar.org/CorpusID:258686693.
[32]
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., and Hutter, F. Efficient and robust automated machine learning. Advances in neural information processing systems, 28, 2015.
[33]
Foster, A. and Ford, N. Serendipity and information seeking: an empirical study. Journal of documentation, 59(3):321-340, 2003.
[34]
Gil, Y., McWeeney, S. K., and Mason, C. E. Using semantic workflows to disseminate best practices and accelerate discoveries in multi-omic data analysis. In AAAI Conference on Artificial Intelligence, 2013. URL https://api.semanticscholar.org/CorpusID:15583030.
[35]
Gil, Y., Garijo, D., Ratnakar, V., Mayani, R., Adusumilli, R., Boyce, H., Srivastava, A., and Mallick, P. Towards continuous scientific data analysis and hypothesis evolution. In AAAI Conference on Artificial Intelligence, 2017. URL https://api.semanticscholar.org/CorpusID:11269287.
[36]
Gil, Y., Khider, D., Osorio, M., Ratnakar, V., Vargas, H., and Garijo, D. Towards capturing scientific reasoning to automate data analysis. 2022. URL https://api.semanticscholar.org/CorpusID:248914202.
[37]
Google. Introducing duet ai for google workspace. https://workspace.google.com/blog/product-announcements/duet-ai, 2023. Accessed: 2024-02-18.
[38]
Groh, M., Sankaranarayanan, A., Singh, N., Kim, D. Y., Lippman, A., and Picard, R.W. Human detection of political speech deepfakes across transcripts, audio, and video. 2022. URL https://api.semanticscholar.org/CorpusID:259342907.
[39]
Groth, O., Wulfmeier, M., Vezzani, G., Dasagi, V., Hertweck, T., Hafner, R., Heess, N. M. O., and Riedmiller, M. A. Is curiosity all you need? on the utility of emergent behaviours from curious exploration. ArXiv, abs/2109.08603, 2021. URL https://api.semanticscholar.org/CorpusID:237563118.
[40]
Gu, K., Grunde-McLaughlin, M., McNutt, A. M., Heer, J., and Althoff, T. How do data analysts respond to ai assistance? a wizard-of-oz study. ArXiv, abs/2309.10108, 2023. URL https://api.semanticscholar.org/CorpusID:262054482.
[41]
Hassabis, D. Using ai to accelerate scientific discovery, 2002. URL https://www.youtube.com/watch?v=jocWJiztxYA&ab_channel=InstituteforEthicsinAIOxford.
[42]
Heaven, W. D. Why meta's latest large language model survived only three days online. MIT Technology Review. Last accessed December, 15:2022, 2022.
[43]
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. X., and Steinhardt, J. Measuring massive multitask language understanding. ArXiv, abs/2009.03300, 2020. URL https://api.semanticscholar.org/CorpusID:221516475.
[44]
Hennig, P. and Schuler, C. J. Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13(6), 2012.
[45]
Hoffmann, C. G., Kiladis, G. N., Gehne, M., and von Savigny, C. A python package to calculate the olr-based index of the madden-julian-oscillation (omi) in climate science and weather forecasting. Journal of Open Research Software, 2021. URL https://api.semanticscholar.org/CorpusID:236586655.
[46]
Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J. C., and Wang, H. Large language models for software engineering: A systematic literature review. ArXiv, abs/2308.10620, 2023. URL https://api.semanticscholar.org/CorpusID:261048648.
[47]
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., and Abbeel, P. Vime: Variational information maximizing exploration. Advances in neural information processing systems, 29, 2016.
[48]
Huang, Q., Vora, J., Liang, P., and Leskovec, J. Benchmarking large language models as ai research agents. ArXiv, abs/2310.03302, 2023. URL https://api.semanticscholar.org/CorpusID:263671541.
[49]
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583-589, 2021.
[50]
Kahneman, D. Thinking, fast and slow. macmillan, 2011.
[51]
Kambhampati, S., Valmeekam, K., Guan, L., Stechly, K., Verma, M., Bhambri, S., Saldyt, L., and Murthy, A. Llms can't plan, but can help planning in llm-modulo frameworks. arXiv preprint arXiv:2402.01817, 2024.
[52]
Kang, D., Emmons, J., Abuzaid, F., Bailis, P. D., and Zaharia, M. A. Noscope: Optimizing deep cnn-based queries over video streams at scale. Proc. VLDB Endow., 10:1586-1597, 2017. URL https://api.semanticscholar.org/CorpusID:20732104.
[53]
Khot, T., Trivedi, H., Finlayson, M., Fu, Y., Richardson, K., Clark, P., and Sabharwal, A. Decomposed prompting: A modular approach for solving complex tasks. ArXiv, abs/2210.02406, 2022. URL https://api.semanticscholar.org/CorpusID:252715485.
[54]
Korthauer, K., Kimes, P. K., Duvallet, C., Reyes, A., Subramanian, A., Teng, M., Shukla, C., Alm, E. J., and Hicks, S. C. A practical guide to methods controlling false discoveries in computational biology. Genome Biology, 20, 2019.
[55]
Korthauer, K. D., Kimes, P. K., Duvallet, C., Reyes, A., Subramanian, A., Teng, M., Shukla, C. J., Alm, E. J., and Hicks, S. C. A practical guide to methods controlling false discoveries in computational biology. Genome Biology, 20, 2018. URL https://api.semanticscholar.org/CorpusID:91264977.
[56]
Lahiri, S. K., Naik, A., Sakkas, G., Choudhury, P., von Veh, C., Musuvathi, M., Inala, J. P., Wang, C., and Gao, J. Interactive code generation via test-driven user-intent formalization. ArXiv, abs/2208.05950, 2022. URL https://api.semanticscholar.org/CorpusID:251492970.
[57]
Langley, P. Bacon: A production system that discovers empirical laws. In International Joint Conference on Artificial Intelligence, 1977. URL https://api.semanticscholar.org/CorpusID:2320342.
[58]
Langley, P. Data-driven discovery of physical laws. Cogn. Sci., 5:31-54, 1981. URL https://api.semanticscholar.org/CorpusID:39694251.
[59]
Langley, P., Bradshaw, G. L., and Simon, H. A. Rediscovering chemistry with the bacon system. 1983. URL https://api.semanticscholar.org/CorpusID:118714327.
[60]
Langley, P., Zytkow, J. M., Simon, H. A., and Bradshaw, G. L. The search for regularity: Four aspects of scientific discovery. 1984. URL https://api.semanticscholar.org/CorpusID:3155192.
[61]
Lau, J. W., Lehnert, E., Sethi, A., Malhotra, R., Kaushik, G., Onder, Z., Groves-Kirkby, N., Mihajlovic, A., DiGiovanna, J., Srdic, M., et al. The cancer genomics cloud: collaborative, reproducible, and democratized--a new paradigm in large-scale computational research. Cancer research, 77(21):e3-e6, 2017.
[62]
LeCun, Y. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62(1), 2022.
[63]
Leeman, J., Liu, Y., Stiles, J., Lee, S., Bhatt, P., Schoop, L., and Palgrave, R. Challenges in high-throughput inorganic material prediction and autonomous synthesis. 2024.
[64]
Li, Y., Choi, D. H., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., Tom, Eccles, Keeling, J., Gimeno, F., Lago, A. D., Hubert, T., Choy, P., de, C., d'Autume, M., Babuschkin, I., Chen, X., Huang, P.-S., Welbl, J., Gowal, S., Alexey, Cherepanov, Molloy, J., Mankowitz, D. J., Robson, E. S., Kohli, P., de, N., Freitas, Kavukcuoglu, K., and Vinyals, O. Competition-level code generation with alphacode. Science, 378:1092 - 1097, 2022. URL https://api.semanticscholar.org/CorpusID:246527904.
[65]
Lin, B. Y., Wang, S. I., Lin, X. V., Jia, R., Xiao, L., Ren, X., and tau Yih, W. On continual model refinement in out-of-distribution data streams. In Annual Meeting of the Association for Computational Linguistics, 2022. URL https://api.semanticscholar.org/CorpusID:248512744.
[66]
Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual instruction tuning. ArXiv, abs/2304.08485, 2023a. URL https://api.semanticscholar.org/CorpusID:258179774.
[67]
Liu, J., Xia, C., Wang, Y., and Zhang, L. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. ArXiv, abs/2305.01210, 2023b. URL https://api.semanticscholar.org/CorpusID:258437095.
[68]
Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Gu, Y., Ding, H., Men, K., Yang, K., Zhang, S., Deng, X., Zeng, A., Du, Z., Zhang, C., Shen, S., Zhang, T., Su, Y., Sun, H., Huang, M., Dong, Y., and Tang, J. Agent-bench: Evaluating llms as agents. ArXiv, abs/2308.03688, 2023c. URL https://api.semanticscholar.org/CorpusID:260682249.
[69]
Lu, P., Bansal, H., Xia, T., Liu, J., yue Li, C., Hajishirzi, H., Cheng, H., Chang, K.-W., Galley, M., and Gao, J. Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts. 2023. URL https://api.semanticscholar.org/CorpusID:264491155.
[70]
Lukasczyk, S. and Fraser, G. Pynguin: Automated unit test generation for python. 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 168-172, 2022. URL https://api.semanticscholar.org/CorpusID:246706202.
[71]
Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Welleck, S., Majumder, B. P., Gupta, S., Yazdanbakhsh, A., and Clark, P. Self-refine: Iterative refinement with self-feedback. ArXiv, abs/2303.17651, 2023. URL https://api.semanticscholar.org/CorpusID:257900871.
[72]
Magnusson, I. H., Smith, N. A., and Dodge, J. Reproducibility in nlp: What have we learned from the checklist? In Annual Meeting of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar.org/CorpusID:259187997.
[73]
Majumder, B. P., Rao, S., Galley, M., and McAuley, J. Ask what's missing and what's useful: Improving clarification question generation using global knowledge. In North American Chapter of the Association for Computational Linguistics, 2021. URL https://api.semanticscholar.org/CorpusID:233231257.
[74]
Majumder, B. P., Jhamtani, H., Berg-Kirkpatrick, T., and McAuley, J. Achieving conversational goals with unsupervised post-hoc knowledge injection. ArXiv, abs/2203.11399, 2022. URL https://api.semanticscholar.org/CorpusID:247547046.
[75]
Majumder, B. P., Dalvi, B., Jansen, P., Tafjord, O., Tandon, N., Zhang, L., Callison-Burch, C., and Clark, P. Clin: A continually learning language agent for rapid task adaptation and generalization. ArXiv, abs/2310.10134, 2023. URL https://api.semanticscholar.org/CorpusID:264146262.
[76]
Monroy, D. Introducing copilot support for python in excel: Advanced data analysis using natural language. https://techcommunity.microsoft.com/t5/excel-blog/introducing-copilot-support-for-python-in-excel-advanced-data/ba-p/3928120, 2023. Accessed: 2024-02-18.
[77]
Newell, A. and Simon, H. A. Computer science as empirical inquiry: symbols and search. Commun. ACM, 19(3):113-126, mar 1976. ISSN 0001-0782.
[78]
Oudeyer, P.-Y. and Kaplan, F. How can we define intrinsic motivation? 2008. URL https://api.semanticscholar.org/CorpusID:14217330.
[79]
Oudeyer, P.-Y., Kaplan, F., and Hafner, V. V. Intrinsic motivation systems for autonomous mental development. IEEE transactions on evolutionary computation, 11(2): 265-286, 2007.
[80]
Palani, S., Naik, A., Downey, D., Zhang, A. X., Bragg, J., and Chang, J. C. Relatedly: Scaffolding literature reviews with existing related work sections. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023. URL https://api.semanticscholar.org/CorpusID:256846632.
[81]
Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pp. 2778-2787. PMLR, 2017.
[82]
Pelrine, K., Taufeeque, M., Zajkac, M., McLean, E., and Gleave, A. Exploiting novel gpt-4 apis. ArXiv, abs/2312.14302, 2023. URL https://api.semanticscholar.org/CorpusID:266521205.
[83]
Perlitz, Y., Sheinwald, D., Slonim, N., and Shmueli-Scheuer, M. nbiig: A neural bi insights generation system for table reporting. In AAAI Conference on Artificial Intelligence, 2022. URL https://api.semanticscholar.org/CorpusID:253397856.
[84]
Prasad, A., Koller, A., Hartmann, M., Clark, P., Sabharwal, A., Bansal, M., and Khot, T. Adapt: As-needed decomposition and planning with language models. ArXiv, abs/2311.05772, 2023. URL https://api.semanticscholar.org/CorpusID:265128575.
[85]
Qiu, L., Jiang, L., Lu, X., Sclar, M., Pyatkin, V., Bhagavatula, C., Wang, B., Kim, Y., Choi, Y., Dziri, N., and Ren, X. Phenomenal yet puzzling: Testing inductive reasoning capabilities of language models with hypothesis refinement. ArXiv, abs/2310.08559, 2023. URL https://api.semanticscholar.org/CorpusID:263909078.
[86]
Ramakrishnan, N. and Grama, A. Y. Data mining: From serendipity to science. Computer, 32(8):34-37, 1999.
[87]
Ristoski, P. and Paulheim, H. Semantic web in data mining and knowledge discovery: A comprehensive survey. J. Web Semant., 36:1-22, 2016. URL https://api.semanticscholar.org/CorpusID:42846121.
[88]
Romera-Paredes, B., Barekatain, M., Novikov, A., Balog, M., Kumar, M. P., Dupont, E., Ruiz, F. J. R., Ellenberg, J. S., Wang, P., Fawzi, O., Kohli, P., Fawzi, A., Grochow, J., Lodi, A., Mouret, J.-B., Ringer, T., and Yu, T. Mathematical discoveries from program search with large language models. Nature, 625:468 - 475, 2023. URL https://api.semanticscholar.org/CorpusID:266223700.
[89]
Rotolo, F., Paoletti, X., and Michiels, S. surrosurv: An r package for the evaluation of failure time surrogate end-points in individual patient data meta-analyses of randomized clinical trials. Computer methods and programs in biomedicine, 155:189-198, 2018. URL https://api.semanticscholar.org/CorpusID:3480478.
[90]
Santos, M., Clemente, F., and Abshire, C. Pandas-profiling now supports apache spark. https://www.databricks.com/blog/2023/04/03/pandas-profiling-now-supports-apache-spark.html, 2023. Accessed: 2024-02-18.
[91]
Schäfer, M., Nadi, S., Eghbali, A., and Tip, F. An empirical evaluation of using large language models for automated unit test generation. IEEE Transactions on Software Engineering, 50:85-105, 2023. URL https://api.semanticscholar.org/CorpusID:256827098.
[92]
Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., and Scialom, T. Toolformer: Language models can teach themselves to use tools. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=Yacmpz84TH.
[93]
Sharma, A., Li, X., Guan, H., Sun, G., Zhang, L., Wang, L., Wu, K., Cao, L., Zhu, E., Sim, A., Wu, T., and Zou, J. Automatic data transformation using large language model - an experimental study on building energy data. 2023 IEEE International Conference on Big Data (BigData), pp. 1824-1834, 2023. URL https://api.semanticscholar.org/CorpusID:261530167.
[94]
Shinn, N., Cassano, F., Labash, B., Gopinath, A., Narasimhan, K., and Yao, S. Reflexion: Language agents with verbal reinforcement learning. 2023. URL https://api.semanticscholar.org/CorpusID:258833055.
[95]
Smith, P. K., Bogin, B., and Bishai, D. Are time preference and body mass index associated?: Evidence from the national longitudinal survey of youth. Economics & Human Biology, 3(2):259-270, 2005.
[96]
Stanley, K. O., Lehman, J., and Soros, L. Open-endedness: The last grand challenge you've never heard of. While open-endedness could be a force for discovering intelligence, it could also be a component of AI itself, 2017.
[97]
Sun, R., Arik, S. Ö., Nakhost, H., Dai, H., Sinha, R., Yin, P., and Pfister, T. Sql-palm: Improved large language model adaptation for text-to-sql. ArXiv, abs/2306.00739, 2023. URL https://api.semanticscholar.org/CorpusID:258999853.
[98]
Swanson, D. R. Undiscovered public knowledge. The Library Quarterly, 56:103-118, 1986. URL https://api.semanticscholar.org/CorpusID:144270735.
[99]
Taleb, N. N. The Black Swan: The Impact of the Highly Improbable. Random House Group, 2007. ISBN 1400063515.
[100]
Touvron, H., Martin, L., Stone, K. R., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D. M., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A. S., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I. M., Korenev, A. V., Koura, P. S., Lachaux, M.-A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E. M., Subramanian, R., Tan, X., Tang, B., Taylor, R., Williams, A., Kuan, J. X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., and Scialom, T. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288, 2023. URL https://api.semanticscholar.org/CorpusID:259950998.
[101]
Trinh, T. H., Wu, Y., Le, Q. V., He, H., and Luong, T. Solving olympiad geometry without human demonstrations. Nature, 625(7995):476-482, 2024.
[102]
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., and Bouchard, G. Complex embeddings for simple link prediction. ArXiv, abs/1606.06357, 2016. URL https://api.semanticscholar.org/CorpusID:15150247.
[103]
Vaithilingam, P., Zhang, T., and Glassman, E. L. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. CHI Conference on Human Factors in Computing Systems Extended Abstracts, 2022. URL https://api.semanticscholar.org/CorpusID:247255943.
[104]
Valmeekam, K., Olmo, A., Sreedharan, S., and Kambhampati, S. Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. 2022. URL https://api.semanticscholar.org/CorpusID:249889477.
[105]
Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L. J., and Anandkumar, A. Voyager: An open-ended embodied agent with large language models. ArXiv, abs/2305.16291, 2023a. URL https://api.semanticscholar.org/CorpusID:258887849.
[106]
Wang, Q., Downey, D., Ji, H., and Hope, T. Learning to generate novel scientific directions with contextualized literature-based discovery. ArXiv, abs/2305.14259, 2023b. URL https://api.semanticscholar.org/CorpusID:258841365.
[107]
Wasserstein, R. and Lazar, N. A. The asa statement on p-values: Context, process, and purpose. The American Statistician, 70:129 - 133, 2016. URL https://api.semanticscholar.org/CorpusID:124084622.
[108]
Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., and Le, Q. V. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=gEZrGCozdqR.
[109]
Weiss, S. T. and Ware, J. H. Overview of issues in the longitudinal analysis of respiratory data. American journal of respiratory and critical care medicine, 154 6 Pt 2:S208-11, 1996. URL https://api.semanticscholar.org/CorpusID:45049299.
[110]
Weitzman, M. Optimal search for the best alternative. Econometrica, 47:641-654, 1978. URL https://api.semanticscholar.org/CorpusID:32530881.
[111]
Whitley, L. D. Fundamental principles of deception in genetic search. In Foundations of genetic algorithms, volume 1, pp. 221-241. Elsevier, 1991.
[112]
Wolf, Y., Wies, N., Levine, Y., and Shashua, A. Fundamental limitations of alignment in large language models. ArXiv, abs/2304.11082, 2023. URL https://api.semanticscholar.org/CorpusID:258291526.
[113]
Zaw, K., Hamilton, D., and Darity, W. A. J. Race, wealth and incarceration: Results from the national longitudinal survey of youth. Race and Social Problems, 8:103-115, 2016. URL https://api.semanticscholar.org/CorpusID:13709779.
[114]
Zhang, J., Lehman, J., Stanley, K., and Clune, J. Omni: Open-endedness via models of human notions of interestingness. arXiv preprint arXiv:2306.01711, 2023.
[115]
Zhou, X., Zhu, H., Mathur, L., Zhang, R., Yu, H., Qi, Z., Morency, L.-P., Bisk, Y., Fried, D., Neubig, G., and Sap, M. Sotopia: Interactive evaluation for social intelligence in language agents. ArXiv, abs/2310.11667, 2023. URL https://api.semanticscholar.org/CorpusID:264289186.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'24: Proceedings of the 41st International Conference on Machine Learning
July 2024
63010 pages

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media