default search action
Paul F. Christiano
Person information
- affiliation: OpenAI, USA
- affiliation (PhD 2017): University of California, Berkeley, CA, USA
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [i27]Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam S. Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul F. Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez:
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. CoRR abs/2401.05566 (2024) - [i26]Paul F. Christiano, Jacob Hilton, Victor Lecomte, Mark Xu:
Backdoor defense, learnability and obfuscation. CoRR abs/2409.03077 (2024) - [i25]Paul F. Christiano, Jacob Hilton, Andrea Lincoln, Eric Neyman, Mark Xu:
Towards a Law of Iterated Expectations for Heuristic Estimators. CoRR abs/2410.01290 (2024) - 2023
- [i24]Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul F. Christiano, Allan Dafoe:
Model evaluation for extreme risks. CoRR abs/2305.15324 (2023) - 2022
- [c11]Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe:
Training language models to follow instructions with human feedback. NeurIPS 2022 - [i23]Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe:
Training language models to follow instructions with human feedback. CoRR abs/2203.02155 (2022) - [i22]Paul F. Christiano, Eric Neyman, Mark Xu:
Formalizing the presumption of independence. CoRR abs/2211.06738 (2022) - 2021
- [j2]Zvika Brakerski, Paul F. Christiano, Urmila Mahadev, Umesh V. Vazirani, Thomas Vidick:
A Cryptographic Test of Quantumness and Certifiable Randomness from a Single Quantum Device. J. ACM 68(5): 31:1-31:47 (2021) - [i21]Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul F. Christiano:
Recursively Summarizing Books with Human Feedback. CoRR abs/2109.10862 (2021) - 2020
- [c10]Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul F. Christiano:
Learning to summarize with human feedback. NeurIPS 2020 - [i20]Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul F. Christiano:
Learning to summarize from human feedback. CoRR abs/2009.01325 (2020)
2010 – 2019
- 2019
- [i19]Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul F. Christiano, Geoffrey Irving:
Fine-Tuning Language Models from Human Preferences. CoRR abs/1909.08593 (2019) - 2018
- [c9]Zvika Brakerski, Paul F. Christiano, Urmila Mahadev, Umesh V. Vazirani, Thomas Vidick:
A Cryptographic Test of Quantumness and Certifiable Randomness from a Single Quantum Device. FOCS 2018: 320-331 - [i18]Zvika Brakerski, Paul F. Christiano, Urmila Mahadev, Umesh V. Vazirani, Thomas Vidick:
Certifiable Randomness from a Single Quantum Device. CoRR abs/1804.00640 (2018) - [i17]Geoffrey Irving, Paul F. Christiano, Dario Amodei:
AI safety via debate. CoRR abs/1805.00899 (2018) - [i16]Tom B. Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul F. Christiano, Ian J. Goodfellow:
Unrestricted Adversarial Examples. CoRR abs/1809.08352 (2018) - [i15]Paul F. Christiano, Buck Shlegeris, Dario Amodei:
Supervising strong learners by amplifying weak experts. CoRR abs/1810.08575 (2018) - 2017
- [b1]Paul Francis Christiano:
Manipulation-resistant online learning. University of California, Berkeley, USA, 2017 - [c8]Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei:
Deep Reinforcement Learning from Human Preferences. NIPS 2017: 4299-4307 - [i14]Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei:
Deep reinforcement learning from human preferences. CoRR abs/1706.03741 (2017) - 2016
- [c7]Paul F. Christiano:
Provably manipulation-resistant reputation systems. COLT 2016: 670-697 - [i13]Paul F. Christiano:
Robust Collaborative Online Learning. CoRR abs/1603.06265 (2016) - [i12]Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermüller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul F. Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron C. Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Melanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian J. Goodfellow, Matthew Graham, Çaglar Gülçehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrançois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Joseph Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph P. Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang:
Theano: A Python framework for fast computation of mathematical expressions. CoRR abs/1605.02688 (2016) - [i11]Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, Dan Mané:
Concrete Problems in AI Safety. CoRR abs/1606.06565 (2016) - [i10]Paul F. Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba:
Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model. CoRR abs/1610.03518 (2016) - [i9]Chelsea Finn, Paul F. Christiano, Pieter Abbeel, Sergey Levine:
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models. CoRR abs/1611.03852 (2016) - 2015
- [c6]Benja Fallenstein, Jessica Taylor, Paul F. Christiano:
Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence. LORI 2015: 411-415 - [i8]Benja Fallenstein, Jessica Taylor, Paul F. Christiano:
Reflective Oracles: A Foundation for Classical Game Theory. CoRR abs/1508.04145 (2015) - 2014
- [c5]Paul F. Christiano:
Open Problem: Online Local Learning. COLT 2014: 1290-1294 - [c4]Paul F. Christiano:
Online local learning via semidefinite programming. STOC 2014: 468-474 - [i7]Mihály Bárász, Paul F. Christiano, Benja Fallenstein, Marcello Herreshoff, Patrick LaVictoire, Eliezer Yudkowsky:
Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic. CoRR abs/1401.5577 (2014) - [i6]Paul F. Christiano:
Online Local Learning via Semidefinite Programming. CoRR abs/1403.5287 (2014) - [i5]Paul F. Christiano:
Provably Manipulation-Resistant Reputation Systems. CoRR abs/1411.1127 (2014) - 2013
- [j1]Scott Aaronson, Paul F. Christiano:
Quantum Money from Hidden Subspaces. Theory Comput. 9: 349-401 (2013) - 2012
- [c3]Scott Aaronson, Paul F. Christiano:
Quantum money from hidden subspaces. STOC 2012: 41-60 - [i4]Scott Aaronson, Paul F. Christiano:
Quantum Money from Hidden Subspaces. CoRR abs/1203.4740 (2012) - [i3]Scott Aaronson, Paul F. Christiano:
Quantum Money from Hidden Subspaces. Electron. Colloquium Comput. Complex. TR12 (2012) - [i2]Scott Aaronson, Paul F. Christiano:
Quantum Money from Hidden Subspaces. IACR Cryptol. ePrint Arch. 2012: 171 (2012) - 2011
- [c2]Paul F. Christiano, Jonathan A. Kelner, Aleksander Madry, Daniel A. Spielman, Shang-Hua Teng:
Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs. STOC 2011: 273-282 - [c1]Paul F. Christiano, Erik D. Demaine, Shaunak Kishore:
Lossless Fault-Tolerant Data Structures with Additive Overhead. WADS 2011: 243-254 - 2010
- [i1]Paul F. Christiano, Jonathan A. Kelner, Aleksander Madry, Daniel A. Spielman, Shang-Hua Teng:
Electrical Flows, Laplacian Systems, and Faster Approximation of Maximum Flow in Undirected Graphs. CoRR abs/1010.2921 (2010)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-11 21:29 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint