Source Themes

Towards the Scalable Evaluation of Cooperativeness in Language Models

It is likely that AI systems driven by pre-trained language models (PLMs) will increasingly be used to assist humans in high-stakes interactions with other agents, such as negotiation or conflict resolution. Consistent with the goals of Cooperative …

Characterizing Manipulation from AI Systems

Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans …

Reclaiming the Digital Commons: A Public Data Trust for Training Data

Democratization of AI means not only that people can freely use AI, but also that people can collectively decide how AI is to be used. In particular, collective decision-making power is required to redress the negative externalities from the …

Harms from Increasingly Agentic Algorithmic Systems

Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate …

Scoring Rules for Performative Binary Prediction

We construct a model of expert prediction where predictions can influence the state of the world. Under this model, we show through theoretical and numerical results that proper scoring rules can incentivize experts to manipulate the world with their …

Loss of Control: "Normal Accidents" and AI Systems

A thread in recent work on the social impacts of AI systems is whether certain properties of a domain should preclude the application of such systems to begin with. Incorporating sociological work on accidents, I analyze two such properties: …

The Limits of Global Inclusion in AI Development

Those best-positioned to profit from the proliferation of artificial intelligence (AI) systems are those with the most economic power. Extant global inequality has motivated Western institutions to involve more diverse groups in the development and …

Inverse Policy Evaluation for Value-based Sequential Decision-making

Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., $Q$-learning), and acting greedily with respect to the estimates with …

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Recurrent neural networks (RNNs) allow an agent to construct a state-representation from a stream of experience, which is essential in partially observable problems. However, there are two primary issues one must overcome when training an RNN: the …

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a fixed number of future time steps. To learn the value function for horizon h, these …