RESEARCHER

Sanchit Ram Arvind

@sanchitram

UC Berkeley

just a fun projects kinda guy

causal inferencemachine learningdata science

Papers

Trade-Only Reinforcement Learning for Asymmetric-Information Catan: An Intermediate Negative Result

We study a modular reinforcement-learning architecture for domestic trade in Settlers of Catan under asymmetric information. Instead of learning full-game play from scratch, the agent learns only when to offer, accept, reject, confirm, or cancel player-to-player trades, while a fixed heuristic backbone controls all non-trade actions. This design isolates whether a learned negotiation module can improve a stable extensive-form game policy. Using the open-source Catanatron simulator, we benchmark four trade-policy variants across two training lengths and multiple random seeds against no-trade and untrained-trade baselines. The learned agents consistently acquire stable trade behavior, making roughly 19 to 20 offers per game and completing about 1 to 2 trades per game in standard evaluations. However, these behaviors do not translate into reliable gains in overall match win rate. Across the broad sweep, trained policies fail to outperform the untrained trade module on average, and performance against stronger opponents remains weak even when trade activity increases. A targeted follow-up with stronger training opponents and longer runs also fails to reverse this pattern. The current evidence therefore supports a negative intermediate finding: a trade-only RL head can learn to negotiate frequently without learning to negotiate profitably. We argue that belief modeling over hidden hands and tighter coupling between trade decisions and downstream planning are likely necessary for trade learning to improve full-game performance.

Sanchit Ram Arvind·Apr 30, 2026