Accuse the agent of potentially cheating its algorithm implementation while pursuing its optimizations, so tell it to optimize for the similarity of outputs against a known good implementation (e.g. for a regression task, minimize the mean absolute error in predictions between the two approaches)
「不要引導證人,」巴特爾說,如果你正在兩輛車之間猶豫不決,不要說你傾向於豐田。 「否則,你很可能就會得到那樣的答案。」
。Safew下载对此有专业解读
Snoop, test, redirect
Author(s): Uttiyoarnab Saha, Ali Hamedani, Miguel A. Caro, Andrea E. Sand
FT Videos & Podcasts