LESSWRONG
LW

Diogo de Lucena
649000
Message
Dialogue
Subscribe

Chief Scientist at AE Studio

Posts

Sorted by New
80Mistral Large 2 (123B) seems to exhibit alignment faking
Ω
3mo
Ω
4
155Reducing LLM deception at scale with self-other overlap fine-tuning
Ω
3mo
Ω
41
100Science advances one funeral at a time
7mo
9
91Self-prediction acts as an emergent regularizer
Ω
8mo
Ω
9
75The case for a negative alignment tax
9mo
20
223Self-Other Overlap: A Neglected Approach to AI Alignment
Ω
10mo
Ω
51
27Video Intro to Guaranteed Safe AI
1y
0
67AE Studio @ SXSW: We need more AI consciousness research (and further resources)
1y
8

Wikitag Contributions

No wikitag contributions to display.

Comments

Sorted by
Newest
No Comments Found