This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Diogo de Lucena
Chief Scientist at AE Studio
Posts
Sorted by New
80
Mistral Large 2 (123B) seems to exhibit alignment faking
Ω
3mo
Ω
4
155
Reducing LLM deception at scale with self-other overlap fine-tuning
Ω
3mo
Ω
41
100
Science advances one funeral at a time
7mo
9
91
Self-prediction acts as an emergent regularizer
Ω
8mo
Ω
9
75
The case for a negative alignment tax
9mo
20
223
Self-Other Overlap: A Neglected Approach to AI Alignment
Ω
10mo
Ω
51
27
Video Intro to Guaranteed Safe AI
1y
0
67
AE Studio @ SXSW: We need more AI consciousness research (and further resources)
1y
8
Wikitag Contributions
Comments
Sorted by
Newest