I guess it’s annoying to have several such journals at the top of rankings lists. Similarly to how if you look up a list of premier league footballers with the highest goals per game, the list will normally be restricted to players who’ve played a certain number of games.

Reply

How could I tell someone that consciousness is not the primary concern of AI Safety?

mattmacdermott4d30

I think that quite often when people say ‘consciousness’ in contexts like this, and especially when they say ‘sentience’, they mean something more like self-awareness than phenomenal consciousness.

Probably they are also not tracking the distinction very carefully, or thinking very deeply about any of this. But still, thinking the problem is ‘will AIs become self-aware?’ is not quite as silly as thinking it is ‘will the AIs develop phenomenal consciousness?’ and I think it’s the former that causes them to say these things.

Reply

Viliam's Shortform

mattmacdermott4d30

Impact factor is some average citation count, so without a minimum you could game it by waiting for one big hit paper.

Reply

johnswentworth's Shortform

mattmacdermott14d42

October 2023 I believe

Reply

Gemini Diffusion: watch this space

mattmacdermott22d72

You can train transformers as diffusion models (example paper), and that’s presumably what Gemini diffusion is.

Reply

peterbarnett's Shortform

mattmacdermott26d30

It does seem likely that this is less legible by default, although we'd need to look at complete examples of how the sequence changes across time to get a clear sense. Unfortunately I can't see any in the paper.

Reply

peterbarnett's Shortform

mattmacdermott1mo*51

Wait, is it obvious that they are worse for faithfulness than the normal CoT approach?

Yes, the outputs are no longer produced sequentially over the sequence dimension, so we definitely don't have causal faithfulness along the sequence dimension.

But the outputs are produced sequentially along the time dimension. Plausibly by monitoring along that dimension we can get a window into the model's thinking.

What's more, I think you actually guarantee strict causal faithfulness in the diffusion case, unlike in the normal autoregressive paradigm. The reason being that in the normal paradigm there is a causal path from the inputs to the final outputs via the model's activations which doesn't go through the intermediate produced tokens. Whereas (if I understand correctly)^[1] with diffusion models we throw away the model's activations after each timestep and just feed in the token sequence. So the only way for the model's thinking at earlier timesteps to affect later timesteps is via the token sequence. Any of the standard measures of chain-of-thought faithfulness that look like corrupting the intermediate reasoning steps and seeing whether the final answer changes will in fact say that the diffusion model is faithful (if applied along the time dimension).

Of course, this causal notion of faithfulness is necessary but not sufficient for what we really care about, since the intermediate outputs could still fail to make the model's thinking comprehensible to us. Is there strong reason to think the diffusion setup is worse in that sense?

ETA: maybe “diffusion models + paraphrasing the sequence after each timestep” is promising for getting both causal faithfulness and non-encoded reasoning? By default this probably breaks the diffusion model, but perhaps it could be made to work.

^{^}
Based on skimming this paper and assuming it's representative.

Reply

Mikhail Samin's Shortform

mattmacdermott1mo42

An interesting development in the time since your shortform was written is that we can now try these ideas out without too much effort via Manifold.

Anyone know of any examples?

Reply

1

Mikhail Samin's Shortform

mattmacdermott1mo142

I think the first question to think about is how to use them to make CDT decisions. You can create a market about a causal effect if you have control over the decision and you can randomise it to break any correlations with the rest of the world, assuming the fact that you’re going to randomise it doesn’t otherwise affect the outcome (or bettors don’t think it will).

Committing to doing that does render the market useless for choosing policy, but you could randomly decide whether to randomise or to make the decision via whatever the process you actually want to use, and have the market be conditional on the former. You probably don’t want to be randomising your policy decisions too often, but if liquidity wasn’t an issue you could set the probability of randomisation arbitrarily low.

Then FDT… I dunno, seems hard.

Reply

1