5 Comments
⭠ Return to thread

Thanks!

The two qualities (intelligence and objective) are orthogonal. So on the one hand an AGI's problem-solving capabilities may be poor, reasonable, satisfactory, excellent, super-human etc, while on the other the final goal towards which it strives (via its problem-solving abilities) may be anything at all, and therefore aligned or mis-aligned with human preferences to any degree.

ChatGPT's "core" goal (i.e. GPT-3's core goal) is simply to generate a statistical continuation (relative to its training corpus) of the input prompt (where the input prompt represents the *user's* goal -- which is of course not necessarily aligned with the best long-term interests of mankind). This combined "GPT-3 + user"'s goal, however, is then modified/contained to some extent by the layer of RLHF that has been applied by OpenAI as a crude attempt at mitigating the combined "GPT-3 + user"'s worst instincts.

Expand full comment

"final goals towards which it strives"

What does it mean for an inanimate, non-sentient, agency-free, non-volitional entity to "strive", or to have "goals"? This whole "alignment" thing strikes me as people letting their anthropomorphizing get the upper hand over their reasoning

Expand full comment

I know, it seems ridiculous. How can a mindless automaton such as a CPU exhibit such high-level behaviour as "striving towards a goal"? Nevertheless, it is plausibly possible to construct many layers of increasingly complex software-implemented behaviour until the system in question genuinely does "strive towards a goal". Please see e.g. (a) https://cambridgeaisocial.org/assets/Slow_down__A_tutorial_on_and_summary_of_the_current_state_of_AI.pdf, followed by (b) https://bigmother.ai/resources/The_BigMother_Manifesto___summary-LATEST-DRAFT.pdf (apologies, the latter is unfinished, but the earlier sections should answer your question).

Expand full comment

I'll take a look at these, but I think you'll agree there's still an AWFUL lot of human agency required to get from where we are now to any such "striving-capable" system. Let's stop trying to blame computers for human failings

Expand full comment

AGI is *hard*. Implementing a high-quality AGI (i.e. one that is maximally safe, maximally benevolent, and maximally intelligent) is infinitely harder than implementing a low-quality AGI (such as ChatGPT - yes, in my definition, ChatGPT qualifies as any AGI, just a very low quality one).

I wholeheartedly agree that an awful lot of human agency as you call it is required to design and build high-quality AGI. It has taken me almost 40 years simply to *design* a cognitive architecture for an AGI that (I believe) plausibly ticks all the boxes. Even if fully funded, I estimate that it would take 50-100 years to fully and properly implement my design (AGI is a cathedral project - I don't expect to live long enough to see my design actually built). But if you want AGI that is actually safe and benevolent, that's how long (IMO) it will take.

I'm not really trying to blame computers. If anything, it's humans (and human nature) that are the greatest obstacle to the development of maximally benevolent AGI (tribalism, short-term self-interest, etc). To all intents and purposes, present day humans are not aligned with future humans.

Expand full comment