· home
xent — a transparent path to AGI
a transparent path to improve the cognitive abilities of language models, toward general intelligence
the idea: cognitive training — a model discovers relevant new skills by creating tasks for itself, with a principled way to measure which tasks are worth creating
done right, this turns a language model into a self-improving system that stays stable and competitive at the same time
what is xent's mission?
build a principled, stable, self-improving system that teaches itself new skills — leading to a generally capable system
the task is to make an environment of environments: the game is to create a game that is useful for a model
how? three ingredients: a space of tasks (the xent games), an algorithm to train on them (frost), and a derived — not designed — way to evaluate the quality of a task (the meta-objective)
what is our thesis?
cognitive training is the formalization of what it means to acquire relevant new skills
scaling it up leads to the emergence of AGI: models teach themselves new skills from within — no external environments needed — and keep improving
they improve in a balanced, organic, competitive way, while keeping a fixed meta-objective — leaving less room for undesirable surprises
what is cognitive training?
  1. realize there is implicit knowledge — models do not know their own probabilities
  2. formulate games on top of that knowledge: cross-entropy games
  3. train models on those games with frost algorithms, enhancing their capabilities
  4. define a meta-game: the game is to create cross-entropy games
  5. play the meta-game
from a sufficiently strong model, the process leads to automatic skill discovery
how is cognitive training implemented?
three ingredients, each doing one job:
  1. a space of tasks — the xent games — built on the implicit knowledge of language models
  2. a meta-objective on that space — unique, by symmetry arguments — that measures a task's relevance
  3. frost — a fast reinforcement-learning algorithm, designed for xent games, to train on them
how is cognitive training special?
xent games are rich enough to overlap with interesting tasks, yet structured enough that cognitive training is computationally tractable
relevant skill discovery is singled out as the quintessential game models ought to learn: the game of creating games
the meta-objective is fixed a priori: models grow in capability while being unable to rewrite or alter what counts as progress
what is the internal value of a game?
the question: can a model trained on games judge for itself the value of a new game?
informally, the internal value measures how well a game balances relevance to old games with new skill discovery
the remarkable result: there is essentially one consistent expression for that value — the meta-objective is derived, not designed
what about the meta-game?
cognitive training optimizes a meta-objective over the space of xent games — playing a move of the meta-game means creating a xent game
the reward for creating a game is its transfer value: the external value is what benchmarks measure; the internal value is the key novelty — it admits a principled derivation, rather than being chosen by hand
surprisingly, there is only one meta-game, up to two hyperparameters
why don't we have AGI yet?
models learn — at a spectacular level — the tasks they are trained on, but stay very weak on some others: they are far more uneven in their abilities than humans
equivalently: they generalize less well than humans outside the training points
so, what is the weak point of model training? our answer: models only ever train on tasks that are handed to them — they never acquire the skill of finding relevant new tasks
cognitive training targets exactly that
some questions about AGI
why don't we have AGI yet? — models are very uneven: spectacular on the tasks they are trained on, weak elsewhere
what does it mean to get there? — to close that gap: abilities that generalize evenly, the way human abilities do
what is xent's path? — cognitive training, scaled up: models that discover and master relevant new skills by themselves
where are we? — the ingredients are in place: the xent games, the frost algorithm, and the meta-objective
can this be stable? — the meta-objective is fixed a priori: capabilities grow, the goal cannot drift
what are examples of implicit-knowledge questions?
questions a model can pose — and score — using its own probabilities:
counterfactual — what piece of information would change one's view of things?
interestingness — does a piece of information change our view of something?
in-filling — is there a plausible sequence of steps leading from A to B?
originality — given local plausibility, what is the most surprising way to end a story?
synthesis — given a family of texts, do ideas emerge that are in none of them?
an example of implicit knowledge
imagine an experiment with two copies of yourself: one that receives a piece of information, one that doesn't
if you could compare how both copies fare in the world afterwards, you'd have a good idea of the value of that information
this is impossible for humans — life is lived once — but for models it can be done at will
learn to play this game well, and you learn to gauge the value of any information — e.g. what difference does it make to read an article (say, the cognitive-training paper)?
related concepts · esc menu
swipe ← to follow the thread