Der Term EAI ist längst schon wieder tot und es gibt keinen wirklichen Ansatz um LLMs zu verstehen.
Es ist eigentlich alles sehr genau beschrieben in den zwei Videos die ich verlinkt habe. Hier der Link zu den Research Papern.
Anthropic's latest interpretability research: a new microscope to understand Claude's internal mechanisms
www.anthropic.com
„ Language models like Claude aren't programmed directly by humans—instead, they‘re trained on large amounts of data. During that training process, they learn their own strategies to solve problems. These strategies are encoded in the billions of computations a model performs for every word it writes. They arrive inscrutable to us, the model’s developers. This means that we don’t understand how models do most of the things they do. „
Jetzt haben sie im Prinzip ganz simple Anfragen gestellt und diese versucht nachzuverfolgen, bei einem einfacheren Model. Sie kommen zu dem Schluss:
At the same time, we recognize the limitations of our current approach. Even on short, simple prompts, our method only captures a fraction of the total computation performed by Claude, and the mechanisms we do see may have some artifacts based on our tools which don't reflect what is going on in the underlying model.