LLMs are a Trap: Beyond the Hype

Mar 6

6 min read

There is nothing more hyped today than AI, or more specifically GenAI, or to be even more specific, Large Language Models (LLMs). Many people talk about LLMs as the next great technological leap and, while they are impressive and get better every day, it’s all hype in terms of how people overstate their capabilities. There is nothing more worrying to me than when folks start describing LLMs as having agency in the digital world - an AI that can do your shopping, develop your next app, run your business, etc. That kind of Agentic AI is simply not going to be built on LLM models. It’s not just hype – it’s a trap!

The trap is while LLMs provide a fantastic user experience, the moment you need true agency in the digital world, the inherent limitations of transformer architecture rear their ugly head. In technology, as in life, there is a golden rule: use the right tool for the right job. Panacea solutions that claim to do everything always come at the cost of not doing the job as well as a purpose-built system. This is especially true for a Generalist AI (GenAI – see what I did there!) like LLMs as they have a paradoxical duality: while capable of generating remarkably human-like text, they simultaneously have structural constraints that force outputs into statistically reinforced patterns in their training data. This phenomenon, known by the cool name of Dragon Kings, is rooted in the mathematical architecture of neural networks. It manifests as Eigenvector Traps and Eigenvector Dominance – dominant directions in high-dimensional parameter space that act as gravitational wells for model outputs – and Mode Collapse, a failure to explore the full diversity of possible responses (https://towardsai.net/p/l/gan-mode-collapse-explanation).

These limitations are inherent in the fundamentals of transformer architectures, creating persistent biases that can’t be mitigated through conventional methods. Also, the eigenvector traps and mode collapse problem in LLMs is autoregressive, so as each token is generated based on the previous one, small errors compound over time leading to outputs that might be plausible but are logically inconsistent or outright wrong. Each successive token prediction narrows the context window with entropy decreasing exponentially across layers, so basically LLM’s lose context as the token count diminishes through use and gets progressively worse. Not great when you want your Agentic AI to be reliable, scalable and consistently accurate. This is not hyperbole or hype – it is proven fact. A few years ago there was novel way to directly optimise the representation space and mitigate the dimensional collapse through DirectCLR but there doesn't seem to much progress on this front. The most common approach used in LLM’s unfortunately is to fake it. Stochastic autoregressive decoding introduces randomness to give the illusion of variability, but this is still an illusion. The LLM is still sampling from the same eigenvector space and that is the inherent constraint baked into the architecture. Basically, it looks like it doesn't regress to the mean average in the output it gives you and it's very good at faking it with the new 'reasoning' models but unfortunately this limitation is fundamental in the architecture of how all LLMs work under the hood, no matter how much it's dressed up.

Great for Chatting, Not for Decision-Making

LLMs are designed primarily to generate text. They can parse vast amounts of data and carry on a conversation based on your prompt, much like someone who’s read every book about fixing cars can discuss the concepts in great detail. But when it comes to actually fixing your car, you’d prefer an actual mechanic with hands-on experience over a book-smart theorist. And don’t even get me started on the ‘code-generation’ of LLM’s – that’s a rant for another day!

The real problem arises when you start deploying GenAI not just as a user interface but as the decision-maker that executes actions. The problem isn’t just about occasional mistakes - the mathematical formulae and underlying architecture of LLMs is fundamentally unsuited to decision-making and execution. Specifically the softmax attention's rank collapse, weight matrix spectral decay of the eigenvalues, and entropy-reduced autoregression are the 3 largest culprits that work together to create the inherent limitations of tokenised LLMs.

So what does all of this mean? Well, if you’re expecting to use LLM’s for anything beyond having a generalist-level conversation, either personally or professionally, I have bad news and good news for you. The bad news is you can’t trust it once you get past the hype and expensive parlour tricks. The good news is there are alternatives to transformer-based LLMs that are better suited to the task. For Agentic AI, which is the buzzword of the year, this is where Neurosymbolic AI comes to the rescue. If you haven’t heard of this yet then read on...

Neurosymbolic AI: The Right Tool for Agentic AI

Neurosymbolic AI takes the best of deep neural networks (the real Intelligence in AI!) and mitigates the black-box limitations of it by applying symbolic logic - basically, it combines a superhuman ability to pattern-match data with the precision of rule-based reasoning. This creates architectures capable of domain-bound decision-making with mathematical certainty, bypassing the statistical guessing inherent to LLMs. Most commonly and crudely this is applied in Retrieval-Augmented Generation (RAG) architectures with vector-based knowledge graphs of structured data but this has limitations in being inflexible and over-fitting the data output for accuracy. Applying a hybrid approach of rule-based neurosymbolic systems to domain-aware RAG architectures achieves superior reliability in applications requiring structured reasoning and domain expertise, and this is what's fundamental to the decision making of Agentic AI. In fact, this is actually more akin to how the neurons in our brains process information and make decision (https://www.jneurosci.org/content/29/27/8675).

This change from the generalist approach of current LLMs is not easy though. That is why actual Machine Learning requires actual Data Science to curate the data and explicitly define entity relationships. This can be optimised through unsupervised learning but it’s still hard and resource intensive. Much easier to just throw some tokens into a LLM and let it figure it out. Right? As the saying goes – rubbish in, rubbish out!

So what’s the solution? Allow me to introduce you to Domain-Aware Neurosymbolic Agents (DANA). If you want consistency and accuracy, and let’s face it who doesn’t, then DANA is the best we have currently. You can still use LLMs to convert raw data into structured symbolic representations but then you need to iteratively apply domain-specific rules in formal logic using a forward-chaining engine like Markov-Chain/Monte-Carlo.

The distinct advantages of this approach are:

Deterministic Reasoning: Instead of guessing based on probability like LLMs do, Neurosymbolic AI uses explicit if-then rules to reach conclusions. Imagine having an AI with absolute certainty - no guessing, no hallucinations, just clear logic. This is essential when giving Agentic AI tasks that you need to rely on to be done properly.
Transparent and Auditable: With symbolic logic, every decision can be traced back to a set of rules, making the AI’s reasoning process open and explainable. Again, this is crucial when giving Agentic AI agency to take action. Neural network architectures were previously limited by the hidden parameter weights that led to outcomes that were not explainable so this is a real step forwards.
LLMs as Interfaces: LLMs still have immense value though, especially in a hybrid architecture. The LLM is used at the UX layer to translate natural language into a format that the symbolic AI engine can understand. This way, you get the best of both worlds: a friendly conversational interface and rock-solid decision-making under the hood.

Choosing the Right Tool for the Job

While LLMs have transformed how we interact with machines, offering a slick user experience and impressively creative text generation, when it comes to decision-making and having agency to perform tasks based on reason and logic, they are not the right tool for the job. The tech giants you all know and love are generating a lot of hype by creating larger and larger models and dressing up the inherent limitations of LLMs, but they all remain prisoners of their underlying architecture, forever oscillating between the same probabilistic outputs in their race to the bottom of LLMs. They are competing in a fool’s errand towards the much hyped AGI when we know LLMs are not the right path. The best path we have at the moment is Neurosymbolic AI and other next-generation architectures that evolve beyond the limitations of LLMs. By combining deterministic, rule-based reasoning with innovative approaches, we can build true AgenticAI systems that do more than just chat - they can actually reason, decide, and act with the reliability and transparency that the hype promises. The current hype of LLMs is smoke and mirrors when it comes to its use in Agentic AI systems, anyone that tells you different also has some snake oil to sell you too!

In the end, it’s all about using the right tool for the right job. LLMs are an excellent tool for generating conversation and being creative, even to the point that it all sounds very sophisticated to the layman but if you want an AI that develops your next app, runs your business or makes life-critical decisions, you need a system that goes beyond the hype – a true Agentic system built on robust, purpose-designed architectures with domain expertise that can be relied on.

At JP Global, we help businesses cut through the AI hype and make strategic, informed decisions about AI and beyond. If you’re looking to integrate AI solutions that actually deliver value, let’s talk.