I'm looking for:
Recently viewed
What AI Still Cannot Do: An Honest Assessment From a London Team That Has Built 50+ Integrations - Softomate Solutions blog

AI AUTOMATION

What AI Still Cannot Do: An Honest Assessment From a London Team That Has Built 50+ Integrations

7 June 202624 min readBy Softomate Solutions

After building more than 50 AI integrations for London businesses, the honest answer is that AI still cannot reason, take accountability, or be trusted on facts without a human checking its work. Current models hallucinate at rates between 3% and 19% on frontier systems, and as high as 52% on hard tasks, which means they fabricate confident, false answers often enough to make unsupervised use risky in any setting where a mistake costs money or trust. AI cannot reliably automate inconsistent human processes, understand intent, generalise across unfamiliar domains, or connect cleanly to legacy systems that were never designed for it. In the UK, only around 16% of businesses use AI under a strict definition, and roughly 77% of adopters report no immediate revenue change. The genuine value is time saved, not magic. AI is a powerful assistant with sharp limits, not an autonomous replacement for skilled judgement.

Last updated: June 2026

Why Does AI Still Make Things Up With Total Confidence?

AI makes things up because large language models do not store facts, they predict the most statistically likely next word based on patterns in their training data. When the model has no reliable pattern to draw on, it does not stop and say "I do not know". It generates a plausible-sounding answer anyway, and it delivers that answer with exactly the same tone of authority it uses for things it gets right. This is what the industry calls hallucination, and after 50-plus builds we can tell you it is the single most underestimated risk in business AI.

The numbers are sobering. Independent benchmarks put hallucination rates on frontier models between 3.1% and 19.1%, and broader testing across harder tasks pushes that range to anywhere from 15% to 52%. That means on a tough question, a leading model can be wrong roughly half the time while sounding completely certain. Our view is blunt: anyone selling you "fully autonomous AI" for a task where accuracy matters either does not understand this number or is hoping you do not.

Hallucination is not a bug that a future update will quietly remove. It is a mathematical property of how these systems work. A model trained to always produce an answer will sometimes produce a wrong one, because producing nothing is not in its design. You can reduce the rate with techniques like retrieval-augmented generation, grounding the model in your own verified documents, and tight prompt design, but you cannot drive it to zero.

Here is how the risk changes by task type, based on what we have seen in production:

Task typeTypical hallucination riskSafe to run unsupervised?
Drafting a marketing emailLow impact even if wrongOften yes
Summarising a long documentMedium, may invent detailOnly with human review
Answering a factual customer queryHigh if ungroundedNo, needs grounded data
Quoting prices or legal termsVery high, fabricates figuresNever
Calculating tax or financial figuresSevere, confidently wrongNever, use deterministic code

The practical lesson is that you must match the task to the failure cost. A hallucinated subject line wastes nothing. A hallucinated VAT figure in a customer quote can cost you a client and a complaint. We design every system around that question first, before we write a single line of integration code. If you want a chatbot that answers customer questions safely, the only honest approach is to ground it in your verified content, which is exactly how we build our AI chatbot development service in London.

Can AI Actually Reason, Or Does It Just Predict Patterns?

AI does not reason in the human sense, it pattern-matches at enormous scale, and the difference matters more than the marketing admits. A model can pass a law exam and then fail to apply a single clause correctly to a real client situation it has not seen before. It has no understanding of what it is saying, no model of cause and effect, and no awareness that it might be wrong. It is task-bound: brilliant inside the boundary of patterns it has absorbed, and unreliable the moment a problem requires genuine cross-domain transfer.

This shows up constantly in real projects. Ask a model to handle a standard refund request and it performs beautifully. Change one variable, say a customer paid partly in store credit and partly by card across two transactions, and the model often produces an answer that is internally fluent but operationally wrong. A human agent reasons from intent: what is the customer actually owed, and what does fairness and policy require? The model reaches for the nearest pattern, which is not the same thing.

Our honest stance is that the word "intelligence" oversells what is happening. These systems are extraordinary statistical engines, not thinking agents. They have no goals, no self-awareness, and no ability to know when they have left the territory they understand. That last point is the dangerous one. A junior employee who is unsure will usually flag it. The model never flags it, because it does not know.

The capabilities split roughly like this:

  • Strong: language fluency, summarisation, translation, drafting, classification, code generation against known patterns, and extracting structured data from messy text.
  • Weak: novel reasoning, multi-step logic with real-world stakes, understanding implicit intent, and recognising the edge of its own competence.
  • Absent: genuine understanding, accountability, common sense outside its training distribution, and the ability to be held responsible for an outcome.

Because of this, we treat AI as a fast, tireless junior assistant that needs direction and review, never as a senior decision-maker. The systems that work best in production keep a human firmly in the loop for anything requiring judgement. That principle underpins everything in our business process automation work in London, where the goal is to remove repetitive effort, not to remove the people who exercise judgement.

Why Does Bad Data Defeat Even the Best AI Model?

AI is only as good as the data it can see, and most UK businesses have data that is fragmented, inconsistent, and trapped in formats AI cannot use cleanly. This is the quiet reason so many AI pilots stall before they reach production. The model is not the bottleneck. The customer records spread across three spreadsheets, the contracts stored as scanned PDFs, the product details that live only in someone's head, and the CRM where half the fields are blank, those are the bottleneck.

We have seen this on nearly every engagement. A company wants an AI assistant that answers staff questions about policy, pricing, or process. The technology is ready in an afternoon. Then we discover the policies contradict each other across departments, the pricing sheet is six months out of date, and three different documents give three different answers to the same question. The AI will faithfully reflect that chaos back, often blending contradictory sources into one confidently wrong answer. Garbage in, confident garbage out.

There is also the governance dimension. Under UK GDPR and the Data Protection Act 2018, you remain responsible for how personal data is processed, including when an AI system touches it. Feeding customer data into a model without understanding where it goes, how long it is retained, and whether it trains a third party's system is a real compliance risk, not a theoretical one. The Information Commissioner's Office has been clear that data protection law applies fully to AI.

Here is the honest before-and-after we see when data is sorted out first:

ConditionAI deployed on messy dataAI deployed on cleaned, grounded data
Answer accuracyInconsistent, often contradictoryReliable within defined scope
Trust from staffCollapses after first bad answerGrows with use
Maintenance effortConstant firefightingPeriodic content updates
Compliance exposureHigh, unclear data flowsControlled and documented

The honest rule we give every client is this: spend the first phase of any AI project on data, not on the model. Consolidate the sources of truth, fix the contradictions, and decide what the AI is allowed to see. Skip that, and you are not building an assistant, you are building an amplifier for your existing data problems. A well-structured central system makes this far easier, which is one reason we often recommend a custom CRM build in London as the foundation before any AI layer goes on top.

Why Can't AI Just Plug Into My Existing Business Systems?

AI cannot plug straight into most business systems because those systems were built years before anyone designed for AI, and they rarely expose clean, modern ways to connect. Vendors love a demo where the AI "just connects" to everything. In the real world, that connection is the hardest and most expensive part of the project, and it is where the slick demo and the working production system part ways.

The problem is structural. Older accounting packages, bespoke databases, and on-premise line-of-business software often lack proper APIs, the standardised connectors that let modern tools talk to each other. Where APIs do exist, they are frequently incomplete, undocumented, or rate-limited in ways that make real-time AII impractical. We have spent more hours wiring AI into a twelve-year-old system than we have spent on the AI itself, and that is the norm, not the exception.

Then there is the integration sprawl. A typical SME runs a website, a CRM, an email platform, an accounting tool, a booking system, and a handful of spreadsheets, none of which were designed to share data. An AI agent that needs to read from all six and write back to three is not a quick plug-in, it is a custom integration project with error handling, security, and data mapping at every join. Skip the error handling, and the first time one system is briefly offline, your "autonomous" AI silently fails and nobody notices until a customer complains.

The realistic integration effort looks like this:

  1. Discovery: map every system, what data it holds, and how it can be accessed. Usually reveals two systems nobody mentioned.
  2. Connectivity: build or configure the connections, often working around missing APIs with middleware.
  3. Data mapping: reconcile different field names, formats, and identifiers between systems.
  4. Error handling: decide what happens when a system is down, a record is malformed, or a call fails.
  5. Security and permissions: ensure the AI can only see and change what it should.

This is unglamorous engineering, and it is exactly why vendor agents that promise to "connect to anything" so often disappoint. They handle the happy path and fall over on the real one. Our position is that integration is where projects are won or lost, which is why we treat it as core engineering rather than an afterthought, both in our GoHighLevel automation services and in broader custom software development for clients whose systems will not bend to off-the-shelf tools.

What Kind of Work Can AI Never Automate Reliably?

AI cannot reliably automate any process that humans perform differently each time, because automation needs a consistent, repeatable pattern and inconsistency gives it nothing to learn. This is the limitation that surprises business owners most. They assume the blocker is technical complexity. Often the real blocker is that the task they want to automate has never actually been standardised, it just lives in the heads of experienced staff who improvise sensibly every time.

Working on something like this? Let’s talk it through.

Think about how a skilled account manager handles a difficult client. They read tone, recall history, weigh the relationship against the policy, and decide when to bend a rule and when to hold firm. None of that follows a fixed script. It is judgement applied to context, and it changes case by case. You cannot automate a process that has no stable shape, and trying to force one produces rigid, tone-deaf results that damage the very relationships they were meant to help.

Our honest experience is that roughly a third of the things clients first ask us to automate should not be automated at all, at least not until the underlying process is documented and standardised. The other path, automating the genuinely repeatable parts and leaving the judgement to people, is almost always the better return. The win is hybrid, not total replacement.

Here is how we classify work in the first workshop:

Process characteristicAutomation suitabilityRecommended approach
High volume, identical every timeExcellentFull automation
Repeatable with occasional exceptionsGoodAutomate with human escalation
Varies by context and judgementPoorAI assists, human decides
Different every single timeUnsuitableKeep human, standardise first
High emotional or relationship stakesUnsuitableKeep human

The pattern is clear. The further left you sit on that table, the more AI delivers. The further right, the more it costs you in rework, frustration, and damaged trust. We push back hard, and politely, when a client asks us to automate something from the bottom rows, because saying yes would be doing them a disservice. The UK statistics back this up: while around 54% of firms touch AI in some form, only about 11% of SMEs automate operations extensively, which tells you most of the easy, consistent wins are narrower than the hype suggests.

Is AI Safe to Use for Legal, Financial or Healthcare Work?

AI is not safe to use unsupervised for legal, financial, or healthcare work, because in these domains a confident error is not an inconvenience, it is a liability with regulatory and human consequences. We will say this plainly because too few vendors will: if a wrong answer can lead to a fine, a misdiagnosis, a mis-sold product, or a court problem, AI must operate strictly as an assistant under human review, never as the decision-maker.

The regulatory picture in the UK reinforces this. There is no single UK AI Act. Instead the UK has taken a pro-innovation, sector-led approach, with existing regulators such as the Information Commissioner's Office and the Financial Conduct Authority applying their rules to AI within their domains. Crucially, UK GDPR and the Data Protection Act 2018 already restrict solely automated decisions that have a legal or similarly significant effect on a person. In plain terms, you generally cannot let an algorithm make a high-stakes decision about someone with no human involvement and no route to challenge it. If your business serves EU customers, the EU AI Act adds further obligations with extraterritorial reach.

Beyond the law, there is the simple matter of accountability. AI cannot be struck off, fined, or held responsible. When something goes wrong, the regulator looks at you, not the model. That alone should settle the question of who signs off the final decision.

Where AI does add real value in these sectors, used carefully:

  • Drafting: first-pass documents, letters, and summaries that a qualified human then reviews and approves.
  • Research support: surfacing relevant material faster, with the human verifying every source before relying on it.
  • Triage and routing: sorting and prioritising incoming work so people spend time where it matters.
  • Admin reduction: automating the genuinely clerical tasks that surround the regulated work.

Be sceptical of any provider offering "AI that makes the decision" in a regulated field. The responsible design keeps a qualified professional accountable for every output that carries weight. We build to that standard by default, and we will tell a client no rather than ship something that exposes them. If you operate in a regulated sector and want automation that respects those boundaries, our AI automation agency in London designs systems with human sign-off built into the workflow, not bolted on afterwards.

Where Did We Drew the Line on Real Client Builds?

Across 50-plus integrations, the line we drew most often was simple: AI handles the volume, humans handle the judgement, and we never let the model take an irreversible action without a person confirming it. That single rule has prevented more problems than any clever piece of engineering. Below are real patterns from real builds, anonymised, that show where capability ends and supervision begins.

The booking assistant that almost double-charged. A client wanted an AI agent to take bookings and process deposits end to end. The model handled the conversation beautifully. But in testing, a network hiccup during payment caused the agent to retry and nearly charge a customer twice, because it had no real understanding that the first charge might have succeeded. We redesigned it so the AI gathers everything, then a deterministic, non-AI payment step handles money with proper idempotency. AI for conversation, code for cash.

The support bot that invented a refund policy. A retailer asked for a chatbot to answer customer questions. Early on, when asked about an unusual return scenario, it confidently described a 90-day refund policy the company did not have. It had blended general training knowledge with the client's content. We rebuilt it to answer only from grounded, approved documents, and to escalate to a human the moment a question fell outside that scope. Hallucination risk dropped to near zero, because the model was no longer allowed to improvise.

The lead-qualifier that needed a human ear. A B2B client wanted an AI voice agent to qualify inbound calls. It worked well for straightforward enquiries, but it could not read the hesitation in a nervous first-time buyer's voice, the cue a good salesperson acts on instantly. The winning design used the voice agent to capture and route calls efficiently, then handed warm, high-intent prospects to a human who could close with empathy.

The common thread across all three:

What AI did wellWhere the human stayed in control
Held natural conversationsFinal approval of any money movement
Captured and structured informationDecisions involving policy exceptions
Worked tirelessly at any hourReading emotional and relationship cues
Routed and prioritised at speedAnything irreversible or high-stakes

None of these projects failed. They succeeded precisely because we were honest about the limits up front and designed around them. The clients who get burned are the ones sold a fantasy of full autonomy. The clients who win are the ones who deploy AI where it is genuinely strong and keep people where people are genuinely irreplaceable.

How Do You Decide What to Automate, Assist, or Never Touch?

You decide by scoring each task on two axes: how consistent and repeatable it is, and how costly a mistake would be. That single framework, which we use in the first session with every client, sorts almost any business process into one of three buckets: automate fully, keep a human in the loop, or never automate. It cuts through hype faster than any vendor demo.

The logic is straightforward. Tasks that are highly repeatable and low-risk are ideal for full automation, because consistency gives the AI a stable pattern and a mistake costs little. Tasks that are repeatable but carry real consequences belong in the human-in-the-loop category, where AI does the heavy lifting and a person approves the outcome. Tasks that are inconsistent or high-stakes should stay human, full stop, until and unless the process itself can be standardised and de-risked.

Here is the framework we actually use, with examples:

CategoryCriteriaExample tasksOur recommendation
Automate fullyRepeatable, low risk, clear rulesData entry, appointment reminders, FAQ replies, report generationBuild it, monitor it
Human in the loopRepeatable but consequentialQuotes, contract drafts, lead scoring, content draftsAI drafts, human approves
Never automateInconsistent or high-stakesFinal legal advice, medical decisions, complex complaints, redundanciesKeep human, AI may assist research only

A word on cost, because it gets ignored. Human-in-the-loop is the right answer for many tasks, but it is not free. Someone has to review the AI's output, and if reviewing takes nearly as long as doing the task from scratch, you have not saved much. We measure this honestly before we build. If the review burden cancels the time saved, we say so and we do not build it. That is the kind of advice you rarely get from someone whose revenue depends on selling you AI.

The honest summary is that AI is a force multiplier on the right tasks and a liability on the wrong ones. The skill is not in the building, it is in the sorting. Get the sorting right and the technology pays for itself. Get it wrong and you have an expensive system nobody trusts. That is why our process starts with this framework, not with a tool.

What Does the Softomate Implementation Process Look Like?

Our implementation process is a five-stage method that starts by deciding what should be automated before we automate anything, so you never pay to build a system that should not exist. We are a London-based AI automation and software development agency in Stanmore, and after 50-plus integrations we have refined a process built around the limits described in this article. We lead with honesty about what AI cannot do, then build only where it genuinely delivers. Every project is a fixed quote, agreed up front, so there are no open-ended bills.

The five stages:

  1. Discovery and process mapping. We map your workflows and apply the automate / assist / never framework, so you know exactly which tasks are worth automating and which are not.
  2. Data and systems audit. We assess your data quality and the systems we need to integrate, and we flag the contradictions and gaps that would otherwise sink the project.
  3. Design and grounding. We design the solution, ground any AI in your verified content, and define where humans stay in the loop. Compliance and data protection are built in here, not bolted on.
  4. Build and integration. We build the automation, wire it into your existing systems with proper error handling and security, and test it hard against the real edge cases, not just the happy path.
  5. Launch, monitor and refine. We deploy, monitor accuracy and performance, and refine. You get documentation and a clear support arrangement.

Indicative timeline and pricing for 2026:

StageTypical durationWhat you receive
Discovery and mapping1 to 2 weeksAutomation roadmap and honest go / no-go advice
Data and systems audit1 weekIntegration plan and risk register
Design and grounding1 to 2 weeksSolution design and compliance approach
Build and integration3 to 8 weeksWorking, tested system
Launch and refineOngoingLive system, monitoring, support

On price, we are transparent. A focused AI chatbot or automation project typically starts from around £5,000. A more involved custom AI integration across multiple systems generally starts from around £9,000, with the exact figure fixed in your quote once we understand the scope. A standalone discovery and roadmap engagement, useful if you simply want honest guidance on what to automate, starts from around £1,200 and is credited against the build if you proceed. No retainers you do not need, no surprise invoices, and a clear recommendation even when that recommendation is "do not automate this yet". You can contact us for a fixed quote, or read more about how we work on our about page.

Frequently Asked Questions

Can AI replace my team?

No. After 50-plus integrations, we have never seen AI replace a skilled team. It replaces specific repetitive tasks within roles, freeing people for judgement-led work. AI cannot reason, take accountability, or handle the inconsistent, relationship-driven parts of most jobs. The realistic outcome is augmentation: smaller manual workload, same humans making the important calls.

Is AI safe for legal or financial work?

Only as a supervised assistant, never as the decision-maker. In regulated UK sectors, a confident AI error can mean fines or liability, and UK GDPR restricts solely automated decisions with significant effects on people. Use AI for drafting, research support, and admin, with a qualified professional reviewing and approving every output that carries weight.

What is the hallucination rate of AI?

Frontier models hallucinate roughly 3% to 19% of the time, rising to between 15% and 52% on harder tasks. Hallucination means the model produces a confident but false answer. You can reduce it by grounding the AI in verified data, but you cannot eliminate it, which is why human review matters wherever accuracy counts.

Why do so many AI projects fail to reach production?

Usually because of data and integration, not the AI itself. Fragmented spreadsheets, outdated records, contradictory documents, and legacy systems without proper APIs stall pilots before they scale. In the UK, only around 16% of businesses use AI under a strict definition. The fix is sorting data and systems first, then layering AI on a solid foundation.

Does using AI actually increase revenue?

Rarely in the short term. Around 77% of UK AI adopters report no immediate revenue change, and only about 12% report an AI-attributable revenue rise. The genuine value is time saved and capacity freed, not top-line growth. Treat AI as an efficiency tool, measure hours saved honestly, and expect productivity gains rather than instant sales increases.

What can AI do really well right now?

AI excels at language tasks: drafting, summarising, translating, classifying, extracting structured data from messy text, and generating code against known patterns. It works tirelessly at scale and at any hour. The key is keeping it on repeatable, low-to-medium-risk tasks where a mistake is cheap and a human can review anything consequential before it goes out.

Do I need to clean my data before adding AI?

Almost always, yes. AI reflects the quality of the data it sees, so contradictory or outdated records produce confidently wrong answers. We spend the first phase of every project consolidating sources of truth, fixing contradictions, and deciding what the AI may access. Skipping this step turns AI into an amplifier for your existing data problems.

What does human-in-the-loop actually mean?

It means AI does the work and a person approves the outcome before it takes effect. The AI drafts a quote, scores a lead, or summarises a case, and a human checks and signs off. It is the standard mitigation for hallucination and judgement gaps. It is not free, though, so we measure whether review time cancels the time saved.

How much does a real AI automation project cost in the UK?

A focused AI chatbot or automation project typically starts from around £5,000. A larger custom integration across multiple systems generally starts from around £9,000, fixed in your quote once scope is clear. A discovery and roadmap engagement, if you just want honest guidance on what to automate, starts from around £1,200 and is credited against any build that follows.

Is AI regulated in the UK?

There is no single UK AI Act. The UK uses a pro-innovation, sector-led approach, with regulators like the ICO and FCA applying existing rules to AI in their areas. UK GDPR and the Data Protection Act 2018 already govern automated decisions involving personal data. If you serve EU customers, the EU AI Act may also apply, with extraterritorial reach.

After 50-plus AI integrations for London businesses, the honest picture is clear. AI cannot reason, take accountability, or be trusted on facts without supervision, with hallucination rates running from 3% to as high as 52% on hard tasks. It struggles with bad data, legacy systems, inconsistent processes, and anything high-stakes in legal, financial, or healthcare work, where UK GDPR and sector regulators keep humans firmly in charge. The UK numbers confirm the reality: around 16% adoption under a strict definition, only 11% of SMEs automating extensively, and roughly 77% seeing no immediate revenue change. The value is time saved, not magic. The winning approach is to automate the repeatable and low-risk, keep humans on judgement and high-stakes work, and never let a model take an irreversible action unchecked. Sort tasks correctly first, and AI pays for itself. Sort them wrong, and you build an expensive system nobody trusts.

If you want an honest assessment of what AI can and cannot do for your specific business, before you spend a penny on a build, talk to our London AI automation team for a fixed quote and a no-nonsense recommendation.

Written by Deen Dayal Yadav, Founder of Softomate Solutions, a London-based AI automation and software development agency in Stanmore (HA7). With over 12 years building software and automation systems for UK businesses, and more than 50 AI integrations delivered, I have a clear, first-hand view of where AI genuinely helps and where it quietly fails. Softomate Solutions is registered at Companies House and works with SMEs across London and the UK to automate the right tasks and keep people in control of the rest. Read more about our approach on our about page.

We protect the real names of all clients featured in examples and case studies. Every testimonial is from a real client.

Work with us

Ready to automate your business?

Book a free 30-minute discovery call with DD and get a personalised automation roadmap.

  • Free discovery call, no commitment
  • Fixed-price scoping delivered within 48 hours
  • UK-based team with full accountability
48hSCOPING DELIVERED
100+PROJECTS DELIVERED
UKBASED TEAM
10+YEARS EXPERIENCE
Deen Dayal Yadav, founder of Softomate Solutions

Deen Dayal Yadav

Online

Hi there ðŸ'‹

How can I help you?