Tomorrow's agents cannot use yesterday's search

Tomorrow's agents cannot use yesterday's search
 A technical briefing for engineering and AI leaders evaluating next-generation context infrastructure.

THE SHIFT
Search's reader changed — from human to agent
For decades recall was king — because humans consumed the results, forgiving readers who skim a list, skip the junk, and stitch the useful pieces together. Agents don't read like us: they pull results straight into context, attend to everything at once, and a single plausible-but-wrong passage derails the task with no amount of reasoning able to recover. Retrieval's job just flipped — from "return everything relevant" to "admit only clean, correct context."
The Human Era — Tolerant
Skims, skips the junk, stitches the pieces. Recall was king: return everything plausibly relevant — the human sorts it out and forgives the noise.
Needs forgiveness.
The Agent Era — Fragile
Ingests it all into context, attends in parallel — can't skip. One plausible-but-wrong passage derails the whole task — and no amount of reasoning recovers.
Needs precision.
THE INTERSECTION
Search is ancient. What POMA solves is brand new.
Three technologies converged in the last few years — dense vector retrieval, LLMs that can read implicit document structure, and agents that consume retrieval directly. POMA sits exactly where they meet.
1
1960s–2010s — The keyword era
Inverted indexes, BM25, PageRank. Fifty years of incremental refinement on the same core idea: match terms, rank by frequency.
2
2013 — word2vec
Words become vectors. Semantic similarity becomes computable for the first time.
3
2022 — Vector search at scale
Dense retrieval goes production-ready. Meaning, not keywords, drives what gets returned.
4
2023 — LLMs read structure
Models can now infer hierarchy, section boundaries, and implicit document architecture — not just surface text.
5
2025 — Agents do the searching
Retrieval is no longer a human-facing UI. Agents pull context directly, which goes straight into reasoning.
6
Now — POMA
Structure-aware ingestion that emits retrieval-ready chunks. Built for the era where agents do the reading.
THE PROBLEM
Every pipeline starts with something broken nobody talks about: the chunks.
A chunk is the atomic unit of retrieval — the exact slice of text your vector database stores, searches, and returns to the agent.
Every ingestion vendor — Azure, AWS Textract, Unstructured, LlamaParse — parses your documents to text, then stops. They hand you a wall of markdown and leave you to split it into chunks yourself. Usually with a free, naive splitter that knows nothing about your document's structure. That broken chunk is what your vector database indexes. That broken chunk is what retrieval runs on. That broken chunk is what your agent reasons over.
Naive splits
500-token blocks that start mid-sentence and end mid-table
Lost context
No section, no hierarchy, no position — just floating text
Dirty text
OCR artifacts, page headers, watermarks embedded as content
You can't fix a bad chunk downstream. The damage is done at ingestion.
THE SOLUTION
POMA PrimeCut
Better document ingestion
— with chunking built in
Every ingestion vendor hands you text
and leaves the chunking to you.

POMA PrimeCut does both — 
structure-aware ingestion that emits retrieval-ready chunks, not raw markdown.
Same price point as OCR. Fundamentally better output.
CONTEXT IS MADE OF CHUNKS
Same document. Three very different chunks.
Here's the same passage from an FDA cybersecurity guidance document, chunked three different ways.
❌ Conventional
500-token naive split
...an SPDF is one approach to help ensure that the QS regulation is met. Because of its benefits in helping comply with the QS regulation and cybersecurity, FDA encourages manufacturers to use an SPDF, but other approaches might also satisfy the QS regulation.
### B. Designing for Security
When reviewing premarket submissions, FDA intends to assess device cybersecurity based on a number of factors, including, but not limited to, the device's ability to provide and implement the security objectives below throughout the device architecture.
Security Objectives: • Authenticity, which includes integrity • Authorization • Availability • Confidentiality • Secure and timely updatability and patchability
...The risks presented by cybersecurity vulnerabilities; the exploitability of the vulnerabilities; and the risk of patient harm due to vulnerability exploitation.
### C. Transparency
A lack of cybersecurity information, such as information necessary to integrate the device into the use environment...
[truncated — chunk continues across §C]
Spans multiple sections
Heading isolated from its content
Bleeds into unrelated topic (Transparency)

⚠️ Unstructured.io
Incumbent parser
• The device's intended use, indications for use, and reasonably foreseeable misuse;
• The presence and functionality of its electronic data interfaces;
• Its intended and actual environment of use;18
• The risks presented by cybersecurity vulnerabilities;
• The exploitability of the vulnerabilities; and
• The risk of patient harm due to vulnerability exploitation.
No section indication
No position within document
OCR artifact in main text ("18")

✓ POMA PrimeCut
Hierarchically prefixed chunk
Cybersecurity Guidance for Medical Devices
  └ Guidance for Industry and FDA Staff
      └ B. Designing for Security
          └ The extent to which security requirements, architecture, supply chain, and implementation are needed to meet these objectives will depend on but may not be limited to:
              └ Its intended and actual environment of use:
                  └ The risk of patient harm due to vulnerability exploitation.
What makes it so much better:
Full document path preserved
Self-contained meaning
Zero artifacts, zero ambiguity
The chunk is the unit of retrieval. Its quality determines everything downstream.
CONTEXT ROT
You can't out-context a bad chunk
Agents attend to every retrieved token at once, so noise isn't ignored — a distractor actively degrades the answer. And you can't out-context it: accuracy drops as input grows, often at only 10–30% of the context window, with facts buried mid-context retrieved worst. Backed by Chroma's context-rot study and academic work (Lost in the Middle, NoLiMa, RULER, Databricks).
Benchmark
Tokens at 100% Recall — the key metric for the agentic era
On the public POMA-OfficeQA benchmark, PrimeCut achieves 100% recall using just 23% of the context tokens that naive pipelines need — 4× less noise, cost, and context rot per query.
100%
Databricks + naive chunking
500/100 split/overlay strategy
102%
Unstructured.io
Default "hierarchical" chunking
23%
POMA PrimeCut
Full recall, 4× less context
Public and reproducible: github.com/poma-ai/poma-officeqa (based on the hitherto unsolved  Databricks OfficeQA challenge)
WHY IT WORKS
Embedders love structure
Feed an embedder POMA's hierarchical chunksets instead of naive splits — same model, same cosine, only the chunking changes — and the golden answer jumps from rank 1,780 to rank 15: a 119× ranking gain. Here's why it works at the vector level.
Hierarchy is a coordinate system
A chunk reading "Employee Handbook › Benefits › Health Insurance › Enrollment deadline: Dec 15" encodes positional context in the document's architecture. The embedder knows where this fact lives — not just what it says.
Dilution is incoherence, not length
Dilution happens when semantically unrelated content competes for the same vector — a table header, half a footnote, and the start of the next section blur into a useless average. A long, coherent chunk produces a focused embedding. Length is not the enemy. Incoherence is.
Structural markers are semantic anchors
Section titles and table headers embedded inside the chunk — not as metadata — dramatically improve retrieval. The embedder treats them as anchors that sharpen the vector's meaning.
Fixed-size chunking feeds embedders what they hate
A 500-token block starting mid-sentence, ending mid-table? Blurry vector, poor ranking, hallucinations downstream. PrimeCut's 77% token reduction is what happens when you give embedders input they can work with.
rank 1,780 → 15
Ranking improvement, same model & cosine
119× more relevance
with only the chunking changed (conventional —> POMA)
153× improvement
when prefixing full hierarchical "content path" to embeddings on 21,414 exemplary chunks
Source: www.linkedin.com/posts/dr-alexander-kihm_embeddings-rag-ai-activity-7440691119727816704-FDGx
THE PRODUCT AT A GLANCE
PrimeCut: ingestion plus best-in-class chunking
PrimeCut reads document structure — sections, tables, lists, figures — and emits hierarchical, retrieval-ready chunks instead of arbitrary slices. One product replaces your OCR/parser and the chunking step the others leave to you.
50+ filetypes
PDFs, tables, figures, code, and more, including content most parsers flatten
Hierarchical chunksets
Patented, structure-aware; no meaning lost at boundaries
Shipped and proven
Live API, public benchmark, US patent granted
From €0.003/page
At the OCR price floor, chunking included
POMA GRILL
Try PrimeCut end-to-end — without building a pipeline
Grill is POMA's managed context engine: a full context pipeline you drop into your agent via MCP, SDK, or REST. PrimeCut chunking runs under the hood — ingest a document, query it, get a prompt-ready RetrievalContext block back. No vector DB to provision, no embedding model to choose, no retrieval glue to maintain.
Ingest
POST /grill/ingest — accepts 50+ filetypes via REST, SDK, CLI, or MCP. Grill indexes server-side; no .poma archive to handle.
Query
POST /grill/search returns a sandwich-ordered, token-budgeted RetrievalContext (XML + Markdown) — drop it straight into your LLM prompt.
Comparable by design
Run Grill alongside your existing pipeline on the same queries and documents. The benchmark runs itself.

Wire it in — two options
Option A — local binary (stdio)
{
  "mcpServers": {
    "poma-grill": {
      "command": "npx",
      "args": ["-y", "@poma-ai/poma-grill-mcp", "-input", "-"],
      "env": { "POMA_API_KEY": "poma_prod_gr_..." }
    }
  }
}
Go binary also available: brew install poma or go install github.com/poma-ai/poma-grill-mcp@latest
Option B — hosted endpoint (no install)
{
  "mcpServers": {
    "poma-grill": {
      "type": "http",
      "url": "https://mcp.poma-ai.com/grill/v1",
      "headers": { "x-api-key": "poma_prod_gr_..." }
    }
  }
}
Hosted server cannot read local file paths — use file_base64, or run Option A for file_path access.
Context-as-a-Service
POMA Grill returns a RetrievalContext — not raw chunks. Sandwich-ordered passages, gap markers, citation metadata. Your agent reasons over signal, not noise.
Docs & quickstart: poma-ai.com/docs — the docs themselves are also served as a separate MCP server (poma-docs MCP), so your agent can query the documentation directly while you build.
Ready to Unlock Your Agent's Full Potential?
POMA PrimeCut is engineered to eliminate context rot, boost recall, and significantly reduce noise in your context pipeline.
Try it with POMA Grill
Plug the PrimeCut magic into your agent in minutes — via MCP, SDK, and REST. No pipeline to build. Start at poma-ai.com/docs
Direct contact
Any questions? You can reach out directly to our Founder and Chief Scientist Dr. Alexander Kihm at ak@poma-ai.com
Visit poma-ai.com
Explore more about how our products are redefining context for the agentic era. Visit https://www.poma-ai.com
Let's build a future where every agent operates with 100% recall and zero noise.