Short title: How We Eliminated 77% Entity Loss and Agent Freeze with an Open Memory Standard
Author: L. Zamazal, GLG, a.s.
Date: March 2026
Keywords: LLM memory, context compaction, agent memory, information loss, on-demand recall, UAML, structured memory, MCP, zero-downtime, open standard
Large language models have no persistent memory. Every inference call receives the entire conversation context as input, and the model's response quality depends directly on the quality of that input. When context grows beyond the window limit, platforms perform compaction β summarizing the conversation to fit. We measured that standard compaction loses 77% of named entities (people, decisions, tools, dates) in production multi-agent deployments, directly degrading agent decision quality.
We present a three-layer recall architecture that achieves 100% entity recovery while keeping per-turn token costs minimal through on-demand retrieval. Deployed in production across two agent instances processing 16,000+ messages, our approach demonstrates that improving memory quality is more cost-effective than upgrading models. We survey 27 recent papers from Western, Chinese, Korean, and Japanese research communities and show that no existing system combines selective on-demand recall with post-quantum encryption, audit trails, and temporal validity β capabilities required for deployment in regulated industries.
Our central claim: you don't pay for a better model; you pay for better memory.
The dominant assumption in the LLM industry is that performance scales with model capability β larger models, longer context windows, better reasoning. This assumption drives a hardware arms race and an API pricing war. But it obscures a fundamental architectural problem: the context window is not memory. It is cache.
Every API call to a large language model sends the complete conversation context β system instructions, tool schemas, prior messages, injected documents β as input. There is no persistent state between calls. The model sees only what is explicitly present in that single request. When the accumulated context exceeds the model's window, frameworks apply compaction: lossy summarization that discards what doesn't fit.
An AI agent is only as good as what it remembers. Yet the dominant paradigm for handling context overflow is the computational equivalent of asking a colleague to shred their notes and work from a one-paragraph summary.
The consequences are measurable. In our production deployment, we observed that after standard compaction, agents could recall only 23% of named entities from prior conversations β losing decisions, attributions, tool references, and temporal context. When asked about a payment processor integration discussed 90 minutes earlier, the agent responded "I don't know," despite having processed that exact information before compaction occurred.
This paper presents evidence from academic research and production measurements that:
1. Context quality determines output quality β not model size, not context window length
2. Standard compaction is fundamentally lossy β and the loss is quantifiable
3. On-demand recall from structured memory solves this without inflating per-turn costs
4. No existing system combines selective recall with the security and auditability required for regulated deployment
5. The approach works today β no model modifications required, no cloud dependencies
Liu et al. (2024) demonstrated that transformer-based models exhibit significant performance degradation when relevant information is positioned in the middle of the input context. In multi-document QA experiments, accuracy dropped by more than 20 percentage points when the answer-bearing document moved from the beginning or end to the center of the context [1]. This finding has been replicated across model families including GPT-4, Claude, and Llama.
Crucially, this is not a generational bug being patched away. Esmi et al. (2026) benchmarked GPT-5 and found that long-context performance still degrades compared to short-context baselines [2]. Salvatore et al. (2025) argue this is an emergent property of the attention mechanism itself [3]. The problem is architectural.
Chroma Research (2026) coined the term "Context Rot" to describe the non-uniform performance degradation that occurs as input length increases [4]. Testing 18 LLMs on controlled tasks where only input length varied (not task complexity), they found that standard benchmarks like Needle-in-a-Haystack give a false sense of security β they test simple lexical retrieval, not the semantic reasoning that real applications demand.
The implication: a 1M-token context window filled with marginally relevant information will produce worse results than a 50K-token window with precisely the right information.
When context exceeds the window limit, agent platforms perform compaction β typically by sending the entire context to an LLM with instructions to summarize. Wang et al. (2026) describe this as a "fundamentally lossy" operation [5]: truncation and summarization compress or discard evidence that may be critical for future decisions.
The problem compounds over time. Each compaction cycle loses detail from the previous cycle's summary. After several cycles, the agent retains only the broadest themes β a phenomenon we term progressive context amnesia.
Research from the Harbin Institute of Technology (Tan et al., ACL 2024) demonstrated that when retrieved context conflicts with the model's parametric knowledge, LLMs frequently ignore the provided context entirely [6]. Poorly curated context injection doesn't just waste tokens β it can actively mislead the model by triggering parametric override of correct but poorly positioned external evidence. This makes provenance and temporal validity of injected context a first-order concern.
We deployed UAML (Universal Agent Memory Layer) on two production agent instances processing real-world multi-agent team conversations over a period of several weeks. Over 16,000 chat messages were indexed, producing 6,400+ structured knowledge entries. We then measured entity recovery β the ability to recall named entities (people, tools, decisions, dates, configurations) after compaction events.
Our architecture provides three recovery layers:
| Layer | Source | Function |
|---|---|---|
| **L1** | Platform native compaction | Summarized conversation context |
| **L2** | UAML knowledge base | Structured entities extracted from conversations |
| **L3** | SQL archive | Complete, unmodified message history |
| Configuration | Entity Recovery | What Survives |
|---|---|---|
| L1 only (standard compaction) | **23%** | Broad themes, recent topics |
| L1 + L2 (+ structured knowledge) | **50%** | Named entities, key decisions, facts |
| L1 + L2 + L3 (full architecture) | **100%** | Everything β zero data loss |
Standard compaction loses 77% of named entities. After compaction, the agent cannot reliably recall:
These are precisely the facts that determine whether an agent's next response is helpful or hallucinated.
During production operation, an agent was asked about a prior decision regarding a payment processor integration β a topic discussed 90 minutes earlier but lost to compaction. Without structured memory, the agent could not answer. With on-demand recall (a single query taking <100ms), the full decision context was recovered and the agent responded correctly with complete attribution.
This is not an edge case. In multi-session, multi-day agent deployments, every compaction cycle creates potential blind spots that accumulate over time.
Mason (2026) independently confirmed the scale of the problem by analyzing 857 production LLM sessions comprising 4.45 million effective input tokens [7]. The finding: 21.8% of all context was structural waste β tool definitions that were never invoked, system prompt repetitions, stale tool results from completed subtasks. This waste is not merely an efficiency problem; it actively competes with relevant information for the model's limited attention capacity.
A naive solution would be to inject all available memory into every context. This fails for three reasons:
1. Token cost scales linearly β every additional token in context is paid for on every turn, whether or not it's needed
2. Context Rot degrades quality β more tokens β better results [4]
3. Distractors harm accuracy β Iratni et al. (2025) showed that irrelevant retrieved passages actively degrade output quality [8]
On-demand recall avoids all three problems:
| Approach | Tokens/turn | Cost Pattern | Accuracy |
|---|---|---|---|
| No compaction (full history) | 200K+ | Very high, every turn | Degrades with length |
| Standard compaction | 20β50K | Low | 23% entity recovery |
| Auto-inject all memory | 50β80K | High, every turn | High but with noise |
| **On-demand recall** | **20β50K base + 2β5K when needed** | **Low (recall only when needed)** | **100% recoverable** |
The mechanism works in two phases within a single turn:
1. Phase 1: The agent receives a query, recognizes it needs historical context, and calls a memory retrieval tool
2. Phase 2: Relevant entries are returned into context, and the agent formulates a response with complete information
This mirrors how professionals work: you don't carry every document to every meeting, but you know where to find them when needed.
The compression approach tackles the problem at the infrastructure level β making context smaller without (ideally) losing information.
Activation Beacon (Zhang, Liu, Xiao et al., BAAI/FlagOpen, 2024) compresses KV cache activations rather than text, achieving 8Γ KV cache reduction and 2Γ inference speedup on 128K+ contexts [9]. Recurrent Context Compression (Huang, Zhu, Wang et al., 2024) achieves 32Γ compression with BLEU4 near 0.95 [10]. Semantic Compression (Fei, Niu, Zhou et al., Huawei, ACL 2024) applies information-theoretic source coding to extend context 6β8Γ without fine-tuning [11].
These approaches address hardware efficiency but do not solve the semantic quality problem: compressed content still carries all information indiscriminately.
A more promising direction treats memory as a managed system rather than a compression target.
MemAgent (Yu, Chen, Feng et al., 2025) uses RL-trained agents that read text in segments and update memory via an overwrite strategy, extrapolating from 8K training context to 3.5M token QA with <5% loss [12]. With 81 citations, it represents the current state of the art in Chinese memory agent research.
SimpleMem (Liu, Su, Xia et al., 2026) implements a three-stage pipeline β semantic compression, online synthesis, intent-aware retrieval β achieving +26.4% F1 on LoCoMo while reducing token consumption by 30Γ [13]. Its architecture is conceptually closest to our on-demand recall approach.
MemOS (MemTensor, Shanghai Jiao Tong University, Renmin University, China Telecom, 2025) proposes a Memory Operating System with OS-inspired lifecycle management, achieving state-of-the-art across benchmarks [14]. It operates primarily in latent space β complementary to text-level structured approaches.
M+ (ICML 2025) extends MemoryLLM with scalable long-term memory, confirming that "retaining information from the distant past remains a challenge" [15] β exactly what SQL-backed archival solves without model modification.
ACON (Kang et al., Microsoft Research, 2025) optimizes context compression for long-horizon agents, reducing memory usage by 26β54% [16]. Its key finding validates our approach: "generic summarization easily loses critical details" β task-aware, selective retrieval is essential.
Pichay (Mason, 2026) takes the most radical approach, treating the context window as L1 cache and implementing demand paging with eviction and fault detection [7]. In production, it reduces context consumption by up to 93%. Mason's framing captures the field's core insight: "the problems β context limits, attention degradation, cost scaling, lost state across sessions β are virtual memory problems wearing different clothes."
Focus (Verma, 2026) implements autonomous context compression where agents decide when to consolidate learnings and prune history, achieving 22.7% token reduction without accuracy loss [17].
MemArt (2025) demonstrates that structured retrieval improves accuracy by 11.8β39.4% over plaintext memory methods with 91β135Γ reduction in prefill tokens [18] β direct validation of the principle that targeted recall outperforms brute-force injection.
InfiniGen (Lee et al., Seoul National University, OSDI 2024) addresses KV cache management for long-text generation, achieving up to 3Γ speedup over existing methods [23]. Funded by Samsung Advanced Institute of Technology and cited 253 times, it represents the hardware-level approach to context scaling that complements software-level memory management.
THEANINE (Ong, Kim, Gwak et al., NAACL 2025) introduces timeline-based memory management for lifelong dialogue agents [24]. Its key insight β don't delete old memories, but connect them temporally and causally β directly validates UAML's temporal validity mechanism. Memories form evolving timelines rather than static snapshots.
LRAgent (Jeon et al., Korea, 2026) tackles KV cache sharing for multi-LoRA agents [25], addressing the exact overhead problem that emerges when multiple agents share a backbone but maintain separate caches.
Most striking is "Store then On-Demand Extract" (Yamanaka et al., Japan, 2026), which argues against the dominant "extract then store" paradigm and advocates storing raw data with on-demand extraction at query time [26]. This is philosophically identical to UAML's three-layer approach: preserve everything (L3), extract structured knowledge (L2), and retrieve on demand. Yamanaka's framing β "uplifting the world with memory" β captures the same conviction that memory infrastructure is foundational, not auxiliary.
AIM-RM (Yoshizato, Shimizu et al., Japan, AAMAS 2026) demonstrates practical deployment of memory retrieval in industrial supply chain agents [27], confirming that memory-augmented agents are moving beyond chatbot applications into production enterprise systems.
MemGPT/Letta (Packer et al., 2023) pioneered virtual context management inspired by OS memory hierarchies [19]. Mem0 (2025) offers scalable memory for multi-session dialogues [20]. Memex(RL) (Wang et al., 2026) is the closest academic parallel β indexed memory with compact summaries plus full-fidelity external storage [5].
Two comprehensive surveys map the field: Cognitive Memory in LLMs (Shan, Luo, Zhu et al., 2025) provides the most complete taxonomy of memory mechanisms with 34 citations [21], and A Comprehensive Survey on Long Context Language Modeling (Liu, Zhu, Bai et al., 2025) covers the full spectrum with contributions from 35+ researchers and 88 citations [22].
Across all surveyed approaches, several critical capabilities are consistently absent:
| Capability | MemAgent | SimpleMem | MemOS | Pichay | THEANINE | Mem0 | MemGPT | **UAML** |
|---|---|---|---|---|---|---|---|---|
| On-demand selective recall | β | β | Partial | β | β | Partial | Partial | **β** |
| Cross-session memory | β | β | β | β | β | β | β | **β** |
| End-to-end encryption | β | β | β | β | β | β | β | **β (PQC)** |
| Audit trail / provenance | β | β | β | β | β | β | β | **β** |
| Temporal validity | β | β | β | β | β | β | β | **β** |
| Multi-agent isolation | β | β | β | β | β | β | β | **β** |
| Local-first / self-hosted | β | β | β | β | β | β | Partial | **β** |
| Certifiable for regulated use | β | β | β | β | β | β | β | **β** |
| Zero-downtime compaction | β | β | β | Partial | β | β | β | **β** |
| MCP integration (drop-in) | β | β | β | β | β | β | β | **β** |
| Production-deployed | Research | Research | Research | β | Research | β | Partial | **β** |
No existing system combines selective on-demand recall with the security, auditability, and temporal reasoning required for deployment in regulated environments β healthcare, legal, financial services, government.
Academic memory systems universally neglect security. For enterprises in regulated industries, this is a deployment blocker. UAML addresses this gap with:
These are not features for a roadmap. They are implemented, tested, and deployed in production.
We believe memory infrastructure should be standardized, not proprietary. We have proposed an External Memory Provider API (RFC #49233) [28] that addresses three interconnected problems: agent downtime during compaction, information loss after compaction, and the lack of a standard integration path for external memory systems.
When an agent platform's context window fills up, the platform performs synchronous in-band compaction: the agent stops responding, the entire context is sent to an LLM for summarization, and the summary replaces the original context. In our measurements, this creates a 30β60 second blackout β the agent is completely unresponsive. For production use cases (customer support, financial services, healthcare), this is a deployment blocker.
RFC #49233 proposes a hot-swap architecture with continuous background synchronization:
ββββββββββββββββββββββββββββββββββββββββββββ
β Agent Platform β
β Context Slot A (active) ββ Slot B β
β β β β
β ββββββββββββββββββββββββββββββββββ β
β β Memory Provider Interface β β
β ββββββββββββ¬ββββββββββββββββββββββ β
βββββββββββββββΌβββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββ
β External Memory Provider β
β β’ Continuous async sync β
β β’ Background compression β
β β’ Pre-built context ready β
β β’ Full audit trail β
βββββββββββββββββββββββββββββββ
The mechanism has four stages:
1. Continuous sync β Every message is written to the memory provider asynchronously (fire-and-forget, local socket, ~1ms). The provider never blocks the agent.
2. Background compression β The provider maintains a compressed context summary, updated asynchronously using a lightweight local model. The compressed context is always ready before the agent needs it.
3. Atomic swap β When capacity threshold is reached, the platform reads the pre-built context from the provider (~50ms) and swaps it in during the inter-message gap. The agent never pauses.
4. On-demand recall β When the agent needs details that were compressed away, it queries the provider via a tool call (~10β50ms), receiving targeted knowledge entries.
This is analogous to double-buffering in graphics: while the agent uses buffer A, the provider prepares buffer B in the background. When it's time to compact, the platform atomically swaps A for B.
The proposed interface is deliberately minimal β three core operations plus lifecycle hooks:
interface MemoryProvider {
// Fire-and-forget after each message (async, not in critical path)
onMessage(sessionId: string, message: Message): Promise;
// Returns pre-built compressed context
getCompressedContext(sessionId: string, maxChars: number): Promise;
// On-demand recall tool
recall(sessionId: string, query: string, limit?: number): Promise;
ping(): Promise<{ ok: boolean; latencyMs: number }>;
}
interface SessionHooks {
'pre-compaction': (session: Session) => Promise;
'post-compaction': (session: Session, newContext: CompressedContext) => Promise;
'session-start': (session: Session) => Promise; // Cold-start recovery
'session-end': (session: Session) => Promise;
}
Configuration is opt-in with full backward compatibility:
{
"memory": {
"provider": "uaml",
"endpoint": "http://localhost:8770",
"compaction": {
"strategy": "hot-swap",
"threshold": 0.85,
"targetSize": 0.40
}
}
}
| Metric | Current (builtin) | With Memory Provider |
|---|---|---|
| Compaction duration | 30β60s | <100ms |
| Agent downtime | 30β60s | **0** (between messages) |
| Information loss | Significant (77%) | **None** (full DB) |
| Audit trail | None | **Complete** |
| Cost per compaction | ~$0.10β0.50 | **~$0.001** (async local) |
A critical design decision is the choice of integration protocol for the recall tool. UAML implements the Model Context Protocol (MCP) connector, which means any existing LLM agent that supports MCP tool calls can connect to UAML's full memory capabilities β recall, indexing, knowledge extraction β without any code changes to the agent itself. The agent simply gains a new tool in its toolbox. No forking, no framework lock-in, no migration.
In our production deployment, two agents from different platforms were connected to UAML via MCP within minutes, immediately gaining access to structured memory recall while retaining all their existing functionality. The MCP approach turns memory from a platform feature into a universal service layer that any agent can consume.
A critical design principle: the platform's builtin compaction pipeline always runs in parallel β it is never disabled, even when a memory provider is active.
Builtin Compaction ββββββββββββββββββββββΊ ALWAYS running (shadow/backup)
UAML Memory Provider βββββββββββββββββββββΊ Enriches when available (overlay)
This guarantees 100% functionality at all times:
Not all information is equally important. The memory provider classifies every entry:
| Level | Criteria | % of data | Examples |
|---|---|---|---|
| **HIGH** | Decisions, rules, architecture choices | 0.5% | "Chose ML-KEM-768 for encryption" |
| **MEDIUM** | Entity mentions, config changes, results | 7% | "VPS IP: 5.189.139.221" |
| **LOW** | Debug output, heartbeats, transient status | 93% | Tool output, NO_REPLY messages |
This filtering ensures the recall API returns signal, not noise. The `getCompressedContext` endpoint prioritizes HIGH and MEDIUM entries, keeping injected context compact and relevant.
Every knowledge entry maintains a verifiable provenance chain from recalled fact to original message:
UAML Entry #4521
βββ content: "Decided on hot-swap compaction strategy"
βββ importance: HIGH (score: 7)
βββ source: session:a7e0260c:msg_hash_abc123
βββ chat_history_id: 28451 β links to SQL archive
βββ created_at: 2026-03-18T14:23:00Z
β
βΌ
SQL chat_messages #28451
βββ text: [full original message, verbatim]
βββ source_file: a7e0260c-...jsonl
βββ source_line: 14832
This enables audit (trace any fact to its source), verification (compare summary against verbatim record), and compliance (demonstrate data provenance for regulated environments).
In production, agents operate across multiple sessions (Discord channels, messaging platforms, scheduled tasks). The proposed API extension enables:
The design prioritizes graceful degradation:
| Phase | Capability | Estimated Timeline |
|---|---|---|
| 1 | Pre/post-compaction hooks | 1β2 weeks |
| 2 | Memory Provider Interface | 2β3 weeks |
| 3 | Hot-swap compaction strategy | 3β4 weeks |
| 4 | Background sync + config + docs | 2β3 weeks |
Phase 1 alone would already enable external memory integration and demonstrate value. The full RFC proposal is publicly available at github.com/openclaw/openclaw/issues/49233 and has received community attention and independent analysis, confirming demand for standardized memory infrastructure.
Research groups across East Asia are producing world-class work on context compression and memory architectures. Chinese institutions (BAAI, Shanghai Jiao Tong, Harbin Institute, Huawei, China Telecom, MemTensor), Korean groups (Seoul National University, Samsung, KAIST), and Japanese researchers are collectively advancing the field at a pace that exceeds Western output in volume and increasingly matches it in impact. Any serious memory infrastructure product must engage with this global research base.
The evidence converges from multiple independent sources: context quality determines output quality. A smaller context with the right information produces better results than a larger context with noise. Standard compaction loses 77% of named entities β a measurable degradation that directly affects agent decision quality.
On-demand recall from structured memory resolves this without the token cost of context stuffing. Our three-layer architecture (compaction + knowledge base + archive) achieves 100% entity recovery while maintaining minimal per-turn costs. The approach requires no model modifications, no cloud dependencies, and works as an overlay on existing agent platforms.
We are not arguing for replacing compaction β it serves a useful purpose in maintaining manageable context sizes. We are arguing that compaction alone is insufficient, and that a structured external memory layer transforms it from a lossy compression into a lossless one.
The organizations that understand this will build agents that actually work. The rest will keep buying bigger models and wondering why they still forget.
Memory quality is the new model quality.
[1] Liu, N.F. et al. (2024). "Lost in the Middle: How Language Models Use Long Contexts." Transactions of the ACL, 12, 157β173. arXiv:2307.03172
[2] Esmi, N. et al. (2026). "GPT-5 vs Other LLMs in Long Short-Context Performance." arXiv, February 2026.
[3] Salvatore, N. et al. (2025). "Lost in the Middle: An Emergent Property from Information Retrieval Demands in LLMs." arXiv, October 2025.
[4] Chroma Research (2026). "Context Rot: How Increasing Input Tokens Impacts LLM Performance." research.trychroma.com/context-rot
[5] Wang, Z. et al. (2026). "Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory." arXiv:2603.04257
[6] Tan, Y. et al. (2024). "When Retrieved Context Conflicts with Parametric Knowledge." ACL 2024, Harbin Institute of Technology.
[7] Mason, T. (2026). "The Missing Memory Hierarchy: Demand Paging for LLM Context Windows." arXiv:2603.09023
[8] Iratni, M. et al. (2025). "Dynamic Context Selection for Retrieval-Augmented Generation: Mitigating Distractors and Positional Bias." arXiv, December 2025.
[9] Zhang, P. et al. (2024). "Long Context Compression with Activation Beacon." BAAI/FlagOpen. arXiv:2401.03462
[10] Huang, C. et al. (2024). "Recurrent Context Compression: Efficiently Expanding the Context Window of LLM." arXiv:2406.06110
[11] Fei, W. et al. (2024). "Extending Context Window of Large Language Models via Semantic Compression." Huawei. Findings of ACL 2024, 5169β5181.
[12] Yu, H. et al. (2025). "MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent." arXiv:2507.02259
[13] Liu, J. et al. (2026). "SimpleMem: Efficient Lifelong Memory for LLM Agents." arXiv:2601.02553
[14] MemTensor et al. (2025). "MemOS: A Memory OS for AI System." Shanghai Jiao Tong University, Renmin University, China Telecom. arXiv:2507.03724
[15] M+ (2025). "Extending MemoryLLM with Scalable Long-Term Memory." ICML 2025. arXiv:2502.00592
[16] Kang, M. et al. (2025). "ACON: Optimizing Context Compression for Long-Horizon LLM Agents." arXiv:2510.00615
[17] Verma, N. (2026). "Active Context Compression: Autonomous Memory Management in LLM Agents." arXiv:2601.07190
[18] MemArt (2025). "KVCache-Centric Memory for LLM Agents." OpenReview.
[19] Packer, C. et al. (2023). "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560
[20] Mem0 (2025). "Building Production-Ready AI Agents with Scalable Long-Term Memory." arXiv:2504.19413
[21] Shan, L. et al. (2025). "Cognitive Memory in Large Language Models." arXiv:2504.02441
[22] Liu, J. et al. (2025). "A Comprehensive Survey on Long Context Language Modeling." arXiv:2503.17407
[23] Lee, S. et al. (2024). "InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management." OSDI 2024, Seoul National University.
[24] Ong, D., Kim, H., Gwak, S. et al. (2025). "THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation." NAACL 2025.
[25] Jeon, H. et al. (2026). "LRAgent: Multi-LoRA Agents with KV Cache Sharing." arXiv:2602.01053
[26] Yamanaka, Y. et al. (2026). "Store then On-Demand Extract: A Memory Architecture for LLM Agents." arXiv:2602.16192
[27] Yoshizato, T., Shimizu, H. et al. (2026). "AIM-RM: Agent-based Inventory Management with Retrieval Memory." AAMAS 2026. arXiv:2602.05524
[28] Zamazal, L. / GLG, a.s. (2026). "RFC: External Memory Provider API for OpenClaw." GitHub Issue #49233. github.com/openclaw/openclaw/issues/49233
GLG, a.s. β UAML (Universal Agent Memory Layer) is available at uaml-memory.com. Technical documentation and API reference at smart-memory.ai.
Β© 2026 L. Zamazal, GLG, a.s. | uaml-memory.com | RFC #49233 | CC BY 4.0