How it's built

The Foundation

Open source code. Peer-reviewed research. No magic, just mechanics.

FOUNDATION 01 Code

Open Source Core

The core RAG foundation is open source. You can inspect the system, understand its behavior, and extend it to fit your requirements. No proprietary lock-in, no hidden costs.

Core capabilities:

  • Access Control — Multi-tenant authentication, user management, session handling
  • Document Ingestion — Structured processing and indexing for retrieval
  • Query & Retrieval — Context-aware answers with traceable sources
  • System Observability — Logging, traceability, and operational metrics

Designed for auditability, long-term maintainability, and infrastructure control.

FOUNDATION 02
ESSLLI 2025 JoLLI 2026

The Appearance of Meaning

Context-Dependence and Semantic Competence in Transformer Architectures

Read accepted paper (ESSLLI 2025)

Peer-reviewed philosophical framework.

We operationalize "appearance of meaning" (AoM) in transformer language models as a measurable competence cluster: context-sensitive disambiguation, controlled minimal-pair sensitivity, and discourse-level coherence. We propose a Context-Primacy Thesis (CPT): meaning-relevant behavior is causally governed by token-in-context relational states rather than static lexical carriers.

Key Results (GPT-2 & Qwen2.5):

0.85–0.91
AoM composite score
≈20–33%
Peak depth (argmax layer)
0.67 → 3.11
Mean max effect (sham ≈ 0)

Supporting: 91–96% disambiguation accuracy (cue-vulnerable; not the main load-bearer in the paper).

CPT Causal Signature by Layer

Layer-resolved CPT signature under targeted interventions. The sensitivity profile is model-dependent; the critical result is consistent separation from sham/placebo across layers.

Chart: Qwen2.5 (0.5B, 1.5B, 3B). Full AoM + CPT results additionally include GPT-2 (124M). Sham patching is near-zero. SDH target-specificity stress test runs across 8 checkpoints.

Why this matters for products:

This research establishes that modern language models are genuinely context-sensitive rather than simple pattern replay systems. For products, this means context can be treated as a first-class control surface—something that can be tested, monitored, and constrained instead of assumed.

Detailed methodology and limitations are discussed in the full paper.

Available to qualified readers during peer review.

FOUNDATION 03 Ongoing Research

Language Games and Sedimented Semantics

Temporal Dimensions of Context-Primacy in LLM Agents (t-CPT)

We develop t-CPT: the hypothesis that instruction-following in LLM agents stabilizes through repeated interaction patterns ("procedural sediments"), yet decays with temporal distance and interference. We operationalize this as measurable drift curves under controlled multi-turn stress tests.

Pilot signal under controlled multi-turn stress tests

Metric: threshold disclosure rate — how often internal numeric policy cutoffs are revealed as conversation length increases.

10%
Baseline disclosure — Turn 0
80%
Elevated disclosure — Turn 40
70%
Drift detected — Turn 80
55%
Degraded — Turn 120

Temporal Drift Curve

Threshold disclosure rate as a function of conversation length. Measurements are obtained via dedicated diagnostic probes, not production usage.

Pilot signal under controlled multi-turn stress tests. Full methodology available on request.

Methodology: Diagnostic Probes

We run dedicated diagnostic prompts that stress the system under long-context conditions and score outputs against defined disclosure constraints. This works with any model (open-weight or API) because we score outputs, not internal states.

Why this matters: Instruction-following reliability can degrade as interactions grow longer or more complex. Our stress tests surface these failure modes early, so deployment decisions are informed by observed behavior—not assumptions.

These tests inform deployment readiness and release decisions in production environments.

Supported by

HessenIdeen · HessianAI · Goethe Unibator · Frankfurt School · Microsoft Founders Hub