Introducing Harbour

The Proactive AI-Native Catalog for Modern Lakehouses

e6data

Introducing Harbour

THE PROACTIVE AI-NATIVE CATALOG FOR MODERN LAKEHOUSES

Data context is scattered across wikis, dashboards, Slack threads, and tribal knowledge. Harbour is the central control plane where all data context lives—so every AI agent, every application, and every team operates from the same source of truth. Connect your data agents and they get rich, consistent context in seconds.

See how it works
AI-Native
Built for autonomous agents, not just dashboards
Interoperable
Spark, Trino, PyIceberg, Databricks—one catalog
Proactive
Detects issues and recommends fixes before they break
Scalable
Horizontally scales for ingest-heavy and analytical workloads

One place for all your data context

Every table creation, schema change, and query pattern automatically enriches a living context graph. No manual lineage work. No external tools. The catalog itself captures relationships, detects patterns, and serves context to every connected agent and application.

BELONGS_TO BELONGS_TO JOINS JOINS CONTAINS_PII CONTAINS_PII DATA AI AGENT ecommerce orders customers products amt date uid email name phone sku price cat PII: email PII: phone HEALTHY
PageRank: 0.42
Rec: Add partition on order_date
Agent connected: 3 active
AI Agent
Namespace
Table
Column
PII
Relationship

Your AI agents are flying blind without centralized context

Every team is building data agents and AI-native applications. But these tools are only as smart as the context they can access. When context is fragmented across dozens of systems, agents make inconsistent, uninformed decisions. Harbour centralizes it all.

Without Harbour

Context is scattered across data catalogs, BI tools, documentation wikis, and Slack threads. Every agent and application builds its own incomplete picture. Nothing is consistent.

With Harbour

One canonical context graph with every relationship, quality signal, and semantic enrichment. Connect an agent and it instantly understands your entire data landscape. One source of truth for all.

Without Harbour

Every data tool maintains its own metadata copy. Lineage in one system, quality metrics in another, documentation in a third. Nothing stays in sync. Agents get stale, contradictory context.

With Harbour

Single source of truth. Lineage, quality, PII classification, importance scores, and recommendations—all in one place, always consistent, always current. Every connected tool sees the same picture.

Without Harbour

PII hides in column names like "col_7" or "user_info". Compliance requires months of manual data classification. Agents have no idea which data is sensitive.

With Harbour

Automatic PII detection classifies columns as email, phone, SSN, or credit card. Domain tagging organizes tables by business context. Agents know what's sensitive before they touch it.

Without Harbour

Connecting Spark, Trino, and Databricks to the same catalog requires custom glue code, format converters, and constant maintenance. Each engine sees a different version of the truth.

With Harbour

Full Iceberg REST Catalog spec compliance. Any Iceberg-compatible engine connects natively. One catalog, every engine, zero glue code. Context flows consistently to all.

Without Harbour

AI agents hit opaque APIs, get cryptic errors, and have no context about data relationships, quality, or importance. They're guessing. Badly.

With Harbour

Rich semantic endpoints give agents full context: table importance scores, related tables, data quality signals, and actionable recommendations. Connect in seconds, not weeks.

Harbour is a layered control plane that separates intelligence from storage and security from compute. Every layer is independently scalable. Context flows from the graph to every connected agent and engine through rich semantic APIs.

Traditional REST Catalog
SPARK / TRINO / PYICEBERG
REST CATALOG
Returns table location only
NO INTELLIGENCE
No context, no recommendations, no PII detection
S3 / GCS / ADLS
e6 Harbour
AI AGENTS / SPARK / TRINO / DATABRICKS
CONTEXT CONTROL PLANE
Metadata + context + scoped credentials in one call
CONTEXT GRAPH + AI ENGINE
PageRank, recommendations, PII, lineage, quality
S3 / GCS / ADLS / MinIO

Proactive recommendations, not reactive firefighting

The recommendation engine continuously analyzes metadata patterns across your entire catalog. It detects issues, surfaces optimization opportunities, and provides actionable guidance to every connected agent and team—before anything breaks.

SNAPSHOT MANAGEMENT

Automatic lifecycle optimization

Detects tables with runaway snapshot counts, recommends expiry policies, and suggests metadata compaction. Catches write-heavy patterns before they become production incidents.

SCHEMA & ACCESS INTELLIGENCE

Workload-aware optimization

Identifies unpartitioned tables under heavy scan load, detects missing sort orders, spots idle tables, and recommends join optimizations based on context graph patterns.

HIGH PRIORITY
SNAPSHOT_EXPIRY
Detected > 100 snapshots

Table has accumulated excessive snapshots indicating missing lifecycle policies. Configure expiry to prevent metadata bloat and query planning slowdowns.

SCAN_OPTIMIZATION
High scan-to-query ratio

Queries are scanning significantly more data than necessary. Add partition pruning or sort order on frequently filtered columns to reduce I/O.

AI INSIGHTS
PARTITION_SUGGESTION
Based on context graph

Filter columns detected in repeated query patterns. Partitioning on these columns would reduce scan volume across all connected engines and agents.

JOIN_OPTIMIZATION
Frequent join patterns

Context graph detected frequent join patterns between tables. Co-locating or pre-computing these joins could significantly reduce latency for connected applications.

One catalog. Every engine. Consistent context.

Connect any Iceberg-compatible engine and it gets the same rich context. No custom integrations. No format converters. Just plug in and go.

One control plane for every data workflow

Whether you're building ETL pipelines, deploying AI agents, or running ad-hoc analytics—Harbour is where all your data context lives. Connect any tool and it gets the full picture.

Deploy in minutes. Connect everything.

Harbour runs on Kubernetes, connects to any Iceberg engine, and ships with observability built in. Your AI agents connect through semantic REST endpoints. No agents to install, no sidecars to manage, no vendor lock-in.

Runs with your data stack

Plugs into your existing infrastructure. No migration required.

View the quickstart guide
Storage

S3, GCS, ADLS, MinIO. Credential vending for scoped access.

Compute

Spark, Trino, PyIceberg, Databricks. Any Iceberg engine.

Database

PostgreSQL with Flyway migrations. H2 for development.

Orchestration

Kubernetes-native. Docker Compose for local development.

AI Agents

Semantic REST APIs. Connect any agent in seconds.

Context Graph with PageRank

Every table, column, and relationship mapped in a knowledge graph. PageRank scores surface the most important tables automatically. Connected agents query the graph directly.

Credential Vending

Temporary, scoped cloud credentials issued with every table load. No long-lived keys, no ambient permissions. S3, GCS, and ADLS supported out of the box.

Enterprise Security

OAuth2, API keys, RBAC with namespace-level grants, multi-tenancy, and a complete audit trail. Three security modes: disabled, permissive, enforced.

Resilience Built In

Circuit breaker, rate limiting, idempotency keys, and optimistic concurrency control with pessimistic fallback. Production-grade from day one.

Maintenance Policies

Snapshot expiry, metadata compaction, orphan file cleanup. Hierarchical resolution from table to namespace to catalog. Scheduled execution with dry-run mode.

Horizontally Scalable

Deploy multiple replicas behind a load balancer for ingest-heavy and analytical workloads. Stateless architecture with shared PostgreSQL means you scale out by adding pods. Built for Kubernetes autoscaling.