THE PROACTIVE AI-NATIVE CATALOG FOR MODERN LAKEHOUSES
Data context is scattered across wikis, dashboards, Slack threads, and tribal knowledge. Harbour is the central control plane where all data context lives—so every AI agent, every application, and every team operates from the same source of truth. Connect your data agents and they get rich, consistent context in seconds.
See how it works →Every table creation, schema change, and query pattern automatically enriches a living context graph. No manual lineage work. No external tools. The catalog itself captures relationships, detects patterns, and serves context to every connected agent and application.
Every team is building data agents and AI-native applications. But these tools are only as smart as the context they can access. When context is fragmented across dozens of systems, agents make inconsistent, uninformed decisions. Harbour centralizes it all.
Harbour is a layered control plane that separates intelligence from storage and security from compute. Every layer is independently scalable. Context flows from the graph to every connected agent and engine through rich semantic APIs.
The recommendation engine continuously analyzes metadata patterns across your entire catalog. It detects issues, surfaces optimization opportunities, and provides actionable guidance to every connected agent and team—before anything breaks.
Detects tables with runaway snapshot counts, recommends expiry policies, and suggests metadata compaction. Catches write-heavy patterns before they become production incidents.
Identifies unpartitioned tables under heavy scan load, detects missing sort orders, spots idle tables, and recommends join optimizations based on context graph patterns.
Table has accumulated excessive snapshots indicating missing lifecycle policies. Configure expiry to prevent metadata bloat and query planning slowdowns.
Queries are scanning significantly more data than necessary. Add partition pruning or sort order on frequently filtered columns to reduce I/O.
Filter columns detected in repeated query patterns. Partitioning on these columns would reduce scan volume across all connected engines and agents.
Context graph detected frequent join patterns between tables. Co-locating or pre-computing these joins could significantly reduce latency for connected applications.
Connect any Iceberg-compatible engine and it gets the same rich context. No custom integrations. No format converters. Just plug in and go.
Whether you're building ETL pipelines, deploying AI agents, or running ad-hoc analytics—Harbour is where all your data context lives. Connect any tool and it gets the full picture.
Give autonomous agents rich context about your entire data landscape. Semantic endpoints expose importance, relationships, PII classifications, and quality signals. Agents connect in seconds and make informed decisions instead of guessing.
Run Spark for ETL, Trino for ad-hoc queries, PyIceberg for notebooks, and Databricks for ML—all pointing to one catalog. Schema evolution and time travel work consistently across every engine.
Stop scattering context across tools. Lineage, quality, PII, documentation—it all lives in Harbour. Every team, every tool, every agent reads from the same source of truth. Context stays consistent.
Automatic PII detection, domain classification, RBAC with namespace-level grants, and a complete audit trail. Meet GDPR, CCPA, and SOC 2 requirements without bolting on external tools.
Deploy on Kubernetes with health probes, Prometheus metrics, and structured logging. Maintenance policies automate snapshot expiry and compaction. Self-service for data teams, full control for platform teams.
Credential vending with scoped temporary credentials for every table operation. Optimistic concurrency control. Sub-millisecond metadata cache for high-throughput streaming workloads.
Harbour runs on Kubernetes, connects to any Iceberg engine, and ships with observability built in. Your AI agents connect through semantic REST endpoints. No agents to install, no sidecars to manage, no vendor lock-in.
Plugs into your existing infrastructure. No migration required.
View the quickstart guide →S3, GCS, ADLS, MinIO. Credential vending for scoped access.
Spark, Trino, PyIceberg, Databricks. Any Iceberg engine.
PostgreSQL with Flyway migrations. H2 for development.
Kubernetes-native. Docker Compose for local development.
Semantic REST APIs. Connect any agent in seconds.
Every table, column, and relationship mapped in a knowledge graph. PageRank scores surface the most important tables automatically. Connected agents query the graph directly.
Temporary, scoped cloud credentials issued with every table load. No long-lived keys, no ambient permissions. S3, GCS, and ADLS supported out of the box.
OAuth2, API keys, RBAC with namespace-level grants, multi-tenancy, and a complete audit trail. Three security modes: disabled, permissive, enforced.
Circuit breaker, rate limiting, idempotency keys, and optimistic concurrency control with pessimistic fallback. Production-grade from day one.
Snapshot expiry, metadata compaction, orphan file cleanup. Hierarchical resolution from table to namespace to catalog. Scheduled execution with dry-run mode.
Deploy multiple replicas behind a load balancer for ingest-heavy and analytical workloads. Stateless architecture with shared PostgreSQL means you scale out by adding pods. Built for Kubernetes autoscaling.