fitgap

lakeFS

Features
Ease of use
Ease of management
Quality of support
Affordability
Market presence
Take the quiz to check if lakeFS and its alternatives fit your requirements.
Pricing from
Contact the product provider
Free Trial
Free version
User corporate size
Small
Medium
Large
User industry
  1. Healthcare and life sciences
  2. Retail and wholesale
  3. Accommodation and food services

What is lakeFS

lakeFS is a version control system for data stored in object storage, providing Git-like branching, commits, and merges over data lake contents. It is used by data engineering and ML teams to create isolated environments for ETL/ELT pipelines, experimentation, and reproducible data processing. The product typically sits in front of S3-compatible storage and integrates with common data tools via APIs, CLI, and SDKs. Its core differentiator is applying version-control semantics to large datasets without moving data into a separate repository format.

pros

Git-like data branching

lakeFS provides branches, commits, and merges for data in object storage, enabling isolated development and testing workflows for data pipelines. This supports reproducibility and easier rollback compared with ad-hoc folder conventions or timestamped copies. It aligns well with CI-style practices by allowing promotion of data changes between environments. The model is familiar to teams already using Git-based workflows.

Works with object storage

lakeFS is designed to operate on top of S3 and S3-compatible object stores, rather than requiring a separate proprietary storage layer. This lets teams keep data where it already resides and apply versioning semantics at the access layer. It can fit into existing lake architectures and tooling that read/write objects. The approach can reduce the need for full dataset duplication when creating isolated environments.

Automation via APIs and CLI

lakeFS exposes programmatic interfaces (API/CLI/SDKs) that support integration into orchestration and DevOps-style automation. Teams can script branch creation, commits, merges, and policy checks as part of pipeline runs. This enables repeatable workflows and governance patterns around data changes. It is suitable for organizations that treat data pipelines as software delivery processes.

cons

Not source code SCM

Although it uses Git-like concepts, lakeFS is not a full replacement for source code management systems used for application code. It focuses on versioning data objects and metadata rather than providing the broad ecosystem of code review, issue tracking, and developer collaboration features. Most teams still need a separate SCM for code and configuration. This can add operational overhead in coordinating code and data versions.

Operational complexity to run

Deploying lakeFS typically requires running and maintaining a service layer alongside object storage, including configuration, authentication, and integration with existing data tooling. Organizations may need to plan for availability, scaling, and upgrades as part of platform operations. This is more involved than relying solely on native object-store versioning or simple naming conventions. The operational burden can be significant for smaller teams without platform engineering support.

Merge semantics can be hard

Merging changes in large datasets can be more complex than merging source code, especially when multiple pipelines write overlapping partitions or objects. Conflicts may require domain-specific resolution strategies and careful pipeline design. Teams often need conventions and governance to avoid inconsistent states across branches. This can limit how freely teams can use branching in highly concurrent write scenarios.

Plan & Pricing

Plan Price Key features & notes
Open Source (Self-hosted) Free forever Core lakeFS features (data version control, zero-copy branching, atomic merges); run locally; community support (no SLA).
lakeFS Cloud (Managed) Contact sales (free trial available) Fully managed service; optimized configuration & auto-scaling; high availability; SOC2; enterprise support; data metadata stored in your VPC. Free trial is promoted on the Cloud page (duration not specified on site).
Enterprise (Proprietary) Contact sales Enterprise-only features (RBAC, SSO, SCIM, IAM roles, Mount capability, Audit logs, Transactional Mirroring, Metadata Search, Support SLA); requires sales engagement.

Seller details

Treeverse, Inc.
Tel Aviv, Israel
2018
Private
https://lakefs.io/
https://x.com/lakeFS
https://www.linkedin.com/company/treeverse/

Tools by Treeverse, Inc.

lakeFS

Best lakeFS alternatives

Git
P4
Mercurial
See all alternatives

Popular categories

All categories