
lakeFS
Version control software
DevOps software
Source code management software
- Features
- Ease of use
- Ease of management
- Quality of support
- Affordability
- Market presence
Take the quiz to check if lakeFS and its alternatives fit your requirements.
Contact the product provider
Small
Medium
Large
- Healthcare and life sciences
- Retail and wholesale
- Accommodation and food services
What is lakeFS
lakeFS is a version control system for data stored in object storage, providing Git-like branching, commits, and merges over data lake contents. It is used by data engineering and ML teams to create isolated environments for ETL/ELT pipelines, experimentation, and reproducible data processing. The product typically sits in front of S3-compatible storage and integrates with common data tools via APIs, CLI, and SDKs. Its core differentiator is applying version-control semantics to large datasets without moving data into a separate repository format.
Git-like data branching
lakeFS provides branches, commits, and merges for data in object storage, enabling isolated development and testing workflows for data pipelines. This supports reproducibility and easier rollback compared with ad-hoc folder conventions or timestamped copies. It aligns well with CI-style practices by allowing promotion of data changes between environments. The model is familiar to teams already using Git-based workflows.
Works with object storage
lakeFS is designed to operate on top of S3 and S3-compatible object stores, rather than requiring a separate proprietary storage layer. This lets teams keep data where it already resides and apply versioning semantics at the access layer. It can fit into existing lake architectures and tooling that read/write objects. The approach can reduce the need for full dataset duplication when creating isolated environments.
Automation via APIs and CLI
lakeFS exposes programmatic interfaces (API/CLI/SDKs) that support integration into orchestration and DevOps-style automation. Teams can script branch creation, commits, merges, and policy checks as part of pipeline runs. This enables repeatable workflows and governance patterns around data changes. It is suitable for organizations that treat data pipelines as software delivery processes.
Not source code SCM
Although it uses Git-like concepts, lakeFS is not a full replacement for source code management systems used for application code. It focuses on versioning data objects and metadata rather than providing the broad ecosystem of code review, issue tracking, and developer collaboration features. Most teams still need a separate SCM for code and configuration. This can add operational overhead in coordinating code and data versions.
Operational complexity to run
Deploying lakeFS typically requires running and maintaining a service layer alongside object storage, including configuration, authentication, and integration with existing data tooling. Organizations may need to plan for availability, scaling, and upgrades as part of platform operations. This is more involved than relying solely on native object-store versioning or simple naming conventions. The operational burden can be significant for smaller teams without platform engineering support.
Merge semantics can be hard
Merging changes in large datasets can be more complex than merging source code, especially when multiple pipelines write overlapping partitions or objects. Conflicts may require domain-specific resolution strategies and careful pipeline design. Teams often need conventions and governance to avoid inconsistent states across branches. This can limit how freely teams can use branching in highly concurrent write scenarios.
Plan & Pricing
| Plan | Price | Key features & notes |
|---|---|---|
| Open Source (Self-hosted) | Free forever | Core lakeFS features (data version control, zero-copy branching, atomic merges); run locally; community support (no SLA). |
| lakeFS Cloud (Managed) | Contact sales (free trial available) | Fully managed service; optimized configuration & auto-scaling; high availability; SOC2; enterprise support; data metadata stored in your VPC. Free trial is promoted on the Cloud page (duration not specified on site). |
| Enterprise (Proprietary) | Contact sales | Enterprise-only features (RBAC, SSO, SCIM, IAM roles, Mount capability, Audit logs, Transactional Mirroring, Metadata Search, Support SLA); requires sales engagement. |
Seller details
Treeverse, Inc.
Tel Aviv, Israel
2018
Private
https://lakefs.io/
https://x.com/lakeFS
https://www.linkedin.com/company/treeverse/