Data Availability Layer - One of the critical layers to Blockchain Scalability & Security
The term "data availability" (DA) in the context of Ethereum was first introduced in the Ethereum Improvement Proposal (EIP) 4844 at ETHDenver 2022. It is a key concept that significantly influences both scalability and security of the ethereum blockchain. Data availability refers to the accessibility of transaction data for each block within the blockchain, a critical factor for network participants, like full node operators, to validate blocks effectively. However, requiring every participant in the network to download complete transaction data for each block is not scalable as it imposes a substantial hardware and network burden on participants, hindering the blockchain's ability to scale efficiently.
Data availability can be broken down into two fundamental aspects: firstly, the guarantee that the transaction data representing transactions in each block is accessible; and secondly, the ability to verify blocks without the need to download the entire transaction dataset. While the full nodes do download the entire transaction data for the block and execute the individual transactions to ensure the correctness of a block, it is not feasible to do the same for light nodes, stateless nodes or L2 rollups. Hence the data availability solutions that are being developed by different projects are extremely crucial for the success of such participants and scalability of the blockchain.
In this post, I focus on establishing a framework for assessing the current landscape of Data availability solutions, particularly in the context of their application in Layer 2 (L2) solutions. To illustrate this framework, I will examine three distinct projects: Avail, Celestia and EigenDA. (Note: An affiliate of Protagonist Management LLC is an investor in Layr Labs, Inc., the company that is developing EigenDA.)
Current State of Data Availability Solutions
The landscape of data availability solutions within the modular blockchain ecosystem is rapidly evolving and is one of the most powerful narratives entering into 2024. The projects within the scope of this post are in various stages of development.
Avail, started by a team at Polygon, is a project aimed at developing modular blockchains that enables developers to create customizable and scalable applications. Positioning itself as a robust base layer with a sharp focus on data availability, Avail has gained significant traction on its testnet with a mainnet planned to launch sometime in 2024
Launched in 2023, Celestia stands out as a modular data availability network. Currently, it operates with its mainnet in beta testing. The network has made a substantial impact, and is considered one of the leading projects in the data availability space.
Originating as a proof of concept within the EigenLayer ecosystem, EigenDA has evolved into a more ambitious project. Initially intended to be the first AVS (Actively Validated Service) on EigenLayer, the team realized the potential for a high-performance data availability layer and how it could be a critical component in the broader modular blockchain narrative. Currently in the testnet phase, EigenDA plans to launch its mainnet by the end of Q1 2024, marking a significant milestone in its development.
Architecture Highlights
While doing a detailed architecture deep dive is beyond the scope of this article, here are some of the architectural highlights for each of the solutions and what distinguishes them from each other.
* Erasure Coding - Erasure coding (EC) is a method of data protection that involves breaking data into fragments, expanding and encoding them with redundant data pieces, and then storing them across different nodes. This technique increases data redundancy and allows for the reconstruction of data in case of node failure or data corruption.
** Named Merkle Trees - Celestia partitions the block data into multiple namespaces, one for every application (e.g., rollup) using the DA layer. As a result, every application needs to download only its own data and can ignore the data of other applications.
Evaluating Data Availability solutions
In this section I establish criteria that teams building L2 solutions should consider when evaluating the various solutions.
Cost
Understanding the cost structure is crucial for L2 networks, particularly since high data availability costs on the mainnet are a significant driver for moving to alternative solutions. The costs for any data availability solution typically consists of transaction fees, data storage fee, and bandwidth utilization fee, along with optional priority fees for faster processing. To make the costs more predictable, data availability solutions might offer long term reservation of storage and bandwidth to keep costs predictable and low.
A key strategic consideration is whether L2 networks can pay for these costs in their native tokens, potentially easing the economic strain during initial scaling and user onboarding phases. As data availability layers are still evolving, there's an opportunity for L2 projects to collaborate with these platforms to develop a cost model that benefits all parties involved, ensuring economic feasibility and sustainability.
Performance
Comparing the block size, latency, and throughput of different networks is essential to determine which data availability network best suits your needs.
Block size matters because it allows L2 solutions to bundle transactions together into a single block. The larger the block size, the more transactions can be processed and included in one block, which helps to eliminate data availability as a bottleneck in transaction processing. While most data availability solutions aim for a 1GB block size, the initial block sizes supported by these solutions would be much lower.
Latency is defined as the time it takes from when a Layer 2 solution requests to write a data blob to the data availability layer, to the point when the availability of this blob can be verified on the chain. Lower latency is crucial because it means the data availability layer becomes less of a bottleneck in the transaction workflow.
Throughput is measured by the number of bytes that can be written to the DA layer within a specific timeframe. Higher throughput is crucial for use cases that require more real time data streaming such as gaming, streaming video/audio etc. It’s also important to consider if the solution can be tuned for higher throughput or low latency depending on the use case.
Data Availability Guarantees
Redundancy of data is crucial to ensure data availability in case a percentage of nodes is compromised or corrupted. All the three data availability solutions utilize erasure coding for data protection and redundancy. Further, while AvailDA and EigenDA utilize Validity proofs in the form of KZG polynomial commitments to ensure data availability and validity, Celestia DA relies on fraud-proofs for the same.
In addition to the above, EigenDA utilizes Proof of custody where each operator must routinely compute and commit to the value of a function which can only be computed if they have stored all the blob chunks allocated to them over a designated storage period. EigenDA also has a feature called Dual Quorum, where two separate quorums can be required to attest to the availability of data making it more secure.
Security
It’s important to monitor the security of the data availability solutions on an ongoing basis, not just when evaluating a solution for your L2. Apart from ensuring that the network code and any smart contract code is open source and audited by a well-reputed auditing firm, there are a few other considerations to keep in mind when evaluating various data availability solutions. Firstly, consider what is at stake for the node operators to ensure they operate honestly. While AvailDA and CelestiaDA require operators to stake their native token (AVL, TIA respectively), EigenDA requires operators to restake ETH. While this isn’t a direct reflection of the quality of the stake, it is important to monitor the value of the stake to ensure that it’s significant enough to provide crypto economic security.
Secondly, it is also important to track the incentives for node operators and how many nodes are participating in securing the network. As these networks mature, they will compete for market share, and operators will be attracted to networks that offer good economic incentives. Good incentives also lead to more node operators participating in securing the network, making the network more decentralized.
Finally, understanding the level of fault tolerance of these networks is very important. Essentially, this means determining the percentage of nodes that need to collude or fail for the network to be compromised. This number, coupled with the economic incentives, will help you gauge the security of the network.
Conclusion
Data availability is set to play a crucial role, especially in the narrative of modular blockchains. L2 solutions being developed will increasingly depend on these data availability solutions to scale effectively. However, this isn't likely to be a 'winner-takes-all' market. Diverse solutions like AvailDA, CelestiaDA and EigenDA each have the potential to capture a share of the market. These solutions might even carve out niche markets for themselves, catering to specific sectors such as gaming, real-world assets (RWA) etc.
An intriguing possibility that isn’t prevalent yet but could emerge is L2 solutions utilizing multiple data availability solutions simultaneously. This approach could be for redundancy or to meet varying Service Level Agreements (SLAs) for different use cases. Looking ahead, we might see data availability solutions evolve beyond just handling transaction data. They could start storing any arbitrary data blobs required by Decentralized Applications (DApps) to deliver their services. Keeping an eye on how these solutions evolve will definitely be interesting.