Blockchain is all about empowering people to take more control over their experiences and their finances without the need for centralized services, but it can only do this if users are able to access the data within it.
When Satoshi Nakamoto designed the world’s first blockchain, doing so was a relatively easy thing, but it’s becoming much more difficult in a multichain world made up of hundreds of decentralized networks, without any specialized tools for data retrieval. That’s because blockchain, although open and public by design, makes it difficult for users to access the data it stores.
The challenge of accessing on-chain data stems from Nakamoto’s lack of foresight. He probably wasn’t thinking of a world populated by hundreds of different blockchains and Layer-2 networks built on top of them, when he focused on aspects such as Bitcoin’s consensus mechanism and execution. As such, blockchain lacks a reliable way for users to read all of the data stored on it.
If a decentralized application is trying to check the history of transactions on Solana, for instance, it will need to wade through more than 300 terabytes of data – and that number is growing all of the time with each new transaction and block that’s added to it.
The favorite way to access on-chain data is to use something called an RPC node. RPC stands for “remote procedure call” and it is the protocol that individual nodes use to talk to the blockchain. But using RPC calls to search the blockchain is an arduous process, thanks to the way these decentralized ledgers process and store their data sequentially, one block after another.
As more and more blocks are added to the blockchain, it becomes longer and longer, with critical data scattered all over the place, categorized by the time it was created. Unfortunately, this means blockchains are incredibly disorganized. While it’s still fairly easy to search for data within a specific block or information related to a specific account, it becomes much more difficult when engaging in more complex searches that involve querying multiple blocks.
Because the data is spread all over the place, the information takes a long time to retrieve, which causes problems for decentralized applications that require such data to enable their smart contract logic to function as designed.
EVM-based chains share an interoperable data scheme that makes them easier to search, but that doesn’t extend to SVM chains and other popular networks. Because of this, anyone trying to work with data at multichain scales is going to suffer from some real headches.
Modern dApps need to access blockchain data because this is what enables their most advanced features, such as analytics tools and smart contracts.
DeFi applications for one can be vastly improved by accessing multichain data, as this allows them to tap into much greater liquidity, providing a better experience for users, with real-time updates about what’s happening across numerous blockchains. NFT marketplaces can also benefit from being able to access blockchain data, as this will enable them to provide users with more insights about collections on different chains, their prices and what people are doing with them.
Another example is SocialFi protocols such as Farcaster and Lens, which need to store much more than just transaction details, such as all of the information about who is following who and what those people are posting and saying to each other. Fetching all of this data involves some serious searching.
What’s more, there are countless scenarios in which easy-to-access blockchain data can pave the way for even more advanced use cases, such as decentralized AI. For instance, large language models could use on-chain data such as social graphs to curate content, identify trends and generate participant’s reputation scores, based on their previous blockchain interactions.
Blockchain is expected to become the foundation of a new generation of more sophisticated dApps, but in order for that vision to become a reality, developers need an easy way to access them.
The keys to the blockchain data kingdom can be found in data indexers, which are protocols that index the entire contents of a blockchain network by scanning every block posted to it. They store this information in a format that’s more consistent and simpler to query, something more akin to an SQL database, for instance. Some of the main examples of these data indexers include The Graph and SQD.
Blockchain data indexers are decentralized protocols that perform the role of the middleman when it comes to searching blockchains, making the information within them more easily accessible for developers.
The advantage of data indexers is that blockchain data can be stored in a more logical, searchable way. So a smart contract, for instance, can be stored alongside all of the transaction IDs and block numbers associated with it, making that information easier to retrieve.
Data indexers are written in high-performance code that’s designed to facilitate rapid queries, and they consist of a database to store that data in a more logically organized way, plus APIs for accessing that database. They also require an archive node that keeps fetching any new transactions as blocks are added to the chain, ensuring they’re able to access the most up to date information.
With a data indexer, developers won’t have to worry about what’s happening on the multiple base layers and L2 structures that sit above them, as all of this activity will be stored in a logical way so their dApps can access it at lightning speeds.
One of the first decentralized data indexers to emerge was The Graph, which enables any dApp to access its open marketplace for data, which is fueled by GRT token. The Graph is based on a complex, decentralized network that includes users, who need to access and consume blockchain data.
Other participants are the indexers, who query that information on their behalf, and the curators, who select the most reliable and accurate subgraphs. These are the individual schemas that determine how blockchain information is indexed, structured, made queryable and retrieved.
With The Graph, the work of the indexers and curators is incentivized, encouraging them to deliver accurate results at high speed, with honesty assured via a dispute system that allows anyone to challenge an indexer and demand proof of their work. Should an indexer be shown to be dishonest, its staked GRT would be slashed and distributed to the challenger, giving them an incentive to act honestly.
Building on the success of The Graph is SQD, one of the most advanced data indexers available. SQD launched last year and aggregates on-chain data in parquet files before distributing it across nodes hosted in a decentralized data lake. It allows anyone to build their own indexer and run it on the SQD network.
With SQD, queries are sent to a worker node that hosts the desired data range. These nodes are assigned to specific segments of the blockchain data by schedulers, and they provide a detailed map for dApps to query the data they need to access. Typically, there are multiple worker nodes that store data in the same range, and so an algorithm is used to fairly distribute query volume among them.
By aggregating the data from numerous networks into one, enormous decentralized data lake, SQD makes multichain data much easier to source. What’s more, SQD’s network scales with each new node that’s added, ensuring it can handle the bandwidth and data throughput needed to support its exponential growth.
Like with The Graph, SQD’s network is incentivized so that participants are rewarded for their efforts. This has the effect of ensuring that its capacity grows in-step with demand, keeping query costs to a minimum and eliminating bottlenecks.
By combining decentralized storage, an efficient API and its framework for rapid blockchain data retrieval, SQD has shown itself to be able to manage the growth and sprawl of the multichain blockchain world.
With the SQD SDK, developers can extract even more value from SQD’s network, taking advantage of various storage solutions and the ability to index and store data in real-time. SQD claims it has the ability to index real-time on-chain data up to 1,000 times more rapidly than traditional subgraphs, which makes it stand out as one of the most vital, future cogs in the nascent world of Web3.
This post was last modified on Oct 07, 2024, 12:36 BST 12:36