Database structure

When a user wants to add data to the database, he creates a new storage unit and broadcasts it to his peers. The storage unit includes (among other things):

• The data to be stored. A unit may include more than one data package called a message. There are many different types of messages, each with its own structure. One of the message types is payment, which is used to send bytes or other assets to peers.

• Signature(s) of one or more users who created the unit. Users are identified by their addresses. Individual users may (and are encouraged to) have multiple addresses, like in Bitcoin. In the simplest case, the address is derived from a public key, again similar to Bitcoin.

• References to one or more previous units (parents) identified by their hashes.

References to parents is what establishes the order (only partial order so far) of units and generalizes the blockchain structure. Since we are not confined to one-parent– one-child relationships between consecutive blocks, we do not have to strive for near-synchrony and can safely tolerate large latencies and high throughputs: we’ll just have more parents per unit and more children per unit. If we go forward in history along parent-child links, we’ll observe many forks when the same unit is referenced by multiple later units, and many merges when the same unit references multiple earlier units (developers are already used to seeing this in git). This structure is known in graph theory as directed acyclic graph (DAG). Units are vertices, and parent-child links are the edges of the graph.

Figure 1. Storage units connected into a DAG. Arrows are from child to parent, G is the genesis unit.

In the special case when new units arrive rarely, the DAG will look almost like a chain, with only occasional forks and quick merges. Like in blockchains where each new block confirms all previous blocks (and transactions therein), every new child unit in the DAG confirms its parents, all parents of parents, parents of parents of parents, etc. If one tries to edit a unit, he will also have to change its hash. Inevitably, this would break all child units who reference this unit by its hash as both signatures and hashes of children depend on parent hashes.

Therefore, it is impossible to revise a unit without cooperating with all its children or stealing their private keys. The children, in turn, cannot revise their units without cooperating with their children (grandchildren of the original unit), and so on. Once a unit is broadcast into the network, and other users start building their units on top of it (referencing it as parent), the number of secondary revisions required to edit this unit hence grows like a snowball. That’s why we call this design Byteball (our snowflakes are bytes of data). Unlike blockchains where issuing a block is a rare event and only a privileged caste of users is in practice engaged in this activity, in a new Byteball unit starts accumulating confirmations immediately after it is released and confirmations can come from anyone, every time another new unit is issued.

There is no two-tier system of ordinary users and miners. Instead, users help each other: by adding a new unit its author also confirms all previous units. Unlike Bitcoin, where an attempt to revise a past transaction requires a large computational effort, an attempt to revise a past record in Byteball requires coordination with a large and growing number of other users, most of whom are anonymous strangers. The immutability of past records is therefore based on the sheer complexity of coordinating with such a large number of strangers, who are difficult to reach, have no interest in cooperation, and where every single one of them can veto the revision.

By referencing its parents, a unit includes the parent. It doesn’t include the full content of the parent; rather, it depends on its information through the parent’s hash. In the same way, the unit indirectly depends on and therefore includes the parents of the parent, their parents, and so on, and every unit ultimately includes the genesis unit.

There is a protocol rule that a unit cannot reference redundant parents – that is such parents that one parent includes another. For example, if unit B references unit A, then unit C cannot reference both units A and B at the same time. A is already, in a way, contained within B. This rule removes unnecessary links that don’t add any new useful connectivity to the graph.

Last updated