Riccardo Marchesin Universita` degli Studi di Trento Trento, Italy
Scalable UTXO Smart Contracts via Fine-Grained Distributed State
Xxxxxxx Xxxxxxxxxx
Universita` degli Studi di Cagliari Cagliari, Italy xxxx@xxxxx.xx
Xxxxxxxx Xxxxxxxxx Universita` degli Studi di Trento Trento, Italy
Xxxxxxx Xxxxxx
Universita` degli Studi di Trento Trento, Italy xxxxxxx.xxxxxx@xxxxx.xx
Index Terms—Blockchain, smart contracts, hybrid UTXO- account model, parallel validation, Cardano
arXiv:2406.07700v1 [xx.XX] 11 Jun 2024
Abstract—Current UTXO-based smart contracts face an efficiency bottleneck, requiring any transaction sent to a contract to specify the entire updated contract state. This requirement becomes particularly burdensome when the contract state contains dynamic data structures, such as maps, which are needed in many use cases for tracking users interactions with the contract. The problem is twofold: on the one hand, a large state in transactions implies a large transaction fee; on the other hand, a large centralized state is detrimental to the parallelization of transactions, which should be one of the main selling points of UTXO- based blockchains compared to account-based ones. We propose a technique to efficiently execute smart contracts on an extended UTXO blockchain, which allows the contract state to be distributed across multiple UTXOs. In this way, transactions only need to specify the part of the state they need to access, reducing their size (and fees). We also show how to exploit our model to parallelize the validation of transactions on multi-core CPUs. We implement our tech- nique and provide an empirical validation of its effectiveness.
1. Introduction
Current smart contract platforms can be roughly parti- tioned in two main classes: account-based (e.g., Ethereum) and UTXO-based (e.g., Cardano). Besides the well-known differences in the programming style of smart con- tracts [1], these two paradigms have fundamental dif- ferences that affect the performance and scalability of smart contracts. A well-known weakness of the account- based model is that it hinders the parallelization of val- idator nodes: indeed, detecting when two transactions are parallelizable is an hard problem, and approaches that try to improve the node performance require complex optimistic execution or static analysis techniques [2], [3]. By contrast, in the UTXO model detecting when two transactions are parallelizable is straightforward: it just amounts checking that their inputs satisfy the Xxxxxxxxx conditions [4]. However, in smart contract platforms based on the UTXO model, like Cardano, smart contracts are typically represented as a single transaction output: there- fore, transactions representing actions on the same smart contract are never parallelizable, since they spend the same input. Another typical problem is that transactions acting on a UTXO contract must include the whole up- dated contract state. When the contract state gets large
(e.g., for contracts storing user data in maps), this is a clear performance bottleneck. Besides that, transaction size is usually capped, so in some scenarios one can easily reach a point when further contract updates are forbidden. This problem can be mitigated by segregating the state, i.e. storing only the state hash on the blockchain. While this might reduce the transaction footprint on the blockchain, validators still need to access the whole state corresponding to such hashes, hence this approach still requires a significant amount of network communications. Summing up, current UTXO models of smart contracts scale poorly when the contract state grows large.
Distributing the contract state. In this paper we propose a framework to improve the performance and scalability of contracts in the UTXO model. The key idea it so distribute the contract state over multiple UTXOs. As a basic example, if the contract state contains two variables, we store their values in distinct UTXOs: in this way, contract actions accessing different variables can be parallelized. More in general, we distribute arbitrary contract states, including those comprising unbounded data structures. This is done in a fine-grained way: e.g., maps are distributed point-wise w.r.t. their finite support. While distributing the contract state has a clear advan- tage in parallelization, maintaining the contract balance in UTXOs is problematic. Of course, keeping the entire contract balance in a single UTXO introduces a single point of synchronization, hindering parallelization. Even splitting the balance across multiple UTXOs does not completely solve the problem: for instance, consider a contract with a balance of 2 tokens, split in two distinct UTXOs. If we want to perform two actions, each with- drawing 1 token from the contract, it may happen that both of them attempt to spend the same UTXO, hence only one of them will be valid, hindering parallelization again. This problem can manifest itself more frequently when a withdrawal requires spending several UTXOs of lesser value; indeed, a conflict arises when two actions attempt to spend the same output, and this becomes more likely when the actions need to consume many UTXOs. To address this problem, we introduce a new UTXO design, which separates the contract state (kept in UTXOs) from the contract balance (which is kept in an account-based fashion). In this way, withdrawing from the contract (i.e. from its account) never causes conflicts, provided the balance is sufficient. We dub our model hybrid UTXO (hUTXO), since it combines features of both models.
Thwarting forgeries. While distributing the contract state and isolating the balance enables the parallel valida- tion of transactions, this additional complexity can expose contracts to new kinds of attacks. Recall that in a standard UTXO model, an adversary can always create a new UTXO with arbitrary fields (including the token amount, as long as the adversary spends the needed tokens). If the contract state is encoded in UTXOs with zero tokens, as in our proposal, it becomes possible for an adversary to craft an UTXO encoding a corrupted part of the state, at the sole cost of the transaction fee. The adversary could then fire a contract action operating on the corrupted state and on the actual balance, possibly withdrawing more tokens than intended. To thwart these attacks, we use a combination of techniques:
• in our hUTXO model, we introduce contract identi- fiers: the same identifier is shared by all the UTXOs contributing to the contract state and by the associ- ated balance account.
• at the hUTXO level, we enforce validity conditions on new transactions: one can spawn new UTXOs with pre-existing contract identifiers only when they are vetted by a script of a previous UTXO that is already linked to that contract identifier.
• finally, we use scripts to ensure that the new UTXOs are coherent with the contract logic.
From high-level contracts to hUTXO. The last item above highlights a possible weakness of our proposal: since programming in the UTXO model is notoriously error-prone [5], developers must be extremely careful so to avoid introducing vulnerabilities that would pave the way to forgery attacks. To simplify the development of smart contracts in the hUTXO model, we propose a high- level smart contract language, dubbed HURF, with a com- piler to hUTXO transactions. HURF abstracts from the hUTXO transaction structure, modelling smart contracts as sets of rules reading and updating the contract state. The contract state can include basic values (e.g. integers or bitstrings) and maps from basic values to basic values. A contract rule can verify a precondition on the contract state, update the state by performing a list of variable/map assignments, and transfer currency from/to users. The HURF compiler guarantees that the scripts in the hUTXO transactions resulting from compilation enable exactly the state/balance updates programmed by the HURF rules. Overall, this thwarts the above-mentioned forgery attacks.
×
Parallel validation. The distribution of contract state and the parallelization of transaction validation achieves a substantial performance speedup. To measure it, we have implemented a simulator of a hUTXO validator, which can either validate transactions sequentially, or in parallel across multiple CPU cores. As a benchmark, we a used a real-world crowdfunding contract, for which we have given two implementations: one with a centralized state, and with a distributed one. This contract is particularly relevant to study distribution and parallelization, since it features an unbounded data structure to store the crowd- funding contributors and their donations. The experiments show a 13 speedup from the centralized version with se- quential validation to the distributed version parallellized on 8 cores.
Contributions. We summarise our contributions below:
• A high-level smart contract language, HURF, and its compiler to the hUTXO model (Sections 3 and 4).
• A parallel algorithm for hUTXO validators in the hUTXO model (Section 5).
• An implementation of a hUTXO validator simulator, featuring sequential and parallel validation, and a series of experiments with a crowdfunding contract (Sections 5 and 6).
2. The hUTXO blockchain model
In this section we present hUTXO (standing for hybrid UTXO), a UTXO-based blockchain model extended with contract state and contract balance accounts. This model is similar to Xxxxxxx’x eUTXO model [6]–[8], in that a transaction output contains a datum field which can be used to store contract data. As in the eUTXO model, covenants [9]–[11] are used to ensure that that the datum is updated according to the smart contract behaviour. We further extend the eUTXO model by adding a ctrId field to the transactions, which represents the ID of the contract which is being affected by the transaction. If no contract is being involved, ctrId is set to zero. Our transaction outputs are then augmented with a boolean inContract field, marking which outputs belong to the contract specified by ctrId .
Finally, we model the balance of contracts as a map form contract IDs to currency. This map is not stored inside the UTXOs nor inside the transactions. Instead, it is a part of the implicit blockchain state, as it happens in the account-based model followed by e.g. Ethereum. This makes our blockchain model a hybrid UTXO-account one.
In our model, we will use data of the following primitive types:
Bool, N, PubK, Hash, TokenId, Script, Data
The meaning of most of these data types is straightfor- xxxx. We only mention that Script is the type of the redeeming scripts, while Data is a generic type that can hold arbitrary data (concretely represented as a bitstring). Our model is formalized in Figure 1. There, we de- scribe the format of Inputs, Outputs, and transactions (Tx). Below, we provide some intuition about the fields
of a transaction T: Tx.
• Outputs store tokens (denoted by field value: Value) and data (denoted by field datum: Data). Each output has a validator script, i.e. a program that allows it to specify the condition under which it can be either spent or checked by another transaction. The inContract flag determines whether the output and its data are part of a contract or not.
• Every input references a previous unspent output. The spent flag determines whether the output is spent (i.e., T grabs its tokens) or to checked (i.e., T just reads its fields, including the datum, leaving it un- spent1). The output script must be satisfied regardless
1. The ability of reading output data without spending the output is similar to Xxxxxxx’x checked inputs [12].
=
=
TxId def Hash CtrId def Hash
=
Wallet def TokenId → N
= (txId :
OutputRef def TxId, index : N)
= (from:
TimeInterval def N, to: N)
5) For each input of T (including those with spent = false), if it refers to the output o then evaluating o.validator yields true.
6) For each input of T, if it refers to some output o in T′
with o.inContract = true, then T′.ctrId = T.ctrId .
7) If all inputs of T refer to outputs with inContract =
false, then:
a) If all outputs of T have inContract = false, then
def
Output
= (value:
Value, validator : Script,
T.ctrId = 0.
datum: Data, inContract : Bool)
=
Input def (outRef : OutputRef, redeemer : Data, spent : Bool)
def
Tx
= (in: List[Input], out : List[Output], signers: List[PubK],
validityTime: TimeInterval, fee: N, ctrId : CtrId)
Accounts def
b) Otherwise, if some output of T has inContract = true, then T.ctrId is equal to the hash of the outRef of the first input of T with spent = true.
8) If T.ctrId = 0, let vin be the sum of all the o.value where o is referred by an input of T with spent = true. Moreover, let vout be equal to the sum of all
Ledger def
= CtrId → Value
List[Tx], accounts: Accounts, time: N)
the o′.value of outputs o′ of T, and v
fee
be a value of
= (txs:
Figure 1: Types for the hUTXO model.
T.fee units of the native cryptocurrency. We require that vin ≥ vout + vfee.
9) If T.ctrId 0, then let bal = A(T.ctrId ) be the
of the spent flag. An input also contains a redeemer : a piece of data that can be checked by the referred output script.
• The validityTime field specifies time constraints for the validation of the transaction.
• The fee field specifies the amount of native cryp- tocurrency that the transaction pays to validators.
• The ctrId field uniquely identifies a contract in the blockchain. More specifically, an output o with o.inContract = true in T is considered part of the contract specified by T.ctrId . Indeed, the current contract state is collectively represented by all such unspent outputs, distributed across all the transactions with that ctrId .
The Ledger state comprises the list of the transactions that have been appended so far (txs), the contract balances (accounts), and the time, abstractly represented as a natu- ral. The accounts field is a mapping that associates to each contract ID the amount of tokens it owns. Since contracts have their balance represented in the accounts map, there is no need to also store their tokens in the UTXOs. For this reason, the outputs having inContract = true usually carry a value of zero tokens. Note that such value-less outputs can still be useful, since they can hold a part of the contract state in their datum, as well as a validator script to implement the contract behaviour.
B
B
B A
Validity of hUTXO transactions. Below we list the conditions needed for the validity of a transaction T: Tx in a given ( , , t): Ledger. As an auxiliary notion, we say that an output is unspent in if there is no transaction T in that has an input referring to that output with spent = true. Then, we say that a transaction T is valid when:
1) All inputs of T refer to unspent outputs in B.
current contract balance, and let vin , xxxx , vfee as in Item 8. We require that vin + bal ≥ vout + vfee.
We now discuss the above validity conditions. Item 1 requires that the inputs have not already been spent by a previous transaction. Item 2 prevents T from double spending the same output. Item 3 requires T to spend at least one previous output. In this way, T can be appended at most once to the ledger. Item 4 ensures that the time constraints are respected. Item 5 requires that T satisfies each script o.validator of each output o referred to in the inputs of T. Items 6 and 7 control the creation of new contract outputs, so that they can not be forged. More specifically, Item 6 restricts the outputs that can be created inside an existing contract ctrId , by requiring that some input refers to a previous output o in the same contract. This makes it possible for o.validator to impose its own requirements on the new outputs, rejecting output forgeries by allowing only those allowed by the contract logic. By contrast, Item 7 handles the case where a fresh contract is deployed: this happens when there are no contract inputs but some contract outputs. Here T.ctrId is mandated to be a fresh contract ID, so that one can not maliciously reuse a previous ID and forge outputs inside such contract. Items 8 and 9 ensure that the currency provided by the inputs is enough to cover for all the outputs and the fees. If some contract is involved by the transaction, then the contract balance can also contribute to pay outputs and fees.
B A
B A
Updating the ledger. Appending a valid transaction T makes the ledger state ( , , t) evolve into a new state ( T, ′, t′). The accounts map is unchanged if T is not a contract action (i.e., T.ctrId = 0), while it is updated at point γ = T.ctrId otherwise:
A
′(γ) = (1)
(
B
2) The inputs of T with spent = true refer to distinct outputs in . Note instead that the inputs of T with spent = false may refer to the same output of any other input.
3) At least one input of T has spent = true.
4) The current time t falls within the validityTime
interval of T.
A(γ) + vin − vout − vfee if γ = T.ctrId 0
A(γ) otherwise
≥
where xxx , vout , vfee are defined as in Item 8 of the previous paragraph. The time update is non-deterministic, with the only constraint that t′ t. We do not specify further how time is handled in the model: it could be either
tied to real time or to the block height of the ledger. Our model is independent from this choice.
Scripts. For our purposes the specific script language is immaterial, so hereafter we do not explicitly provide its syntax and semantics. Given a transaction T referring to an output o in its inputs, we just assume that the script o.validator can read:
• a set of preconditions that must be met in order for the rule to be fired;
• an effect that can modify the contract state σ and transfer tokens to users, affecting the contract balance w accordingly.
To formalize the contract state, we first define the base values BVal as
(a) all the fields of T;
=
(b) all the fields of the sibling outputs, i.e. those referred
BVal def Bool
| Z | String
to by the inputs of T (including o itself).
In particular, by item (a) it follows that the script can access the redeemer field in the input of T referring to o. This is analogous to the way witnesses are used in Bitcoin. Moreover, the script can access both o (by item (b)) and each output o′ of T (by item (a)). This makes it possible, for instance, to enforce a covenant by checking that o′.validator = o.validator and that o′.datum is related to o.datum according to the specific
Then, we make state σ associate each variable x to a base value σ(x): BVal, and each each map m to a function from tuples of base values to a single base value, i.e.:
σ(m): BVal∗ → BVal
To denote the function σ(m) we will write
σ(m) = [(l1, · · · , l1 ) '→ v1] · · · [(lk, · · · , lk) '→ vk]
covenant logic one wants to implement. Furthermore, by item (b), o.validator can allow T to spend o only if T simultaneously spends a sibling o′′ of a wanted form.
1 n 1 n
for the function that takes the value vi at the point
(li , · · · , li ), and the default value 0 at any other point.
More specifically, o′′ can be required to be a contract
1 n
Accessing the state of a map requires one to specify
output by checking o′′.inContract .
Efficiency of the validating nodes. Moreover, the state maintained by blockchain nodes, i.e. the set of UTXOs, is already sufficient to implement the script semantics.
3. Smart contracts model
Although the transactions validation mechanism out- lined in Section 2 allows one to define programmatic exchanges of tokens, it does not provide a sharp notion of smart contract. This is because the above-mentioned programmatic exchanges can result from the interaction of multiple different scripts distributed among several transactions: in this sense, the smart contract could be seen as an emerging high-level behaviour of the low-level behaviour of these transactions.
In this section we provide a higher-level model of smart contracts, HURF (hUTXO Rule Format), where the programmatic exchange of tokens is specified by a set of rules. We will show in Section 4 how to compile and execute HURF contracts on hUTXO blockchains. In this way, HURF provides a lower bound to the expressive- ness of hUTXO contracts, as any HURF contract can be effectively implemented in hUTXO.
Smart contract as rules. Syntactically, a HURF con- tract comprises:
• a declaration of the variables and maps that contribute to its state (σ), possibly including their initial values;
• a set of rules (R) that specify how the contract state
σ and balance w can be updated.
Figure 2 shows a simple example of contract. Users exe- cute contracts by repeatedly choosing and firing contract rules. Each rule R consists of three parts:
• a signature providing the rule name and the sequence of formal parameters to be instantiated by users upon firing the rule;
the map name m and the point (l1, . . . , ln) where it is accessed. We will refer to the pair (m, (l1 . . . , ln)) as a map location.
The contract balance w is a Wallet (i.e., a map from token types to non-negative integers) that represents the amount of tokens owned by the contract.
Expressions. Expressions e appearing in preconditions and effects have the following form:
• boolean, integer, and string constants;
• e1 op e2, where op is an arithmetic or boolean operator;
• not e, to negate a boolean;
• if e0 then e1 else e2, conditional;
• m[e1,...,ek], accessing the map m at point
(e1,...,ek);
• hash(e1,...,ek), hashing a tuple of values;
• e1 @ e2, concatenating two strings;
• len(e), the length of a string;
• substr(e, e1, e2), taking the substring from character in position e1 to the one in position e2 of a given string e;
• toStr(e1,...,ek), converting a tuple of values to a string;
• signedBy(e), checking whether user e authorizes the rule invocation;
• validFrom, validTo, the validity temporal bounds for the rule invocation;
• rule parameters and state variables.
Note that the language of expressions has no operators to iterate over maps. This ensures that the evaluation time of expressions does not depend on the size of maps.
W.l.o.g. there are no expressions to obtain the contract balance: if needed, the developer can record in contract variables the balances of the different tokens stored by the contract. However, doing so may affect the paralleliz- ability of contracts, so these variables must be dealt with carefully.
contract Bank { map m;
deposit (x, a) {
receive ( x: T); // precondition m[ a] = m[ a]+x; // effect
}
withdraw (x, a) {
require ( m[ a] >=x and signed By ( a)); m[ a] = m[ a]- x | a. send ( x: T);
}
}
Figure 2: A simple bank contract.
Preconditions. A rule precondition is a list of require- ments of the following forms:
contract Crowdfund { map m;
var owner = " pub_key_owner "; var threshold = 1000 ;
var t_withdraw = 10; var t_refund = 20;
fund (x, a) {
receive ( x: T);
m[ a]=m[ a]+x;
}
withdraw ( x) { require ( signed By ( owner)
and valid From >= t_withdraw and validTo < t_refund
and x >= threshold ); owner. send ( x: T);
}
refund ( a) { require ( valid From >= t_refund ); m[ a ]=0 | a. send ( m[ a]: T);
• zero or more receive(e:T), enabling the rule only }
if some user sends exactly e tokens of type T to the }
contract account.
• one require(e), enabling the rule only if e evalu- ates to true.
Figure 3: A crowdfunding contract.
Effects. The effect of a rule allows to simultaneously assign values to contract variables and maps, while also transferring assets from the contract balance to a given user.
The effect, denoted as S1 | ... | SN, simultane- ously executes statements Si of the following forms:
• x = e, assigning a value to variable x;
• m[e1,...,ek] = e’, updating the map m at point
(e1,...,ek);
• e.send(e’:T), sending e’ tokens of type T from the contract balance to a user (whose public key is) e. If the contract balance is insufficient, the rule cannot be fired.
We stress that the execution of statements is simulta- neous: all the expressions in the effect are first evaluated in the old state, which is then updated. To ensure that such update is well-defined, we postulate that in each rule, each variable and map location is assigned at most once. More specifically, we postulate that effects satisfy the following:
• if the effect comprises x = e and x = e’, then e
and e’ must be equal;
• if the effect comprises m[e1,...,ek] = e and m[f1,...,fk] = e’, with different expressions e and e’, then the rule require precondition must check that
not ( e1 == f1 and ... and ek == fk)
Hereafter, for simplicity we will not explicitly include this check in our rules.
m that associates the users’ public keys to the amount of
Example: bank. We specify in Figure 2 a bare-bone bank contract. The contract state consists of a single map
the owner to withdraw tokens from the contract, but only if the amount is greater than the given threshold. The owner can invoke this rule only within a time window, i.e. the interval between t withdraw and t refund. Note that withdraw fails if the donations have not reached the threshold. Finally, the rule refund allows donors to redeem their donations after the deadline t refund has passed.
⟨ ⟩
⟨ ⟩ ⟨ ⟩
Contract behaviour. To describe the behaviour of a contract, besides its state and balance we must also con- sider the tokens held by users. These are needed to provide inputs to the contract (to fulfill its receive conditions), and to transfer tokens from the contract to users (through the send statement). We denote by A, w x the fact that a user A holds a deposit w of tokens. The name x is used to discriminate between different deposits of the same user. For instance, in the same system configuration we can have both A, [1: T] x and A, [2: T] y for two distinct deposits of 1: T and 2: T both owned by A. Invoking a rule spends deposits, burning their name, while executing send creates new deposits with fresh names.
⟨ ⟩
As usual, several contracts might be simultaneously active, possibly instantiating the same set of rules multiple times. To account for that, we model a contract instance as R, σ, w γ where R is the set of contract rules, and γi is a unique identifier of the instance.
We then define the system configuration as:
Γ = ⟨R1, σ1, w1⟩γ1 | · · · | ⟨Rn, σn, wn⟩γn |
⟨A1, w1′ ⟩x1 | · · · | ⟨Ak, wk′ ⟩xk | t
i
i
i γi
i
i xi
where ⟨R , σ , w ⟩ are the contract instances, ⟨A , w′⟩
tokens of type T they have deposited. The contract has two rules: deposit allows users to transfer tokens to the contract, and withdraw allows them to get their funds back.
Example: crowdfunding. We specify in Figure 3 a crowdfunding contract. The rule fund allows anyone to deposit tokens in the contract. The rule withdraw allows
are the deposits, and t is the current time. In the initial configuration there are no contract instances, and the timestamp is zero.
Configurations advance through actions. An action can perform one of the following:
⟨ ⟩
• deploy a new contract instance, spending some value w + fee from users’ deposits to add R, σ, w γ to the configuration, where γ is a fresh identifier;
• fire a contract rule, affecting the state and balance of a contract instance, while possibly consuming and/or creating deposits;
• allow users to redistribute the tokens in their deposits;
• make the time pass.
We detail below the actions of the second kind, called contract actions. Contract actions must specify all the data needed to uniquely determine its behavior. In particular, each contract action must specify:
• the unique identifier γ of the affected instance;
• the name of the rule in R that is being invoked, as well as its actual parameters;
• the signers who authorize the action. There can be any number of signers (possibly zero);
• the names of each deposit that is spent to satisfy each
receive precondition in the invoked rule;
• an additional deposit name, specifying a fee paid to fire the action. Including this fee also ensures that actions can not be replayed;
• the validity interval, constraining the time at which the rule can be performed.
Ethereum transactions do not. This is because Ethereum
We remark a few differences between the data in- cluded in our actions and those occurring in Ethereum transactions. First, our actions allow multiple signers, while in Ethereum each transaction is signed by a single externally-owned account. Second, our actions can spend multiple deposits and transfer tokens from different users to the contract. In Ethereum, the tokens transferred to the contract can only come from the sender’s account. Third, our actions include an explicit validity interval, while
4.1. Representing the state and balance
We represent the state σ and balance w of the contract
γ as follows:
• The balance w is simply stored as a Wallet within the accounts map of the Ledger. Formally, we have
accounts(γ) = w
• The state σ is instead stored across multiple unspent contract outputs for γ. To achieve this, we proceed in two steps: first, we flatten σ into a single map Σ from hashes to values. Then, we distribute Σ across multiple contract outputs.
· · ·
· · · '→ · · ·
Flattening σ to Σ. The state σ associates variables x to values σ(x) = v, and maps m to functions σ(m) = [(l1, , ln) v] . We flatten this structure, creating a new state map Σ that associates the hash of each variable name x and the hash of each map location (m, (l1, , ln)) to their respective values. More formally, we represent each variable association
σ(x) = v
with the association
Σ(hash("var_x")) = v (2)
Similarly, we represent each map association
1 n 1 n
σ(m) = [(l1, · · · , l1 ) '→ v1] · · · [(lk, · · · , lk) '→ vk]
with the associations
1
n
1
Σ(hash("map_m["@toStr(l1 · · · l1 )@"]")) = v
contracts can access the current block number, and then use this information to implement the wanted time con-
Σ(hash("map_m["@toStr(lk · · · lk)@"]")) = vk
straints. The above-mentioned differences between our 1 n
model and Ethereum are due to our underlying compila- tion target, which is a hUTXO-based blockchain instead of an account-based one.
When an action invokes a rule, if its preconditions are satisfied, the configuration is updated accordingly. The deposits spent by the action are removed from the config- uration, while new deposits are added to implement the send effects. The contract balance w is updated accord- ing to the receive preconditions and send effects. The contract state σ is updated according to the assignments to variables and maps specified in the effect.
4. Compiling HURF to hUTXO
In this section we show how to compile HURF con- tracts into the hUTXO model.
Recall that a contract with rules R can be instantiated
We let the map Σ associate all the other hashes to the default value 0. This representation Σ of the state σ is well defined, provided that there are no hash collisions. Since such collisions are extremely unlikely we simply assume that they will never happen. We assume that the codomain of the hash function hash is the open interval (minH, maxH ). We will denote such values in hexadeci- mal format: minH = 00... and maxH = ff....
We provide an example in Table 1, showing how a state σ is flattened to Σ.
· · ·
· · ·
Encoding Σ as outputs. To represent the map Σ as contract outputs we first notice that Σ takes the default value 0 everywhere, except for some hashes which we denote by h1 hn. For simplicity, let those n hashes be ordered, so that h1 < h2 < < hn. We can then visualize Σ as follows:
0 h ∈ (minH, h1)
⟨ ⟩
multiple times. Each instance R, σ, w γ is associated to a unique contract identifier γ. At the hUTXO level, the rules R and the current state σ of the contract γ are represented by those UTXOs whose inContract flag is true and whose enclosing transaction has ctrId = γ. We will refer to such outputs as the contract outputs for γ.
We first show how to represent the contract state and balance on the ledger and how to update them. Then, we show how to compile contract rules into output scripts.
Σ(h) =
v1 h = h1
· · ·
0 h ∈ (hi−1, hi)
vi h = hi
0 h ∈ (hi, hi+1)
· · ·
vn h = hn
0 h ∈ (hn, maxH )
(4)
We distribute Σ on the ledger by creating an output for each of the cases shown above, amounting to 2n + 1
single output oh0,h1 (where h0 < h < h1), and
produce three outputs: o′h ,h, o′h, and o′h,h , splitting
0 1
outputs. This is exemplified in Figure 4.
Among the generated outputs, n are used to certify that Σ(hi) has a specific non-default value vi. The remaining n+1 outputs are used to certify that for any h in the open interval (hi, hi+1), Σ(h) takes the default value 0.
We define state outputs as follows:
• oh is the output certifying that Σ takes a non-default value v in point h. We set the datum field of oh to ["non-default", h, v].
• oh′,h′′ is the output certifying that Σ takes the default value in the open interval (h′, h′′). We set the datum field of oh′,h′′ to ["default", h′, h′′].
In both cases, we make these outputs have value = 0 and inContract = true. Finally, we let script = scriptstate as described in Section 4.3.
4.2. Reading and updating the state
To implement the contract behavior, we will use suit- able transaction scripts to verify that the contract evolves according to the contract rules. In order to do so, such scripts need to read and update the contract state. This is realized by making the redeeming transaction include inputs referring to those state outputs that represent the part of the state that is being accessed. More in de- tail, reading from the state is done by inputs having spent = false, while updating the state is done by inputs having spent = true. In the case of updates, the redeeming transaction must also include new state outputs to represent the updated state.
h
h0,h1
0
1
∈
Reading the state. In order to read the value of a contract variable x or of a map access m[v,...], we first compute the hash h of the relevant key as in Equations (2) and (3), and then craft a transaction including among its inputs the state output that certifies the value of Σ(h), i.e. either o or some o with h (h , h ). This allows the scripts to access the datum field, and read the value of Σ(h). Note that this reading operation must not spend the output, so we set spent = false in the input.
Updating Σ: intuition. To explain how the state can be updated, we first consider the case where the update only involves a single point Σ(h), and then generalize this to the general case where multiple points are updated.
In order to update Σ on a point a transaction needs to spend the old state outputs and to create new ones. This is handled in four possible ways, depending on whether the old value Σ(h) = v is the default or not, and on whether the new value Σ′(h) = v′ is the default or not:
̸
• If Σ(h) = v = 0 and we want to update it to Σ′(h) = v′ 0, then the transaction needs to consume the output oh and create a new output o′h containing the new value v′.
′ ̸
• If Σ(h) = v = 0 and we want to update it to Σ (h) = 0, then the transaction needs to consume three outputs oh0,h, oh, and oh,h1 and merge them
the default range into two and adding a new non- default output o′h containing the new value v′.
• Finally, if Σ(h) = 0 and we want to update it to
Σ′(h) = 0 (this can happen due to the effect of a rule), the transaction does not need to consume nor create any output. However a non-spending input is still needed to ensure that the old value is actually the default: this input is oh0,h1 , where h0 < h < h1.
Problems with updating the state on multiple points. If we need to update Σ on multiple points, we might be tempted to handle each update independently, applying the technique presented above on each point. However, this does not always work.
For instance, let Σ be such that Σ(h′) = 1 and Σ(h′′) = 2, with h′ < h′′ and Σ(h) = 0 for any other h. If we try to update both h′ and h′′ to 0 within a single transaction, we would need to redeem output oh′,h′′ twice (once to update Σ(h′) and once again to update Σ(h′′)). This is already problematic. Furthermore, the first update would create the output ominH ,h′′ while the second one would create the output oh′,maxH . However, these intervals overlap, so the new outputs would fail to correctly represent the updated state Σ(′). Indeed, a correct transaction should instead produce a single output ominH ,maxH .
This is not the only case in which these simultaneous
updates are problematic: consider the opposite case, in which Σ is 0 everywhere, and we want a single transaction to change both Σ(h′) and Σ(h′′) to 1. Following the technique described above, this transaction would produce both the triple of state outputs ominH ,h′ , oh′ , oh′,maxH , and another triple ominH ,h′′ , oh′′ , oh′′,maxH . Once again, this does not correctly represent the updated state. Indeed, the output ominH ,h′′ would wrongly claim that Σ′(h′) = 0 (and similarly oh′,maxH would claim Σ′(h′′) = 0).
'→
Updating Σ in the correct way. Here we detail an algorithm that, given a list of updates u = (h1
· · · '→
v1) (hn vn) and an initial state Σ, constructs a list of inputs in and a list of outputs out that a transaction would need to include in order to perform the updates in u to the state. Intuitively, this algorithm suitably splits and/or joins the intervals affected by the updates.
'→
We assume that the list of updates u = (h1
· · · '→
v1) (hn vn) given as input to our algorithm is sorted, so that if i < j then hi < hj. Note that, by requiring a strict inequality, we effectively forbid multiple updates at the same point (consistently with the fact that we want to perform all of them simultaneously).
We start by constructing the list of inputs in affected by the updates u. Given an initial state Σ, the algorithm repeats the following steps to add inputs to the initially empty list in.
1) If u is empty, terminate.
'→
2) If u = (h '→ v)u′, with Σ(h) ̸= 0 and v 0, then
into a single range of default values, creating an
0 1
output o′h ,x .
• If Σ(h) = 0 and we want to update it to Σ′(h) =
append oh to in and remove (h v) from u.
'→ ̸
3) If u = (h 0)u′, with Σ(h) = 0, then find h′, h′′
for which there exist outputs oh′,h, oh, oh,h′′ . Append
v′ 0, then the transaction needs to consume the oh′,h to in unless it already occurs in it. Then append
State σ of a contract
Name Value
x 3
y 0
m [1 → 10][6 → 100]
Name Key Key hash h Σ(h)
x var x ed5ba7. . . 3
y var y bd1138. . . 0
m[1] map m[1] ea00c6. . . 10
m[6] map m[6] 9c90ac. 100
Hash keys and corresponding values, defining
Σ.
Default | Non-default | |
Key | from hash to hash | Key hash value |
— map m[6] — map m[1] — var x — | 0 9c90a. . . 9c90a. . . ea00c. . . ea00c. . . ed5ba. . . ed5ba. . . fffff. . . | 9c90a. . . 100 ea00c. . . 10 ed5ba. . . 3 |
Outputs representing the contract. Since y takes the value 0, it is already represented in the default ranges.
TABLE 1: From contract state to the outputs
'→
oh, and oh,h′′ to in. Finally, remove (h 0) from
u.
'→
4) If u = (h v)u′, with Σ(h) = 0 then find h′, h′′ for which there exists an output oh′,h′′ with h′ < h < h′′. Append oh,h′ to in unless it already occurs in it. Then remove (h '→ v) from u.
Constructing the outputs list out is done by the par- tial function Out defined in Figure 7. When called as Out(in, u) the function checks whether the list of inputs in can support the state updates u, and in such case it returns the list of outputs out to be included in the transaction to drive the wanted state update. Intuitively, this function iterates in parallel over the lists in and u, finding the points where the input intervals need to be split or merged, producing suitable outputs in the process.
For instance, let h1 < h2 < h3 < h4, and consider the following state Σ:
1 if h = h
Figure 5: The state can be modified drastically with a single transaction.
Σ(h) =
1
3 if h = h3
if
Example: multiple state updates. We show in Figure 5
4 h = h4
0 otherwise
'→ '→
Let u = (h2 2)(h3 0) be the wanted state updates. Assume that Σ is currently represented on the blockchain by the outputs omin ,h , oh , oh ,h , oh , oh ,h , oh ,
the effect of a transaction T1 that updates the state at six distinct points, consuming eight state outputs while spawning six new ones. The inputs and outputs in the fig- ure are those computed by the input and output generation algorithms for the updates
H 1 1 1 3 3 3 4 4
oh4,maxH . We generate a transaction to update the state. Running the input generation algorithm we obtain the
u = (h1 '→ 0, h2 '→ v, h3 '→ 0, h4 '→ v, h5 '→ 0, h6 '→ v)
list of inputs in = oh1,h3 oh3 oh3,h4 . Running the output
with v 0. Taken individually, the update part (h1 '→ 0)
1 2 2 2 4
generation algorithm as Out(in, u) we obtain out = o′h ,h o′h o′h ,h . Indeed, the algorithm splits oh1,h3 to account for h2 being changed from the default value to 2,
would require the state outputs o0,1, o1, o1,3 to be merged, but the update part (h2 '→ v) would instead require o1,3 to be split. The net effect of (h1 '→ 0, h2 '→ v) is that
2
4
and merges oh3 into the output o′h ,h
default interval.
representing a new
the new outputs include o′0,2, o′2. A similar interaction is caused by (h3 '→ 0, h4 '→ v). By contrast, the update
Out∗(oh′,h′′ , ε) = oh′,h′′
(
Out∗(oh′,h′′ in, (h '→ v)u) =
= oh′,h′′ Out(in, (h '→ v)u) if h > h′′
Out(oh′,h′′ in, (h '→ v)u) otherwise
Out(ε, ε) = ε
(
Out(oh′,h′′ in, (h '→ v)u) =
= Out∗(oh′,h′′ in, u) if v = 0, h ∈ (h′, h′′)
oh′,hohOut∗(oh,h′′ in, u) if v 0, h ∈ (h′, h′′)
Out(ohin, (h '→ v
0)u) = o′hOut(in, u)
Figure 6: As long as they do not consume the same part of the state, transactions can be parallelized.
'→ '→
part (h5 0, h6 v) does not require a split, since o6
already associates h6 to a non-default value.
We remark that the updates u modify the state simulta- neously, therefore we bundle them in a single transaction. Indeed, regardless of the size of u, the inputs and outputs generation algorithms always craft a single transaction that performs them all.
Example: parallel updates. Figure 6 shows how two transactions updating the state at different points can be validated in parallel. There, T1 updates the state at points h1, h2 while T2 updates the state at point h4. The inputs of the two transactions do not overlap, so a parallel validation is possible.
'→
Note that updating the state at different points is not, in general, sufficient to guarantee that two transactions are parallelizable. Indeed, had T1 performed the update (h3 0), we would still have disjoint update points (h3 vs. h4) but both transactions would attempt to consume o3,5 (T1 to merge it with o2,3, T2 to split it). This makes the two transaction conflict: only one of them can be appended to the current blockchain, and their validation can not be made in parallel. After one of them is put on the blockchain, the other update can still be performed, but using a different transaction involving the new UTXOs.
4.3. Contract logic
We now show how to implement the contracts of Section 3 on the blockchain. In Sections 4.1 and 4.2 we saw how to represent, read and update the state. Here, starting from a set of rules R, we create a set of contract outputs that constrain the state updates to follow the rules. To deploy a contract, we append a transaction Tinit with a fresh contract identifier, as required by item 7 of the transaction validity conditions (see page 3). The transac- tion Tinit contains a contract output oR for each contract rule R, as well as a set of state outputs to represent the
initial contract state, as explained in Section 4.1.
Once the contract is deployed, one can execute a rule
R by appending a transaction T referring to the output oR
Out(oh′,hohoh,h′′ in, (h '→ 0)u) = Out∗(oh′,h′′ in, u)
Figure 7: Definition of the partial function Out. In those cases not explicitly handled by the equations above, the operation fails. The definition exploits the auxiliary func- tion Out∗.
as a non-spending input. This causes the script oR.script to be executed. We define such script so to verify that T correctly updates the current contract state as mandated by rule R. This is done by inspecting all the inputs and the outputs of T. In Figure 8 we show the structure of T.
We now describe oR in more detail. We set its fields as follows: value = 0, datum = "logic", inContract = true. Further, oR.script verifies that:
• oR is included as the first input in the redeeming transaction. This ensures that only one rule (i.e., R) is executed by the redeeming transaction T.
• The first input of T specifies spent = false. Overall, this ensures that oR can never be consumed.
• The second input of T spends a non-contract output and its value is equal to T.fee. This allows the input to pay transaction fees without affecting the contract state and balance.
• If k is the cumulative number of variables and map accesses that are read by rule R, then the script checks that the next k inputs specify spent = false and refer to contract outputs with a "default" or "non-default" datum, reading the values in the process. The script checks that the hash-keys in the input datums are the expected ones (so that we do not read some other part of the state). Moreover, the rule parameters are read from the redeemer field of the first input of T. After this step, the script is able to evaluate all the expressions occurring in R.
• If n is the number of receive preconditions of R, then the script checks that the next n inputs spend non-contract outputs contributing the exact values specified in those receive preconditions.
• The expression in the require precondition of R
evaluates to true.
• If m is the number of send statements in R, then the script checks that the first m outputs in T set inContract = false and pay the intended amount of tokens to the intended recipients, as specified by the
seventh access (without spending them) the state outputs relative to the values read by the contract. More precisely, we have: ohy for y, ohz for z, oha for a, oha,hm[14] for
−
'→ '→
m[1] (since ha < hm[1] < hm[14]), and oh
m[27]
for m[x],
Figure 8: Structure of a contract transaction rule.
• Finally, the script checks that the remaining inputs and outputs perform the state update mandated by the rule. To this purpose, the script first computes the list of the remaining inputs in and the list of state up- dates u. The script checks that the inputs in in have spent = true. Then it computes out = Out(in, u) as per Figure 7. If that fails, then T is rejected. Xxx- xxxxxx, the script verifies that the remaining outputs of T are equal to out. In this way, this step ensures
in that order. The values stored in the outputs referenced from these inputs are then used throughout the script to evaluate the expressions. The eight input must spend an output contributing 15: T0 to satisfy the receive(z:T0) precondition. Finally, inputs from the ninth to the twelfth spend the outputs relative to the state updates. Note that the updates are u = (hw 0), (hm[15] 7) since in the state σ the expression m[x]-y evaluates to 3 3 = 0, while z evaluates to 15 and 7+m[1] evaluates to 7 + 0 = 7. To achieve the wanted update u, we include in T the inputs in specified by the input generation algorithm of Section 4.2. These include ohy,hw , ohw , ohw,hz which are needed for hw '→ 0 and ohm[14],hm[27] which is needed for hm[15] '→ 7. These inputs have spent set to true.
'→
'→y z
We also generate the outputs for the updates, accord- ing to Out(in, u). Since hw 0 causes w to change from a non-default value to the default one, the inputs ohy,hw , ohw , ohw,hz are merged into a single new out- put o′h ,h . Instead, since hm[15] 7 causes m[15] to change from the default value to a non-default one,
m[14] m[27]
the input oh ,h is split into three new outputs
that there are no extraneous inputs or outputs w.r.t.
o′h
m[14]
,hm[15]
, o′h
m[15]
, o′h
m[15]
,hm[27] .
those needed to implement the intended state update. By construction, when the “logic” output oR is re- ferred as an input in T, the contract state must be updated
according to rule X. We prevent the contract state to be modified in any other way from those specified by the rules. This is done by making the script scriptstate of all state outputs to require the presence of some “logic” output oR. More precisely, scriptstate checks that in the redeeming transaction T the first input refers to a contract output whose datum is "logic".
Example. Consider a contract with variables a,w,y,z
and map m, having the following rule:
example ( x) { receive ( z: T0 ); require (z >10);
The outputs of T also include an output to implement the send statement, transferring 1: T1 to the public key "pubkey_a" (the result of the evaluation of a). This output has inContract set to false, and the owner of the corresponding private key can freely spend it.
After the transaction is appended to the blockchain, the state of the contract is updated to
σ′(y) = 3 σ′(w) = 0 σ′(z) = 15 σ′(a) = "pubkey_a"
σ′(m) = [14 '→ 1][15 '→ 7][27 '→ 3]
and the contract balance is updated to:
w′ = [15: T0, 99: T1]
⟨ ⟩
Contract deployment. To deploy a new contract in-
w=m[ x]- y | m[ z]=7+ m [1] | a. send (1: T1 );
}
stance R, σ, w γ we need to append a suitable trans- action Tinit to the blockchain. The first input of Tinit
Consider the execution of the rule in the state σ where:
σ(y) = 3 σ(w) = 2 σ(z) = 15 σ(a) = "pubkey_a"
σ(m) = [14 '→ 1][27 '→ 3]
and let the contract balance be w = [100: T ].
must spend a previous non-contract output, contributing
the fees. We set Tinit.fee to its value. As specified by the validity conditions (item 7 at page 3), the hash of this input also determines a fresh contract identifier γ. Accordingly, we set Tinit.ctrId to this value. We then fill
The flattened state map
1
Σ takes as input the hashes of
the outputs of Tinit, adding one contract output oR for
each rule R ∈ R, as well as the sequence of state outputs
variable names and map location. These are generated as
omin
,h , xx , oh ,h , oh , . . . , oh ,max
according to the
in Equations (2) and (3). Assume that they are ordered as
H 0 0
0 1 1 n H
follows:
minH < hy < hw < hz < ha <
< hm[1] < hm[14] < hm[15] < hm[27] < maxH
We now craft a transaction T that executes the rule with parameter x = 27. The first input of T must refer to the output oexample that encodes the rule logic. Moreover, this input must have redeemer set to 27, specifying the value of x. The second input of T must spend a previous output contributing the fees. Inputs from the third to the
initial state σ of the contract. These outputs have zero value, and have as their scripts the ones described before. If the required initial balance w is non-zero, we also include in Tinit one or more (non-contract) spent input, whose overall value amounts to w. Since all the outputs of Tinit have zero value, the currency received from the inputs is transferred to the contract balance. The redeemer fields of all the inputs is set so to satisfy the scripts of the outputs they are spending. The spent field of all the inputs is set to true. Finally, Xxxxx is signed by all the users whose authorization is required to spend the inputs.
4.4. Discussion
We now discuss several aspects of our approach, as well as a few possible variants.
Security of the compiler. The security of our approach hinges on precisely controlling the form of the contract outputs, i.e. those UTXOs having inContract = true and the contract ctrId = γ in their transaction. Indeed, it is crucial that such UTXOs correctly represent the current contract state as well as the logic operations corresponding to contract rules. More specifically, the contract outputs must either have datum = "logic" and check the correct application of a contract rule, or have a datum of one the forms "default"/"non-default" and contribute to represent the contract state as in Equation (4).
Breaking this invariant would allow a malicious user to disrupt the execution of the contract. Adding a new "logic" contract output allows updating the contract state without following the contract rules. Removing a "logic" contract output would instead prevent the execution of a contract rule. Moreover, adding a spurious state output ("default"/"non-default") could cause confusion on the value of a contract variable or map location: reading the state could then use one of the two different values on the blockchain, potentially leading to vulnerabilities. Finally, removing a state output in an uncontrolled way can prevent reading and updating that part of the state, effectively preventing the execution of the rules which attempt that.
This invariant on the contract outputs is enforced at multiple levels. First, our blockchain model prevents users from freely creating new UTXOs in an existing contract. To see why, assume we have an existing contract with ctrId = γ. The validation conditions for new transactions (see page 3) allow new contract outputs to be created only in two cases:
• Item 6 allows to create new contract outputs o′, but only when at least one input refers to a previous contract output o having the same ctrId . Hence, one can create contract outputs o′ with ctrId = γ only when the previous script o.script accepts it.
• Item 7 allows to create new contract outputs o′, but
only when there are no input referring to a contract output o, and when the ctrId of o′ is generated from a spent input. This makes ctrId a fresh contract iden- tifier because of the no-double-spending property. Hence, this ctrId must be different from γ, making the contract output o′ belong to a new contract.
Consequently, the creation of new contract outputs inside ctrId = γ can only be done with the permission of one of the previous output scripts within the contract. This allows to enforce our invariant at the script level, by making the contract-related scripts verify the newly created contract outputs. Indeed, no new "logic" outputs are accepted, and the new "default"/"non-default" outputs are only accepted when they are part of a state update mandated by some contract rule.
Simplicity of the scripts. As discussed in Section 4.3 our scripts have to perform several checks on the fields of the redeeming transaction T and the UTXOs referred to by
T. We remark that such checks do not involve any other
We note that the mentioned checks would also be feasible in a loop-free script language. Indeed, the most complex check our scripts need to perform is the one on the outputs defined in Figure 7. While that algorithm is presented as a recursive function, the number of recursive calls it performs is at most 2n where n is the number of the assignments in the contract rule effect, which is statically known. It is therefore feasible to unfold the recursion up to that level and obtain an equivalent loop-free script.
'→
Improving the compiler. The transactions produced by our compiler can be optimized and made slightly more efficient in a few scenarios. For instance, when updating the state according to u = (h v) it is possible that the new value v is actually the same as the old value. Our compiler always consumes the old state output o and creates a new one o′, even the value does not change. In such case, it would suffice reading o using a transaction input with spent = false and check that the old value is indeed v. Doing so would be more efficient since it would generate smaller transactions (one fewer output), possibly reducing the fees. Further, it would increase the opportunities of parallelization by avoiding conflicts with other transactions reading o (see Section 5).
Our compiler can also generate multiple inputs for the same part of the state, as it happens when a contract variable is both read and updated. This could be optimized by omitting the redundant inputs, at the cost of some additional complexity in the scripts.
Closing contracts. Our contracts do not feature any means of termination. Once deployed, a contract lives forever: its logic outputs and state outputs are never fully removed from the UTXO set. At best, the number of state outputs can decrease when setting some part of the state to the default value. We could extend our contract model and compiler to allow a complete shutdown of the contract. To achieve that, a special "logic" output could verify that a suitable programmed precondition for the shutdown is satisfied, and instantly consume all the "logic" outputs, effectively forbidding any new firing of contract rules. At the same time, a new "logic" output odel is spawned to start the consumption of all the existing state outputs, following the order given by the hashes stored in their datum fields, from minH to maxH . This can be realized by making odel keep the hash h of the next state output to consume in odel.datum. The only way to spend odel would then be to consume the outputs odel, oh,h′ , oh′ (for any h′) and produce a new output o′del with a new hash h′
in its datum. In this way, the deletion process can continue
until maxH is found, at which point no new output needs to be produced, completing the contract shutdown.
Avoiding spam transactions. Our logic and state out- puts have their value field set to 0, so that creating a new state output only costs the transaction fee. This might allow contracts to generate an excessively large amount of UTXOs, polluting the UTXO set that validator nodes must handle. To incentivize a more conservative use of state storage, fees could be increased for state outputs.
For instance, the fee could be made proportional to the transaction size as in Cardano [13].
Alternatively, a small “collateral” value could be de- posited in the state outputs’ value. Such collateral could then be refunded when the state output is consumed (including when the contract is shut down). This would be analogous to the “rent” mechanism used by Xxxxxx [14].
5. Parallel validation of transactions
Validating transaction blocks is one of the most com- putationally expensive task a blockchain node performs. Modern hardware features multi-core processors that can execute many threads concurrently. To improve the per- formance of block validation it is therefore important to exploit the parallelism offered the hardware.
Many smart contract platforms, however, are not easily amenable to the parallel validation of block transactions. A first issue that hinders parallelization is given by read/write conflicts (data dependencies) on the contract state. For instance, in Ethereum, transactions usually only contain the contract address, the name of the called method, and its arguments. If the transaction block contains multiple calls to the same contract, it may happen that the first call modifies the state in a way that affects the second call. In such case, running both methods in parallel can lead to an invalid contract state. It is also difficult to efficiently detect those cases where parallelization would be possible. Conflicts can also be caused by contract-to-contract calls: an Ethereum contract can invoke other contracts’ methods though their addresses. This makes it even harder to detect data dependencies, since they might not be confined to the state of the contract mentioned in the transaction. For these reasons, even though several techniques have been proposed to make Ethereum transactions more concurrent (e.g., by exploiting optimistic execution [2], [15] or static analysis [4] techniques), the obtained speedup is limited.
Our approach addresses these issues. When executing a contract rule, the transaction has to explicitly refer to the UTXOs representing the part of the state which is relevant for its execution. In this way, the transaction must list the part of the state that is being read (using inputs with spent = false), and the part of the state which is being updated (using inputs with spent = true). Consequently, it is easy for validators to detect when transactions have no data dependencies: it is enough to check that, when a UTXO o is spent by some input, no input (spent or not) of other transactions refers to o.
Our contract model does not feature contract-to- contract calls, avoiding the conflicts between the states of different contracts. Note in passing that, even if one opted to extend our model so to allow contract-to-contract calls, one would still be able to detect conflicts easily, since the part of the state which is being affected must still be explicitly mentioned in the transaction inputs.
While conflicts can be caused by attempting to read or write the same part of the state, they can also arise from attempting to spend the contract balance. For instance, when two transactions spend 2/3 of the tokens in the contract balance, only one transaction can be appended to the blockchain. This kind of conflicts is present in account-based blockchains like Ethereum, and also in
our model since we use accounts to represent contract balances.
Even in the presence of such balance-related conflicts, in certain cases we can still perform the parallel validation of transactions. To this purpose, we first compute the tokens that a transaction is going to consume from the contract balance, as specified in Equation (1). Then, if such tokens are still available in the contract balance, we make a thread validate the transaction while temporarily marking those tokens as frozen, making them unavailable for other transactions. If the transaction is ultimately ac- cepted, the tokens are consumed, while on rejection they are thawed. Effectively, this is a conservative execution approach, since we only validate transactions when we are sure there is enough contract balance for them.
6. Implementation and experiments
We implemented our approach to distributed smart contracts, developing a proof-of-concept tool named DISCO SIM. Our implementation features:
• a blockchain node simulator for our hUTXO model, comprising two alternative transaction validators: a “sequential” one that only uses one thread, and a “parallel” one that can exploit a thread pool;
• routines to simulate the on-chain execution of the scripts produced by our HURF compiler;
• routines to generate the transactions needed to deploy and execute HURF contracts;
• a few test cases, including a “centralized” crowdfund- ing contract whose state is stored in a single UTXO datum, and its “distributed” variant whose state is instead stored across many UTXOs (as discussed in Section 3).
DISCO SIM simulates the core of a blockchain node, focusing on the validation of transactions, while omitting the components related to the consensus protocol (Proof- of-Work or Proof-of-Stake), the communications with other nodes, and the permanent storage of the previously validated blocks.
More precisely, running DISCO SIM performs the following actions. We start by generating a large vector of transactions to be validated, according to the chosen test case. We start a timer, and proceed to validate the transactions in the vector. If the “sequential” validator was selected, we scan the vector and validate the transactions one by one, in a single thread. Otherwise, if the “parallel” validator was selected, a thread pool is created. The main thread scans the vector searching for the longest prefix of conflict-free transactions, as discussed in Section 5, sending them to the worker threads for validation. When the worker threads complete their task, the main thread commits the successfully validated transactions on the ledger, and repeats the whole process. Finally, after the whole transaction vector has been processed, we stop the timer and report its value.
Note that the choice of the cryptographic primi- tives used by validators may substantially impact their performance. Indeed, the main computational burden in validating a transaction comes from the verification of signatures and the computation of hashes. Choosing an unrealistically fast algorithm for signature verification or
8K users | Distributed 8K users | Distributed 50K users | |
Sequential | 6.378s | 1.375s | 9.236s |
Parallel (1+1 threads) | 4.389s | 1.410s | 9.075s |
Parallel (1+2 threads) | 4.925s | 0.756s | 4.950s |
Parallel (1+3 threads) | 7.541s | 0.577s | 3.629s |
Parallel (1+4 threads) | 8.146s | 0.504s | 3.055s |
Parallel (1+5 threads) | 8.112s | 0.504s | 3.004s |
Parallel (1+6 threads) | 9.071s | 0.513s | 3.017s |
Parallel (1+7 threads) | 9.161s | 0.489s | 2.930s |
TABLE 2: Performance measurements for the crowdfunding test case.
hash would then invalidate the experimental comparison between the sequential and the parallel implementations of validators, since the communication time between threads would nullify the advantage given by parallelization. To make our evaluation realistic, we use the same crypto- graphic primitives used by Xxxxxxx, i.e. BLAKE2b-512 for hashing, and ed25519 for digital signatures. Still, in order to simplify the generation of transactions in our test cases we opted for two shortcuts, which have a negligible impact on the resulting evaluation.
First, we chose to use sequential numbers instead of hashes as transaction IDs, since it makes it easier to generate chains of transactions that refer to each other. The actual cryptographic hash function is still used for all other purposes, including reading and updating the contract state as explained in Section 4, and in particular in Equations (2) and (3) which define the expected hashes that occur in state outputs.
Second, we chose not to sign the transactions: instead, when a signature on a transaction would need to be checked, we resort to verifying a fixed signature on a fixed message. We ensured that this step was not optimized away when compiling our tool, so not to invalidate the performance measurements.
Our tool was developed in Rust, and amounts to approximately 3000 Lines of Code and 110KB. The test cases instead amount to 1200 LoC and 41KB.
We performed a performance test to compare the sequential and parallel validators, using a Crowdfunding contract whose state involves a map to keep track of the donors and the amount they donated. We compare two distinct implementations of the contract:
• a “centralized” implementation that keeps the entire state (i.e., the donors map) in the datum field of an output, copying it each time a new donor is added;
• a “distributed” implementation that uses the approach of Section 4 to distribute the state across multiple outputs.
Our experiments compare the two implementations of the hUTXO validator (sequential vs. parallel) with respect to the two implementations of the Crowdfunding contract (centralized vs. distributed). In all experiments, each donor deposits some tokens in the contract, and then receives them back as a refund after a deadline.
The results in Table 2 show the average of ten runs on an 8-core PC with 8GB of RAM. The notation “1 + n threads” describes a main thread plus n worker threads. In our experiments, we first observed that the centralized contract is significantly less efficient: having to replicate the full state in each output requires more time and more memory. Note that each deposit operation adds an entry in the donor map, making it larger, so the overall consumed memory is quadratic w.r.t. the number of users. We also tried the centralized contract with 12K users, but it ran out of memory. By contrast, the distributed contract scales linearly, so we could run it with a much larger number of users. Table 2 shows the results for 50K users, but we also performed a single run with 500K users without running out of memory (in 28.5s with 1 + 7 threads).
We then observed that the centralized contract gen- erally runs faster with the sequential validator. For this contract, the parallel validator becomes slower when the thread pool becomes larger. This trend was expected, since the transactions in the centralized contract form a long chain where each one redeems the output of the previous one, preventing parallelization. Increasing the number of threads only increases the contention between them.
Finally, we observed that the distributed contract in- stead benefits from the parallel validator, which becomes faster with more threads. Indeed, the distributed contract test features many independent transactions since the state is distributed, allowing the parallel validator to exploit its threads to save time.
References
[1] L. Bru¨njes and X. X. Xxxxxx, “UTxO- vs account-based smart con- tract blockchain programming paradigms,” in ISoLA, ser. LNCS, vol. 12478. Springer, 2020, pp. 73–88.
[2] X. X. Xxxxxxxxx, X. Xxxxxxxx, X. Xxxxxxx, and X. Xxxxxxxx, “Adding concurrency to smart contracts,” in ACM Symposium on Principles of Distributed Computing (PODC). ACM, 2017, pp. 303–312.
[3] X. Xxxxxxx, X. Xxxxx, and I. Xxxxxx, “Practical smart contract sharding with ownership and commutativity analysis,” in ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI). ACM, 2021, pp. 1327–1341.
[4] X. Xxxxxxxxxx, X. Xxxxxxxx, and X. Xxxxxx, “A theory of transaction parallelism in blockchains,” Log. Methods Comput. Sci., vol. 17, no. 4, 2021.
[5] X. Xxxxxxxxxx, X. Xxxxxxxxx, X. Xxxxxxxx, X. Xxxxx, X. X. Xxxxx,
R. Xxxxxxxx, X. Xxxxx, X. Xxxxx, X. Xxxxx, X. Xxxxx, X. Xxxxx`,
X. Xxxxxxxxx, X. Xxxxxxx, and X. Xxxxxx, “Smart contract languages: a comparative analysis,” CoRR, vol. abs/2404.04129, 2024. [Online]. Available: xxxxx://xxx.xxx/00.00000/xxXxx.0000. 04129
[6] Xxxxxxx, “EUTXO handbook,” xxxxx://xxxxxxxx.xxx/0xx00x0x- 73ac-4c9b-844b-f215dcce0628/EUTXOhandbook for EC.pdf, 2022.
[7] M. M. T. Xxxxxxxxxxx, X. Xxxxxxx, X. XxxXxxxxx, X. Xxxxxxxxx,
X. X. Xxxxx, and X. Xxxxxx, “The extended UTXO model,” in Financial Cryptography and Data Security Workshops, ser. LNCS, vol. 12063. Springer, 2020, pp. 525–539.
[8] X. Xxxxxxx, X. Xxxxxxx, X. Ja¨a¨ger, X. Xxxxxx, X. Xxxxxxxxx,
X. Xxxx, and X. XxXxx, “Formal specification of the Cardano blockchain ledger, mechanized in Agda,” in Workshop on Formal Methods for Blockchains (FMBC), 2024.
[9] M. Mo¨ser, I. Xxxx, and X. X. Xxxxx, “Bitcoin covenants,” in Xxxxx- cial Cryptography Workshops, ser. LNCS, vol. 9604. Springer, 2016, pp. 126–141.
[10] X. X’Xxxxxx and X. Xxxxxxxxx, “Enhancing Bitcoin transactions with covenants,” in Financial Cryptography Workshops, ser. LNCS, vol. 10323. Springer, 2017.
[11] X. Xxxxxxxxxx, X. Xxxxx, and X. Xxxxxx, “Bitcoin covenants un- chained,” in ISoLA, ser. LNCS, vol. 12478. Springer, 2020, pp. 25–42.
[12] X. X. Xxxxx, “Reference inputs,” 2021, xxxxx://xxxx.xxxxxxx.xxx/xxx/ CIP-31.
[13] “Plutus fee estimator,” xxxxx://xxxx.xxxxxxx.xxx/xxxxxxx-xxxxxxx/ tools/plutus-fee-estimator, 2023.
[14] X. Xxxxxxxxx, “Solana: A new architecture for a high performance blockchain v0.8.13,” 2017. [Online]. Available: xxxxx://xxxxxx.xxx/ solana-labs/whitepaper/blob/master/solana-whitepaper-en.pdf
[15] P. S. Xxxxxx, X. Xxxxxx, X. Xxxx, X. Xxxxxx, and X. Xxxxxx, “Enti- tling concurrency to smart contracts using optimistic transactional memory,” in ICDCN, 2019, p. 508.