The Scilla checker

The Scilla checker (scilla-checker) works as a compiler frontend, parsing the contract and performing a number of static checks including typechecking.

Phases of the Scilla checker

The Scilla checker operates in distinct phases, each of which can perform checks (and potentially reject contracts that do not pass the checks) and add annotations to each piece of syntax:

  • Lexing and parsing reads the contract code and builds an abstract syntax tree (AST). Each node in the tree is annotated with a location from the source file in the form of line and column numbers.
  • ADT checking checks various constraints on user-defined ADTs.
  • Typechecking checks that values in the contract are used in a way that is consistent with the type system. The typechecker also annotates each expression with its type.
  • Pattern-checking checks that each pattern-match in the contract is exhaustive (so that execution will not fail due to no match being found), and that each pattern can be reached (so that the programmer does not inadvertently introduce a pattern branch that can never be reached).
  • Event-info checks that messages and events in the contract contain all necessary fields. For events, if a contract emits two events with the same name (_eventname), then their structure (the fields and their types) must also be the same.
  • Cashflow analysis analyzes the usage of variables, fields and ADTs, and attempts to determine which fields are used to represent (native) blockchain money. No checks are performed, but expressions, variables, fields and ADTs are annotated with tags indicating their usage.
  • Payment acceptance checking checks contracts for payment acceptances. It raises a warning if a contract has no transitions which accept payment. Also, the check raises a warning if a transition has any code path with potentially multiple accept statements in it. This check does not raise an error since it is not possible via static analysis to know for sure in all cases whether multiple accept statements would be reached if present, since this can be dependent on conditions which are only known at run-time. The Scilla checker only performs this check if only -cf flag is specified on the command line, i.e. it is performed together with the cashflow analysis. For instance, fungible token contracts don’t generally need accept statements, hence this check is not mandatory.
  • Sanity-checking performs a number of minor checks, e.g., that all parameters to a transition or a procedure have distinct names.

Annotations

Each phase in the Scilla checker can add an annotation to each node in the abstract syntax tree. The type of an annotation is specified through instantiations of the module signature Rep. Rep specifies the type rep, which is the type of the annotation:

module type Rep = sig
  type rep
  ...
end

In addition to the type of the annotation, the instantiation of Rep can declare helper functions that allow subsequent phases to access the annotations of previous phases. Some of these functions are declared in the Rep signature, because they involve creating new abstract syntax nodes, which must be created with annotations from the parser onward:

module type Rep = sig
  ...

  val mk_id_address : string -> rep ident
  val mk_id_uint128 : string -> rep ident
  val mk_id_bnum    : string -> rep ident
  val mk_id_string  : string -> rep ident

  val rep_of_sexp : Sexp.t -> rep
  val sexp_of_rep : rep -> Sexp.t

  val parse_rep : string -> rep
  val get_rep_str: rep -> string
end

mk_id_<type> creates an identifier with an appropriate type annotation if annotation is one of the phases that has been executed. If the typechecker has not yet been executed, the functions simply create an (untyped) identifier with a dummy location.

rep_of_sexp and sexp_of_rep are used for pretty-printing. They are automatically generated if rep is defined with the [@@deriving sexp] directive.

parse_rep and get_rep_str are used for caching of typechecked libraries, so that they do not need to be checked again if they haven’t changed. These will likely be removed in future versions of the Scilla checker.

As an example, consider the annotation module TypecheckerERep:

module TypecheckerERep (R : Rep) = struct
  type rep = PlainTypes.t inferred_type * R.rep
  [@@deriving sexp]

  let get_loc r = match r with | (_, rr) -> R.get_loc rr

  let mk_id s t =
    match s with
    | Ident (n, r) -> Ident (n, (PlainTypes.mk_qualified_type t, r))

  let mk_id_address s = mk_id (R.mk_id_address s) (bystrx_typ address_length)
  let mk_id_uint128 s = mk_id (R.mk_id_uint128 s) uint128_typ
  let mk_id_bnum    s = mk_id (R.mk_id_bnum s) bnum_typ
  let mk_id_string  s = mk_id (R.mk_id_string s) string_typ

  let mk_rep (r : R.rep) (t : PlainTypes.t inferred_type) = (t, r)

  let parse_rep s = (PlainTypes.mk_qualified_type uint128_typ, R.parse_rep s)
  let get_rep_str r = match r with | (_, rr) -> R.get_rep_str rr

  let get_type (r : rep) = fst r
end

The functor (parameterized structure) takes the annotation from the previous phase as the parameter R. In the Scilla checker this previous phase is the parser, but any phase could be added in-between the two phases by specifying the phase in the top-level runner.

The type rep specifies that the new annotation is a pair of a type and the previous annotation.

The function get_loc merely serves as a proxy for the get_loc function of the previous phase.

The function mk_id is a helper function for the mk_id_<type> functions, which create an identifier with the appropriate type annotation.

The mk_rep function is a helper function used by the typechecker.

Prettyprinting does not output the types of AST nodes, so the functions parse_rep and get_rep_str ignore the type annotations.

Finally, the function get_type provides access to type information for subsequent phases. This function is not mentioned in the Rep signature, since it is made available by the typechecker once type annotations have been added to the AST.

Abstract syntax

The ScillaSyntax functor defines the AST node types. Each phase will instantiate the functor twice, once for the input syntax and once for the output syntax. These two syntax instantiations differ only in the type of annotations of each syntax node. If the phase produces no additional annotations, the two instantiations will be identical.

The parameters SR and ER, both of type Rep, define the annotations for statements and expressions, respectively.

module ScillaSyntax (SR : Rep) (ER : Rep) = struct

  type expr_annot = expr * ER.rep
  and expr = ...

  type stmt_annot = stmt * SR.rep
  and stmt = ...
end

Initial annotation

The parser generates the initial annotation, which only contains information about where the syntax node is located in the source file. The function get_loc allows subsequent phases to access the location.

The ParserRep structure is used for annotations both of statements and expressions.

module ParserRep = struct
  type rep = loc
  [@@deriving sexp]

  let get_loc l = l
  ...
end

Typical phase

Each phase that produces additional annotations will need to provide a new implementation of the Rep module type. The implementation should take the previous annotation type (as a structure implementing the Rep module type) as a parameter, so that the phase’s annotations can be added to the annotations of the previous phases.

The typechecker adds a type to each expression node in the AST, but doesn’t add anything to statement node annotations. Consequently, the typechecker only defines an annotation type for expressions.

In addition, the Rep implementation defines a function get_type, so that subsequent phases can access the type in the annotation.

module TypecheckerERep (R : Rep) = struct
  type rep = PlainTypes.t inferred_type * R.rep
  [@@deriving sexp]

  let get_loc r = match r with | (_, rr) -> R.get_loc rr

  ...
  let get_type (r : rep) = fst r
end

The Scilla typechecker takes the statement and expression annotations of the previous phase, and then instantiates TypeCheckerERep (creating the new annotation type), ScillaSyntax (creating the abstract syntax type for the previous phase, which serves as input to the typechecker), and ScillaSyntax again (creating the abstract syntax type that the typechecker outputs).

module ScillaTypechecker
  (SR : Rep)
  (ER : Rep) = struct

  (* No annotation added to statements *)
  module STR = SR
  (* TypecheckerERep is the new annotation for expressions *)
  module ETR = TypecheckerERep (ER)

  (* Instantiate ScillaSyntax with source and target annotations *)
  module UntypedSyntax = ScillaSyntax (SR) (ER)
  module TypedSyntax = ScillaSyntax (STR) (ETR)

  (* Expose target syntax and annotations for subsequent phases *)
  include TypedSyntax
  include ETR

  (* Instantiate helper functors *)
  module TU = TypeUtilities (SR) (ER)
  module TBuiltins = ScillaBuiltIns (SR) (ER)
  module TypeEnv = TU.MakeTEnv(PlainTypes)(ER)
  module CU = ScillaContractUtil (SR) (ER)
  ...
end

Crucially, the typechecker module exposes the annotations and the syntax type that it generates, so that they can be made available to the next phase.

The typechecker finally instantiates helper functors such as TypeUtilities and ScillaBuiltIns.

Cashflow Analysis

The cashflow analysis phase analyzes the usage of a contract’s variables, fields, and ADT constructor, and attempts to determine which fields and ADTs are used to represent (native) blockchain money. Each contract field is annotated with a tag indicating the field’s usage.

The resulting tags are an approximation based on the usage of the contract’s fields, variables, and ADT constructors. The tags are not guaranteed to be accurate, but are intended as a tool to help the contract developer use her fields in the intended manner.

Running the analysis

The cashflow analysis is activated by running scilla-checker with the option -cf. The analysis is not run by default, since it is only intended to be used during contract development.

A contract is never rejected due to the result of the cashflow analysis. It is up to the contract developer to determine whether the cashflow tags are consistent with the intended use of each contract field.

The Analysis in Detail

The analysis works by continually analysing the transitions and procedures of the contract until no further information is gathered.

The starting point for the analysis is the incoming message that invokes the contract’s transition, the outgoing messages and events that may be sent by the contract, the contract’s account balance, and any field being read from the blockchain such as the current blocknumber.

Both incoming and outgoing messages contain a field _amount whose value is the amount of money being transferred between accounts by the message. Whenever the value of the _amount field of the incoming message is loaded into a local variable, that local variable is tagged as representing money. Similarly, a local variable used to initialise the _amount field of an outgoing message is also tagged as representing money.

Conversely, the message fields _sender, _origin, _recipient, and _tag, the event field _eventname, the exception field _exception, and the blockchain field BLOCKNUMBER, are known to not represent money, so any variable used to initialise those fields or to hold the value read from one of those fields is tagged as not representing money.

Once some variables have been tagged, their usage implies how other variables can be tagged. For instance, if two variables tagged as money are added to each other, the result is also deemed to represent money. Conversely, if two variables tagged as non-money are added, the result is deemed to represent non-money.

Tagging of contract fields happens when a local variable is used when loading or storing a contract field. In these cases, the field is deemed to have the same tag as the local variable.

Tagging of custom ADTs is done when they are used for constructing values, and when they are used in pattern-matching.

Once a transition or procedure has been analyzed, the local variables and their tags are saved and the analysis proceeds to the next transition or procedure while keeping the tags of the contract fields and ADTs. The analysis continues until all the transitions and procedures have been analysed without any existing tags having changed.

Tags

The analysis uses the following set of tags:

  • No information: No information has been gathered about the variable. This sometimes (but not always) indicates that the variable is not being used, indicating a potential bug.
  • Money: The variable represents money.
  • Not money: The variable represents something other than money.
  • Map t (where t is a tag): The variable represents a map or a function whose co-domain is tagged with t. Hence, when performing a lookup in the map, or when applying a function on the values stored in the map, the result is tagged with t. Keys of maps are assumed to always be Not money. Using a variable as a function parameter does not give rise to a tag.
  • T t1 … tn (where T is an ADT, and t1 … tn are tags): The variable represents a value of an ADT, such as List or Option. The tags t1 … tn correspond to the tags of each type parameter of the ADT. (See the simple example further down.)
  • Inconsistent: The variable has been used to represent both money and not money. Inconsistent usage indicates a bug.

Library and local functions are only partially supported, since no attempt is made to connect the tags of parameters to the tag of the result. Built-in functions are fully supported, however.

A simple example

Consider the following code snippet:

match p with
| Nil =>
| Cons x xs =>
  msg = { _amount : x ; ...}
  ...
end

x is used to initialise the _amount field of a message, so x gets tagged with Money. Since xs is the tail of a list of which x is the first element, xs must be a list of elements with the same tag as x. xs therefore gets tagged with List Money, corresponding to the fact that the List 'A type has one type parameter.

Similarly, p is matched against the patterns Nil and Cons x xs. Nil is a list, but since the list is empty we don’t know anything about the contents of the list, and so the Nil pattern corresponds to the tag List (No information). Cons x xs is also a list, but this time we do know something about the contents, namely that the first element x is tagged with Money, and the tail of the list is tagged with List Money. Consequently, Cons x xs corresponds to List Money.

Unifying the two tags List (No information) and List Money gives the tag List Money, so p gets tagged with List Money.

ADT constructor tagging

In addition to tagging fields and local variables, the cashflow analyser also tags constructors of custom ADTs.

To see how this works, consider the following custom ADT:

type Transaction =
| UserTransaction of ByStr20 Uint128
| ContractTransaction of ByStr20 String Uint128

A user transaction is a transaction where the recipient is a user account, so the UserTransaction constructor takes two arguments: An address of the recipient user account, and the amount to transfer.

A contract transaction is a transaction where the recipient is another contract, so the ContractTransaction takes three arguments: An address of the recipient contract, the name of the transition to invoke on the recipient contract, and the amount to transfer.

In terms of cashflow it is clear that the last argument of both constructors is used to represent an amount of money, whereas all other arguments are used to represent non-money. The cashflow analyser therefore attempts to tag the arguments of the two constructors with appropriate tags, using the principles described in the previous sections.

A more elaborate example

As an example, consider a crowdfunding contract written in Scilla. Such a contract may declare the following immutable parameters and mutable fields:

contract Crowdfunding

(*  Parameters *)
(owner     : ByStr20,
max_block : BNum,
goal      : Uint128)

(* Mutable fields *)
field backers : Map ByStr20 Uint128 = ...
field funded : Bool = ...

The owner parameter represents the address of the person deploying the contract. The goal parameter is the amount of money the owner is trying to raise, and the max_block parameter represents the deadline by which the goal is to be met.

The field backers is a map from the addresses of contributors to the amount of money contributed, and the field funded represents whether the goal has been reached.

Since the field goal represents an amount of money, goal should be tagged as Money by the analysis. Similarly, the backers field is a map with a co-domain representing Money, so backers should be tagged with Map Money.

Conversely, both owner, max_block and funded represent something other than money, so they should all be tagged with Not money.

The cashflow analysis will tag the parameters and fields according to how they are used in the contract’s transitions and procedures, and if the resulting tags do not correspond to the expectation, then the contract likely contains a bug somewhere.