Using meaningful data types

Defining data structures

Data structures can be defined by combining or providing alternatives between other data structures or primitive types. The simplest data definition is the 'record' type, which is similar to a C struct.

data Hedgehog = newHedgehog(String name, Int age, Int spines, Bool alive);

newHedgehog is a constructor function, which is otherwise treated like a normal function, and creates an instance of the Hedgehog type. Type names must begin with a capital letter to distinguish them from type variables (more on type variables later). The Hedgehog type can be used as follows:

Void main() {
    x = newHedgehog("Henry the hedgehog",6, 20000, true);
    putStr( x.name + " is " + x.age + " years old");

    x.name = "Fred"; // Assignment works too
    birthday(p);
}

Void birthday(Hedgehog h) {
    if (h.alive) {
        h.age += 1;
    }
}

Custom types may be used as list elements exactly as a normal type, and all normal list operations and functions work as expected.

// gets an array of hedgehogs
[Hedgehog] family() {
    family = [];
    for name@i in ["Henry","Fred","Sally"] {
      push(family,newHedgehog(name,i+2,10000+(i*4000),true));
    }
    return family;
}

As shown in this example, the '.' operator can be used to extract parts of a type for retrieval or setting. There is no need, in the birthday function to use the var keyword, because as in the basic tutorial, the function parameter is only a shallow copy.

The namespaces for types and functions are separate, so the constructor function may be named after the type:

data Person = Person(String name, Int age);

which may, since the two are the same, be shortened to

data Person(String name, Int age);

The remainder of this tutorial looks at polymorphic and algebraic types, where the real power of Kaya's type system is.

Polymorphic types

Where a type definition is not strongly dependent on the component types, a type variable may be used. Type variables must begin with a lower-case letter, and represent any type (though always the same type in the same instance)

data Pair<a,b>(a fst, b snd);
data SamePair<a,a>(a fst, a snd);

The Pair data type pairs together two pieces of arbitrary data. The SamePair type is similar, but has the added restriction that both pieces must have the same type. While the type definition contains a type variable, any particular instance of the type has a fixed type - the following code is not allowed.

arr = [];
arr[0] = Pair(3,2.2);
arr[1] = Pair("foo",false);
// compile-time error because the types do not match

Polymorphic types in functions

Functions can also have polymorphic types. For example, this function will reverse any array.

[a] reverse([a] xs) {
    pos=size(xs)-1;
    x=0;
    while(pos>=0) {
	newxs[x]=xs[pos];
	x=x+1;
	pos=pos-1;
    }
    return newxs;
}

The allMatch function from the basic tutorial can be rewritten like this:

Bool allMatch([a] xs, Bool(a) test) {
    for x in xs {
        if (!test(x)) {
            return false;
        } 
    }
    return true;
}

The example of code using this function to determine if all integers in an array are odd will still work exactly as before.

A polymorphic function must not use any of its polymorphic arguments in a more specific way:

// this is not allowed
public Void printAnything(a val) {
  putStrLn(String(val));
}

Polymorphic functions as types of function parameters.

A function may take parameters that are themselves polymorphic functions, for example:

b apply(b(a) fn, a val) {
    return fn(val);
}

Neither the return type nor any of the arguments of a polymorphic function parameter may be Void.

Types with multiple constructors

It is possible for a type to have more than one constructor function (the practical upper limit is 65536 constructor functions). These are known as algebraic data types (ADTs), commonly found in functional languages such as Haskell or Ocaml, but rarely found in imperative languages. They often provide a natural way to describe data where an object can take one of many different forms (for example, a binary tree can be either empty or have two subtrees).

A simple linked list can be defined as:

data List<a> = nil
             | cons(a head, List<a> tail);

Note that this particular type is defined recursively. The vertical bar divides the two constructors:

This data type can then be initialised - in this case to a list of integers - with:

list = cons(5,cons(3,cons(10,nil)));

This makes a three-element linked list. Obviously a list of arbitrary length could be initialised this way, but the numbers of brackets involved might make a clearer method appropriate!

Having put the data into the list, retrieving it requires telling the difference between the two constructors. The case statement is used for this. The following function (recursively) prints a list of Strings.

Void printList(List<String> list) {
    case list of {
        cons(head,tail) -> putStrLn(head);
                           printList(tail);
      | nil -> return; // stop at this point.
    }
}

On execution, the program will examine each clause (consisting of a pattern and code to execute if the pattern matches) in turn. Code for the first pattern which matches the form of list will be executed. If no pattern matches, a Missing_Case exception is thrown. An underscore indicates a part of a pattern which can match anything --- the following function uses a match anything case to return a default value if the list is empty.

a getHead(List<a> list, a defval) {
    case list of {
        cons(head,tail) -> return head;
        | _ -> return defval;
    }
}

Arbitrarily complex patterns are allowed. For example, the following function returns the second element in a list of integers, provided that the first element is 0, and throws an exception otherwise:

Int getSecondWhenFirstZero(List<Int> xs) {
    case xs of {
        cons(0,cons(val, _)) -> return val;
        | _ -> throw(Invalid_Value);
    }
}

Use of case statements is useful for avoiding complex nested if statements.

In versions of Kaya before 0.2.7 this syntax was not available, and only simple patterns were allowed. The old form of patterns (with default used instead of _) is now deprecated and will soon be removed. In the new pattern syntax, default no longer has any special meaning.

Other methods of extracting values

Since the List type has a parameter called head, it's possible to writelist.head to extract the head element from a List<a> called list. If the list was constructed using the nil constructor function, it won't have a head element, and a run-time error will be thrown. Therefore, using case statements is much better for types with multiple constructors.

Extracting parameters by name is not possible if the fields are unnamed, and this may sometimes be desirable:

data Tree<a> = leaf(a)
             | node(Tree<a>,Tree<a>);

The let keyword may also be used to extract particular values from a data type. If the data type has multiple constructors, then this will fail if the wrong constructor is used. Underscores may be used to specify values to be ignored in the extraction.

tree = node(node(leaf(1),leaf(10)),
            node(node(leaf(3),leaf(5)),leaf(4)));

let node(node(_,aleaf),node(_,leaf(leafval))) = tree;
// aleaf = leaf(10); leafval = 4;

Function overloading

Because the printing method for a variable depends on its type, we sadly can't make a single list printing function for all types of list. However, you can make multiple list printing functions with the same name:

Void printList(List<String> list) {
    case list of {
        cons(head,tail) -> putStrLn(head);
                           printList(tail);
      | nil -> return; // stop at this point.
    }
}

Void printList(List<Int> list) {
    case list of {
        cons(head,tail) -> putStrLn(String(head));
                           printList(tail);
      | nil -> return; // stop at this point.
    }
}
  
Void printList(List<Float> list) {
    case list of {
        cons(head,tail) -> putStrLn(String(head));
                           printList(tail);
      | nil -> return; // stop at this point.
    }
}

Provided the functions have different types, this is allowed. This is currently only a convenience feature - it's not yet possible to call printList on a List<a> and have it "do the right thing" when the program is run.

An example from the standard prelude

The following two functions from the standard prelude both return the absolute value of a number.

Int abs(Int x) {
    if (x < 0) {
        return -x;
    } else {
        return x;
    }
}

Float abs(Float x) {
    if (x < 0.0) {
        return -x;
    } else {
        return x;
    }
}

A call to the abs function will automatically use the right function for the type of the input or output.

i = abs(-5); // calls Int(Int) version
j = abs(-4.5); // calls Float(Float) version
Int k = abs(3.7); // error - there is no abs() function of type Int(Float)!

Data types in web applications

Web applications have two major data types. HTMLDocument, which represents a HTML document, including the HTTP headers it is sent with, and ElementTree, which represents an element in HTML, and its structure. If you're unfamiliar with the terminology, in <p>Welcome to <em>this</em> paragraph</p>, <p> is the opening tag of the "p" element, </p> is the closing tag, and everything between them is the contents of the element (which includes some text and another element).

The body of the HTMLDocument is built up out of nested ElementTrees, in a similar way to a web browsers DOM representation of the document.

Iteration over data types using 'for'

To allow the syntax for x@i in collection { ... } there must be a traverse function defined for that data type. Traverse functions must be of the form, for a data type Collection<a>, Void traverse(Bool(a, Int) block, Collection<a> collection). For example, for the List data type above, the following traverse function is used:

public Void traverse(Bool(a, Int) block, List<a> list) {
    i = 0;
    repeat case list of {
        nil() -> break;
        | cons(x,list) -> if (!block(x, i)) { return; } i++;
    }
}

The function should iterate over the data structure in a sensible way, calling the block function for each item, and incrementing the index variable after doing so. The block function returns false if traversal should terminate (i.e., if a break statement has been executed). The function should then return cleanly when iteration is complete. Users should never need to call the traverse function directly.

Some examples of data types

'Maybe' (representing possible failure)

data Maybe<a> = nothing | just(a val);

This type is defined in the standard Prelude, and is used to represent the possibility of failure. Because the nothing value can be stored, this often makes more sense than throwing an exception. For example, a set of preferences between options may be expressed as [Maybe<Option>].

data Option(Int optid, String optname);
Void printPrefs([Maybe<Option>] preferences) {
    for pref@idx in preferences {
        putStr("Preference "+(idx+1)+": ");
        case pref of {
            nothing -> putStrLn("unspecified");
            |just(opt) -> putStrLn(opt.optname);
        }
    }
}

Binary Trees

data Tree<a> = Leaf(a val) 
             | Node(Tree<a> left, Tree<a> right);

Directory Trees

data Content = FileEntry(String fname)
             | DirEntry(String fname, [Content] subtree);

Exercises

  1. Create a data type (or set of data types) to represent a pack of playing cards. Shuffle the pack, and print out the cards in the new order.
  2. Write a function to find the largest item in a list, where the type of the list is [a] (i.e. any type). Have a look at the sort documentation if you need ideas, but remember this can be solved more efficiently than sorting the list and taking the last item!
  3. Create a data type for an arbitrary tree (i.e. each node may have zero or more descendants), and write a function to traverse the tree and print it in a way that describes the hierarchy (e.g. indent successive levels by a couple of spaces, for which the rep function may be useful.
kaya@kayalang.org | Last modified 4 September 2008 | Supported by Durham CompSoc | Powered by Kaya