Meaningful data types

Kaya's powerful type system makes it easy to meaningfully represent complex data structures in your program. This helps readability and maintainability, as well as helping to catch a large class of errors at compile time. Type inference means that the need to explicitly write types down is minimised.

Static typing with type inference

Kaya is statically-typed, which makes it much easier to ensure data integrity and catch potential mistakes at compile time rather than run time. Unlike many statically-typed languages, however, Kaya has type inference (and automatic type conversion where safe) to keep the extra typing needed to a minimum. In general, you should never need to specify types explicitly outside function and data declarations.

String printSquares(Int num) {
    output = ""; 
    for i in [1..num] { 
        // types of 'i' and 'output' are obvious, so no need to give them.
        square = i*i; // 'square' is another Int
        output += i+"*"+i+"="+square+"\n";
// because the type of the whole expression is String, and Int->String
// conversions are 'safe', there's no need to explicitly say 
// String(i)+"*"+String(i)+etc...
    }
    return output;
}

Primitive types

Kaya provides the following primitive types (in addition to numerous other data types defined in the standard library)

Additionally, automatically-sized lists of any type (including containing other lists) may be constructed.

jigsaw = [[]];
for x in [0..xsize] {
  for y in [0..ysize] {
    jigsaw[x][y] = getPiece(x,y);
  }
}

Algebraic data types

Kaya supports algebraic data types (ADTs), commonly found in functional languages such as Haskell or Ocaml, but rare in imperative languages. These allow easy construction of meaningful data types to represent arbitrarily complex data.

Linked lists

A linked list consists of a sequence of nodes, each pointing to the next node. A list node is either empty, or a pair of a value and a pointer to the next node. This can be represented succinctly as a Kaya ADT as follows:

data List<a> = nil | cons(a head, List<a> tail);

The List data type has constructors nil and cons, which behave as ordinary functions. ADTs can be inspected by pattern matching with the case construct:

xs = cons(5,cons(6,nil));
case xs of {
     nil -> putStrLn("Empty");
   | cons(x,nil) -> putStrLn("One element, "+x);
   | cons(x, cons(y, ys)) -> putStrLn("Two or more elements");
}

'Maybe' in Kaya

A simple and common ADT is the 'Maybe' type, used to represent expected failure in a function. It is defined as:

data Maybe<a> = nothing | just(a val);

This function uses it in an equivalent to libcgi's or PHP's strpos function.

Maybe<Int> strpos(String needle, String haystack) {
    if (length(needle) > length(haystack)) {
        return nothing;
    }
    for i in [0..length(haystack)-length(needle)] {
        if (substr(haystack,i,length(needle)) == needle) {
            return just(i);
        }
    }
    return nothing;
}

In this example, the return type when the needle is not found in the haystack is very different to the type when it is found. This allows code calling this function to be written in a way that can be at least partly checked at compile time.

For comparison, the C libcgi strpos() function always returns 'int', returning -1 if the needle is not found in the haystack. The PHP strpos() function is even more confusing, returning 'false' if the needle is not found in the haystack, and the dynamic typing then makes it easy to confuse 'found at position 0' and 'not found'.

While both C and PHP have a way to distinguish the two cases, errors in implementation when a programmer uses the strpos() function will only be found at run-time. In Kaya, the use of the 'Maybe' type lets some obvious errors in implementation be caught at compile-time.

Representing complex data with ADTs

It is possible to define complex data types using ADTs, and so accurately and safely handle information. For example, the following code is used (with minor modifications) in the Kaya standard library to describe XML-like trees:


public data ElementTree([Element] elements, 
                        String name,
                        TinyDict<String,String> attributes);
public data Element = SubElement(ElementTree nested) |
                      CData(String cdata);

The content of an XML element is made up of a sequence of other elements and of 'anonymous elements' containing text. The Element data type allows these two possibilities to be represented meaningfully. It is then possible, for example, to traverse the tree to extract all element text:

String extractText(ElementTree el) {
    extracted = "";
    for sub in el.elements {
        case sub of {
            SubElement(nested) -> extracted += extractText(nested);
            | CData(cdata) -> extracted += cdata;
        }
    }
    return extracted;
}

By storing data in meaningful types, the code required to use the data can be kept short and easy to debug.

Polymorphic functions and data types

Both data types and functions may be polymorphic, provided that their actions do not need to depend on the properties of a specific data type. For example, the Maybe type above can contain a possible value of any other type (though any particular instance will always have the same type). The deref function is defined as:

public a deref(Maybe<a> v) {
  case v of {
    nothing -> throw(CantDerefNothing);
    | just(a) -> return a;
  }
}

The type variable a is used to make the function generic so that it can act on Maybe<Int>, Maybe<String>, and Maybe<UserDefinedType> equally.

References to functions can also be polymorphic. The any function checks to see if any of the values in an array match a specified predicate.

public Bool any(Bool(a) pred, [a] xs) {
    for x in xs {
	if (pred(x)) {
	    return true;
	}
    }
    return false;
}

The compiler will check at compile time that the predicate function supplied takes a parameter of the same type as the array members, and so this single function can be used on any data type.

Overloading functions

In some cases, the behaviour of a function is dependent on the input type, but several functions have equivalent behaviour across several input types. For example, the following functions both convert their input value into an Int.

Int sum([Int] ns) {
    sum = 0;
    for n in ns {
        sum += n;
    }
    return sum;
}

Int sum(List<Int> l) {
    case l of {
        nil -> return 0;
        | cons(head,tail) -> return head+sum(tail);
    }
}

Because the parameter types of these two functions are different, then they can both be called simply with total = sum(numbers); and the compiler will pick the appropriate function based on the type of the numbers variable.

kaya@kayalang.org | Last modified 29 November 2011 | Supported by Durham CompSoc | Powered by Kaya