Skip to content

Working with pointers

So far we’ve only considered functions manipulating numeric values. Specifications become more interesting when pointers are involved, because the safety of memory accesses via pointers has to be taken into account.

CN uses separation logic resources and the concept of ownership to reason about memory accesses. A resource, intuitively, represents the permission to access a region of memory.

Let’s look at a simple example. The function read takes an integer pointer p and returns the pointee value.

exercises/read.c
unsigned int read (unsigned int *p)
{
  return *p;
}

Running cn test on this example produces an error that looks like this:

Testing read::read:
FAILED

************************ Failed at *************************
function read, file ./read-exec.c, line 19
Load failed.
  ==> 0x12294ca70[0] (0x12294ca70) not owned
********************** Failing input ***********************

unsigned int* p = (unsigned int*)(12294ca70);
read(p);

For the read *p to be safe, we need to know that the function has permission to access the memory pointed to by p. We next explain how to represent this permission.

RW resources

Given a C-type T and pointer p, the resource RW<T>(p) asserts ownership of a memory region at location p of the size of the C type T.

In this example, we can ensure the safe execution of read by adding a precondition that requires ownership of RW<unsigned int>(p), as shown below. (The take ... = part will be explained shortly.) Since reading the pointer does not disturb its value, we can also add a corresponding postcondition, whereby read returns ownership of p after it is finished executing, in the form of another RW<unsigned int>(p) resource.

solutions/read.c
unsigned int read (unsigned int *p)
/*@ requires take P = RW<unsigned int>(p);
    ensures take P_post = RW<unsigned int>(p);
@*/
{
  return *p;
}

This specification can be read as follows:

  • any function calling read has to be able to provide a resource RW<unsigned int>(p) to pass into read, and

  • the caller will receive back a resource RW<unsigned int>(p) when read returns.

Pointee values

In addition to reasoning about memory accesed by pointers, we likely also want to reason about the actual values that the pointers point to. The take P = in the precondition assigns the name P to the pointee value of p.

That means we can use the pointee values from the pre- and postcondition to strengthen the specification of read. We add two new postconditions specifying

  1. that read returns P, the initial pointee value of p, and
  2. that the pointee values P and P_post before and after execution of read (respectively) are the same.
exercises/read2.c
unsigned int read (unsigned int *p)
/*@ requires take P = RW<unsigned int>(p);
    ensures take P_post = RW<unsigned int>(p);
            return == P;
            P_post == P;
@*/
{
  return *p;
}
Aside (for separation-logic experts)

Aside. In standard separation logic, the equivalent specification for read could have been phrased as follows (where \return binds the return value in the postcondition):

∀p.
∀P.
{ p ↦ P }
read(p)
{ \return. ∃P_post. (p ↦ P_post) and return = P and P = P_post }

CN’s take notation is just an alternative syntax for quantification over the values of resources, but a useful one: the take notation syntactically restricts how these quantifiers can be used to ensure CN can always infer them.

Exercises

Exercise: Write a specification for double_it, which takes a pointer p and returns double the pointee value. Running cn test on this correct implementation should succeed,

exercises/double_it.c
unsigned int double_it (unsigned int *p)
{
  unsigned int n = *p;
  unsigned int m = n + n;
  return m;
}
while running cn test on this incorrect implementation
  unsigned int n = *p;
  unsigned int m = n + n;
  *p = 0;
  return m;
and this incorrect implementation
  unsigned int n = *p;
  unsigned int m = n + n + n;
  return m;
should fail.

Writing through pointers

We next have an example where data is written to a pointer. The function incr takes a pointer p and increments the value in the memory cell that it points to.

exercises/slf0_basic_incr.c
void incr (unsigned int *p)
/*@ requires take P = RW<unsigned int>(p);
    ensures take P_post = RW<unsigned int>(p);
            P_post == P + 1u32;
@*/
{
  unsigned int n = *p;
  unsigned int m = n+1;
  *p = m;
}

The precondition binds the initial pointee value to P. The postcondition binds the value after function execution to P_post, and uses this to express that the value p points to is incremented by incr: P_post == P + 1i32.

Exercises

Exercise: Write a specification for inplace_double, which takes a pointer p and doubles the pointee value. Make sure your postcondition captures the function's intended behavior.

exercises/slf3_basic_inplace_double.c
void inplace_double (unsigned int *p)
{
  unsigned int n = *p;
  unsigned int m = n + n;
  *p = m;
}

No memory leaks

In the specifications we have written so far, functions that receive resources as part of their precondition also return this ownership in their postcondition.

Let’s try the read example from earlier again, but without such a postcondition:

exercises/read.broken.c
unsigned int read (unsigned int *p)
/*@ requires take P = RW<unsigned int>(p); @*/
{
  return *p;
}

CN rejects this program with the following message:

Testing read::read:
FAILED

************************ Failed at *************************
function read, file ./read.broken-exec.c, line 26
Postcondition leak check failed, ownership leaked for pointer 0x1201e8a78

********************** Failing input ***********************

void* p0 = malloc(4);
*((unsigned int*)p0) = 17;
unsigned int* p = (unsigned int*)(p0);
read(p);

The error from cn test report tells us (1) in which function the error occurred, (2) what happened ("ownership leaked"), and (3) a failing input — i.e., a snippet of C code that will construct a heap state on which the test fails in this way.

What went wrong here is that, given the above specification, read leaks memory: it takes ownership, does not return it, but also does not deallocate the RW memory or otherwise dispose of it. CN requires that every resource passed into a function has to be either returned to the caller or else destroyed by deallocating the RW area of memory (as we shall see later).

CN’s motivation for this choice is its focus on low-level systems software in which memory is managed manually; in this context, memory leaks are typically very undesirable. As a consequence, function specifications have to do precise bookkeeping of their resource footprint and, in particular, return any unused resources back to the caller.

Disjoint memory regions

When functions manipulate multiple pointers, we can assert ownership of each one, just like before. But there is an additional twist: simultaneously owning resources for two pointers implies that these pointers refer to disjoint regions of memory.

Consider this example to see when disjointness matters:

exercises/five_six.c
unsigned int five_six(unsigned int *p, unsigned int *q) 
/*@ requires take P = RW<unsigned int>(p);
             take Q = RW<unsigned int>(q);
    ensures  take P_post = RW<unsigned int>(p);
             take Q_post = RW<unsigned int>(q);
             return == 5u32;
@*/
{
    *p = 5;
    *q = 6;
    return *p;
}

The postcondition claims that the function returns 5. Observe that this is only true when p and q are disjoint; otherwise, the write to q would override the write to p. In CN, we can make this assumption for free — no extra work is needed to assert that the pointers are disjoint.

Exercises

Exercise: Write a specification for the function transfer, shown below.

exercises/slf8_basic_transfer.c
void transfer (unsigned int *p, unsigned int *q)
{
  unsigned int n = *p;
  unsigned int m = *q;
  unsigned int s = n + m;
  *p = s;
  *q = 0;
}

Ownership of structured objects

So far, our examples have worked with just integers and pointers, but larger programs typically also manipulate compound values, often represented using C structs. Specifying functions manipulating structs works in much the same way as with basic types.

For instance, the following example uses a struct point to represent a point in two-dimensional space. The function transpose swaps a point’s x and y coordinates.

exercises/transpose.c
struct point { unsigned int x; unsigned int y; };

void transpose (struct point *p) 
/*@ requires take P = RW<struct point>(p);
    ensures take P_post = RW<struct point>(p);
            P_post.x == P.y;
            P_post.y == P.x;
@*/
{
  unsigned int temp_x = p->x;
  unsigned int temp_y = p->y;
  p->x = temp_y;
  p->y = temp_x;
}

Here we have RW<struct point>(p) resources, with the appropriate type of p filled in. Accordingly, P and P_post are values with type struct point, i.e., they are records with members x and y. The postcondition asserts the coordinates have been swapped by referring to the members of P and P_post individually.