The C3 Blog¶

July 18, 2020
3 min read

Macros in C3 - a status update

Originally from: https://dev.to/lerno/macros-in-c3-a-status-update-1m1n

I'm going to share a bit of the C3 design process here for people who might be interested.

Like error handling, macros are one of the few truly new things in C3 compared to C. Consequently I've been going back and forth with the design trying to cover all angles.

I always wanted to make macros sufficiently safe that people could use them without worries, which means that some macro use from C would have to go, but which one?

After doing an inventory of what macros could do, I roughly end up with this "feature ladder" for macros – from easily understandable and readable to more "dangerous" in terms of how easy it would be to abuse:

Inlining
Lazy evaluation of arguments
Polymorphic parameters
Non-local jumps
Implicit capture
Declarations escaping scope
Arbitrary code generation
Code fragment replacement

One has to make the cut somewhere, and for C3 I think it's reasonable to either stop at (4) or (5).

(5) - implicit capture - is a bit related to (8) but can often be extremely useful in local code.

One hard-to-place feature is taking a name or a function invocation and then generating statements from that.

Consider the following:

#define FOO(X) do { X(0); X(1); X(2); } while 0;
void doX(int i) { ... }
FOO(doX);

In C3 this is sort of covered at the (2) level, even though for C that would be (7).

Because macros are mainstream tools in C3 rather than advanced tools it’s important that the syntax is geared towards writing code for 1-3 in particular.

This naturally makes it more natural to require that the macros should resemble functions as much as possible.

(6), (7) and (8) are, when used, usually clever ways to twist C into being more brief or to have an in-code DSL.

This flexibility can create pretty neat hacks, but it’s unclear whether this is a good idea in the large. Are these just clever solutions or are they important ones? My bet is on the former: that the legitimate uses more are about closing holes in C. And if it is, then the macros are basically a poor man's syntax extensions.

If syntax extensions are desired, Kit shows how that can be done in a very elegant manner. However, syntax extensions will always sacrifice readability for power, and here C3 makes a different tradeoff so that no matter what macro you see, you should be able to make a good guess as to what it could be doing.

For comparison, here are some C macros and their counterpart in C3 (as the design currently stands):

C:

#define nodesGet(nodes, index) ((INode**)((nodes)+1))[index]

C3:

macro INode *nodesGet(nodes, index)
{
  return cast(node + 1, INode**)[index];
}

C3 allows trailing body in macros, which makes for slightly different look from C in "foreach" style macros:

#define namespaceFor(ns) for (size_t __i = 0; __i < (ns)->avail; ++__i)

namespaceFor(ns) {
  NameNode *nn = &ns->namenodes[__i];
  if (nn->name == NULL)
    continue;
  nametblHookNode(nn->name, nn->node);
}

C3 (note that the declaration for trailing body is very much undecided):

macro namespaceFor(ns; void(usize i) $body)
{
  for (usize index = 0; index < ns.avail; index++)
  {
    body(index);
  }
}

@namespaceFor(ns; usize i) 
{
  NameNode *nn = &ns->namenodes[i];
  if (nn->name == NULL) continue;
  nametblHookNode(nn->name, nn->node);
}

Using implicit capture of variables from the surrounding scope:

#define lexReturnPuncTok(tok, skip) { \
  lex->toktype = tok; \
  lex->tokp = srcp; \
  lex->srcp = srcp + (skip); \
  return; \
}

C3:

macro lexReturnPuncTok!(tok, skip, implicit lex, implicit srcp)
{
  lex.toktype = tok;
  lex.tokp = srcp;
  lex.srcp = srcp + skip;
  return;
}

Creating a good macro system that is simple enough not to be dangerous requires difficult trade offs, and it's easy to just make it as flexible as possible. That might be a mistake though, with macros becoming an advanced feature reserved for special situations instead of a regular tool in the toolkit.

July 14, 2020
2 min read

A zoo of casts

Originally from: https://dev.to/lerno/a-zoo-of-casts-4bob

I recently made a post on Reddit to ask about various types of cast syntax. For posterity's sake I'm recording them here.

Note that I'm ignoring the behaviour of the cast. Some languages have different syntax for upcasts, downcasts, bitcasts etc. I'm not concerned with that here. This is merely a list of variants of visual syntax. Consequently I list :> even though that's only a special form of cast for F#, and only one of the many keyword<type>(x) casts for C++, even though there are many variants.

Also I apologize in advance if the attribution is incorrect somewhere. I don't know all the languages I list here.

(The list of languages for each is also incomplete – it's just a sample)

cast(x, int)        MATLAB
int(x)              Pascal
<int>x              Typescript
(int)x              C/C++/Java/Beef/C#
static_cast<int>    C++
x as int            C#/Swift/Rust
x as! int           Swift
cast(x as int)      SQL
cast(int)x          D, Jai
@as(int, x)         Zig
[int]x              Pike
(int)(x)            Go
x :>                F#
cast[int](x)        Nim
x.as(int)           Crystal/Ecstasy
x->(int)            Frost
(x: int)            Flow
cast<int>(x)        C2
x.asInstanceOf(int) Scala
x.(int)             Go
x $ int             ChucK
int'(x)             Verilog

For fun, here are other permutations of the cast syntax that may or may not be useful:

(x as int)       
(x, int)
x<int>
x::int
(int : x)
(int x)
int::x
(x :: int)
cast(x -> int)
x to int
x#int
int:x
x.as[int]
x[int]
x.int
(int >> x)

C3 is currently using cast(x, int) but that might change.

When evaluating syntax, readability is important and it is always nice if the precedence is crystal clear.

As an example: x as Foo[4] – would that be x as (Foo[4]) or (x as Foo)[4]? Precedence rules will obviously decide, but if we compare with cast<Foo[4]>(x) the latter is much clearer because there is no need to know the precedence.

But length also matters: x = int(y) + int(z) is succinct while x = cast(y, int) + cast(z, int) feels quite a bit more wordy.

Picking a good cast syntax for a language is clearly one of difficult trade-offs.

July 3, 2020
3 min read

More on error handling in C3

Originally from: https://dev.to/lerno/more-on-error-handling-in-c3-3bee

When we left off, C3 was looking like this:

int! index = atoi(readLine());
if (index) {
  printf("Thx for the number\n");
  // Index is now int.
  ... 
 }

I somewhat off-handedly mentioned that some sort of guard statement would be needed to extract the error and the need to handle things like index && index > 0

As usual in language design, things become less easy the more you flesh out the spec.

The first obvious problem is using if (index) for unwrapping.

Here's a problematic piece of code:

bool! b = someCall();
// Is this checking if b is true or non error?
if (b) { ... }

A way around this would be to explicitly indicate the success check:

bool! b = someCall();
// Use the ? to indicate unwrapping
if (b?) { ... }

This seems fine, but now that we made b? doing implicit unwrapping we're making pretty complicated things possible:

bool! b = someCall();
if (i > 0 && b? && ((b = someCall())? || i > 100) { ... }

In the example above the compiler has to figure out that b might possibly have an error...

To deal with this we need to do real full flow typing, which increases the complexity of implementing the compiler by quite a bit. That's not the only problem: flow typing means types implicitly change. A quick look at the code above - is it easy to see that b will not be unwrapped in the body?

So flow typing has both advantages and disadvantages.

One of the core principles I try to follow building this language is that it should not be hard to write a compiler for it. It's by necessity a multi pass compiler, but other things it's nice to keep simple.

There are ways to do so. For example, unwrapping might require what in other languages are called a "if-let":

bool! b = someCall();
if (bool b1 = b?) { ... }

Here there is no implicit unwrap, it's just another variable introduced in the scope. This is all well, but pretty verbose. It would be nice to have a shortcut for the bool b = b? case.

Again the language design becomes more complex than one likes. C3 has a pretty flexible if statement that allows you to write things like:

if (int a = foo(), b = bar(), int c = baz()) { ... }

However in this case only the final result (that of baz()) counts. If it looked like this:

if (int a = foo()?, b = bar(), int c = baz()) { ... }

We'd have to make sure that the call to foo() didn't return an error AND that baz() was non zero.

So what should we do?

It's time to take a step back and review our options without making assumptions that we unwrap things with if.

First let us construct our guard statement – the one taking a block to execute if there is an error:

int! i = ...
catch (err = i)
{
   ... handle the error ...
}

We can do some very simple flow typing here: 1. If a variable is caught using a catch 2. And the catch has a jump at all exits 3. Then the variable can be types to the non failable version of it after the catch.

int! i = ...
catch (err = i)
{
   ...
   return;
}
// i is int here

So that works. This is much easier than if we hade overloaded if to handle error unwrapping. What if we introduce try to be like if but only for unwrapping:

try (int j = i) 
{ 
 ... only executes if i is not an error ...
}
try (i) 
{ 
 ... i is implicitly unwrapped to int ...
}

So to wrap up, here are some elements of the error handling:

int! i = ...

// Default value if it is an error
int j = i else 0;

// Jump on error
int k = i else return;

// Check error
try (i)
{
  printf("i was: %d\n", i);
}

// Conditional execution
// this line is only called
// if i is not an error.
printf("i was: %d\n", i);

// Composition:
bool! b = checkFoo(getFoo(i));
int! l = i + 1;

// Returning something that may be an error
if (z > 0) return i;

// Check if error
bool wasError = check(i);

// Check if success
bool wasSuccess = try(i);

// Returning an error
return MyError!;

The error handling still has some ways to go, but it's getting closer to something that also handles the various possible corner cases and not just the simplest use cases.

June 14, 2020
6 min read

A new error handling paradigm for C3

Originally from: https://dev.to/lerno/a-new-error-handling-paradigm-for-c3-2geo

The C3 programming language is getting increasingly more complete (try it out here!), it's a language very close to C similar to the C2 language.

The current state of C3

C3 of today has an error system inspired by Midori and Herb Sutter's C++ error proposal. It has lots of similarities with Zig's errors as well.

Here is an example from the documentation:

error RandomError {
  NORMAL,
  EXCEPTIONAL
}

func int mayThrowError() throws RandomError {
  if (rand() > 0.5) throw RandomError.NORMAL;
  if (rand() > 0.99) throw RandomError.EXCEPTIONAL;
  return 1;
}

func void testMayError() throws
{
  // all throwable sites must be annotated with "try"
  try mayThrowError(); 
}

func void testWithoutError() {
  try testMayError();

  // Catching will catch any try above in the scope.
  catch (error e) {
    case RandomError.NORMAL:
      io.printf("Normal Error\n");
    case RandomError.EXCEPTIONAL:
      io.printf("You win!\n");
    default:
      io.printf("What is this error you're talking about?\n");                 
  }
}

This might at first glance look like exceptions, but it is value based and a function like:

func int getFoo() throws RetrieveError;

Corresponds to the C code:

RetrieveError getFoo(int *result);

So throws are really return values.

Why isn't this sufficient?

Compared to exceptions I find this pretty good. Places where errors occur are clearly marked and we're using value based return values under the sheets.

However, the flow here is clearly exception-style. Personally I like the explicit control given by C return values. However, they are not always convenient. For a function in C you usually end up with one of four cases:

No errors returned, just return the result.
It may fail, so return boolean, result as "out" parameter. Maybe use errno or similar to get details.
It may fail in many ways, so return the error code, result as an "out" parameter.
It may fail in many ways, return the result (usually a pointer), the error is an out parameter and will be set if it fails.

These are typically only simple in the case that no result is needed.

Go improves on this by using tuple returns, which folds 2-4 into a single case. However there's another problem – that of having multiple calls which would throw errors. In C it might end up looking like this:

if (doSomething() != OK) goto ERR;
if (doSomethingElse() != OK) goto ERR;
cannotFailProc();
if (blah() != OK) goto ERR;
return true;
ERR:
... error handling ...
return false;

This contrasts with exception style code which can be much easier to read:

try
{
  doSomething();
  doSomethingElse();
  cannotFailProc();
  blah();
  return true;
}
catch
{
   ... error handling ...
}

Go has tried to make some efforts to improve the rather infamous cascade of if (err != nil) { ... } code but hasn't really made any major progress.

Designing a new language this has frustrated me: exceptions are known to have issues, but so do "return values".

There's also the idea to use sum types, e.g. Result<MyResult, Error> and pass them around. Swift was even built with optionals where "no value" meant an error... but that was so unergonomic that they later introduced an exception style error handling very similar to what C3 currently provides.

Using Result would usually mean using things like getMaybeThrowingInt().flatMap(i => get(i)).flatMap(val => val.openFile) where each invocation only conditionally happens if the result is a non-error.

However, this Result based would often look rather different from the normal "error free" code. So that looked like a dead end as well.

Frustrations and an idea

Trying out the error handling in C3 I was frustrated with how ugly simple functions would look in the case they had a single error.

Consider the simple task of looking up the index of an item in an array and using it.

error SearchError {
  ELEMENT_NOT_FOUND;
}
func int indexOfFoo(Foo[] f, int i) throws SearchError
{ .... }

func void test(Foo[] f)
{
  int i = try indexOfFoo(f, 1);
  printf("Name1: %s", f[i].name);
  catch (SearchError e)
  { 
    printf("Name1 could not be found\n");
  }
}

Using an error here like this feels all wrong. (Java infamously returns -1 rather than using an exception on this sort of code).

Some Go style tuple return would probably have given us:

func void test(Foo[] f)
{
  int i, bool success = indexOfFoo(f, 1);
  if (!success)
  {
    printf("Name1 could not be found\n");
    return;
  }    
  printf("Name1: %s", f[i].name);
}

And this feels more reasonable. Not because of the code size, but because it feels weird to introduce jumps in the code just to handle the fact that there is the possibility of a special "not found" index.

So what do I really want? The Go version translates everything to values rather than implicit jumps. This is similar to how Result works. Maybe there is a way?

What if we introduce a built in sum type: "a type + error". For example int! would be the same as Result<int, Error> in languages using Result.

If we rewrite our code:

error ElementNotFoundError;

func int! indexOfFoo(Foo[] f, int i)
{ .... }

func void test(Foo[] f)
{
  int! i = indexOfFoo(f, 1);
  guard (i) // Only called on i is error
  {
    printf("Name1 could not be found\n");
    return;
  }
  // i implicitly becomes "int" due to the guard.
  printf("Name1: %s", f[i].name);
}

We can do more with this though! If we define that a statement relying on a "Result" also becomes a "Result" we get this:

func void test(Foo[] f)
{
  int! i = indexOfFoo(f, 1);
  Foo! foo = f[i];
  printf("Name1: %s", foo.name);
}

In a "Result" based language that would translate to something like:

Result<int, Error> i = indexOfFoo(f, 1);
Result<Foo, error> foo = i.flatMap(i => f[i]);
Result<void, error> res = foo.flatMap(foo => 
  printf("Name1: %s", foo.name)
);

Let's look at another example. Here is a sample C# program to illustrate it's exceptions:

static void Main(string[] args)
{
  int index;
  int value = 100;
  int[] arr = new int[10];
  try
  {
    Console.Write("Enter a number: ");
    index = Convert.ToInt32(Console.ReadLine());
    arr[index] = value;
  }
  catch (FormatException e)
  {
    Console.Write("Bad Format ");
  }
  catch (IndexOutOfRangeException e)
  {
    Console.Write("Index out of bounds ");
  }
  Console.Write("Remaining program ");
}

C3 has no out of bounds error, but we can make a method for it:

error IndexOutOfBoundsError;

func void! int[].set(int[]* array, int index, int value)
{
  if (index < 0 || index >= array.size) return! IndexOutOfBoundsError;
  array[index] = value;
}

Let's assume atoi returns a ConversionError

func void main()
{
  int index;
  int value = 100;
  int[100] arr;
  console::write("Enter a number: ");
  int! index = atoi(readLine());
  guard (arr.set(index, value))
  {
    case ConversionError:
      printf("This is not a number %s\n", error.string);
    case EofError:
      printf("Input closed.\n");
    case IndexOutOfBoundsError:
      printf("Index out of bounds.\n");
    default:
      printf("Unknown error.\n");
  }
  printf("Remaining program\n");
}

If we just want to ignore all errors:

func void main()
{
  int index;
  int value = 100;
  int[100] arr;
  console::write("Enter a number: ");
  int! index = atoi(readLine());
  arr.set(index, value);
  printf("Remaining program\n");
}

If we want to be explicit about following the happy case with nesting, here's a variant:

func void main()
{
  int index;
  int value = 100;
  int[100] arr;
  console::write("Enter a number: ");
  int! index = atoi(readLine());
  if (index) {
    printf("Thx for the number\n");
    // Index is now int.
    // catch (index) - Invalid
    if (arr.set(index, value)) {
        printf("All worked fine!\n")
    }
  }
}

Some unresolved questions

In the text above I use guard to "get" the error from the "Result". Other variants could be to use catch or iferr as a keyword. Maybe even use unrolling with !, e.g. if (i!).

Similarly the conditional extraction in the if might have issues. If we have Foo*! f then if (f) might assume that f is also not null, where the correct check would be if (f && f)(!).

Shortcuts for defaults on error are needed. Some possibilities:

int i = atoi(readLine()) ?: 0
int i = atoi(readLine()) else 0;
int i = atoi(readLine()) !! 0;
int i = atoi(readLing()) || 0;

And for rethrows:

int i = atoi(readLine())!;
int i = atoi(readLine()) else return!;
int i = atoi(readLine())!!;
int i = try atoi(readLine());

I use return! for returning errors. Again, it could use raise throw or exit instead. Or an exclamation mark after the error, e.g. return SomeError!.

While there is a lot of things left to figure out I at least feel like there is a real alternative here that might be a candidate to replace the current error handling in C3.

May 3, 2020
5 min read

Thoughts on numeric literal type inference rules for a C-like programming language

Originally from: https://dev.to/lerno/thoughts-on-numeric-literal-type-inference-rules-for-a-c-like-programming-language-4fpb

For the C3 language I’m working on I wanted to improve on C’s integers.

Recent languages have gravitated towards removing many implicit casts: Rust, Swift, Go, Zig and Odin all fall into this camp.

Studying Go in particular is illuminating: they recognize that removing implicit casts creates usability issues, and changes numeric literals to be BigInts, implicitly convertible into any sufficiently large integer type.

Swift, Zig and Odin all pick up this idea, but in slightly different ways.

Zig uses what it calls peer type resolution to describe how the conversion from “compile time integer” (the BigInt representation) occurs in various circumstances, such as in binary expressions. This is basically picking a common type that all sub expressions can coerce into. Here are some examples: As an example, adding a variable of type i32 with a constant “123” will convert the constant from BigInt to i32. The common type here is i32 and this is the type the constant will be cast to. When adding i32 and i64 the common type is instead i64 and so on.

This peer type resolution breaks down in expressions like:

var y : i32 = if (x > 2) 1 else 2;

In this example 1 and 2 are both BigInt types, but the expression is a runtime one and needs a definite integer type.

In Go, this situation is resolved by falling back on a default type size: int. In Go this is a 32-bit or 64-bit value depending on platform.

Zig doesn't have that, so other strategies must be employed. As of 0.6.0, Zig will parse the above, but will not accept either of the following.

if (x > 2) 1 else 2;
var y : i32 = 1 + if (x > 2) 1 else 2;

Peer type resolution might also at times create odd results. Consider the following Zig code:

var foo : i16 = 0x7FFF;
var bar : i32 = foo + 1;

// The above is equivalent to:
var foo : i16 = 0x7FFF;
var temp : i16 = foo + 1;
var bar : i32 = temp;

Because of peer type resolution this would overflow. One could imagine a behaviour more similar to what C would often give you by resolving the add foo + 1 after casting both to the type of bar.

For Odin, Swift and Go this example is more obvious, because there are no widening conversions. In Go for example we would have to write this:

var foo int16 = 0x7FFF
var bar int32 = int32(x + 1)

In this case it's clear that in x + 1 neither x nor 1 are int32, so the fact that bar returns -32768 is expected.

It's important to emphasize that these are conventions, there is not strictly any right or wrong, rather it's about a trade off between convenience and how prone it is to cause bugs.

C3 – like Zig – allows safe widening conversions, and so it needs to decide whether it should follow the Zig behaviour or the "convert first" approach. To me the Zig behaviour is a bit counter-intuitive. I'd prefer the widening to happen before the addition. Here we might consider int a = b + c where b and c are signed chars. Is it intuitive that this add should overflow on b and c both equal to 64? I think it's better to avoid possible overflows if possible, and doing widening first helps that.

To achieve this we use bi-directional type checking. This works by "pushing down" the expected type and optionally casting to the expected type. As we saw in the Zig example, it does that for assignment, but retains peer type resolution when we nest deeper into expressions.

To illustrate this, here is some C3 code, it should look fairly familiar – note that like in Java, C# and D the sizes of the types are well defined. An int is always 32 bits:

short foo = 0x7FFF;
int bar = foo + 1;
// The above is equivalent to:
short foo = 0x7FFF;
int bar = cast(foo, int) + cast(1, int);

What happens during semantic analysis in C3 is this: 1. Found declaration of bar 2. Analyse the init expression to the declaration, pass down the type int 3. The init expression is a binary add, analyse left side, pass down the type int to the analysis. 4. Left side is smaller than int, promote to int. 5. Analyse right side, passing down the type int. 6. Right side is compile time integer, try to convert to int (this will be a compile time error if the value does not fit in an integer). 7. Binary sub expressions are resolved. Find a common type of both sides, which in this case is int. 8. Implicitly cast binary expression to int if necessary (not necessary in this case). 9. Check the type of the binary expression if it matches int

So let's have a look at a case when conversion isn't possible and a compile time error occurs.

long foo = 1;
int bar = foo + 2;

Semantic analysis is in this case. 1. Found declaration of bar 2. Analyse the init expression to the declaration, pass down the type int 3. The init expression is a binary add, analyse left side, pass down the type int to the analysis. 4. Left side is long, it cannot be implicitly cast to int. 5. Analyse right side, passing down the type int. 6. Right side is compile time integer, try to convert to int (this will be a compile time error if the value does not fit in an integer). 7. Binary sub expressions are resolved. Find if the common type which is long in this case. 8. Since the right hand side is int, cast to long. 9. The resulting binary expression now has type long. 10. Check the type of the binary expression if it matches int, it doesn't and cannot be implicitly converted to int either. A compile time error "long cannot be implicitly converted to int" is displayed.

Given the above examples it might seem like one could simply do away with any "peer type resolution".

However, there are cases where top down resolution fails. Here is an example:

short x = ...
if (x + 1 < 100) { ... }

In this case we don't have a type hint. This is what happens when resolving the comparison.

Analyse left hand size, with type passed down as NULL.
Analyse x + 1 with type passed down as NULL.
Left hand side is short and right hand side is compile time integer. The common type is short
Right hand side in addition is implicitly cast to short.
Left hand side in the comparison has the type short.
Analyse right hand side of comparison, this is a compile time integer.
Find a common type between short and compile time integer, which is short.
Implicitly cast right hand side in comparison to short
Analysis is complete.

That is not to say that these two methods are sufficient(!) here is an example from gingerBill (author of Odin): ((x > 1 ? 3000 : 2 + 1000) == 2). The problem here is that all values are compile time integers, so no type hint can be found anywhere.

This can either be considered a compile time error because it's "under typed", or default to some integer type: either the register sized integer (Go's strategy) or some other similar strategy. C3 currently picks the first option (compile time error) because of the difficulty of picking a good default that doesn't accidentally cause odd behaviour and due to how uncommon this is.

January 28, 2020
4 min read

An evolution of macros for C

Originally from: https://dev.to/lerno/an-evolution-of-macros-for-c-59b5

(This text was previously published on Medium)

I’ve been trying for a long time to think up a good macro system that could replace or extend the C preprocessor and yet be as easy and approachable.

There has recently been a lot of interesting work on alternatives to C and C++, and consequently those languages have tried to fix both macros and templates. In some languages, like Rust, there’s a very rich set of tools to extend the language, the inspiration here has clearly come from languages like LISP where macros have been a way to expand the language itself. There are other approaches though: both Zig and Jai uses compile time execution of the language to avoid any specialized macro syntax. Zig is notable for making this a large part of the language.

In many ways those macro systems are to C’s as a strong typed language is to C’s weakly typed one. The increased type and error control also means added complexity in defining macros. If we write a language that overall is much stricter than C, then this is both fine and necessary. But for C, do we really want to constrict ourselves?

What would a sort of “incremental” improvement for a macro system for C look like? Can we make a minimal extension that doesn’t feel like we’re making a whole new language?

Let us make an attempt!

Step 1, we could make multiline #defines more readable by adding { } to implicitly allow row breaks:

#define foo(a, b) \
  int x = run_foo(a, b); \
  if (x > 0) printf("We got foo!\n");

// => 

#define foo(a, b) {
  int x = run_foo(a, b);
  if (x > 0) printf("We got foo!\n");
}

Step 2, accidentally shadowing other variables is bad. Let’s create unique variable names on demand by a prefix of $ inside a #define:

#define foo(a, b) {
  int $x = run_foo(a, b);
  if ($x > 0) printf("We got foo!\n")
}

foo(1, 2); // $x expands to __foo_x_1
foo(100, 20); // $x expands to __foo_x_2 (increment by one for each time expanded)

Step 3, __typeof__ is needed to do a lot of nice macros, let’s lift it to a sanctioned typeof function. We can now rewrite GCC’s MAX:

#define max(a,b) \
   ({ __typeof__ (a) _a = (a); \
       __typeof__ (b) _b = (b); \
     _a > _b ? _a : _b; })

#define max(a, b) {
  ({
    typeof(a) $a = (a);
    typeof(b) $b = (b);
    $a > $b ? $a : $b;  
  })
}

This is about as far as we should take #define. For more advanced macros we need a new syntax.

Step 4, let’s define non-preprocessor macros using a new macro keyword.

macro foo(&a, &b) {
  int $x = run_foo(a, b);
  if ($x > 0) printf("We got foo!\n")
}

Note the slightly odd “&” prefix. This means we import the entire expression or variable into the scope. Without we simply use the value, so here are the two equivalent versions of max:

macro max(&a, &b) {
  typeof(a) $a = (a); // (1)
  typeof(b) $b = (b); // (2)
  return $a > $b ? $a : $b; // Return automatically makes this an expression statement.
}

// If we do not use &a, &b then we get evaluated values 
// instead, making it look like
// an untyped version of a static inlined function. The macro 
// below is exactly equivalent to the one at the top.
macro max(a, b) {
  return a > b ? a : b;
}

Since macro is largely hygienic, break and return are meaningless in the top scope of the body. For that reason we can reuse return to indicate that the macro returns a value, that is, it should be treated as an expression. This allows us to skip ({ })

Step 5: Wrapping something "inside" of a macro is a pain, so for our final extension, let’s define a trailing body parameter that can be expanded:

macro for_from_to(a, b, macro body) {
  for (typeof(a) $x = a; $x <= b; $x++) {
    body();
  }    
}

for_from_to(1, 100) {
  printf("Again!\n"); 
}

// expands to:

for (int __for_from_to_x_1 = 1; __for_from_to_x_1 <= 100; __for_from_to_x_1++) {
  printf("Again!\n");
}

macro for_from_to(a, b, macro($v) body) {
  for (typeof(a) $x = 0; $x < a; $x++) {
    body($x);
  }    
}

times_do(1, 100) {
  printf("Loop: %d\n", $v);
}

// expands to:

for (int __for_from_to_x_1 = 1; __for_from_to_x_1 <= 100; __for_from_to_x_1++) {
  printf("Loop: %d\n", __for_from_to_x_1);
}

The trailing body is expanded inside of the macro as if it was a macro itself.

In the examples above I’ve tried to extend and expand on the C macros rather than replacing it. The new macro function is simply an evolved subset of #define that can be parsed as normal C except for the lack of types (giving a compiler the ability to issue a lot more errors directly at the macro definition).

This is not the only direction we could have taken the language. Another approach could have been to make it possible to parameterize static inlined functions instead. Those are steps instead of solving a part of the macro problem domain using generic function. In this direction we also have parameterized (generic) structs.

That, however, would bring a significant change to the language. Similarly, a “Rust-like” macro system could offer both expressiveness and safety, but it would be more of a revolution than an evolution.

Sometimes the latter is what you want.

January 18, 2020
2 min read

How to procrastinate while working hard

Originally from: https://dev.to/lerno/how-to-procrastinate-while-working-hard-4l5f

Refactoring is an important part of programming. If you are maintaining a non-trivial code base you need to constantly remove cruft and improve on solutions otherwise the code will slowly rot.

Working with improving abstractions and code quality there is also a lure which is mostly ignored, which is over-engineering. The urge to add code that feels “magical” and just does things in an extremely elegant way. You can find examples in amazing C++ templates, or some awesomely elegant Swift code that might use some combination of operator overloading, generics and pattern matching. It might look cool, but over-engineering is dangerous.

It’s dangerous because you can spend days on that “perfect abstraction” which might be elegant on the surface — but your teammates will have a less pleasant time trying to figure out how to debug or extend it later on.

It’s dangerous because all that time you spent might make you reluctant to find easier solutions, or throw it away when it’s no longer needed.

It’s dangerous because that complexity disguised as abstraction is making your code less maintainable and also less easy to understand.

It’s dangerous because you might have thrown away bug free code and replaced it with something new and untested because you thought it might look more elegant.

But most of all it’s dangerous because it’s so damned satisfying to just find those beautiful abstractions. It’s so much fun that we forget how dangerous it is.

So when you feel the urge — remember restraint. The “magically cool things” your language can do are usually exactly those parts that you should stay clear of.

January 18, 2020
8 min read

To OO or not to OO

Originally from: https://dev.to/lerno/to-oo-or-not-to-oo-5dm5

title: To OO or not to OO published: true description: A discussion of Object Oriented Design tags: object oriented, procedural, programming languages, java

(a previous version of this text was posted on Medium)

Around 2018, partly due to following Jonathan Blow’s work on Jai and the obvious lack of OO in the language, I started to re-think about whether OO is a good thing or not. I ended up listening to — and reading — various criticisms leveled towards OO and pondered the problem quite a bit. I had also started following Bob Nystrom’s “Crafting Interpreters” series that had the second part written in a very clear and easy to understand procedural C. It reminded me how directly aimed at the problem code used to be for me.

Before OO there was just writing code

I started programming in 1982. I was 9 years old and programming on home computers was almost always BASIC (or assembler, when you wanted to get some speed). It was the era of the first “home computers”. Computer magazines would print listings of games and applications. This was part “getting programs on the cheap” and part “learning to program”.

BASIC isn’t exactly structured programming, at least not those early versions. “Structured programming” was limited to statements like GOSUB 1340. Still, you could definitely build things with it. Games and applications, all were written in BASIC. The limitation was usually the memory of the machine (typically 16 kb) rather than the structure. It might not have been elegant code, but it got things done.

Eventually I would pick up Pascal, and even though later iterations of BASIC would improve much of the 8-bit implementations, Pascal was just so much better. It was more powerful, a more importantly it was really easy to write clear and structured code. But even so, program design just wasn’t much different from writing assembler or BASIC. You started in one end and built things until they were done. It was really easy to just "get things done".

I eventually learned a bit C++, but the OO part of the language mostly escaped me. It wasn’t until I got to Java that things changed — and at the time I believed it to be for the better...

Java and “real” OO

I used to tell people that I didn’t understand object oriented programming until I learned Java. That might not be quite accurate, but it’s true that I didn’t attempt any “Object Oriented Design” until I learned Java.

Java really forced you to do objects. I myself started out when applets was the new hot thing, and the language was still in its 1.0.2 version. It was cool and magical. Objects where these small units you could craft to do things, almost like programming small robots. Instead of just plain programming you did this design where independent objects were talking and producing a result. It was awesome. And of course, it was also a lie.

With Java going mainstream, the articles and books on how to do real OO flourished. The point seemed to be that one should throw away most things about procedural programming. It was supposedly bad in the same way unstructured programming was to the procedural programming. The more we thought in terms of objects, the better things were. It was clear that nirvana was near — accessible to those lucky people who could get program Java for a living.

Object Oriented modelling nightmares

In my free time I was working on the next iteration of a complex online game that was initially written as BBS door (BBS online games back when we used dial-up modems). The original version had been written in QBasic, and I also had done a rewrite in Turbo Pascal that covered about 80% of the game. Writing the game had taken evenings and weekends spread over about half a year.

The QBasic version was tricky as you had to pass all global data explicitly between implementation files, and each file had a size limit — so it had to be split with this huge declaration of globals on top of each file. The Pascal version was so much easier to write. You didn’t need to explicitly pass globals and it was straightforward to pass things to procedures — whereas for QBasic you had to pass parameters and results in global variables you especially set aside for the purpose. Obviously the Java version — with OO goodness — must be even easier than Pascal to implement!

That turned out to be “not quite true”. I wrote page after page on the design of the classes, how each entity would know what, and what state each would contain and how to act on other entities. It felt really cool, but it was also complex and every version of the object model seemed to have some problem where eventually everything needed to know everything and every single class would be hideously complex with little way to ensure consistency. It felt like an impossible task.

Doing real things

Meanwhile I started working professionally as a programmer. I wrote in Perl, Java, learned Objective-C and Ruby. With Objective-C I discovered that the “OO” of Java/C++ was just one brand of OO. In Java the idea was to create tiny classes that were assembled to a whole, but in ObjC objects were instead used as high level “glue” between larger components written internally in straight procedural C code. The fine grained classes of Java would be considered the very opposite of good design for Objective-C. So, if OO was to be to procedural programming as procedural was to unstructured — how come there wasn’t even a consensus on how to do “proper OO”?

My pet project was still a failure. I still tried to model things the Java way and I kept failing. But then I tried writing it in Ruby and something unexpected happened.

Getting things done

In Java it’s easy to just get caught in the work of writing your classes. Java is comparatively verbose, so just writing a bunch of classes, writing getters and setters seems like you got quite a bit of work done. That’s why it was easy just to waste time testing out models and still feel like you’re getting somewhere.

In Ruby, on the other hand, writing a class is trivial. Even modelling a lot of them isn’t much work. If you’re lazy you can even generate a bunch of them using metaprogramming. Ruby is very, very quick to do prototyping in.

Suddenly I couldn’t fool myself anymore. After implementing parts of the model in Ruby it was suddenly clear that I wasn’t adding anything of value by creating those game model classes. It was work, but it wasn’t real work. I needed a new idea.

At the time I was working professionally on poker servers, and it was clear a poker game instance was simply a data structure with the deck, the current bets and players. Player actions were simply commands acting on this data according to certain rules. Maybe this idea could work…? As a prototype in my Ruby code I simply used a nested hash map — no model objects at all. Each action the player would simply invoke the corresponding method which would directly edit values of the branches of the map. A very procedural approach — even though I didn’t think of it like that at the time.

Some good things immediately resulted from this design: it was dead simple to add an “undo”-functionality, where the changes to the data could be rolled back. The data tree was naturally dead simple to serialize and deserialize (compare the headache of my old designs where each single class needed to implement serialization/deserialization on its own), and I could also track exactly what changes were made in order to assemble updates to logged in players.

I had solved my big OO problem… by adopting a more procedural mindset.

Not quite there yet

Despite solving the problem, I didn’t actually realize that the problem was OO. I just thought I had found a very clever solution.

Professionally, even though I could do all the “fancy” OO solutions with reflection, polymorphism etc, my own OO style tended to favour simple, explicit and obvious solutions. But I would pick the solutions because my experience showed me they were the best, not because I realized there was any problems with OO.

A new problem

I had built my game server and it was fine. It was trivial to extend and dead simple to add more features. There was just one problem: the client.

I had written clients before, but on a smaller. As long as you have no more than say 10–15 screens you can get away with most designs. My game client had over 50 distinct dialog screens and many different states. Things were getting messy.

I had my model and my views and my controllers and still I felt I had no control there was just so much state everywhere. Because this is the essence of Java/C++ style OO: split state into small pieces and let each object manage their specific part of the application state. It’s really a spectacularly bad idea as complexity roughly correspond to the square of different interacting states. In addition, it’s very tempting to simply let the state be implicit in some (combination of) member variable values. “Why have an explicit state when you can check the value of a variable to figure it out?”

A conclusion

In procedural programming you tend to keep your state where you can see it. Unlike OO, where you’re encouraged to split the state and hide it, you pretty much have to keep it explicit and that really is a good thing. That’s not to say that using objects is necessarily a bad thing. It’s a very powerful tool for building UI rendering hierarchies for one thing, and the namespacing together with chaining can create very smooth and readable code, compare: urlescape(substr(string, 0, strlen(string) - 2) to string.substr(0, -2).urlescape() (there can still be an argument that the former is clearer though!). However, the object oriented design with objects that keep state or act on other objects — here is where OO goes all wrong.

There is also the (mostly forgotten) Objective-C style of OO which happens to be even better for building GUIs than Java/C++, as the late binding of the dispatch and the runtime pushes it significantly closer to being a scripting language. Sadly Apple, the former champions of Objective-C, have largely forgotten what the idea behind ObjC really was, and are now replacing it with Swift which adopts the OO style of Java/C++.

Still, there are languages trying to get back to the basics. Golang is one, and many of the other new “system programming languages” also qualifies. Go in particular (despite my reservations regarding the language) disproves the myth that “it’s not possible to build large scale products with procedural programming”. The increasing popularity of the language might create a crack in the idea that OO is “inevitable”

However, Java-style OO is deeply entrenched and shows little inclination of disappearing anytime soon. It will be interesting to see what the future brings.