Skip to content

The C3 Blog

Are modules without imports "considered harmful"?

Originally from: https://c3.handmade.network/blog/p/8337-are_modules_without_imports_considered_harmful

Can you really do a module system without import statements? And should you? If you’re like me you’d probably initially dismiss the idea: “surely that can only work for very simple examples!”

But someone filed an issue to add it to C3, so I had to motivate why it would be difficult / impossible to do well (this actually ended with me redesigning the module system quite a bit). – But the question whether it’s possible stuck with me.

Why it shouldn't work

Let’s quickly review the problems with no imports (where modules are loaded automatically).

1. Ambiguities

The classic example is the function “open”, which is would clash with open in all other modules, making it necessary to use the full module names:

module foo;
fn File* open(char* filename) { … }

module bar;
fn Connection* open(char* url) { … }

module baz;
fn void test()
{
   open(“foo.txt”); // Which one is intended?
}

2. Bad search & code completion

When all files are basically importing everything then every public function should be listed for code completion or search.

If you'd just match your own code it wouldn’t be so bad, but add to that the whole standard library + any library you’re importing… you'll get a lot of matches.

3. Compiling more than necessary

Some languages use imports to figure out exactly what files to compile. Implicitly having everything imported is means everything needs to be analyzed during compilation.

4. Dependencies are not obvious

Explicit imports help both readers of the source code and things like IDEs to limit the files the current file depends on in a simple way.

Summing it up

All in all the situation looks pretty grim so there’s a reason why we don’t see this.

There are outliers: pre-namespace PHP, and from what I’ve heard there’s a Prolog variant which has a form of auto import as well. Unfortunately these examples offer very little in terms of encouragement.

Making a try

Despite this I found that I personally couldn’t really dismiss the idea entirely, for my own peace of mind I had to make sure it wasn’t possible. Let’s revisit the problems:

1. Ambiguities

In this case I actually had the problem halfway solved: in C3 all functions are expected to be called with at least a partial path qualifier.

To call the function foo() from module std::bar in another module you have to write bar::foo() to call it (std::bar::foo() works as well).

I haven't seen the idea of using abbreviated module paths elsewhere, and so it seems to be a novel invention. It should be possible to implement in any namespace scheme where namespaces are separate from types.

However for C3 structs and other user defined types do not require any qualifiers. The reasoning is that type names in general tend to be fairly unique except where two libraries trying to abstract the same thing (for example two file IO libraries will probably both use File as a type name somewhere)

Name collisions are rare with explicit imports, but for implicit imports this might become a real issue.

module foo::io;
struct File { ... }

module std::io;
struct File { ... }

module bar;
File *f; // Is which File is this?

Fortunately we can introduce a simple feature to help us out: if we reintroduce import but change its meaning so that it simply makes the imported module’s types and functions preferred over modules that aren’t imported when doing type resolution.

So returning to the example with File: rather than to have to type foo::io::File to disambiguate it from std::io::File we simply add import foo::io to the start of the file:

module bar;
import foo::io;

File *f; // This is foo::io::File

If we sort of squint at it this is actually a little like Java’s imports work: they only add possibility to use the imported classes without qualifiers.

This seems like (1) could be considered solvable for any language that are fine with path qualifiers in front of functions and globals like in C3.

3. Compiling more than necessary

For reasons that will become apparent later, let's jump to this point first.

Trying to solve this requires us to look at our compilation model in general. For the more extreme version of this, let’s assume that all our libraries are in source form rather than precompiled. We can say we roughly have 3 types of source code: the application code, external libraries and the standard library.

In C3 you already specify the libraries you want to add by specifying the libraries you need for the project. The problem here are projects that bring in their own dependencies.

There’s an simple model we could use here:

  • the application code only sees what is public in the actual libraries imported.
  • the external libraries are resolved seeing only the dependencies they have and not the application code

Let’s say you have a library which allows you to set up an HTTPS service, which in turn uses a crypto library: your application code will not see the crypto library and the HTTPS service will not see other libraries that the application code uses.

To summarize:

  1. Application code: sees library and standard library public types, variables and functions.
  2. Library: sees only public declarations of its own dependencies and the standard library.
  3. Standard library: only sees itself.

Here we're moving dependencies and imports from the source files into the build configuration.

Unfortunately, in practice we will likely still parse most of the code and start decide what needs to be lowered into actual code after analysis. In other words this is not necessarily a win. Parsing and semantic analysis is a small part of the compile time so avoiding doing it for some code doesn't necessarily help much.

Java "modules"

Taking a detour now: Java has a large standard library and typically frameworks have a fair amount of additional dependencies. To address this Java introduced “modules” in Project Jigsaw (not to be confused with the Java packages that are used with import). Jigsaw modules are essentially creating groups of Java packages documented in a special file that also specifies dependencies to other “modules”. The idea is to drastically reduce the number of packages that need to be bundled for an application.

This is very similar to the compilation issue above. By providing a file which in detail describes what parts of the libraries the application uses, the compiler can actually begin with those library definitions before lexing and parsing starts. So in your app you could perhaps not just define the libraries you want to use, but also specify the subset of the modules we are actually depending on. In practical terms we define in a single place what our imports are and the compiler just needs to work with this subset. This is sort of an analogue of keeping a precompiled header in C with all the external library headers you want to use in the project. Although we're not necessarily reducing the compile time more, we're making the job a lot simpler for the compiler.

2. Bad search & code completion

Armed with this we can go back to the question of search: if we now use these package dependency files we've suddenly reduced the lookup for code completion to the subset of packages we actually use in our project, which effectively resolves this issue.

4. Dependencies are not obvious

We’re also ready to tackle the dependencies because we're now in a much better situation than with per-file imports: we can see all dependencies our project has, and also what dependencies the libraries we depend on have by inspection of a few files.

If libraries split their dependencies into multiple groups we can also get a reduction in the number of libraries we need for compilation.

As an example, let us envision a http server library which both supports http and https. The latter depends on a cryptography library which contains multiple types of algorithms. If the library is split into multiple modules, then we can perhaps let the http part simply depend on a TCP library, whereas the https also depends on some cryptography algorithms, but perhaps only in use for https.

Depending on how much granularity there is, something not using https might avoid the download of the cryptography library, and even if https is included, packages with deprecated hash and crypto algorithms do not need to be included to compile the https library.

Does this mean it works?

It seems like for most module systems it could work – given that the caveats listed are satisfied.

But should one do it? I would hedge my bets and say "possibly". Regular imports requires less of the language and is the proven approach, but I believe I've shown that "modules without imports" could still be up for consideration when designing a language.

Comments


Comment by Christoffer Lernö

Can you really do a module system without import statements? And should you? If you’re like me you’d probably initially dismiss the idea: “surely that can only work for very simple examples!”

But someone filed an issue to add it to C3, so I had to motivate why it would be difficult / impossible to do well (this actually ended with me redesigning the module system quite a bit). – But the question whether it’s possible stuck with me.

Why it shouldn't work

Let’s quickly review the problems with no imports (where modules are loaded automatically).

1. Ambiguities

The classic example is the function “open”, which is would clash with open in all other modules, making it necessary to use the full module names:

module foo;
fn File* open(char* filename) { … }

module bar;
fn Connection* open(char* url) { … }

module baz;
fn void test()
{
   open(“foo.txt”); // Which one is intended?
}

2. Bad search & code completion

When all files are basically importing everything then every public function should be listed for code completion or search.

If you'd just match your own code it wouldn’t be so bad, but add to that the whole standard library + any library you’re importing… you'll get a lot of matches.

3. Compiling more than necessary

Some languages use imports to figure out exactly what files to compile. Implicitly having everything imported is means everything needs to be analyzed during compilation.

4. Dependencies are not obvious

Explicit imports help both readers of the source code and things like IDEs to limit the files the current file depends on in a simple way.

Summing it up

All in all the situation looks pretty grim so there’s a reason why we don’t see this.

There are outliers: pre-namespace PHP, and from what I’ve heard there’s a Prolog variant which has a form of auto import as well. Unfortunately these examples offer very little in terms of encouragement.

Making a try

Despite this I found that I personally couldn’t really dismiss the idea entirely, for my own peace of mind I had to make sure it wasn’t possible. Let’s revisit the problems:

1. Ambiguities

In this case I actually had the problem halfway solved: in C3 all functions are expected to be called with at least a partial path qualifier.

To call the function foo() from module std::bar in another module you have to write bar::foo() to call it (std::bar::foo() works as well).

I haven't seen the idea of using abbreviated module paths elsewhere, and so it seems to be a novel invention. It should be possible to implement in any namespace scheme where namespaces are separate from types.

However for C3 structs and other user defined types do not require any qualifiers. The reasoning is that type names in general tend to be fairly unique except where two libraries trying to abstract the same thing (for example two file IO libraries will probably both use File as a type name somewhere)

Name collisions are rare with explicit imports, but for implicit imports this might become a real issue.

module foo::io;
struct File { ... }

module std::io;
struct File { ... }

module bar;
File *f; // Is which File is this?

Fortunately we can introduce a simple feature to help us out: if we reintroduce import but change its meaning so that it simply makes the imported module’s types and functions preferred over modules that aren’t imported when doing type resolution.

So returning to the example with File: rather than to have to type foo::io::File to disambiguate it from std::io::File we simply add import foo::io to the start of the file:

module bar;
import foo::io;

File *f; // This is foo::io::File

If we sort of squint at it this is actually a little like Java’s imports work: they only add possibility to use the imported classes without qualifiers.

This seems like (1) could be considered solvable for any language that are fine with path qualifiers in front of functions and globals like in C3.

3. Compiling more than necessary

For reasons that will become apparent later, let's jump to this point first.

Trying to solve this requires us to look at our compilation model in general. For the more extreme version of this, let’s assume that all our libraries are in source form rather than precompiled. We can say we roughly have 3 types of source code: the application code, external libraries and the standard library.

In C3 you already specify the libraries you want to add by specifying the libraries you need for the project. The problem here are projects that bring in their own dependencies.

There’s an simple model we could use here:

  • the application code only sees what is public in the actual libraries imported.
  • the external libraries are resolved seeing only the dependencies they have and not the application code

Let’s say you have a library which allows you to set up an HTTPS service, which in turn uses a crypto library: your application code will not see the crypto library and the HTTPS service will not see other libraries that the application code uses.

To summarize:

  1. Application code: sees library and standard library public types, variables and functions.
  2. Library: sees only public declarations of its own dependencies and the standard library.
  3. Standard library: only sees itself.

Here we're moving dependencies and imports from the source files into the build configuration.

Unfortunately, in practice we will likely still parse most of the code and start decide what needs to be lowered into actual code after analysis. In other words this is not necessarily a win. Parsing and semantic analysis is a small part of the compile time so avoiding doing it for some code doesn't necessarily help much.

Java "modules"

Taking a detour now: Java has a large standard library and typically frameworks have a fair amount of additional dependencies. To address this Java introduced “modules” in Project Jigsaw (not to be confused with the Java packages that are used with import). Jigsaw modules are essentially creating groups of Java packages documented in a special file that also specifies dependencies to other “modules”. The idea is to drastically reduce the number of packages that need to be bundled for an application.

This is very similar to the compilation issue above. By providing a file which in detail describes what parts of the libraries the application uses, the compiler can actually begin with those library definitions before lexing and parsing starts. So in your app you could perhaps not just define the libraries you want to use, but also specify the subset of the modules we are actually depending on. In practical terms we define in a single place what our imports are and the compiler just needs to work with this subset. This is sort of an analogue of keeping a precompiled header in C with all the external library headers you want to use in the project. Although we're not necessarily reducing the compile time more, we're making the job a lot simpler for the compiler.

2. Bad search & code completion

Armed with this we can go back to the question of search: if we now use these package dependency files we've suddenly reduced the lookup for code completion to the subset of packages we actually use in our project, which effectively resolves this issue.

4. Dependencies are not obvious

We’re also ready to tackle the dependencies because we're now in a much better situation than with per-file imports: we can see all dependencies our project has, and also what dependencies the libraries we depend on have by inspection of a few files.

If libraries split their dependencies into multiple groups we can also get a reduction in the number of libraries we need for compilation.

As an example, let us envision a http server library which both supports http and https. The latter depends on a cryptography library which contains multiple types of algorithms. If the library is split into multiple modules, then we can perhaps let the http part simply depend on a TCP library, whereas the https also depends on some cryptography algorithms, but perhaps only in use for https.

Depending on how much granularity there is, something not using https might avoid the download of the cryptography library, and even if https is included, packages with deprecated hash and crypto algorithms do not need to be included to compile the https library.

Does this mean it works?

It seems like for most module systems it could work – given that the caveats listed are satisfied.

But should one do it? I would hedge my bets and say "possibly". Regular imports requires less of the language and is the proven approach, but I believe I've shown that "modules without imports" could still be up for consideration when designing a language.

Fixing lexical scopes... badly

Originally from: https://c3.handmade.network/blog/p/8272-fixing_lexical_scopes..._badly

Last time we were looking at this example:

macro int testmacro(int x)
{
  int z = 2;
  for (int i = 0; i < x; i++)
  {
    z *= 2;
  }
  return z;
}     

fn int test(int y)
{
  int z = getValue(y);
  int x = @testmacro(z);
  return z + x;
}

In our previous solution, we had the variables in an array, where each scope would keep track of the current and last local. Before entering the testmacro, this list is [int y, int z, int x], but entering the macro we would get [int y, int z, int x, int x, int z]. Which would mean shadowing.

A naive solution would be name mangling, let's say macro names are prefixed with something: [int y, int z, int x, int _testmacro$x, int _testmacro$z. Our lookup must then:

  1. Lookup with the macro prefix.
  2. If not found, lookup without the macro prefix, but in this case only accept globals.

Aside from not actually solving later problems, it's complex for no real benefit, because we can essentially insert a sentinel in the list: [int y, int z, int x, SENTINEL, int x, int z].

Now when we scan back we always stop at the sentinel value. This means that entering the macro scope we simply push the sentinel value on the stack of locals (this is not the only way to introduce the same effect, but it's the simplest version to explain). When looking up locals in the array we can now stop as soon as we reach either the first element OR the sentinel value.

Problem solved?

Resolution without hierarchies

If your macro resolution only takes values, then this solution is sufficient. However, often we want to use macros to provide an expression that only conditionally is evaluated. In C3 we use # in front of the variable identifier to indicate an unresolved expression.

macro foo(#expr)
{
  return #expr * #expr;
}
fn int test(int z)
{
  return @foo(call(z));
  // => return call(z) * call(z);
}

Now we're running into problems. Both z and call should be resolved in the test function scope. Ooops.

What happens if we tag the #expr with the current scope? This seems like it could work, but in C3, like with GCC statement expressions, we can introduce new variables.

macro int two_times(#expr)
{
  int w = 1;
  #expr;
  #expr;
  return w;
}

fn void test2(int z)
{
  @two_times({| int y = z; call(y); |});
}

So we go into two_times with [int z], then add w for [int z, SENTINEL, int w]. Now when we evaluate two_times we would like something like this: [int z, int y, SENTINEL, int w]. That is, we slip in a new scope in the function scope, and not in the macro scope we pushed.

Trying a hack

What we might realize here is that if we evaluate expr just to the declaration before entering, so that all declarations ar resolved, we might just get the behaviour we want. So something like this:

  1. Enter test2 scope
  2. Push z
  3. Start evaluating the macro call.
  4. Take the macro call argument and only check the declarations.
  5. Enter expr scope
  6. Push y
  7. Resolve z
  8. Resolve y
  9. Pop expr scope
  10. Pass in this pre-checked expression into the macro.
  11. Enter the two_times scope
  12. Push w
  13. Copy #expr and insert it.
  14. Evaluate #expr - which will not need a lookup
  15. Copy #expr and insert it.
  16. Evaluate #expr - which will not need a lookup
  17. Lookup w
  18. Pop the macro scope
  19. Pop the test2 scope

This scheme looks like it would work, but there are questions: what if the declarations inside should not be resolved the same way twice? What if the expr instead looks like:

@two_times({|
  $if (@some_compile_time_macro(...)):
    int y = 0;
  $else:
    int z = 0;
  $endif;
  $if ($defined(y)):
    y = 1;
  $endif;
|});

Here it's not clear that two invocations of the same expr will even lower to the same declarations! So we can't do the lookup ahead of time.

The alternative is to completely evaluate expr, not just the declarations. It's a possible solution, but the corner cases with this approach are hard to foresee.

Summary

If our macros only take values then we can retain a simple model for symbol lookup using a single stack. However, if we can provide expressions or even statements, then these need to not only resolve symbols in the original scope but also possibly introduce them. Pre-checking expressions do not work well with compile time evaluation, since they may change every evaluation.

But maybe there is some way to salvage the model? We'll look at that next.

Comments


Comment by Christoffer Lernö

Last time we were looking at this example:

macro int testmacro(int x)
{
  int z = 2;
  for (int i = 0; i < x; i++)
  {
    z *= 2;
  }
  return z;
}     

fn int test(int y)
{
  int z = getValue(y);
  int x = @testmacro(z);
  return z + x;
}

In our previous solution, we had the variables in an array, where each scope would keep track of the current and last local. Before entering the testmacro, this list is [int y, int z, int x], but entering the macro we would get [int y, int z, int x, int x, int z]. Which would mean shadowing.

A naive solution would be name mangling, let's say macro names are prefixed with something: [int y, int z, int x, int _testmacro$x, int _testmacro$z. Our lookup must then:

  1. Lookup with the macro prefix.
  2. If not found, lookup without the macro prefix, but in this case only accept globals.

Aside from not actually solving later problems, it's complex for no real benefit, because we can essentially insert a sentinel in the list: [int y, int z, int x, SENTINEL, int x, int z].

Now when we scan back we always stop at the sentinel value. This means that entering the macro scope we simply push the sentinel value on the stack of locals (this is not the only way to introduce the same effect, but it's the simplest version to explain). When looking up locals in the array we can now stop as soon as we reach either the first element OR the sentinel value.

Problem solved?

Resolution without hierarchies

If your macro resolution only takes values, then this solution is sufficient. However, often we want to use macros to provide an expression that only conditionally is evaluated. In C3 we use # in front of the variable identifier to indicate an unresolved expression.

macro foo(#expr)
{
  return #expr * #expr;
}
fn int test(int z)
{
  return @foo(call(z));
  // => return call(z) * call(z);
}

Now we're running into problems. Both z and call should be resolved in the test function scope. Ooops.

What happens if we tag the #expr with the current scope? This seems like it could work, but in C3, like with GCC statement expressions, we can introduce new variables.

macro int two_times(#expr)
{
  int w = 1;
  #expr;
  #expr;
  return w;
}

fn void test2(int z)
{
  @two_times({| int y = z; call(y); |});
}

So we go into two_times with [int z], then add w for [int z, SENTINEL, int w]. Now when we evaluate two_times we would like something like this: [int z, int y, SENTINEL, int w]. That is, we slip in a new scope in the function scope, and not in the macro scope we pushed.

Trying a hack

What we might realize here is that if we evaluate expr just to the declaration before entering, so that all declarations ar resolved, we might just get the behaviour we want. So something like this:

  1. Enter test2 scope
  2. Push z
  3. Start evaluating the macro call.
  4. Take the macro call argument and only check the declarations.
  5. Enter expr scope
  6. Push y
  7. Resolve z
  8. Resolve y
  9. Pop expr scope
  10. Pass in this pre-checked expression into the macro.
  11. Enter the two_times scope
  12. Push w
  13. Copy #expr and insert it.
  14. Evaluate #expr - which will not need a lookup
  15. Copy #expr and insert it.
  16. Evaluate #expr - which will not need a lookup
  17. Lookup w
  18. Pop the macro scope
  19. Pop the test2 scope

This scheme looks like it would work, but there are questions: what if the declarations inside should not be resolved the same way twice? What if the expr instead looks like:

@two_times({|
  $if (@some_compile_time_macro(...)):
    int y = 0;
  $else:
    int z = 0;
  $endif;
  $if ($defined(y)):
    y = 1;
  $endif;
|});

Here it's not clear that two invocations of the same expr will even lower to the same declarations! So we can't do the lookup ahead of time.

The alternative is to completely evaluate expr, not just the declarations. It's a possible solution, but the corner cases with this approach are hard to foresee.

Summary

If our macros only take values then we can retain a simple model for symbol lookup using a single stack. However, if we can provide expressions or even statements, then these need to not only resolve symbols in the original scope but also possibly introduce them. Pre-checking expressions do not work well with compile time evaluation, since they may change every evaluation.

But maybe there is some way to salvage the model? We'll look at that next.

Implementing lexical scopes in a simple but wrong way

Originally from: https://c3.handmade.network/blog/p/8271-implementing_lexical_scopes_in_a_simple_but_wrong_way

The problem: implementing lexical scope in a programming language.

Here is an example.

fn void test(int y)
{
  int x = 123;
  for (int i = 1; i < y; i++)
  {
    int z = foo(i + x); // The first z
    if (z > 100) x++;    
  }
  // The second z
  int z = x + 1; 
  // z += i; <- Error since i is not in scope 
}

Formulated like this, the problem is fairly simple. One solution is like this:

Decl *locals[MAX_LOCALS];
Scope scopes[MAX_SCOPES];
Scope *next_scope = &scopes[0];
Decl **next_local = &locals[0];

void push_scope()
{
  *next_scope = { .local_start = next_local, .local_end = next_local };
  // TODO Check we didn't reach max 
  next_scope--;
}
void pop_scope()
{
  // Move the current local to the start.
  next_scope--;
  next_local = next_scope.local_start;
}

void push_local(Decl *decl)
{
  // TODO check we didn't reach max
  *next_local = decl;
  next_local++;
  next_scope[-1].local_end = next_local;
}

This solution isn't the simplest we could do, but it has some advantages knowing both the end and the beginning of the locals. Doing a lookup we do a linear search back from the last local:

Decl *find_decl(const char *interned_name)
{
  Decl **current = next_local;
  while (current != &locals[0])
  {
    current--;
    if (current[0].name == interned_name) return current[0];
  }
  return NULL;
}

This might not seem super fast, but variables are often used near their definition and the number of locals are usually fairly limited anyway.

Roughly this is what happens in this algorithm as it goes through the code above:

1. Push function scope

scope: [{ 0, 0 }]
decl: []

2. The param is added

scope: [{ 0, 1 }]
decl: [int y]

3. Push int x

scope: [{ 0, 2 }]
decl: [int y, int x]

4. Push "for" scope

scope: [{ 0, 2 }, { 2, 2 }]
decl: [int y, int x]

5. Push int i

scope: [{ 0, 2 }, { 2, 3 }]
decl: [int y, int x, int i]

6. Push "for body" scope

scope: [{ 0, 2 }, { 2, 3 }, { 3, 3 }]
decl: [int y, int x, int i]

7. Push int z

scope: [{ 0, 2 }, { 2, 3 }, { 3, 4 }]
decl: [int y, int x, int i, int z]

8. Pop "for body" scope

scope: [{ 0, 2 }, { 2, 3 }]
decl: [int y, int x, int i]

9. Pop "for" scope

scope: [{ 0, 2 }]
decl: [int y, int x]

10. Push int z

scope: [{ 0, 3 }]
decl: [int y, int x, int z]

11. Pop function scope

scope: []
decl: []

It's fairly straightforward to see how - after defining z - we can lookup x and i in step 7.

This model is simple and easy to understand – but we skipped something, didn't we? Yes, the lookup of the function foo. In this scheme, anything not found in the locals, will then first look through the current file, then the module and then finally any imports. (This is not how C works, but that is because in C everything is just a single flat file in the end, so we just "look through" the file).

Macros

So far so good, and if this was all we had, the story would end here. But there's an added complexity. What if we have macros?

Here is a simple example:

macro int testmacro(int x)
{
  int z = 2;
  for (int i = 0; i < x; i++)
  {
    z *= 2;
  }
  return z;
}     

fn int test(int y)
{
  int z = getValue(y);
  int x = @testmacro(z);
  return z + x;
}

Assuming we want hygienic macros we cannot simply copy the macro into the function, names and all:

// Broken version
fn int test(int y)
{
  int z = getValue(y);
  int x = z;
  int z = 2;
  for (int i = 0; i < x; i++)
  {
    z *= 2;
  }
  int _macro_result = z;
  int x = _macro_result;
  return z + x;
}

In C3 there is an expression block, working similar to a statement expression in GCC C. If this was used together with allowing shadowing, we'd get something that halfway worked:

// Works but...
fn int test(int y)
{
  int z = getValue(y);
  int x = {|
    int x = z;
    int z = 2;
    for (int i = 0; i < x; i++)
    {
      z *= 2;
    }
    return z;
  |};
  return z + x;
}

This seems to work, but what if we wrote this incorrect macro:

macro testmacro(int x)
{
  return y * x;
}

If we fold that:

fn int test(int y)
{
  int z = getValue(y);
  int x = {|
    int x = z;
    return y * x;
  |};
  return z + x;
}

So now y is leaking into the macro, which is implicitly capturing y. This is not what we want (and as we'll see later this is just the tip of the iceberg).

Next I'll show a way to extend our simple scope system to make sure that the macros are indeed hygenic, and show other things that breaks.

Comments


Comment by Christoffer Lernö

The problem: implementing lexical scope in a programming language.

Here is an example.

fn void test(int y)
{
  int x = 123;
  for (int i = 1; i < y; i++)
  {
    int z = foo(i + x); // The first z
    if (z > 100) x++;    
  }
  // The second z
  int z = x + 1; 
  // z += i; <- Error since i is not in scope 
}

Formulated like this, the problem is fairly simple. One solution is like this:

Decl *locals[MAX_LOCALS];
Scope scopes[MAX_SCOPES];
Scope *next_scope = &scopes[0];
Decl **next_local = &locals[0];

void push_scope()
{
  *next_scope = { .local_start = next_local, .local_end = next_local };
  // TODO Check we didn't reach max 
  next_scope--;
}
void pop_scope()
{
  // Move the current local to the start.
  next_scope--;
  next_local = next_scope.local_start;
}

void push_local(Decl *decl)
{
  // TODO check we didn't reach max
  *next_local = decl;
  next_local++;
  next_scope[-1].local_end = next_local;
}

This solution isn't the simplest we could do, but it has some advantages knowing both the end and the beginning of the locals. Doing a lookup we do a linear search back from the last local:

Decl *find_decl(const char *interned_name)
{
  Decl **current = next_local;
  while (current != &locals[0])
  {
    current--;
    if (current[0].name == interned_name) return current[0];
  }
  return NULL;
}

This might not seem super fast, but variables are often used near their definition and the number of locals are usually fairly limited anyway.

Roughly this is what happens in this algorithm as it goes through the code above:

1. Push function scope

scope: [{ 0, 0 }]
decl: []

2. The param is added

scope: [{ 0, 1 }]
decl: [int y]

3. Push int x

scope: [{ 0, 2 }]
decl: [int y, int x]

4. Push "for" scope

scope: [{ 0, 2 }, { 2, 2 }]
decl: [int y, int x]

5. Push int i

scope: [{ 0, 2 }, { 2, 3 }]
decl: [int y, int x, int i]

6. Push "for body" scope

scope: [{ 0, 2 }, { 2, 3 }, { 3, 3 }]
decl: [int y, int x, int i]

7. Push int z

scope: [{ 0, 2 }, { 2, 3 }, { 3, 4 }]
decl: [int y, int x, int i, int z]

8. Pop "for body" scope

scope: [{ 0, 2 }, { 2, 3 }]
decl: [int y, int x, int i]

9. Pop "for" scope

scope: [{ 0, 2 }]
decl: [int y, int x]

10. Push int z

scope: [{ 0, 3 }]
decl: [int y, int x, int z]

11. Pop function scope

scope: []
decl: []

It's fairly straightforward to see how - after defining z - we can lookup x and i in step 7.

This model is simple and easy to understand – but we skipped something, didn't we? Yes, the lookup of the function foo. In this scheme, anything not found in the locals, will then first look through the current file, then the module and then finally any imports. (This is not how C works, but that is because in C everything is just a single flat file in the end, so we just "look through" the file).

Macros

So far so good, and if this was all we had, the story would end here. But there's an added complexity. What if we have macros?

Here is a simple example:

macro int testmacro(int x)
{
  int z = 2;
  for (int i = 0; i < x; i++)
  {
    z *= 2;
  }
  return z;
}     

fn int test(int y)
{
  int z = getValue(y);
  int x = @testmacro(z);
  return z + x;
}

Assuming we want hygienic macros we cannot simply copy the macro into the function, names and all:

// Broken version
fn int test(int y)
{
  int z = getValue(y);
  int x = z;
  int z = 2;
  for (int i = 0; i < x; i++)
  {
    z *= 2;
  }
  int _macro_result = z;
  int x = _macro_result;
  return z + x;
}

In C3 there is an expression block, working similar to a statement expression in GCC C. If this was used together with allowing shadowing, we'd get something that halfway worked:

// Works but...
fn int test(int y)
{
  int z = getValue(y);
  int x = {|
    int x = z;
    int z = 2;
    for (int i = 0; i < x; i++)
    {
      z *= 2;
    }
    return z;
  |};
  return z + x;
}

This seems to work, but what if we wrote this incorrect macro:

macro testmacro(int x)
{
  return y * x;
}

If we fold that:

fn int test(int y)
{
  int z = getValue(y);
  int x = {|
    int x = z;
    return y * x;
  |};
  return z + x;
}

So now y is leaking into the macro, which is implicitly capturing y. This is not what we want (and as we'll see later this is just the tip of the iceberg).

Next I'll show a way to extend our simple scope system to make sure that the macros are indeed hygenic, and show other things that breaks.

C3 Feature List

Originally from: https://c3.handmade.network/blog/p/8211-c3_feature_list

I wrapped up the bitstruct code last week, and together with that removed the virtual type. Happily this make the language nearly feature complete for 1.0.

This also means I finally have the means to write a fairly solid "what's different from C" list. It's not written in stone, and some things may change, but since the dev from now on is going to be about fleshing out existing features rather than adding new features, any major new features are unlikely.

So without further ado, here's the list of changes from C:

  • Module system
  • Optional contracts
  • Semantic macros
  • Templates through generic modules
  • Subarrays (slices)
  • Foreach
  • Distinct types (similar to typedef but the type is distinct)
  • Compile time evaluation
  • Compile time reflection of types
  • Defer
  • Arrays as values
  • Struct sub-typing (similar to embedded structs in Go)
  • Built-in SIMD vectors
  • Overloadable foreach, allowing types to define custom foreach.
  • Less permissive implicit type conversions and safer widenings
  • Subarray assign/set (e.g. foo[1..3] = 3)
  • Language support for error values
  • Type methods (dot-syntax invocation)
  • Implicit deref on . (removes ->)
  • Bitstructs (well-defined bit packing)
  • Expression blocks (similar to GCC statement expressions)
  • Enum associated values
  • Opt-in structural typing
  • Integer types have well defined bit width
  • 2cc, 4cc, 8cc literals
  • Base64 and hex literals
  • Signed overflow is wrapping
  • Most C UB moved to "implementation defined behaviour"
  • Signed integers are 2s complement
  • Typesafe varargs
  • "Any" type
  • Fewer operator precedence levels
  • Build system included

And yes, with some exceptions you can play with these features today.

Comments


Comment by Christoffer Lernö

I wrapped up the bitstruct code last week, and together with that removed the virtual type. Happily this make the language nearly feature complete for 1.0.

This also means I finally have the means to write a fairly solid "what's different from C" list. It's not written in stone, and some things may change, but since the dev from now on is going to be about fleshing out existing features rather than adding new features, any major new features are unlikely.

So without further ado, here's the list of changes from C:

  • Module system
  • Optional contracts
  • Semantic macros
  • Templates through generic modules
  • Subarrays (slices)
  • Foreach
  • Distinct types (similar to typedef but the type is distinct)
  • Compile time evaluation
  • Compile time reflection of types
  • Defer
  • Arrays as values
  • Struct sub-typing (similar to embedded structs in Go)
  • Built-in SIMD vectors
  • Overloadable foreach, allowing types to define custom foreach.
  • Less permissive implicit type conversions and safer widenings
  • Subarray assign/set (e.g. foo[1..3] = 3)
  • Language support for error values
  • Type methods (dot-syntax invocation)
  • Implicit deref on . (removes ->)
  • Bitstructs (well-defined bit packing)
  • Expression blocks (similar to GCC statement expressions)
  • Enum associated values
  • Opt-in structural typing
  • Integer types have well defined bit width
  • 2cc, 4cc, 8cc literals
  • Base64 and hex literals
  • Signed overflow is wrapping
  • Most C UB moved to "implementation defined behaviour"
  • Signed integers are 2s complement
  • Typesafe varargs
  • "Any" type
  • Fewer operator precedence levels
  • Build system included

And yes, with some exceptions you can play with these features today.

Your users will do what you make easy

Originally from: https://c3.handmade.network/blog/p/8208-your_users_will_do_what_you_make_easy

Recently was thinking about Java and reflection and how it actually ended up causing the proliferation of "enterprise-y frameworks" written in the language.

Java reflection & serialization

Java early on had built-in serialization. It was very generic but could serialize basically anything in the object graph. The output was also very verbose.

This functionality sort of set the bar for what was to come. So when there came more libraries that wanted to serialize and deserialize with Java, then everyone said "oh, yeah this is cool, it's better than what's built in!".

And then someone said "hey, now that we have this easy serialization and deserialization - guess what, we can do it from config files, that way we get more flexibility!" – And people were at this point already accepting that they were serializing and deserializing from some generic format that wasn't really tailored for their code. So what could go wrong with a generic configuration that wasn't tailored for their code?

Next you got the proliferation of configs everywhere, where you have to mess around with config files in Java because no one builds APIs to actually programmatically configure things.

– And this just comes from the easy accessibility of reflection and serialization in early Java.

Contrast this to C, where you don't have all those tools and have to build your config readers by hand. – So you'll again start shopping for libraries, but since there is no vertically integrated stack that does everything from reading your config to assembling your objects, odds are that you'll at least limit yourself to what you need. Heck, you might even build it yourself.

The tragedy of Objective-C

A similar thing occurred with ObjC in the iPhone gold rush. A lot of Java/C++ programmers arrived to the platform, and in Java/C++, OO is usually objects all the way down. – Because hey there's no problem doing that and in fact that's how things are built with OO in those languages: just split things into increasingly finer grained classes and objects.

ObjC was intended to be used in a different way though. ObjC OO is C with an OO layer for interop. You're supposed to write 95+% C with ObjC as the high level "script-like" glue, you don't have to be afraid of C structs or enums.

Also, writing ObjC classes takes more time than doing the same with Java/C++, and again that wasn't a problem because you weren't supposed to use ObjC classes in the same way.

So then all the C++/Java folks showed up and boy did they moan about how hard ObjC was to use... well because they tried to use 5% C and 95% OO. The result was both slow to write and slow to execute. Basically they were trying to write Java in Objective–C.

Now what did Apple do?

  1. Automated refcounting - don't let the poor Java developers think about memory management of their gazillion objects, where before (in canonical ObjC projects) memory management had been a very small concern (objects were rarely allocated/deallocated as they were large scale structures in sane ObjC programs) this became a huge problem with Java style OO.
  2. Properties - if you use ObjC classes instead of normal C structs where you should be using C structs, then it's a pain writing accessors. So add code to automate that to allow people to easier write bad ObjC code in a Java style!
  3. Since these devs continued to complain, introduce Swift, which was advertised as "ObjC without the C" but was "C++ without the C". Swift allowed devs to work in a C++/Java way the way they were used to. Swift was also a hugely complex language that was slower than ObjC and lacked a lot of good stuff that ObjC had, but hey, you can't have everything, right?
Where did it go wrong?

It seems that as soon as you create an alternative that is easier than doing it in some other way, people will flock around that alternative, even if it is isn't very good practice in the long run (like the built in Java serialization). And in fact trying to "fix" that to make the wrong choice less problematic is just legitimizing the wrong choice - like Apple did when they made it easier to write Java-style Objective-C.

As a language designer, one often runs into "that's easy to add" sort of features. But it seems to me that one has to be careful to not make things easy people aren't suppose to use.

If I add objects and classes to C3, but say "I only include this for doing things that really works well with OO, like GUI toolkits", then should I really blame people coming form an OO background that they start building everything with objects? If I make it just as easy as building with the preferred procedural style?

Conclusion

I don't really like opinionated languages, but I also realize that one does have some responsibility to make things easy that should be used, and hard if they shouldn't be used (often).

As an example, let's say there are two ways to solve a problem in a language: A and B. If A is the best way (maintainability, fit with the other language mechanisms etc), then it should always be easier to do, even if it that means deliberately making B harder than it needs to be. "Making B easier because it has very low cost for me to implement" is not a neutral act of design, but an implicit endorsement.

The user will look at the programming language and think that what is the easiest thing to do is the best thing to do, and violating that is really letting the users down.

Comments


Comment by Christoffer Lernö

Recently was thinking about Java and reflection and how it actually ended up causing the proliferation of "enterprise-y frameworks" written in the language.

Java reflection & serialization

Java early on had built-in serialization. It was very generic but could serialize basically anything in the object graph. The output was also very verbose.

This functionality sort of set the bar for what was to come. So when there came more libraries that wanted to serialize and deserialize with Java, then everyone said "oh, yeah this is cool, it's better than what's built in!".

And then someone said "hey, now that we have this easy serialization and deserialization - guess what, we can do it from config files, that way we get more flexibility!" – And people were at this point already accepting that they were serializing and deserializing from some generic format that wasn't really tailored for their code. So what could go wrong with a generic configuration that wasn't tailored for their code?

Next you got the proliferation of configs everywhere, where you have to mess around with config files in Java because no one builds APIs to actually programmatically configure things.

– And this just comes from the easy accessibility of reflection and serialization in early Java.

Contrast this to C, where you don't have all those tools and have to build your config readers by hand. – So you'll again start shopping for libraries, but since there is no vertically integrated stack that does everything from reading your config to assembling your objects, odds are that you'll at least limit yourself to what you need. Heck, you might even build it yourself.

The tragedy of Objective-C

A similar thing occurred with ObjC in the iPhone gold rush. A lot of Java/C++ programmers arrived to the platform, and in Java/C++, OO is usually objects all the way down. – Because hey there's no problem doing that and in fact that's how things are built with OO in those languages: just split things into increasingly finer grained classes and objects.

ObjC was intended to be used in a different way though. ObjC OO is C with an OO layer for interop. You're supposed to write 95+% C with ObjC as the high level "script-like" glue, you don't have to be afraid of C structs or enums.

Also, writing ObjC classes takes more time than doing the same with Java/C++, and again that wasn't a problem because you weren't supposed to use ObjC classes in the same way.

So then all the C++/Java folks showed up and boy did they moan about how hard ObjC was to use... well because they tried to use 5% C and 95% OO. The result was both slow to write and slow to execute. Basically they were trying to write Java in Objective–C.

Now what did Apple do?

  1. Automated refcounting - don't let the poor Java developers think about memory management of their gazillion objects, where before (in canonical ObjC projects) memory management had been a very small concern (objects were rarely allocated/deallocated as they were large scale structures in sane ObjC programs) this became a huge problem with Java style OO.
  2. Properties - if you use ObjC classes instead of normal C structs where you should be using C structs, then it's a pain writing accessors. So add code to automate that to allow people to easier write bad ObjC code in a Java style!
  3. Since these devs continued to complain, introduce Swift, which was advertised as "ObjC without the C" but was "C++ without the C". Swift allowed devs to work in a C++/Java way the way they were used to. Swift was also a hugely complex language that was slower than ObjC and lacked a lot of good stuff that ObjC had, but hey, you can't have everything, right?
Where did it go wrong?

It seems that as soon as you create an alternative that is easier than doing it in some other way, people will flock around that alternative, even if it is isn't very good practice in the long run (like the built in Java serialization). And in fact trying to "fix" that to make the wrong choice less problematic is just legitimizing the wrong choice - like Apple did when they made it easier to write Java-style Objective-C.

As a language designer, one often runs into "that's easy to add" sort of features. But it seems to me that one has to be careful to not make things easy people aren't suppose to use.

If I add objects and classes to C3, but say "I only include this for doing things that really works well with OO, like GUI toolkits", then should I really blame people coming form an OO background that they start building everything with objects? If I make it just as easy as building with the preferred procedural style?

Conclusion

I don't really like opinionated languages, but I also realize that one does have some responsibility to make things easy that should be used, and hard if they shouldn't be used (often).

As an example, let's say there are two ways to solve a problem in a language: A and B. If A is the best way (maintainability, fit with the other language mechanisms etc), then it should always be easier to do, even if it that means deliberately making B harder than it needs to be. "Making B easier because it has very low cost for me to implement" is not a neutral act of design, but an implicit endorsement.

The user will look at the programming language and think that what is the easiest thing to do is the best thing to do, and violating that is really letting the users down.

Removing features, gaining freedom

Originally from: https://c3.handmade.network/blog/p/8168-removing_features%252C_gaining_freedom

I recently removed untyped literals, and that complexity is now leaving me free to actually add code where there's a tangible benefit to it.

It might not be immediately obvious that having untyped literals adds complexity to a language. For C3, with its C-like implicit conversions and untyped macros, it adds more complexity than in languages like Go, where explicit conversions are enforced.

The sheer amount of complexity I had added to just do this surprised even me though. For example, every expression needed to pass down a "target type" just in case the analysis encountered an untyped literal to give a type to. In some cases this was far from trivial, and analysis had to be done in a certain order to prevent unnecessary error situations, where the incorrect order would only be obvious when the expressions where combined in some special way.

As I removed the code I also discovered I could change how the "failables" (error unions) were handled, which further reduced complexity. This in turn allowed me to do simplifications in tracking constant and "pure" expressions.

Lots of code that I had left half unfinished – with special, problematic, cases to solve another day – would after refactoring easily cover all cases and be smaller.

The code now seems so much easier to grasp, and it seems crazy I didn't removed this feature - that at the most saved a little typing here and there - earlier. But the problem was that it was a gentle creep. I did the untyped literals early, so when I added more complex features I didn't have a comparison with how it would look with untyped literals removed.

When I look at the code now, I see something that more easily can accommodate added features and behaviours. The semantic analysis had before by necessity been much more coupled due to type flow going both bottom up and top down.

There's a lesson here – which is that a seemingly simple and "nice to have" feature might by accident end up making a code base more than twice as complex to read and reason about. And that complexity cost takes away the resources the compiler (and the language) has for other features – features that may actually have a cost that is equal to its benefits.

In removing this small feature with little practical benefit, I am now free to actually add a little complexity where it really helps the users.

Comments


Comment by Christoffer Lernö

I recently removed untyped literals, and that complexity is now leaving me free to actually add code where there's a tangible benefit to it.

It might not be immediately obvious that having untyped literals adds complexity to a language. For C3, with its C-like implicit conversions and untyped macros, it adds more complexity than in languages like Go, where explicit conversions are enforced.

The sheer amount of complexity I had added to just do this surprised even me though. For example, every expression needed to pass down a "target type" just in case the analysis encountered an untyped literal to give a type to. In some cases this was far from trivial, and analysis had to be done in a certain order to prevent unnecessary error situations, where the incorrect order would only be obvious when the expressions where combined in some special way.

As I removed the code I also discovered I could change how the "failables" (error unions) were handled, which further reduced complexity. This in turn allowed me to do simplifications in tracking constant and "pure" expressions.

Lots of code that I had left half unfinished – with special, problematic, cases to solve another day – would after refactoring easily cover all cases and be smaller.

The code now seems so much easier to grasp, and it seems crazy I didn't removed this feature - that at the most saved a little typing here and there - earlier. But the problem was that it was a gentle creep. I did the untyped literals early, so when I added more complex features I didn't have a comparison with how it would look with untyped literals removed.

When I look at the code now, I see something that more easily can accommodate added features and behaviours. The semantic analysis had before by necessity been much more coupled due to type flow going both bottom up and top down.

There's a lesson here – which is that a seemingly simple and "nice to have" feature might by accident end up making a code base more than twice as complex to read and reason about. And that complexity cost takes away the resources the compiler (and the language) has for other features – features that may actually have a cost that is equal to its benefits.

In removing this small feature with little practical benefit, I am now free to actually add a little complexity where it really helps the users.

Fixing "bugs" in our proposal

Originally from: https://c3.handmade.network/blog/p/8138-fixing_bugs_in_our_proposal

When we last left off we had our "attempt 3" which seemed like a promising candidate to research. We expected some problems and here are two:

  1. What is the behaviour of when there is both "top down" casts and peer casts? This was never discussed. For example: x_u64 = a_i32 + b_u32. As we will see, making the wrong choice here will create overflow bugs.
  2. When are constants folded? This matters since we only recursively applied casts the right-hand side is a binary expression. But does that check happen before or after constant folding? The proposal seems to imply it's after, but that would give weird semantics.

Top down casts & peer casts

Our "test" here is this expression:

int32_t a = ...;
uint32_t b = ...;
uint64_t c = a + b;

// Peer -> top down:
uint64_t c = (uint64_t)a + (uint64_t)(int32_t)(b)

For some surprising results, set b = 0x80000000U and a = 1 → c = 0xffffffff80000001ULL. Top down → peer gives us our expected c = 0x80000001ULL.

It's very important that the resolution works in this order:

  1. Check that a and b are valid widening safe expressions (see previous article for definition).
  2. Evaluate a and b in isolation, before actually evaluating the binary expression.
  3. Cast both to uint64_t (note that this differs from peer resolution which would preserve the signedness)
  4. Semantically check the binary expression.

This gives us uint64_t c = (uint64_t)a + (uint64_t)b which presumably is what we want.

Another thing we need to note though, is that only integer widening + signedness changes happen this way. For example something like ptrdiff_t x = ptr1 - ptr2 shouldn't start making casts on the pointers!

When constant folding happens

Constants folded are either constants or literals (assume int 32 bit, long 64 bit):

const int FOO = 1;
const int BAR = 2;
int y = ...;
int i = ...;
int j = ...;
long a1 = y + (i + j); // Error, needs explicit cast.
long a2 = y + (FOO + BAR); // Error?
long a3 = y + (1 + 2); // Error?

Just arguing for consistency would seem to require that semantics are the same for variables and constants, and by extension also literals. This might be a little disappointing, but consider if one would expect a to be -1, which would be the case if we would fold constants first:

long a = INT_MAX * 2 + 1;

With late folding the result becomes:

long a = INT_MAX * 2 + 1; // Error, needs explicit cast.
long a2 = INT_MAX * 2; // => 0xfffffffe
long a3 = INT_MAX + 1; // => 0x80000000

Which is probably as good as it gets.

Another case is when the left hand side unsigned. In this case we do not actually perform the top down widening, but instead have a general cast on the result:

uint b = INT_MAX * 2 + 1; // => 0xffffffff

Conclusion

So far so good, there is no showstopper here, but this serves as a reminder and example of how easy it is to forget implications.

Comments


Comment by Christoffer Lernö

When we last left off we had our "attempt 3" which seemed like a promising candidate to research. We expected some problems and here are two:

  1. What is the behaviour of when there is both "top down" casts and peer casts? This was never discussed. For example: x_u64 = a_i32 + b_u32. As we will see, making the wrong choice here will create overflow bugs.
  2. When are constants folded? This matters since we only recursively applied casts the right-hand side is a binary expression. But does that check happen before or after constant folding? The proposal seems to imply it's after, but that would give weird semantics.

Top down casts & peer casts

Our "test" here is this expression:

int32_t a = ...;
uint32_t b = ...;
uint64_t c = a + b;

// Peer -> top down:
uint64_t c = (uint64_t)a + (uint64_t)(int32_t)(b)

For some surprising results, set b = 0x80000000U and a = 1 → c = 0xffffffff80000001ULL. Top down → peer gives us our expected c = 0x80000001ULL.

It's very important that the resolution works in this order:

  1. Check that a and b are valid widening safe expressions (see previous article for definition).
  2. Evaluate a and b in isolation, before actually evaluating the binary expression.
  3. Cast both to uint64_t (note that this differs from peer resolution which would preserve the signedness)
  4. Semantically check the binary expression.

This gives us uint64_t c = (uint64_t)a + (uint64_t)b which presumably is what we want.

Another thing we need to note though, is that only integer widening + signedness changes happen this way. For example something like ptrdiff_t x = ptr1 - ptr2 shouldn't start making casts on the pointers!

When constant folding happens

Constants folded are either constants or literals (assume int 32 bit, long 64 bit):

const int FOO = 1;
const int BAR = 2;
int y = ...;
int i = ...;
int j = ...;
long a1 = y + (i + j); // Error, needs explicit cast.
long a2 = y + (FOO + BAR); // Error?
long a3 = y + (1 + 2); // Error?

Just arguing for consistency would seem to require that semantics are the same for variables and constants, and by extension also literals. This might be a little disappointing, but consider if one would expect a to be -1, which would be the case if we would fold constants first:

long a = INT_MAX * 2 + 1;

With late folding the result becomes:

long a = INT_MAX * 2 + 1; // Error, needs explicit cast.
long a2 = INT_MAX * 2; // => 0xfffffffe
long a3 = INT_MAX + 1; // => 0x80000000

Which is probably as good as it gets.

Another case is when the left hand side unsigned. In this case we do not actually perform the top down widening, but instead have a general cast on the result:

uint b = INT_MAX * 2 + 1; // => 0xffffffff

Conclusion

So far so good, there is no showstopper here, but this serves as a reminder and example of how easy it is to forget implications.

Attempting new C3 type conversion semantics

Originally from: https://c3.handmade.network/blog/p/8134-attempting_new_c3_type_conversion_semantics

As a reminder, in the last article I listed the following strategies:

Strategies for widening
  1. Early fold, late cast (deep traversal).
  2. Left hand side widening & error on peer widening.
  3. No implicit widening.
  4. Peer resolution widening (C style).
  5. Re-evaluation.
  6. Top casting but error on subexpressions.
  7. Common integer promotion (C style).
  8. Pre-traversal evaluation.
Strategies for narrowing
  1. Always allow (C style).
  2. Narrowing on direct literal assignment only.
  3. Original type is smaller or same as narrowed type. Literals always ok.
  4. As (3) but literals are size checked.
  5. No implicit narrowing.
Strategies for conversion of signed/unsigned
  1. Prefer unsigned (C style).
  2. Prefer signed.
  3. Disallow completely.

To begin with, we'll try the "Early fold, late cast" for widening. This is essentially a modification on the peer resolution widening.

Attempt 1 - "Early fold, late cast"

Here we first fold all constants, then "peer promote" the types recursively downwards. How this differs from C is illustrated below:

int32_t y = foo();
int64_t x = bar();
double w = x + (y * 2);
// C semantics:
// double w = (double)(x + (int64_t)(y * 2));
// Early fold, late cast:
// double w = (double)(x + ((int64_t)y * (int64_t)2));

So far this looks pretty nice, but unfortunately the "early fold" part has less positive consequences:

int32_t y = foo();
int64_t x = bar();
int32_t max = INT32_MAX;
double w = x + (y + (INT32_MAX + 1));
double w2 = x + (y + (max + 1));
// Early fold, late cast:
// double w = (double)(x + ((int64_t)y + (int64_t)-2147483648));
// double w2 = (double)(x + ((int64_t)y 
//             + ((int64_t)max + (int64_t)1)));

The good thing about this wart is that it is at least locally observable - as long as it's clear from the source code what will be constant folded.

All in all, while slightly improving the semantics, we can't claim to have solved all the problems with peer resolution while the semantics got more complicated.

Attempt 2 "Re-evaluation"

This strategy is more theoretical than anything. With this strategy we attempt to first evaluate the maximum type, then re-evaluate everything by casting to this maximum type. It's possible to do this in multiple ways: deep copying the expression, two types of evaluation passes etc.

Our problematic example from attempt 1 works:

int32_t y = foo();
int64_t x = bar();
int32_t max = INT32_MAX;
double w = x + (y + (INT32_MAX + 1));
// => first pass type analysis: int64_t + (int32_t + (int + int))
// => max type in sub expression is int64_t
// => double w = (double)(x + ((int64_t)y 
//               + ((int64_t)INT_MAX + (int64_t)1)));

However, we somewhat surprisingly get code that are hard to reason about locally even in this case:

// elsewhere int64_t x = ... or int32_t x = ... 
int32_t y = foo();
double w = x + (y << 16);
// If x is int64_t:
// double w = (double)(x + ((int64_t)y << 16));
// If x is int32_t
// double w = (double)(x + (y << 16));

Here it's clear that we get different behaviour: in the case of int32_t the bits are shifted out, whereas for int64_t they are retained. There are better (but more complicated) examples to illustrate this, but it is proving that even with "ideal" semantics, we lose locality.

Rethinking "perfect"

Let's go back to the example we had with Java in the last article:

char d = 1;
char c = 1 + d; // Error
char c = (char)(1 + d); // Ok

In this case the first is ok, but the second isn't. The special case takes care of the most common use, but for anything non-trivial an explicit cast is needed.

What if we would say that this is okay:

int64_t x = foo();
int32_t y = bar();
double w = x + y

But this isn't:

int64_t x = foo();
int32_t y = bar();
double w = x + (y + y);
//             ^^^^^^^
// Error, can't implicitly promote complex expression
// of type int32_t to int64_t.

Basically we give up widening except for where we don't need to recurse deep into the sub-expressions. Let's try it out:

Attempt 3: Hybrid peer resolution and no implicit cast

We look at our first example.

int32_t y = foo();
int64_t x = bar();
double w = x + (y * 2); // Error

This is an error, but we can do two different things in order to make it work, depending on what behaviour we want:

// A
double w = x + (y * 2LL); // Promote y
// B
double w = x + (int64_t)(y * 2); // Promote y * 2

Yes, this some extra work, but also note that suddenly we have much better control over what the actual types are compared to the "perfect" promotion. Also note that changing the type of x cannot silently change semantics here.

The second example:

int32_t y = foo();
int64_t x = bar();
int32_t max = INT32_MAX;
double w = x + (y + (INT32_MAX + 1)); // Error

Again we get an error, and again there are multiple ways to remove this ambiguity. To promote the internal expression to use int64_t, we just have to nudge it a little bit:

double w = x + (y + (INT32_MAX + 1LL)); // Ok!

And of course if we want C behaviour, we simply put the cast outside:

double w = x + (int64_t)(y + (INT32_MAX + 1)); // Ok!

Note here how effective the type of the literal is in nudging the semantics in the correct direction!

Adding narrowing to "attempt 3"

For narrowing we have multiple options, the simplest is following the Java model, although it could be considered to be unnecessarily conservative.

The current C3 model uses a "original type / active type" model. This makes it hard do any checks, like accepting char c1 = c + 127 but rejecting char c1 = c + 256 on account of 256 overflowing char.

If this type checking system is altered, it should be possible to instead pass a "required max type" downwards. The difference is in practice only if literals that exceed the left-hand side are rejected.

For most other purposes they work, and seem fairly safe as they are a restricted variant of C semantics. There is already an implementation of this in C3 compiler.

Handling unsigned conversions

The current model in C3 allow free implicit conversions between signed and unsigned types of the same size. However, the result is the signed type rather than the unsigned.

However, naively promoting all types to signed doesn't work well:

ushort c = 2;
uint d = 0x80000000;
uint e = d / 2; // => c0000000 ????
// If it was e = (uint)((int)d / (int)2);

Consequently C3 tries to promote to the signedness of the underlying type:

uint e = d / 2; 
// e = d / (uint)2; 
// => 0x40000000

Given the lack of languages with this sort of promotion, there might very well be serious issues with it, but in such cases we could limit unsigned conversion to the cases where it's fairly safe. For now though, let's keep this behaviour until we have more solid examples of it creating serious issues.

Summarizing attempt 3:

  1. We define a receiving type, which is the parameter type the expression is passed as a value to, or the variable type for in the case of an assignment. This receiving type may also be empty. Example: short a = foo + (b / c), in this case the type is short.
  2. We define a widening safe expression as any expression except: add, sub, mult, div, rem, left/right shift, bit negate, negate, ternary, bit or/xor/and.
  3. We define a widening allowed expression as a binary add, sub, mult, div where the sub expressions are widening safe expressions.
  4. Doing a depth first traversal on sub expressions, the following occurs in order:
  5. The type is checked.
  6. If the type is an integer and has a smaller bit width than the minimum arithmetic integer width, then it is cast to the corresponding minimum integer with the same signedness.
  7. If the expression is a binary add/sub/div/mod/mult/bit or/ bit xor/bit and then the following are applied in order.
  8. If both sub expressions are integers but of different size, check the smaller sub expression. If the smaller sub expression is a widening safe expression insert a widening cast to the same size, with retained signedness. Otherwise, if the smaller sub expression is a widening allowed expression, insert a widening cast to the same size with retained signedness on the smaller expression's two sub expressions. Otherwise this is a type error.
  9. If one sub expression is an unsigned integer, and the other is signed, and the signed is non-negative constant literal, then the literal sub expression is cast to type of the unsigned sub expression.
  10. If one sub expression is unsigned and the other is signed, the unsigned is cast to the signed type.
  11. If one sub expression is bool or integer, and the other is a floating point type, then the integer sub expression is cast to the
  12. Using the receiving type, ignoring inserted implicit casts, recursively check sub expressions:
  13. If the type is wider than the receiving type, and the sub expression is a literal which does not fit in the receiving type, this is a literal out of range error.
  14. the receiving type, and the sub expression is a not a literal, and the sub expression's type is wider than the receiving type then this is an error.
  15. If an explicit cast is encountered, set receiving type for sub expression checking to empty.

Wheew! That was a lot and fairly complex. We can sum it up a bit simpler:

  1. Implicit narrowing only works if all sub expressions would fit in the target type.
  2. Implicit widening will only be done on non-arithmetic expressions or simple +-/%* binary expressions. In the second case the widening is done on the sub expressions rather than the binary expression as a whole.
  3. Integer literals can be implicitly cast to any type as long as it fits in the resulting type.

Conclusion

We tried three different semantics for implicit conversions using typed literals. A limited hybrid solution (our attempt 3) doesn't immediately break and consequently warrants some further investigation, but we're extremely far from saying that "this is good". Quite the opposite, we should treat as likely broken in multiple ways, where our task is to find out where.

We've already attempted various solutions, just to find corner cases with fairly awful semantics, even leaving aside implementation issues.

So it should be fairly clear that there are no perfect solutions, but only trade-offs to various degrees. (That said, some solutions are clearly worse than others)

It's also very clear that a healthy skepticism is needed towards one's designs and ideas. What looks excellent and clear today may tomorrow turn out to have fatal flaws. Backtracking and fixing flaws should be part of the design process and I believe it's important to keep an open mind, because as you research you might find that feature X which you thought was really good / really bad, is in fact the opposite.

When doing language design it's probably good to be mentally prepared to be wrong a lot.

(Also see the follow up, where we patch some of the problems!)

Comments


Comment by Christoffer Lernö

As a reminder, in the last article I listed the following strategies:

Strategies for widening
  1. Early fold, late cast (deep traversal).
  2. Left hand side widening & error on peer widening.
  3. No implicit widening.
  4. Peer resolution widening (C style).
  5. Re-evaluation.
  6. Top casting but error on subexpressions.
  7. Common integer promotion (C style).
  8. Pre-traversal evaluation.
Strategies for narrowing
  1. Always allow (C style).
  2. Narrowing on direct literal assignment only.
  3. Original type is smaller or same as narrowed type. Literals always ok.
  4. As (3) but literals are size checked.
  5. No implicit narrowing.
Strategies for conversion of signed/unsigned
  1. Prefer unsigned (C style).
  2. Prefer signed.
  3. Disallow completely.

To begin with, we'll try the "Early fold, late cast" for widening. This is essentially a modification on the peer resolution widening.

Attempt 1 - "Early fold, late cast"

Here we first fold all constants, then "peer promote" the types recursively downwards. How this differs from C is illustrated below:

int32_t y = foo();
int64_t x = bar();
double w = x + (y * 2);
// C semantics:
// double w = (double)(x + (int64_t)(y * 2));
// Early fold, late cast:
// double w = (double)(x + ((int64_t)y * (int64_t)2));

So far this looks pretty nice, but unfortunately the "early fold" part has less positive consequences:

int32_t y = foo();
int64_t x = bar();
int32_t max = INT32_MAX;
double w = x + (y + (INT32_MAX + 1));
double w2 = x + (y + (max + 1));
// Early fold, late cast:
// double w = (double)(x + ((int64_t)y + (int64_t)-2147483648));
// double w2 = (double)(x + ((int64_t)y 
//             + ((int64_t)max + (int64_t)1)));

The good thing about this wart is that it is at least locally observable - as long as it's clear from the source code what will be constant folded.

All in all, while slightly improving the semantics, we can't claim to have solved all the problems with peer resolution while the semantics got more complicated.

Attempt 2 "Re-evaluation"

This strategy is more theoretical than anything. With this strategy we attempt to first evaluate the maximum type, then re-evaluate everything by casting to this maximum type. It's possible to do this in multiple ways: deep copying the expression, two types of evaluation passes etc.

Our problematic example from attempt 1 works:

int32_t y = foo();
int64_t x = bar();
int32_t max = INT32_MAX;
double w = x + (y + (INT32_MAX + 1));
// => first pass type analysis: int64_t + (int32_t + (int + int))
// => max type in sub expression is int64_t
// => double w = (double)(x + ((int64_t)y 
//               + ((int64_t)INT_MAX + (int64_t)1)));

However, we somewhat surprisingly get code that are hard to reason about locally even in this case:

// elsewhere int64_t x = ... or int32_t x = ... 
int32_t y = foo();
double w = x + (y << 16);
// If x is int64_t:
// double w = (double)(x + ((int64_t)y << 16));
// If x is int32_t
// double w = (double)(x + (y << 16));

Here it's clear that we get different behaviour: in the case of int32_t the bits are shifted out, whereas for int64_t they are retained. There are better (but more complicated) examples to illustrate this, but it is proving that even with "ideal" semantics, we lose locality.

Rethinking "perfect"

Let's go back to the example we had with Java in the last article:

char d = 1;
char c = 1 + d; // Error
char c = (char)(1 + d); // Ok

In this case the first is ok, but the second isn't. The special case takes care of the most common use, but for anything non-trivial an explicit cast is needed.

What if we would say that this is okay:

int64_t x = foo();
int32_t y = bar();
double w = x + y

But this isn't:

int64_t x = foo();
int32_t y = bar();
double w = x + (y + y);
//             ^^^^^^^
// Error, can't implicitly promote complex expression
// of type int32_t to int64_t.

Basically we give up widening except for where we don't need to recurse deep into the sub-expressions. Let's try it out:

Attempt 3: Hybrid peer resolution and no implicit cast

We look at our first example.

int32_t y = foo();
int64_t x = bar();
double w = x + (y * 2); // Error

This is an error, but we can do two different things in order to make it work, depending on what behaviour we want:

// A
double w = x + (y * 2LL); // Promote y
// B
double w = x + (int64_t)(y * 2); // Promote y * 2

Yes, this some extra work, but also note that suddenly we have much better control over what the actual types are compared to the "perfect" promotion. Also note that changing the type of x cannot silently change semantics here.

The second example:

int32_t y = foo();
int64_t x = bar();
int32_t max = INT32_MAX;
double w = x + (y + (INT32_MAX + 1)); // Error

Again we get an error, and again there are multiple ways to remove this ambiguity. To promote the internal expression to use int64_t, we just have to nudge it a little bit:

double w = x + (y + (INT32_MAX + 1LL)); // Ok!

And of course if we want C behaviour, we simply put the cast outside:

double w = x + (int64_t)(y + (INT32_MAX + 1)); // Ok!

Note here how effective the type of the literal is in nudging the semantics in the correct direction!

Adding narrowing to "attempt 3"

For narrowing we have multiple options, the simplest is following the Java model, although it could be considered to be unnecessarily conservative.

The current C3 model uses a "original type / active type" model. This makes it hard do any checks, like accepting char c1 = c + 127 but rejecting char c1 = c + 256 on account of 256 overflowing char.

If this type checking system is altered, it should be possible to instead pass a "required max type" downwards. The difference is in practice only if literals that exceed the left-hand side are rejected.

For most other purposes they work, and seem fairly safe as they are a restricted variant of C semantics. There is already an implementation of this in C3 compiler.

Handling unsigned conversions

The current model in C3 allow free implicit conversions between signed and unsigned types of the same size. However, the result is the signed type rather than the unsigned.

However, naively promoting all types to signed doesn't work well:

ushort c = 2;
uint d = 0x80000000;
uint e = d / 2; // => c0000000 ????
// If it was e = (uint)((int)d / (int)2);

Consequently C3 tries to promote to the signedness of the underlying type:

uint e = d / 2; 
// e = d / (uint)2; 
// => 0x40000000

Given the lack of languages with this sort of promotion, there might very well be serious issues with it, but in such cases we could limit unsigned conversion to the cases where it's fairly safe. For now though, let's keep this behaviour until we have more solid examples of it creating serious issues.

Summarizing attempt 3:

  1. We define a receiving type, which is the parameter type the expression is passed as a value to, or the variable type for in the case of an assignment. This receiving type may also be empty. Example: short a = foo + (b / c), in this case the type is short.
  2. We define a widening safe expression as any expression except: add, sub, mult, div, rem, left/right shift, bit negate, negate, ternary, bit or/xor/and.
  3. We define a widening allowed expression as a binary add, sub, mult, div where the sub expressions are widening safe expressions.
  4. Doing a depth first traversal on sub expressions, the following occurs in order:
  5. The type is checked.
  6. If the type is an integer and has a smaller bit width than the minimum arithmetic integer width, then it is cast to the corresponding minimum integer with the same signedness.
  7. If the expression is a binary add/sub/div/mod/mult/bit or/ bit xor/bit and then the following are applied in order.
  8. If both sub expressions are integers but of different size, check the smaller sub expression. If the smaller sub expression is a widening safe expression insert a widening cast to the same size, with retained signedness. Otherwise, if the smaller sub expression is a widening allowed expression, insert a widening cast to the same size with retained signedness on the smaller expression's two sub expressions. Otherwise this is a type error.
  9. If one sub expression is an unsigned integer, and the other is signed, and the signed is non-negative constant literal, then the literal sub expression is cast to type of the unsigned sub expression.
  10. If one sub expression is unsigned and the other is signed, the unsigned is cast to the signed type.
  11. If one sub expression is bool or integer, and the other is a floating point type, then the integer sub expression is cast to the
  12. Using the receiving type, ignoring inserted implicit casts, recursively check sub expressions:
  13. If the type is wider than the receiving type, and the sub expression is a literal which does not fit in the receiving type, this is a literal out of range error.
  14. the receiving type, and the sub expression is a not a literal, and the sub expression's type is wider than the receiving type then this is an error.
  15. If an explicit cast is encountered, set receiving type for sub expression checking to empty.

Wheew! That was a lot and fairly complex. We can sum it up a bit simpler:

  1. Implicit narrowing only works if all sub expressions would fit in the target type.
  2. Implicit widening will only be done on non-arithmetic expressions or simple +-/%* binary expressions. In the second case the widening is done on the sub expressions rather than the binary expression as a whole.
  3. Integer literals can be implicitly cast to any type as long as it fits in the resulting type.

Conclusion

We tried three different semantics for implicit conversions using typed literals. A limited hybrid solution (our attempt 3) doesn't immediately break and consequently warrants some further investigation, but we're extremely far from saying that "this is good". Quite the opposite, we should treat as likely broken in multiple ways, where our task is to find out where.

We've already attempted various solutions, just to find corner cases with fairly awful semantics, even leaving aside implementation issues.

So it should be fairly clear that there are no perfect solutions, but only trade-offs to various degrees. (That said, some solutions are clearly worse than others)

It's also very clear that a healthy skepticism is needed towards one's designs and ideas. What looks excellent and clear today may tomorrow turn out to have fatal flaws. Backtracking and fixing flaws should be part of the design process and I believe it's important to keep an open mind, because as you research you might find that feature X which you thought was really good / really bad, is in fact the opposite.

When doing language design it's probably good to be mentally prepared to be wrong a lot.

(Also see the follow up, where we patch some of the problems!)

How to break a + b + c

Originally from: https://c3.handmade.network/blog/p/8107-how_to_break_a__b__c

When one is deviating from language semantics, one sometimes accidentally break established, well-understood semantics. One of the worst design mistakes I did when working on C3 was to accidentally break associativity for addition.

Here is some code:

// File foo.h
typedef struct {
   unsigned short a;
   unsigned short b;
   unsigned short c; 
} Foo;
// File bar.c
unsigned int fooIt(Foo *foo)
{
   unsigned int x = a + b + c;
   return x;
}
// File baz.c
int calculate()
{
   Foo foo = { 200, 200, 200 };
   assert(fooIt(foo) == foo.b + foo.c + foo.a);
   return fooIt(foo);
}

I've written this with pure C syntax, but we're going to imagine deviating from C semantics.

In particular we're going to say:

  1. if two operands in a binary expression (e.g. a + b) are of the same bit width n, the operation will be performed with wrapping semantics. So if we have two variables that are unsigned char and we add them, then the maximum value is 255. Similar with unsigned short will yield a maximum of 65535.
  2. Implicit widening casts will be performed as long as they do not affect signedness.

In our example above fooIt(foo) will return 600 regardless whether we are using C or this new language with different semantics.

But let's say someone found this code to be memory inefficient. b and c should never be used with values over 255 (for one reason or the other). They alter the file foo.h in the following way, which passes compilation:

typedef struct {
   unsigned char a;
   unsigned char b;
   unsigned short c; 
} Foo;

You go to work and make changes and discover that suddenly your assert is trapping. You look at calculate and find no changes to that code. Similarly bar.c with fooIt. You find out that fooIt(foo) now returns 344, which makes no sense to you.

Finally the only candidate left is the change to Foo, but the data in Foo is the same and your assert is doing the same calculation as fooIt... or is it?

It turns out that with the non C semantics above, the computer will calculate unsigned int x = a + b + c in the following way:

  1. a + b mod 2^8 => 144
  2. 144 + c mod 2^16 => 344

In your assert on the other hand, we swapped the order:

  1. b + c mod 2^16 => 400
  2. 400 + a mod 2^16 => 600

The new semantic silently broke associativity and the compiler didn't warn us a single bit. This is a spooky action at a distance which you definitely don't want. Neither the writer of Foo, nor of fooIt, nor you could know that this would be a problem, it only breaks when the parts come together.

But "Wait!", you say, "There are many languages allowing this 'smaller than int size adds' addition by default, surely they can't all be broken?" – and you'd be right.

So what is the difference between our semantics and non-broken languages like Rust? If your guess is "implicit widening", then you're right.

And doesn't this seem strange? I mean it's not related to why the associativity breaks, but it's still the culprit. Because what happens if we don't have the widening?

Well fooIt would stop compiling for one:

unsigned int fooIt(Foo *foo)
{
   unsigned int x = a + b + c; 
   //               ^^^^^^^^^
   // Error: cannot add expression of type unsigned char
   // to expression of type unsigned short   
   return a;
}

And of course it would be known that changing Foo would be a possibly breaking change.

So what can be learned?

Designing new language semantics isn't trivial. Few consequences are easily recognizable at the beginning. One needs to be ready to drop semantics if they later turn out to have issues one didn't count on, even if they "work in most cases".

Comments


Comment by Christoffer Lernö

When one is deviating from language semantics, one sometimes accidentally break established, well-understood semantics. One of the worst design mistakes I did when working on C3 was to accidentally break associativity for addition.

Here is some code:

// File foo.h
typedef struct {
   unsigned short a;
   unsigned short b;
   unsigned short c; 
} Foo;
// File bar.c
unsigned int fooIt(Foo *foo)
{
   unsigned int x = a + b + c;
   return x;
}
// File baz.c
int calculate()
{
   Foo foo = { 200, 200, 200 };
   assert(fooIt(foo) == foo.b + foo.c + foo.a);
   return fooIt(foo);
}

I've written this with pure C syntax, but we're going to imagine deviating from C semantics.

In particular we're going to say:

  1. if two operands in a binary expression (e.g. a + b) are of the same bit width n, the operation will be performed with wrapping semantics. So if we have two variables that are unsigned char and we add them, then the maximum value is 255. Similar with unsigned short will yield a maximum of 65535.
  2. Implicit widening casts will be performed as long as they do not affect signedness.

In our example above fooIt(foo) will return 600 regardless whether we are using C or this new language with different semantics.

But let's say someone found this code to be memory inefficient. b and c should never be used with values over 255 (for one reason or the other). They alter the file foo.h in the following way, which passes compilation:

typedef struct {
   unsigned char a;
   unsigned char b;
   unsigned short c; 
} Foo;

You go to work and make changes and discover that suddenly your assert is trapping. You look at calculate and find no changes to that code. Similarly bar.c with fooIt. You find out that fooIt(foo) now returns 344, which makes no sense to you.

Finally the only candidate left is the change to Foo, but the data in Foo is the same and your assert is doing the same calculation as fooIt... or is it?

It turns out that with the non C semantics above, the computer will calculate unsigned int x = a + b + c in the following way:

  1. a + b mod 2^8 => 144
  2. 144 + c mod 2^16 => 344

In your assert on the other hand, we swapped the order:

  1. b + c mod 2^16 => 400
  2. 400 + a mod 2^16 => 600

The new semantic silently broke associativity and the compiler didn't warn us a single bit. This is a spooky action at a distance which you definitely don't want. Neither the writer of Foo, nor of fooIt, nor you could know that this would be a problem, it only breaks when the parts come together.

But "Wait!", you say, "There are many languages allowing this 'smaller than int size adds' addition by default, surely they can't all be broken?" – and you'd be right.

So what is the difference between our semantics and non-broken languages like Rust? If your guess is "implicit widening", then you're right.

And doesn't this seem strange? I mean it's not related to why the associativity breaks, but it's still the culprit. Because what happens if we don't have the widening?

Well fooIt would stop compiling for one:

unsigned int fooIt(Foo *foo)
{
   unsigned int x = a + b + c; 
   //               ^^^^^^^^^
   // Error: cannot add expression of type unsigned char
   // to expression of type unsigned short   
   return a;
}

And of course it would be known that changing Foo would be a possibly breaking change.

So what can be learned?

Designing new language semantics isn't trivial. Few consequences are easily recognizable at the beginning. One needs to be ready to drop semantics if they later turn out to have issues one didn't count on, even if they "work in most cases".