Skip to content

2022

The downsides of compile time evaluation

Originally from: https://c3.handmade.network/blog/p/8590-the_downsides_of_compile_time_evaluation

Macros and compile time evaluation are popular ways to extend a language. While macros fell out of favour by the time Java was created, they've returned to the mainstream in Nim and Rust. Zig has compile time and JAI has both compile time execution and macros.

At one point in time I was assuming that the more power macros and compile time execution provided the better. I'll try to break down why I don't think so anymore.

Code with meta programming are hard to read

Macros and compile time form a set of meta programming tools, and in general meta programming has very strong downsides in terms of maintaining and refactoring code. To understand code with meta programming you have to first resolve the meta program in your head, and not until you do so you can think about the runtime code. This is exponentially harder than reading normal code.

Bye bye, refactoring tools

It's not just you as a programmer that need to resolve the meta programming – any refactoring tool would need to do the same in order to safely do refactorings – even simple ones as variable name changes.

And if the name is created through some meta code, the refactoring tool would basically need to reprogram your meta program to be correct, which is unreasonably complex. This is why everything from preprocessing macros to reflection code simply won't refactor correctly with tools.

Making it worse: arbitrary type creation

Some languages allow that arbitrary types are created at compile time. Now the IDE can't even know how types look unless it runs the meta code. If the meta code is arbitrarily complex, so will the IDE need to be in order to "understand" the code. While the meta programming evalution might be nicely ordered when running the compiler, a responsive IDE will try to iteratively compile source files. This means the IDE will need to compile more code to get the correct ordering.

Code and meta code living together.

Many languages try to make the code and meta code look very similar. This leads to lots of potential confusion. Is a a compile time variable (and thus may change during compilation, and any expression containing it might be compile time resolved) or is it a real variable?

Here's some code, how easy is it to identify the meta code?

fn performFn(comptime prefix_char: u8, start_value: i32) i32  {
    var result: i32 = start_value;
    comptime var i = 0;
    inline while (i < cmd_fns.len) : (i += 1) {
        if (cmd_fns[i].name[0] == prefix_char) {
            result = cmd_fns[i].func(result);
        }
    } 
    return result;
}

I've tried to make it easier in C3 by not mixing meta and runtime code syntax. This is similar how macros in C are encouraged to be all upper case to avoid confusion:

macro int performFn(char $prefix_char, int start_value)
{
    int result = start_value;
    // Prefix $ all compile time vars and statements
    $for (var $i = 0; $i < CMD_FNS.len, $i++):
        $if (CMD_FNS[$i].name[0] == $prefix_char):
            result = CMD_FNS[$i].func(result);
        $endif;   
    $endfor;   
    return result;
}

The intention with the C3 separate syntax is that the approximate runtime code can be found by removing all rows starting with $:

macro int performFn(char $prefix_char, int start_value)
{
    int result = start_value;


            result = CMD_FNS[$i].func(result);


    return result;
}

Not elegant, but the intention is to maximize readability. In particular, look at the "if/$if" statement. In the top example you can only infer that it is compile time evaluated and folded by looking at i and prefix_char definitions. In the C3 example, the $if itself guarantees the contant folding and will return an error if the boolean expression inside of () isn't compile time folded.

Extending syntax for the win?

A popular use for macros is for extending syntax, but this often goes wrong. Even if you have a language with a macro system that is doing this well, what does it mean? It means that suddenly you can't look at something like foo(x) and be able to make assumptions about it. In C without macros we can make the assumption that neither x nor other local variables will not changed (unless they have been passed by reference to some function prior to this), and the code will resume running after the foo call (except if setjmp/longjmp is used). With C++ we can asume less, since foo may throw an exception, and x might implicitly be passed by reference.

The more powerful the macro system the less we can assume. Maybe it's pulling variables from the calling scope and changing them? Maybe it's returning from the current context? Maybe it's formatting the drive? Who knows. You need to know the exact definition or you can't read the local code and this undermines the idea of most languages.

Because in a typical language you will what "breaks the rules": all the built in statements like if, for and return. Then there is a way to extend the language that follows certain rules: functions and types. This forms the common language understood by a developer to be what "knowing a language is about": you know the syntax and semantics of the built-in statements.

If the language extends its syntax, then every code base becomes a DSL which you have to learn from scratch. This is similar to having to buy into some huge framework in the JS/Java-space, just worse.

The point is that while we're always extending the syntax of the language, doing this through certain limited mechanisms like functions works well, but the more unbounded the extension mechanisms the harder the code will be to read and understand.

When meta programming is needed

In some cases meta programming can make code more readable. If the problem is something like having a pre-calculated list for fast calculations or types defined from a protocol, then code generation can often solve the problem. Languages can improve this by better compiler support for triggering codegen.

In other cases the meta programming can be replaced by running code at startup. Having "static init" like Java static blocks can help for cases when libraries need to do initialization.

If none of those options work, there is always copy-paste.

Summary

So to summarize:

  • Code with meta programming is hard to read (so minimize and support readability).
  • Meta programming is hard to refactor (so adopt a subset that can work with IDEs).
  • Arbitrary type creation is hard for tools (so restrict it to generics).
  • Same syntax is bad (so make meta code distinct).
  • Extending syntax with macros is bad (so don't do it).
  • Codegen and init at runtime can replace some use of compile time.

Macros and compile time can be made extremely powerful, but this power is tempered by the huge drawbacks, good macros are not what you can do with them, but if it manages to balance readability with necessary features.

Comments


Comment by Christoffer Lernö

Macros and compile time evaluation are popular ways to extend a language. While macros fell out of favour by the time Java was created, they've returned to the mainstream in Nim and Rust. Zig has compile time and JAI has both compile time execution and macros.

At one point in time I was assuming that the more power macros and compile time execution provided the better. I'll try to break down why I don't think so anymore.

Code with meta programming are hard to read

Macros and compile time form a set of meta programming tools, and in general meta programming has very strong downsides in terms of maintaining and refactoring code. To understand code with meta programming you have to first resolve the meta program in your head, and not until you do so you can think about the runtime code. This is exponentially harder than reading normal code.

Bye bye, refactoring tools

It's not just you as a programmer that need to resolve the meta programming – any refactoring tool would need to do the same in order to safely do refactorings – even simple ones as variable name changes.

And if the name is created through some meta code, the refactoring tool would basically need to reprogram your meta program to be correct, which is unreasonably complex. This is why everything from preprocessing macros to reflection code simply won't refactor correctly with tools.

Making it worse: arbitrary type creation

Some languages allow that arbitrary types are created at compile time. Now the IDE can't even know how types look unless it runs the meta code. If the meta code is arbitrarily complex, so will the IDE need to be in order to "understand" the code. While the meta programming evalution might be nicely ordered when running the compiler, a responsive IDE will try to iteratively compile source files. This means the IDE will need to compile more code to get the correct ordering.

Code and meta code living together.

Many languages try to make the code and meta code look very similar. This leads to lots of potential confusion. Is a a compile time variable (and thus may change during compilation, and any expression containing it might be compile time resolved) or is it a real variable?

Here's some code, how easy is it to identify the meta code?

fn performFn(comptime prefix_char: u8, start_value: i32) i32  {
    var result: i32 = start_value;
    comptime var i = 0;
    inline while (i < cmd_fns.len) : (i += 1) {
        if (cmd_fns[i].name[0] == prefix_char) {
            result = cmd_fns[i].func(result);
        }
    } 
    return result;
}

I've tried to make it easier in C3 by not mixing meta and runtime code syntax. This is similar how macros in C are encouraged to be all upper case to avoid confusion:

macro int performFn(char $prefix_char, int start_value)
{
    int result = start_value;
    // Prefix $ all compile time vars and statements
    $for (var $i = 0; $i < CMD_FNS.len, $i++):
        $if (CMD_FNS[$i].name[0] == $prefix_char):
            result = CMD_FNS[$i].func(result);
        $endif;   
    $endfor;   
    return result;
}

The intention with the C3 separate syntax is that the approximate runtime code can be found by removing all rows starting with $:

macro int performFn(char $prefix_char, int start_value)
{
    int result = start_value;


            result = CMD_FNS[$i].func(result);


    return result;
}

Not elegant, but the intention is to maximize readability. In particular, look at the "if/$if" statement. In the top example you can only infer that it is compile time evaluated and folded by looking at i and prefix_char definitions. In the C3 example, the $if itself guarantees the contant folding and will return an error if the boolean expression inside of () isn't compile time folded.

Extending syntax for the win?

A popular use for macros is for extending syntax, but this often goes wrong. Even if you have a language with a macro system that is doing this well, what does it mean? It means that suddenly you can't look at something like foo(x) and be able to make assumptions about it. In C without macros we can make the assumption that neither x nor other local variables will not changed (unless they have been passed by reference to some function prior to this), and the code will resume running after the foo call (except if setjmp/longjmp is used). With C++ we can asume less, since foo may throw an exception, and x might implicitly be passed by reference.

The more powerful the macro system the less we can assume. Maybe it's pulling variables from the calling scope and changing them? Maybe it's returning from the current context? Maybe it's formatting the drive? Who knows. You need to know the exact definition or you can't read the local code and this undermines the idea of most languages.

Because in a typical language you will what "breaks the rules": all the built in statements like if, for and return. Then there is a way to extend the language that follows certain rules: functions and types. This forms the common language understood by a developer to be what "knowing a language is about": you know the syntax and semantics of the built-in statements.

If the language extends its syntax, then every code base becomes a DSL which you have to learn from scratch. This is similar to having to buy into some huge framework in the JS/Java-space, just worse.

The point is that while we're always extending the syntax of the language, doing this through certain limited mechanisms like functions works well, but the more unbounded the extension mechanisms the harder the code will be to read and understand.

When meta programming is needed

In some cases meta programming can make code more readable. If the problem is something like having a pre-calculated list for fast calculations or types defined from a protocol, then code generation can often solve the problem. Languages can improve this by better compiler support for triggering codegen.

In other cases the meta programming can be replaced by running code at startup. Having "static init" like Java static blocks can help for cases when libraries need to do initialization.

If none of those options work, there is always copy-paste.

Summary

So to summarize:

  • Code with meta programming is hard to read (so minimize and support readability).
  • Meta programming is hard to refactor (so adopt a subset that can work with IDEs).
  • Arbitrary type creation is hard for tools (so restrict it to generics).
  • Same syntax is bad (so make meta code distinct).
  • Extending syntax with macros is bad (so don't do it).
  • Codegen and init at runtime can replace some use of compile time.

Macros and compile time can be made extremely powerful, but this power is tempered by the huge drawbacks, good macros are not what you can do with them, but if it manages to balance readability with necessary features.


Comment by Christoffer Lernö

Having macro meta syntax that is different from the regular syntax helps, but whenever compile time and runtime code mix, the readability goes down. Trying to keep two sets of states in your head at the same time is not trivial and affects code reading.

If you do plain code generation with a code generator (and actually produce a final source file), you have less restrictions on the how you express this code generation as opposed to having code generation in the same code you're running the code in.

If you run code at startup, rather than run it during compile time you will have an easier time understanding it and inspecting what it produces.

And so on.

Using compile time evaluation for these things is creating a very generic in-language tool, and such tools will by necessity be less easy to work with than a specialized tool (such as a custom code generator). These drawbacks need to be taken into account and be balanced against advantages.

"auto" is a language design smell

Originally from: https://c3.handmade.network/blog/p/8587-auto_is_a_language_design_smell

It's increasingly popular to use type inference for variable declarations.

– and it's understandable, after all who wants to write something like Foobar<Baz<Double, String>, Bar> more than once?

I would argue that "auto" (or your particular language's equivalent) is an anti-pattern when the type is fully known.

When is type inference used?

Few are arguing for replacing:

int i = get_val();

by

auto i = get_val();

The latter is longer and gives less information. Still, some "auto all the things!" fanatics argue that this is right. Because maybe at some time you change what get_val() returns and then you need to change one less place, so now rather than having a syntax error where the function is invoked you get it later at some other place to make it extra hard to debug...

But most people will argue it's mainly for when the type gets complex. For example:

std::map<std::string,std::vector<int> >::iterator it = myMap.begin();
// vs
auto it = myMap.begin();

Another important use is when you write macros or templates and the type has to be inferred. Here's a C3 example:

// No type inference
macro @swap1(&a, &b)
{
  $typeof(a) temp = a;
  a = b;
  b = temp; 
}
// vs
macro @swap2(&a, &b)
{
  var temp = a;
  a = b;
  b = temp; 
}

So we have two common cases:

  • When type is unknown
  • When the type name grows long and complex.

Where do long type names come from?

No one is arguing against the use of type inference when the type isn't known or generic – this use makes perfect sense.

– But there is a problem with the auto it = myMap.begin() use, where type inference is a desired shorthand to only because the type names are too long.

Type names only become long because parameterized types usually carry their parameterization in their type (well, some Java "enterprise" code manages long type names anyway, but that's beside the point).

This inevitably causes type signatures to blow up. It's usually possible to write typedefs to make the types shorter, but few are doing that because it's convenient to just define the type directly with parameters as opposed to doing type defines, plus sometimes the parameterization is actually helpful to determine if it matches a particular generic function.

So basically the way we parameterize types in most languages cause the type name blowup that is then mitigated with type inference.

Again, the problem with type inference

I'm not going to rehash the arguments made here: https://austinhenley.com/blog/typeinference.html. I am mostly in agreement with them.

I think the most important thing is that the type declarations locally documents the assumptions in the code. If I ever need to "hover over a variable in the IDE to find the type" (as some suggest as a solution), it means that it is unclear from the local code context what the type is. Since the type of a variable is fundamental to how the code works, this should never be unclear – which is why the type declaration serves as strong support for code reading. (Explicit variable types also makes it easy to text search for type usage and for the IDE to track types).

While this is bad, the problem with long type signatures often makes up for it. Type inference becomes a necessary because of how parameterized types work.

I would strongly object the idea of introducing type inference it to languages that don't have issues with long type names, such as C (or C3), because fundamentally it is something that will make to code less clear to read and consequently: bugs harder to catch.

The design smell

"auto" is a language design smell because it is typically a sign of the language having types parameterized in a way that makes them inconveniently long.

The type inference thus becomes a language design band-aid which lets people ignore tackling the very real issue of long type names.

If long type names are bad, why is everyone doing it?

Unfortunately there is an added complication: there aren't many good alternatives. Enforcing something like typedefs to use parameterized types works but is not particularly elegant.

There are other possibilities that could be explored, such as eliding the parameterization completely, but retaining the rest of the type (e.g. iterator it = myMap.begin) and similar ideas that straddle both inference and types trying to get the best of both worlds.

Such explorations are uncommon though, which the "auto" style type inference is probably to blame for. A popular band-aid is easier to apply than to find a more innovative solution.

Comments


Comment by Christoffer Lernö

It's increasingly popular to use type inference for variable declarations.

– and it's understandable, after all who wants to write something like Foobar<Baz<Double, String>, Bar> more than once?

I would argue that "auto" (or your particular language's equivalent) is an anti-pattern when the type is fully known.

When is type inference used?

Few are arguing for replacing:

int i = get_val();

by

auto i = get_val();

The latter is longer and gives less information. Still, some "auto all the things!" fanatics argue that this is right. Because maybe at some time you change what get_val() returns and then you need to change one less place, so now rather than having a syntax error where the function is invoked you get it later at some other place to make it extra hard to debug...

But most people will argue it's mainly for when the type gets complex. For example:

std::map<std::string,std::vector<int> >::iterator it = myMap.begin();
// vs
auto it = myMap.begin();

Another important use is when you write macros or templates and the type has to be inferred. Here's a C3 example:

// No type inference
macro @swap1(&a, &b)
{
  $typeof(a) temp = a;
  a = b;
  b = temp; 
}
// vs
macro @swap2(&a, &b)
{
  var temp = a;
  a = b;
  b = temp; 
}

So we have two common cases:

  • When type is unknown
  • When the type name grows long and complex.

Where do long type names come from?

No one is arguing against the use of type inference when the type isn't known or generic – this use makes perfect sense.

– But there is a problem with the auto it = myMap.begin() use, where type inference is a desired shorthand to only because the type names are too long.

Type names only become long because parameterized types usually carry their parameterization in their type (well, some Java "enterprise" code manages long type names anyway, but that's beside the point).

This inevitably causes type signatures to blow up. It's usually possible to write typedefs to make the types shorter, but few are doing that because it's convenient to just define the type directly with parameters as opposed to doing type defines, plus sometimes the parameterization is actually helpful to determine if it matches a particular generic function.

So basically the way we parameterize types in most languages cause the type name blowup that is then mitigated with type inference.

Again, the problem with type inference

I'm not going to rehash the arguments made here: https://austinhenley.com/blog/typeinference.html. I am mostly in agreement with them.

I think the most important thing is that the type declarations locally documents the assumptions in the code. If I ever need to "hover over a variable in the IDE to find the type" (as some suggest as a solution), it means that it is unclear from the local code context what the type is. Since the type of a variable is fundamental to how the code works, this should never be unclear – which is why the type declaration serves as strong support for code reading. (Explicit variable types also makes it easy to text search for type usage and for the IDE to track types).

While this is bad, the problem with long type signatures often makes up for it. Type inference becomes a necessary because of how parameterized types work.

I would strongly object the idea of introducing type inference it to languages that don't have issues with long type names, such as C (or C3), because fundamentally it is something that will make to code less clear to read and consequently: bugs harder to catch.

The design smell

"auto" is a language design smell because it is typically a sign of the language having types parameterized in a way that makes them inconveniently long.

The type inference thus becomes a language design band-aid which lets people ignore tackling the very real issue of long type names.

If long type names are bad, why is everyone doing it?

Unfortunately there is an added complication: there aren't many good alternatives. Enforcing something like typedefs to use parameterized types works but is not particularly elegant.

There are other possibilities that could be explored, such as eliding the parameterization completely, but retaining the rest of the type (e.g. iterator it = myMap.begin) and similar ideas that straddle both inference and types trying to get the best of both worlds.

Such explorations are uncommon though, which the "auto" style type inference is probably to blame for. A popular band-aid is easier to apply than to find a more innovative solution.

The case against a C alternative

Originally from: https://c3.handmade.network/blog/p/8486-the_case_against_a_c_alternative

Like several others I am writing an alternative to the C language (if you read this blog before then this shouldn't be news!). My language (C3) is fairly recent, there are others: Zig, Odin, Jai and older languages like eC. Looking at C++ alternatives there are languages like D, Rust, Nim, Crystal, Beef, Carbon and others.

But is it possible to replace C? Let's consider some arguments against.

1. C language toolchain

The C language is not just the language itself but all the developer tools developed for the language. Do you want to do static analysis on your source code? - There are a lot of people working on that for C. Tools for detecting memory leaks, data races and other bugs? There's a lot of those, even if your language has better tooling out of the box.

If you want to target some obscure platform, then likely it's assuming you're using C.

The status of C as the lingua franca of today's computing makes it worthwhile to write tools for it, so there are many tools being written.

If someone has a toolchain set up working, why risk it switching language? A "better C" must bring a lot of added productivity to motivate the spending time setting up a new toolchain. If it's even possible.

2. The uncertainties of a new language

Before a language has matured, it's likely to have bugs and might change significantly to address problems with language semantic. And is the language even as advertised? Maybe it offers something like "great compile times" or "faster than C" – only these goals turn out to be hard to reach a the language adds the full set of features.

And what about maintainers? Sure, an open source language can be forked, but I doubt many companies are interested in using a language that they further down the line might be forced to maintain.

Betting on a new language is a big risk.

3. The language might just not be good enough

Is the language even addressing the real pain points of C? It turns out that people don't always agree on what the pain points with C is. Memory allocation, array and string handling are often tricky, but with the right libraries and a sound memory strategy, it can be minimized. Is the language possibly addressing problems that advanced users don't really worry about – if so then its actual value might be much lower than expected.

And worse, what if the language omits crucial features that are present in C? Features that C advanced programmers rely on? This risk is increased if the language designer hasn't used C a great deal but comes from C++, Java etc.

4. No experienced developers

A new language will naturally have a much smaller pool of experienced developers. For any middle to large company that's a huge problem. The more developers there are available for a company, the better they like it.

Also, while the company has experience recruiting for C developers, it doesn't know how to recruit for this new language.

5. The C ABI is the standard for interoperability

If the language can't easily call – or be called - by C code, then anyone using the language will have to have to do extra work to do pretty much anything that is interfacing with outside code. This is potentially a huge disadvantage.

"Better X" doesn't matter

So those are some of the downsides to not picking C, to be offset by the advantages of picking the alternative. However, often language designers over-estimate what how big advantages their added "features" bring. Here are some common "false advantages"

1. Better syntax

Having a "better syntax" than C is mostly subjective. Different syntax is also a huge disadvantage: now you can't copy code from C, you might have to rewrite every single line even. No company will adopt a language because it has slightly better syntax than C.

2. Safer than C

Any C alternative will be expected to be on par with C in performance. The problem is that C have practically no checks, so any safety checks put into the competing language will have a runtime cost, which often is unacceptable. This leads to a strategy of only having checks in "safe" mode. Where the "fast" mode is just as "unsafe" as C.

There are some exceptions: "foreach" avoids manually adding boundary checks and so will automatically be safer. Similarly slices helps in writing checks compared to "pointer + len" (or worse: null terminated arrays).

3. Programmer productivity

First of all, pretty much all languages ever will make vacuous claims of "higher programmer productivity". The problem is that for a business this usually doesn't matter. Why? Because the actual programming is not the main time sink. In a business, what takes time is to actually figure out what the task really is. So something like a 10% or 20% "productivity boost" won't even register. A 100% increase in productivity might show, but even that isn't guaranteed.

What matters?

So if these don't matter, what does? For a business it's whether despite the downsides the language can help the bottom line: "Is this valuable enough to outweigh the downsides?"

But if "better x" doesn't help - what does? Well... "killer features": having unique selling points that C can't match.

Look at Java, when it was released it offered the following features that most of the competing languages couldn't give you:

  • OO done cleanly (OO was hot at the time)
  • Threading out of the box (uncommon at the time)
  • "Write once run anywhere"
  • "Run your code in the browser"
  • Garbage collection built in
  • Network programming
  • Good standard library
  • Free to use

That's not just one but eight(!) killer features. How many of those unique selling points do the C alternatives have? Less than Java did at least!

The next killing feature

So my argument is that a common way languages gets adoption by being the only language in order to use something: Dart for using Flutter, JS for scripting the browser, Java for applets, ObjC for Mac and iOS apps.

Even if those monopolies disappear over time, it gives the language become known and used.

Similarly there are examples where frameworks have been popular enough to boost languages, Ruby and Python are good examples.

So looking at our example languages, Jai's strategy of bundling a game engine seems good: anyone using it will have to learn Jai, so if the engine is good enough people will learn the language too.

But aside from Jai, is anyone C alternative really looking to pursue having killer features? And if it doesn't have one, how does it prove the switch from C is worth it? It can't.

Conclusion

The "build it and they will come" idea is tempting to believe in, but there is frankly little reason to drop C unless the alternative has important unique features an/or products that C can't hope to match.

While popularity and enthusiasm is helpful, it cannot replace proven value. In the end, all that matters is whether using a language can produce more tangible value to developers than C for at least a large subset of what C is used for. While developers may be excited by new languages, that enthusiasm doesn't translate to business value.

So no matter how exciting that C alternative may look, it probably will fail.

Comments


Comment by Christoffer Lernö

Like several others I am writing an alternative to the C language (if you read this blog before then this shouldn't be news!). My language (C3) is fairly recent, there are others: Zig, Odin, Jai and older languages like eC. Looking at C++ alternatives there are languages like D, Rust, Nim, Crystal, Beef, Carbon and others.

But is it possible to replace C? Let's consider some arguments against.

1. C language toolchain

The C language is not just the language itself but all the developer tools developed for the language. Do you want to do static analysis on your source code? - There are a lot of people working on that for C. Tools for detecting memory leaks, data races and other bugs? There's a lot of those, even if your language has better tooling out of the box.

If you want to target some obscure platform, then likely it's assuming you're using C.

The status of C as the lingua franca of today's computing makes it worthwhile to write tools for it, so there are many tools being written.

If someone has a toolchain set up working, why risk it switching language? A "better C" must bring a lot of added productivity to motivate the spending time setting up a new toolchain. If it's even possible.

2. The uncertainties of a new language

Before a language has matured, it's likely to have bugs and might change significantly to address problems with language semantic. And is the language even as advertised? Maybe it offers something like "great compile times" or "faster than C" – only these goals turn out to be hard to reach a the language adds the full set of features.

And what about maintainers? Sure, an open source language can be forked, but I doubt many companies are interested in using a language that they further down the line might be forced to maintain.

Betting on a new language is a big risk.

3. The language might just not be good enough

Is the language even addressing the real pain points of C? It turns out that people don't always agree on what the pain points with C is. Memory allocation, array and string handling are often tricky, but with the right libraries and a sound memory strategy, it can be minimized. Is the language possibly addressing problems that advanced users don't really worry about – if so then its actual value might be much lower than expected.

And worse, what if the language omits crucial features that are present in C? Features that C advanced programmers rely on? This risk is increased if the language designer hasn't used C a great deal but comes from C++, Java etc.

4. No experienced developers

A new language will naturally have a much smaller pool of experienced developers. For any middle to large company that's a huge problem. The more developers there are available for a company, the better they like it.

Also, while the company has experience recruiting for C developers, it doesn't know how to recruit for this new language.

5. The C ABI is the standard for interoperability

If the language can't easily call – or be called - by C code, then anyone using the language will have to have to do extra work to do pretty much anything that is interfacing with outside code. This is potentially a huge disadvantage.

"Better X" doesn't matter

So those are some of the downsides to not picking C, to be offset by the advantages of picking the alternative. However, often language designers over-estimate what how big advantages their added "features" bring. Here are some common "false advantages"

1. Better syntax

Having a "better syntax" than C is mostly subjective. Different syntax is also a huge disadvantage: now you can't copy code from C, you might have to rewrite every single line even. No company will adopt a language because it has slightly better syntax than C.

2. Safer than C

Any C alternative will be expected to be on par with C in performance. The problem is that C have practically no checks, so any safety checks put into the competing language will have a runtime cost, which often is unacceptable. This leads to a strategy of only having checks in "safe" mode. Where the "fast" mode is just as "unsafe" as C.

There are some exceptions: "foreach" avoids manually adding boundary checks and so will automatically be safer. Similarly slices helps in writing checks compared to "pointer + len" (or worse: null terminated arrays).

3. Programmer productivity

First of all, pretty much all languages ever will make vacuous claims of "higher programmer productivity". The problem is that for a business this usually doesn't matter. Why? Because the actual programming is not the main time sink. In a business, what takes time is to actually figure out what the task really is. So something like a 10% or 20% "productivity boost" won't even register. A 100% increase in productivity might show, but even that isn't guaranteed.

What matters?

So if these don't matter, what does? For a business it's whether despite the downsides the language can help the bottom line: "Is this valuable enough to outweigh the downsides?"

But if "better x" doesn't help - what does? Well... "killer features": having unique selling points that C can't match.

Look at Java, when it was released it offered the following features that most of the competing languages couldn't give you:

  • OO done cleanly (OO was hot at the time)
  • Threading out of the box (uncommon at the time)
  • "Write once run anywhere"
  • "Run your code in the browser"
  • Garbage collection built in
  • Network programming
  • Good standard library
  • Free to use

That's not just one but eight(!) killer features. How many of those unique selling points do the C alternatives have? Less than Java did at least!

The next killing feature

So my argument is that a common way languages gets adoption by being the only language in order to use something: Dart for using Flutter, JS for scripting the browser, Java for applets, ObjC for Mac and iOS apps.

Even if those monopolies disappear over time, it gives the language become known and used.

Similarly there are examples where frameworks have been popular enough to boost languages, Ruby and Python are good examples.

So looking at our example languages, Jai's strategy of bundling a game engine seems good: anyone using it will have to learn Jai, so if the engine is good enough people will learn the language too.

But aside from Jai, is anyone C alternative really looking to pursue having killer features? And if it doesn't have one, how does it prove the switch from C is worth it? It can't.

Conclusion

The "build it and they will come" idea is tempting to believe in, but there is frankly little reason to drop C unless the alternative has important unique features an/or products that C can't hope to match.

While popularity and enthusiasm is helpful, it cannot replace proven value. In the end, all that matters is whether using a language can produce more tangible value to developers than C for at least a large subset of what C is used for. While developers may be excited by new languages, that enthusiasm doesn't translate to business value.

So no matter how exciting that C alternative may look, it probably will fail.

Optional syntax

Originally from: https://c3.handmade.network/blog/p/8460-optional_syntax

In C3 optionals are built into the language. They're not the run of the mill optionals as they carry a "optional result value". This makes them more like "Result" types than optionals.

In C3 you declare a variable holding an optional using the ! suffix:

int! x = ...

We can now assign either to the real value, or to the optional result:

int! x = 1; // x is a real value
x = MyRes.MISSING!; // x is assigned an optional
// x = MyRes.MISSING; <- Error: cannot assign "MyRes" to int

If we think of it in terms of a "Result":

Result<int> x;
x.result = 1; // x = 1
x.error = MyRes.ERR; // x = MyRes.ERR!
x.result = MyRes.ERR; // x = MyRes.ERR - fails

So the "clever" ! suffix here is used to assign to the "error" part of the Result. Unfortunately, the suffix is hard to read at the end of a line, where ! and ; often blurs together. For that reason I regularly try to revisit this syntax to see if I can improve on it.

It's used in two cases:

  1. assignment: x = MyRes.ERR!
  2. return: return MyRes.ERR!

While (2) could be replaced by something like return! MyRes.ERR or even raise MyRes.ERR, the assignment is not as easily tackled.

Naive ideas could be to use some symbol salad like:

int! x !!= MyRes.MISSING;
int! x <!= MyRes.MISSING;
int! x <- MyRes.MISSING;
// I'm going to exclude
// int! x := MyRes.MISSING
// as it is used as regular assign in most languages.

Or we could allow those return statements to have a different meaning in an assignment:

int! x = raise MyRes.MISSING;
int! x = return! MyRes.MISSING;

While it's possible, it creates an odd effect if we consider this example:

int! x;
return x = return! MyRes.MISSING;

This should also illustrate that using x = MyRes.MISSING! should be thought of as implicitly doing x = { 0, MyRes.MISSING }.

Understanding that we see how it works:

x = MyRes.MISSING!; // x = { 0, MyRes.MISSING }
return MyRes.MISSING!; // return { 0, MyRes.MISSING }

So really the proper way would be to always translate the !, like this:

x = fault MyRes.ERR;
return fault MyRes.ERR;

Which is a mouthful. One could of course contract that return fault into something like:

x = fault MyRes.ERR;
throw MyRes.ERR;

Unfortunately, this builds the assumption that a return may not return an optional, which it of course can:

int! x = ...
return x; // Optional, so it may be like a "throw" or not

If we want to be super clear we can do something like this:

int! x = ...
if (y) return? x; // Might return an optional
if (z) return! MyRes.MISSING; // Will return an optional.
if (w) return w; // Will not return an optional

Due to the ? being a rethrow, we could require this:

int! x = ...
if (y) return x?; // Use rethrow to make the type int
if (z) return! MyRes.MISSING; // Will return an optional.
if (w) return w; // Will not return an optional

So the question here is if this adds anything over the original:

int! x = ...
if (y) return x;
if (z) return MyRes.MISSING!; 
if (w) return w;

These are questions that need quite a bit of C3 error handling code to decide, so for now things have to stay as they are.

Comments


Comment by Christoffer Lernö

In C3 optionals are built into the language. They're not the run of the mill optionals as they carry a "optional result value". This makes them more like "Result" types than optionals.

In C3 you declare a variable holding an optional using the ! suffix:

int! x = ...

We can now assign either to the real value, or to the optional result:

int! x = 1; // x is a real value
x = MyRes.MISSING!; // x is assigned an optional
// x = MyRes.MISSING; <- Error: cannot assign "MyRes" to int

If we think of it in terms of a "Result":

Result<int> x;
x.result = 1; // x = 1
x.error = MyRes.ERR; // x = MyRes.ERR!
x.result = MyRes.ERR; // x = MyRes.ERR - fails

So the "clever" ! suffix here is used to assign to the "error" part of the Result. Unfortunately, the suffix is hard to read at the end of a line, where ! and ; often blurs together. For that reason I regularly try to revisit this syntax to see if I can improve on it.

It's used in two cases:

  1. assignment: x = MyRes.ERR!
  2. return: return MyRes.ERR!

While (2) could be replaced by something like return! MyRes.ERR or even raise MyRes.ERR, the assignment is not as easily tackled.

Naive ideas could be to use some symbol salad like:

int! x !!= MyRes.MISSING;
int! x <!= MyRes.MISSING;
int! x <- MyRes.MISSING;
// I'm going to exclude
// int! x := MyRes.MISSING
// as it is used as regular assign in most languages.

Or we could allow those return statements to have a different meaning in an assignment:

int! x = raise MyRes.MISSING;
int! x = return! MyRes.MISSING;

While it's possible, it creates an odd effect if we consider this example:

int! x;
return x = return! MyRes.MISSING;

This should also illustrate that using x = MyRes.MISSING! should be thought of as implicitly doing x = { 0, MyRes.MISSING }.

Understanding that we see how it works:

x = MyRes.MISSING!; // x = { 0, MyRes.MISSING }
return MyRes.MISSING!; // return { 0, MyRes.MISSING }

So really the proper way would be to always translate the !, like this:

x = fault MyRes.ERR;
return fault MyRes.ERR;

Which is a mouthful. One could of course contract that return fault into something like:

x = fault MyRes.ERR;
throw MyRes.ERR;

Unfortunately, this builds the assumption that a return may not return an optional, which it of course can:

int! x = ...
return x; // Optional, so it may be like a "throw" or not

If we want to be super clear we can do something like this:

int! x = ...
if (y) return? x; // Might return an optional
if (z) return! MyRes.MISSING; // Will return an optional.
if (w) return w; // Will not return an optional

Due to the ? being a rethrow, we could require this:

int! x = ...
if (y) return x?; // Use rethrow to make the type int
if (z) return! MyRes.MISSING; // Will return an optional.
if (w) return w; // Will not return an optional

So the question here is if this adds anything over the original:

int! x = ...
if (y) return x;
if (z) return MyRes.MISSING!; 
if (w) return w;

These are questions that need quite a bit of C3 error handling code to decide, so for now things have to stay as they are.

Why implicit imports fails

Originally from: https://c3.handmade.network/blog/p/8448-why_implicit_imports_fails

As previously discussed, it might be possible to do implicit imports so using Foo would implicitly do it. In C3 due to the overall rules, this leads to few ambiguities (go back to the blog post to review how it works)

After using this for quite a while, I ended up concluding that full implicit imports are bad. You want enough high level importing to feel that there is some documentation of what is included to hint at the possible origin of types.

An example is when read some code that relies on an external graphics library and you encounter a type like Point or Vector2. Because at that point you can't be sure whether this is a type from the external library or from some obscure part of the standard library. Same with something like Socket or Connection: is that Socket from a standard lib networking library, or is it from some external imported library? If the standard library is big enough then you can't know for sure – and finding out is not easy.

So you want at least a high level import, but possibly not import std::net::socket granularity, but rather something like import std::net or import raylib at the top of the file – enough to make it easy to find the types and functions.

So the new updated scheme has wildcard inclusion by default (so import std::net would include all the sub modules).

In addition, I've also made modules implicitly import any other module with the same top domain. So code in std::net::socket would see the code in std::net::http without the need for an explicit import.

This means that if you start a project with some top module, for example mygame, then in the module mygame::gameloop you'll automatically import mygame::maths and mygame::data.

There are some issues with the latter. In particular, all of the standard library modules would see all other standard library modules! It's quite possible to address that, but first I want to make sure it's a problem in practice. Even completely implicit imports "almost worked", so maybe this isn't much of a problem.

Comments


Comment by Christoffer Lernö

As previously discussed, it might be possible to do implicit imports so using Foo would implicitly do it. In C3 due to the overall rules, this leads to few ambiguities (go back to the blog post to review how it works)

After using this for quite a while, I ended up concluding that full implicit imports are bad. You want enough high level importing to feel that there is some documentation of what is included to hint at the possible origin of types.

An example is when read some code that relies on an external graphics library and you encounter a type like Point or Vector2. Because at that point you can't be sure whether this is a type from the external library or from some obscure part of the standard library. Same with something like Socket or Connection: is that Socket from a standard lib networking library, or is it from some external imported library? If the standard library is big enough then you can't know for sure – and finding out is not easy.

So you want at least a high level import, but possibly not import std::net::socket granularity, but rather something like import std::net or import raylib at the top of the file – enough to make it easy to find the types and functions.

So the new updated scheme has wildcard inclusion by default (so import std::net would include all the sub modules).

In addition, I've also made modules implicitly import any other module with the same top domain. So code in std::net::socket would see the code in std::net::http without the need for an explicit import.

This means that if you start a project with some top module, for example mygame, then in the module mygame::gameloop you'll automatically import mygame::maths and mygame::data.

There are some issues with the latter. In particular, all of the standard library modules would see all other standard library modules! It's quite possible to address that, but first I want to make sure it's a problem in practice. Even completely implicit imports "almost worked", so maybe this isn't much of a problem.

Imports and modules

Originally from: https://c3.handmade.network/blog/p/8417-imports_and_modules

When talking about packages / modules, I think it's useful to start with Java. As a language C/C++ but with an import / module system from the beginning, it ended up being a very influential.

Importing a namespace or a graph

Interestingly, the import statement in Java doesn't actually import anything. It's a simple namespace folding mechanism, allowing you to use something like java.util.Random as just Random. The fact that you can use the fully qualified name somewhere later in the source code to implicitly use another package, means that the imports do not fully define the dependencies of a Java source file.

In Java, given a collection of source files, all must be compiled to determine the actual dependencies. However, we can imagine instead a different model where the import statements create a dependency graph, starting from the source file that is the main entry point. In this model we may have N source files, but not all are even compiled, since only the subset M can be reached from the import graph.

This later model allows some extra features. For example we can build the feature where including a source file may also implicitly cause a dynamic or static library to be linked. Because only the source code in the graph is compiled, we'll then only get the extra link parameter if the imports reach the source file with the parameter.

The disadvantage is that the imports need to have a clear way of finding the additional dependencies. This is typically done with a file hierarchy or strict naming scheme, so that importing foo.bar allows the compiler to easily find the file or files that define that particular module.

Folding the import

For module systems that allow sub modules, so that there's both foo.bar and foo.baz, the problem with verbosity appears: do we really want to type std.io.net.Socket everywhere? I think the general consensus is that this is annoying.

The two common ways to solve this are namespace folding and namespace renaming, but I'm going to present one more which I term namespace shortening.

The namespace folding is the easiest. You import std.io.net and now you can use Socket unqualified. This is how it works in Java. However, we should note that in Java any global or function is actually prefixed with the class name, which means that even when folding the namespace, your globals and "functions" (static methods) ends up having a prefix.

To overcome collisions and shortcomings of namespace folding, there's namespace renaming, where the import explicitly renames the module name in the file scope, so std.io.net might become n and you now use n.Socket rather than the fully folded or fully qualified name. The downside is naming this namespace alias. Naming things well is known to be one of the harder things in programming, and it can also add to the confusion if the alias is chosen to be different in different parts of the program, e.g. n.Socket in one file and netio.Socket in another.

A way to address the renaming problem is to recognize that usually only the last namespace element is sufficient to distinguish one function from another, so we can allow an abbreviated namespace, allowing the shortened namespace to be used in place of the full one. With this scheme std.io.net.open_socket(), io.net.open_socket() and net.open_socket() are all valid as long as there is no ambiguity (for example, if an import made foo.net.open_socket() available in the current scope, then net.open_socket() would be ambiguous and a longer path, like io.net.open_socket() would be required). C3 uses this scheme for all globals, functions and macros and it seems successful so far.

Lots of imports

In Java, imports quickly became fairly onerous to write, since using a class foo.bar.Baz would use another class from foo.bar.Bar and now both needed to be imported. While wildcard imports helped a bit, those would pull in more classes than necessary, and so inspecting the import statements would obfuscate the actual dependencies.

As a workaround, languages like D added the concept of re-exported imports (D calls this feature "public imports"). So in our foo.bar.Baz case, it could import foo.bar.Bar and re-export it. So that an import of foo.bar.Baz implicitly imports foo.bar.Bar as well. The downside here again is that it's not possible from looking at the imports to see what the actual dependencies are.

A related feature is implicit imports determined by the namespace hierarchy. So for example in Java, any source file in the package foo.bar.baz has all the classes of foo.bar implicitly folded into its namespace. This folding goes bottom up, but not the other way around. So while foo.bar.baz.AbcClass sees foo.bar.Baz, Baz can't access foo.bar.baz.AbcClass without an explicit import.

An experiment: no imports

For C3 I wanted to try going completely without imports. This was feasible mainly due to two observations: (1) type names tend to be fairly universally unique (2) methods and globals are usually unique with a shortened namespace. So given, Foo and foo::some_function() these should mostly be unique without the need for imports. So this is a completely implicit import scheme.

This is completmented by the compiler requiring the programmer to explicitly say which libraries should be used for compilation. So imports could be said to be done globally for the whole program in the build settings.

This certainly works, but has a drawback: let's say a program relies on a library like Raylib. Now Raylib in itself will create a lot of types and functions and while it's no problem to resolve them, it could make it confusing for a casual reader "Oh, a Vector2, is this part of the C3 standard library?", whereas having an import raylib; at the top would immediately hint to the reader where Vector2 might be found.

Wildcard imports for all?

The problem with zero imports suggests an alternative of wildcard imports as the default, so import raylib; would be the standard type of imports and would recursively import everything in raylib, and similarly import std; would get the whole standard library. This would be more for the reader of the code to find the dependencies than being necessary for the compiler.

One problem with this design are the sub modules visibility rules: "what does foo::bar::baz and foo::bar see?"

Java would allow foo::bar::baz to see the foo::bar parent module, but not vice versa. However, looking at the actual usage patterns, it seems to make sense to make this bidirectional, so that all are visible to each other.

But if parent and children modules are visible to each other, what about sibling modules? E.g. does foo::bar::baz see foo::bar::abc? In actual usecases there are arguments both for and against. But if we have sibling visibility what about foo::def and foo::bar::abc? Could they be visible to each other? And if not, would such rules get complicated?

To create a more practical scenario, imagine that we have the following:

  1. std::io::file::open_filename_for_read() a function to open a file for reading
  2. std::io::Path representing a general path.
  3. std::io::OpenMode a distinct type for a mask value for file or resource opening
  4. std::io::readoptions::READ_ONLY a constant of type OpenMode

Let's say this is the implementation of (1)

fn File* open_filename_for_read(char[] filename)
{
  Path* p = io::path_from_string(foo);
  defer io::path_free(p);
  return file::open_file(p, readoptions::READ_ONLY);
}

Here we see that std::io::file must be able to use std::io and std::io::readoptions. The readoptions sub module needs std::io but not the file sub module. Note how C3 uses a function in a sub module as other languages would typically use static methods. If we want to avoid excessive imports in this case, then file would need sibling and parent visibility, whereas the readoptions use only requires parent visibility.

Excessive rules around visibility is both hard to implement well, hard to test and hard to remember, so it might be preferrable to simply say that a module has visibility to any other module in the same top module. The downside would of course be that visibility is much wider than what's probably desired (e.g. std::math having visibility to std::io).

Conclusions and further research for C3

Like everything in language design, imports and modules have a lot of trade-offs. Import statements may be used to narrow down the dependency graph, but at the same time a language with a lot of imports don't necessarily use them in that manner. For namespace folding it matters a lot whether functions are usually grouped as static methods or free functions. Imports can be used to implicitly determine things like linking arguments, in which case the actual import graph matters.

For C3, the scheme with implicit imports works thanks to library imports also being restricted by build scripts, but high level imports could still improve readability. However such a scheme would probably need recursive imports which raises the question of implicit imports between sub modules. For C3 in particular this an important usability concern as sub modules are used to organize functions and constants more than is common in many other languages. This is the area I'm currently researching, but I hope that within a few weeks I can have a design candidate.

Comments


Comment by Christoffer Lernö

When talking about packages / modules, I think it's useful to start with Java. As a language C/C++ but with an import / module system from the beginning, it ended up being a very influential.

Importing a namespace or a graph

Interestingly, the import statement in Java doesn't actually import anything. It's a simple namespace folding mechanism, allowing you to use something like java.util.Random as just Random. The fact that you can use the fully qualified name somewhere later in the source code to implicitly use another package, means that the imports do not fully define the dependencies of a Java source file.

In Java, given a collection of source files, all must be compiled to determine the actual dependencies. However, we can imagine instead a different model where the import statements create a dependency graph, starting from the source file that is the main entry point. In this model we may have N source files, but not all are even compiled, since only the subset M can be reached from the import graph.

This later model allows some extra features. For example we can build the feature where including a source file may also implicitly cause a dynamic or static library to be linked. Because only the source code in the graph is compiled, we'll then only get the extra link parameter if the imports reach the source file with the parameter.

The disadvantage is that the imports need to have a clear way of finding the additional dependencies. This is typically done with a file hierarchy or strict naming scheme, so that importing foo.bar allows the compiler to easily find the file or files that define that particular module.

Folding the import

For module systems that allow sub modules, so that there's both foo.bar and foo.baz, the problem with verbosity appears: do we really want to type std.io.net.Socket everywhere? I think the general consensus is that this is annoying.

The two common ways to solve this are namespace folding and namespace renaming, but I'm going to present one more which I term namespace shortening.

The namespace folding is the easiest. You import std.io.net and now you can use Socket unqualified. This is how it works in Java. However, we should note that in Java any global or function is actually prefixed with the class name, which means that even when folding the namespace, your globals and "functions" (static methods) ends up having a prefix.

To overcome collisions and shortcomings of namespace folding, there's namespace renaming, where the import explicitly renames the module name in the file scope, so std.io.net might become n and you now use n.Socket rather than the fully folded or fully qualified name. The downside is naming this namespace alias. Naming things well is known to be one of the harder things in programming, and it can also add to the confusion if the alias is chosen to be different in different parts of the program, e.g. n.Socket in one file and netio.Socket in another.

A way to address the renaming problem is to recognize that usually only the last namespace element is sufficient to distinguish one function from another, so we can allow an abbreviated namespace, allowing the shortened namespace to be used in place of the full one. With this scheme std.io.net.open_socket(), io.net.open_socket() and net.open_socket() are all valid as long as there is no ambiguity (for example, if an import made foo.net.open_socket() available in the current scope, then net.open_socket() would be ambiguous and a longer path, like io.net.open_socket() would be required). C3 uses this scheme for all globals, functions and macros and it seems successful so far.

Lots of imports

In Java, imports quickly became fairly onerous to write, since using a class foo.bar.Baz would use another class from foo.bar.Bar and now both needed to be imported. While wildcard imports helped a bit, those would pull in more classes than necessary, and so inspecting the import statements would obfuscate the actual dependencies.

As a workaround, languages like D added the concept of re-exported imports (D calls this feature "public imports"). So in our foo.bar.Baz case, it could import foo.bar.Bar and re-export it. So that an import of foo.bar.Baz implicitly imports foo.bar.Bar as well. The downside here again is that it's not possible from looking at the imports to see what the actual dependencies are.

A related feature is implicit imports determined by the namespace hierarchy. So for example in Java, any source file in the package foo.bar.baz has all the classes of foo.bar implicitly folded into its namespace. This folding goes bottom up, but not the other way around. So while foo.bar.baz.AbcClass sees foo.bar.Baz, Baz can't access foo.bar.baz.AbcClass without an explicit import.

An experiment: no imports

For C3 I wanted to try going completely without imports. This was feasible mainly due to two observations: (1) type names tend to be fairly universally unique (2) methods and globals are usually unique with a shortened namespace. So given, Foo and foo::some_function() these should mostly be unique without the need for imports. So this is a completely implicit import scheme.

This is completmented by the compiler requiring the programmer to explicitly say which libraries should be used for compilation. So imports could be said to be done globally for the whole program in the build settings.

This certainly works, but has a drawback: let's say a program relies on a library like Raylib. Now Raylib in itself will create a lot of types and functions and while it's no problem to resolve them, it could make it confusing for a casual reader "Oh, a Vector2, is this part of the C3 standard library?", whereas having an import raylib; at the top would immediately hint to the reader where Vector2 might be found.

Wildcard imports for all?

The problem with zero imports suggests an alternative of wildcard imports as the default, so import raylib; would be the standard type of imports and would recursively import everything in raylib, and similarly import std; would get the whole standard library. This would be more for the reader of the code to find the dependencies than being necessary for the compiler.

One problem with this design are the sub modules visibility rules: "what does foo::bar::baz and foo::bar see?"

Java would allow foo::bar::baz to see the foo::bar parent module, but not vice versa. However, looking at the actual usage patterns, it seems to make sense to make this bidirectional, so that all are visible to each other.

But if parent and children modules are visible to each other, what about sibling modules? E.g. does foo::bar::baz see foo::bar::abc? In actual usecases there are arguments both for and against. But if we have sibling visibility what about foo::def and foo::bar::abc? Could they be visible to each other? And if not, would such rules get complicated?

To create a more practical scenario, imagine that we have the following:

  1. std::io::file::open_filename_for_read() a function to open a file for reading
  2. std::io::Path representing a general path.
  3. std::io::OpenMode a distinct type for a mask value for file or resource opening
  4. std::io::readoptions::READ_ONLY a constant of type OpenMode

Let's say this is the implementation of (1)

fn File* open_filename_for_read(char[] filename)
{
  Path* p = io::path_from_string(foo);
  defer io::path_free(p);
  return file::open_file(p, readoptions::READ_ONLY);
}

Here we see that std::io::file must be able to use std::io and std::io::readoptions. The readoptions sub module needs std::io but not the file sub module. Note how C3 uses a function in a sub module as other languages would typically use static methods. If we want to avoid excessive imports in this case, then file would need sibling and parent visibility, whereas the readoptions use only requires parent visibility.

Excessive rules around visibility is both hard to implement well, hard to test and hard to remember, so it might be preferrable to simply say that a module has visibility to any other module in the same top module. The downside would of course be that visibility is much wider than what's probably desired (e.g. std::math having visibility to std::io).

Conclusions and further research for C3

Like everything in language design, imports and modules have a lot of trade-offs. Import statements may be used to narrow down the dependency graph, but at the same time a language with a lot of imports don't necessarily use them in that manner. For namespace folding it matters a lot whether functions are usually grouped as static methods or free functions. Imports can be used to implicitly determine things like linking arguments, in which case the actual import graph matters.

For C3, the scheme with implicit imports works thanks to library imports also being restricted by build scripts, but high level imports could still improve readability. However such a scheme would probably need recursive imports which raises the question of implicit imports between sub modules. For C3 in particular this an important usability concern as sub modules are used to organize functions and constants more than is common in many other languages. This is the area I'm currently researching, but I hope that within a few weeks I can have a design candidate.

Do you know why your language will fail?

Originally from: https://c3.handmade.network/blog/p/8341-do_you_know_why_your_language_will_fail

Looking at old language presentations of programming languages that never managed to catch on I am often very interested in figuring out just why it failed.

Why is this useful?

I think it's useful for language designers to consider why some things fail and why some things succeed. In the end a language is serving some intended group of users [1], so ask the question "why didn't it succeed in doing that?"

I believe it's an important thing to ask, because the answer often isn't that "the language was bad". It often wasn't a bad language, but there was still something it failed to do which prevented people from using it.

It also implies that in order to actually serve a group of users (the presumed goal of a language) we do not only need to create a good language, but also a language which succeeds in reaching the users.

In order to succeed at language design we must not only make sure that the language is good, but also ensure that there is a way for the intended users to make use of it.

Why do languages fail?

The obvious and most common way a language can fail is by never being completed. It doesn't matter how good the features are if the language can't be implemented.

The second big thing I see is the "build it and they will come" thinking. That is, the idea that all you need to do is to write a sufficiently good language and then somehow that should be enough to make the language universally adopted by everyone. Unfortunately reality does not work like that.

Looking at successful languages, there is no real common pattern. Corporate backing helps, but isn't a guarantee. Lots of initial interest is good, but doesn't mean it will be a success etc.

But while it's difficult to clearly make out the road to success, we can still study other languages for clues on how to avoid the paths leading to failure.

In the end the failure to understand why languages fail is the biggest reason why languages fail.


[1] Zig has "together we serve the users" in its mission statement – addressing exactly this.

Comments


Comment by Christoffer Lernö

Looking at old language presentations of programming languages that never managed to catch on I am often very interested in figuring out just why it failed.

Why is this useful?

I think it's useful for language designers to consider why some things fail and why some things succeed. In the end a language is serving some intended group of users [1], so ask the question "why didn't it succeed in doing that?"

I believe it's an important thing to ask, because the answer often isn't that "the language was bad". It often wasn't a bad language, but there was still something it failed to do which prevented people from using it.

It also implies that in order to actually serve a group of users (the presumed goal of a language) we do not only need to create a good language, but also a language which succeeds in reaching the users.

In order to succeed at language design we must not only make sure that the language is good, but also ensure that there is a way for the intended users to make use of it.

Why do languages fail?

The obvious and most common way a language can fail is by never being completed. It doesn't matter how good the features are if the language can't be implemented.

The second big thing I see is the "build it and they will come" thinking. That is, the idea that all you need to do is to write a sufficiently good language and then somehow that should be enough to make the language universally adopted by everyone. Unfortunately reality does not work like that.

Looking at successful languages, there is no real common pattern. Corporate backing helps, but isn't a guarantee. Lots of initial interest is good, but doesn't mean it will be a success etc.

But while it's difficult to clearly make out the road to success, we can still study other languages for clues on how to avoid the paths leading to failure.

In the end the failure to understand why languages fail is the biggest reason why languages fail.


[1] Zig has "together we serve the users" in its mission statement – addressing exactly this.

Are modules without imports "considered harmful"?

Originally from: https://c3.handmade.network/blog/p/8337-are_modules_without_imports_considered_harmful

Can you really do a module system without import statements? And should you? If you’re like me you’d probably initially dismiss the idea: “surely that can only work for very simple examples!”

But someone filed an issue to add it to C3, so I had to motivate why it would be difficult / impossible to do well (this actually ended with me redesigning the module system quite a bit). – But the question whether it’s possible stuck with me.

Why it shouldn't work

Let’s quickly review the problems with no imports (where modules are loaded automatically).

1. Ambiguities

The classic example is the function “open”, which is would clash with open in all other modules, making it necessary to use the full module names:

module foo;
fn File* open(char* filename) { … }

module bar;
fn Connection* open(char* url) { … }

module baz;
fn void test()
{
   open(“foo.txt”); // Which one is intended?
}

2. Bad search & code completion

When all files are basically importing everything then every public function should be listed for code completion or search.

If you'd just match your own code it wouldn’t be so bad, but add to that the whole standard library + any library you’re importing… you'll get a lot of matches.

3. Compiling more than necessary

Some languages use imports to figure out exactly what files to compile. Implicitly having everything imported is means everything needs to be analyzed during compilation.

4. Dependencies are not obvious

Explicit imports help both readers of the source code and things like IDEs to limit the files the current file depends on in a simple way.

Summing it up

All in all the situation looks pretty grim so there’s a reason why we don’t see this.

There are outliers: pre-namespace PHP, and from what I’ve heard there’s a Prolog variant which has a form of auto import as well. Unfortunately these examples offer very little in terms of encouragement.

Making a try

Despite this I found that I personally couldn’t really dismiss the idea entirely, for my own peace of mind I had to make sure it wasn’t possible. Let’s revisit the problems:

1. Ambiguities

In this case I actually had the problem halfway solved: in C3 all functions are expected to be called with at least a partial path qualifier.

To call the function foo() from module std::bar in another module you have to write bar::foo() to call it (std::bar::foo() works as well).

I haven't seen the idea of using abbreviated module paths elsewhere, and so it seems to be a novel invention. It should be possible to implement in any namespace scheme where namespaces are separate from types.

However for C3 structs and other user defined types do not require any qualifiers. The reasoning is that type names in general tend to be fairly unique except where two libraries trying to abstract the same thing (for example two file IO libraries will probably both use File as a type name somewhere)

Name collisions are rare with explicit imports, but for implicit imports this might become a real issue.

module foo::io;
struct File { ... }

module std::io;
struct File { ... }

module bar;
File *f; // Is which File is this?

Fortunately we can introduce a simple feature to help us out: if we reintroduce import but change its meaning so that it simply makes the imported module’s types and functions preferred over modules that aren’t imported when doing type resolution.

So returning to the example with File: rather than to have to type foo::io::File to disambiguate it from std::io::File we simply add import foo::io to the start of the file:

module bar;
import foo::io;

File *f; // This is foo::io::File

If we sort of squint at it this is actually a little like Java’s imports work: they only add possibility to use the imported classes without qualifiers.

This seems like (1) could be considered solvable for any language that are fine with path qualifiers in front of functions and globals like in C3.

3. Compiling more than necessary

For reasons that will become apparent later, let's jump to this point first.

Trying to solve this requires us to look at our compilation model in general. For the more extreme version of this, let’s assume that all our libraries are in source form rather than precompiled. We can say we roughly have 3 types of source code: the application code, external libraries and the standard library.

In C3 you already specify the libraries you want to add by specifying the libraries you need for the project. The problem here are projects that bring in their own dependencies.

There’s an simple model we could use here:

  • the application code only sees what is public in the actual libraries imported.
  • the external libraries are resolved seeing only the dependencies they have and not the application code

Let’s say you have a library which allows you to set up an HTTPS service, which in turn uses a crypto library: your application code will not see the crypto library and the HTTPS service will not see other libraries that the application code uses.

To summarize:

  1. Application code: sees library and standard library public types, variables and functions.
  2. Library: sees only public declarations of its own dependencies and the standard library.
  3. Standard library: only sees itself.

Here we're moving dependencies and imports from the source files into the build configuration.

Unfortunately, in practice we will likely still parse most of the code and start decide what needs to be lowered into actual code after analysis. In other words this is not necessarily a win. Parsing and semantic analysis is a small part of the compile time so avoiding doing it for some code doesn't necessarily help much.

Java "modules"

Taking a detour now: Java has a large standard library and typically frameworks have a fair amount of additional dependencies. To address this Java introduced “modules” in Project Jigsaw (not to be confused with the Java packages that are used with import). Jigsaw modules are essentially creating groups of Java packages documented in a special file that also specifies dependencies to other “modules”. The idea is to drastically reduce the number of packages that need to be bundled for an application.

This is very similar to the compilation issue above. By providing a file which in detail describes what parts of the libraries the application uses, the compiler can actually begin with those library definitions before lexing and parsing starts. So in your app you could perhaps not just define the libraries you want to use, but also specify the subset of the modules we are actually depending on. In practical terms we define in a single place what our imports are and the compiler just needs to work with this subset. This is sort of an analogue of keeping a precompiled header in C with all the external library headers you want to use in the project. Although we're not necessarily reducing the compile time more, we're making the job a lot simpler for the compiler.

2. Bad search & code completion

Armed with this we can go back to the question of search: if we now use these package dependency files we've suddenly reduced the lookup for code completion to the subset of packages we actually use in our project, which effectively resolves this issue.

4. Dependencies are not obvious

We’re also ready to tackle the dependencies because we're now in a much better situation than with per-file imports: we can see all dependencies our project has, and also what dependencies the libraries we depend on have by inspection of a few files.

If libraries split their dependencies into multiple groups we can also get a reduction in the number of libraries we need for compilation.

As an example, let us envision a http server library which both supports http and https. The latter depends on a cryptography library which contains multiple types of algorithms. If the library is split into multiple modules, then we can perhaps let the http part simply depend on a TCP library, whereas the https also depends on some cryptography algorithms, but perhaps only in use for https.

Depending on how much granularity there is, something not using https might avoid the download of the cryptography library, and even if https is included, packages with deprecated hash and crypto algorithms do not need to be included to compile the https library.

Does this mean it works?

It seems like for most module systems it could work – given that the caveats listed are satisfied.

But should one do it? I would hedge my bets and say "possibly". Regular imports requires less of the language and is the proven approach, but I believe I've shown that "modules without imports" could still be up for consideration when designing a language.

Comments


Comment by Christoffer Lernö

Can you really do a module system without import statements? And should you? If you’re like me you’d probably initially dismiss the idea: “surely that can only work for very simple examples!”

But someone filed an issue to add it to C3, so I had to motivate why it would be difficult / impossible to do well (this actually ended with me redesigning the module system quite a bit). – But the question whether it’s possible stuck with me.

Why it shouldn't work

Let’s quickly review the problems with no imports (where modules are loaded automatically).

1. Ambiguities

The classic example is the function “open”, which is would clash with open in all other modules, making it necessary to use the full module names:

module foo;
fn File* open(char* filename) { … }

module bar;
fn Connection* open(char* url) { … }

module baz;
fn void test()
{
   open(“foo.txt”); // Which one is intended?
}

2. Bad search & code completion

When all files are basically importing everything then every public function should be listed for code completion or search.

If you'd just match your own code it wouldn’t be so bad, but add to that the whole standard library + any library you’re importing… you'll get a lot of matches.

3. Compiling more than necessary

Some languages use imports to figure out exactly what files to compile. Implicitly having everything imported is means everything needs to be analyzed during compilation.

4. Dependencies are not obvious

Explicit imports help both readers of the source code and things like IDEs to limit the files the current file depends on in a simple way.

Summing it up

All in all the situation looks pretty grim so there’s a reason why we don’t see this.

There are outliers: pre-namespace PHP, and from what I’ve heard there’s a Prolog variant which has a form of auto import as well. Unfortunately these examples offer very little in terms of encouragement.

Making a try

Despite this I found that I personally couldn’t really dismiss the idea entirely, for my own peace of mind I had to make sure it wasn’t possible. Let’s revisit the problems:

1. Ambiguities

In this case I actually had the problem halfway solved: in C3 all functions are expected to be called with at least a partial path qualifier.

To call the function foo() from module std::bar in another module you have to write bar::foo() to call it (std::bar::foo() works as well).

I haven't seen the idea of using abbreviated module paths elsewhere, and so it seems to be a novel invention. It should be possible to implement in any namespace scheme where namespaces are separate from types.

However for C3 structs and other user defined types do not require any qualifiers. The reasoning is that type names in general tend to be fairly unique except where two libraries trying to abstract the same thing (for example two file IO libraries will probably both use File as a type name somewhere)

Name collisions are rare with explicit imports, but for implicit imports this might become a real issue.

module foo::io;
struct File { ... }

module std::io;
struct File { ... }

module bar;
File *f; // Is which File is this?

Fortunately we can introduce a simple feature to help us out: if we reintroduce import but change its meaning so that it simply makes the imported module’s types and functions preferred over modules that aren’t imported when doing type resolution.

So returning to the example with File: rather than to have to type foo::io::File to disambiguate it from std::io::File we simply add import foo::io to the start of the file:

module bar;
import foo::io;

File *f; // This is foo::io::File

If we sort of squint at it this is actually a little like Java’s imports work: they only add possibility to use the imported classes without qualifiers.

This seems like (1) could be considered solvable for any language that are fine with path qualifiers in front of functions and globals like in C3.

3. Compiling more than necessary

For reasons that will become apparent later, let's jump to this point first.

Trying to solve this requires us to look at our compilation model in general. For the more extreme version of this, let’s assume that all our libraries are in source form rather than precompiled. We can say we roughly have 3 types of source code: the application code, external libraries and the standard library.

In C3 you already specify the libraries you want to add by specifying the libraries you need for the project. The problem here are projects that bring in their own dependencies.

There’s an simple model we could use here:

  • the application code only sees what is public in the actual libraries imported.
  • the external libraries are resolved seeing only the dependencies they have and not the application code

Let’s say you have a library which allows you to set up an HTTPS service, which in turn uses a crypto library: your application code will not see the crypto library and the HTTPS service will not see other libraries that the application code uses.

To summarize:

  1. Application code: sees library and standard library public types, variables and functions.
  2. Library: sees only public declarations of its own dependencies and the standard library.
  3. Standard library: only sees itself.

Here we're moving dependencies and imports from the source files into the build configuration.

Unfortunately, in practice we will likely still parse most of the code and start decide what needs to be lowered into actual code after analysis. In other words this is not necessarily a win. Parsing and semantic analysis is a small part of the compile time so avoiding doing it for some code doesn't necessarily help much.

Java "modules"

Taking a detour now: Java has a large standard library and typically frameworks have a fair amount of additional dependencies. To address this Java introduced “modules” in Project Jigsaw (not to be confused with the Java packages that are used with import). Jigsaw modules are essentially creating groups of Java packages documented in a special file that also specifies dependencies to other “modules”. The idea is to drastically reduce the number of packages that need to be bundled for an application.

This is very similar to the compilation issue above. By providing a file which in detail describes what parts of the libraries the application uses, the compiler can actually begin with those library definitions before lexing and parsing starts. So in your app you could perhaps not just define the libraries you want to use, but also specify the subset of the modules we are actually depending on. In practical terms we define in a single place what our imports are and the compiler just needs to work with this subset. This is sort of an analogue of keeping a precompiled header in C with all the external library headers you want to use in the project. Although we're not necessarily reducing the compile time more, we're making the job a lot simpler for the compiler.

2. Bad search & code completion

Armed with this we can go back to the question of search: if we now use these package dependency files we've suddenly reduced the lookup for code completion to the subset of packages we actually use in our project, which effectively resolves this issue.

4. Dependencies are not obvious

We’re also ready to tackle the dependencies because we're now in a much better situation than with per-file imports: we can see all dependencies our project has, and also what dependencies the libraries we depend on have by inspection of a few files.

If libraries split their dependencies into multiple groups we can also get a reduction in the number of libraries we need for compilation.

As an example, let us envision a http server library which both supports http and https. The latter depends on a cryptography library which contains multiple types of algorithms. If the library is split into multiple modules, then we can perhaps let the http part simply depend on a TCP library, whereas the https also depends on some cryptography algorithms, but perhaps only in use for https.

Depending on how much granularity there is, something not using https might avoid the download of the cryptography library, and even if https is included, packages with deprecated hash and crypto algorithms do not need to be included to compile the https library.

Does this mean it works?

It seems like for most module systems it could work – given that the caveats listed are satisfied.

But should one do it? I would hedge my bets and say "possibly". Regular imports requires less of the language and is the proven approach, but I believe I've shown that "modules without imports" could still be up for consideration when designing a language.