Skip to content

2024¶

February 14, 2024
3 min read

Regarding programming forums and such

Originally from: https://c3.handmade.network/blog/p/8863-regarding_programming_forums_and_such

An observation: I notice that by virtue of people being mostly anonymous, a curious effect occurs on programming discords (and by extension elsewhere):

People who are "chat savvy" (or whatever we should call being good at writing in a way that is similar to being good at social interactions in real life) are able to dominate discussions by virtue of this.

In addition, people tend to "cluster" when it comes to opinions, so that followers of such persons may sway others ("everyone else agrees on this").

However, such savvy has nothing to do with actual programming skill or knowledge. Many of these "leaders" are in fact fairly inexperienced, if not outright beginners. Age is similarly obfuscated, so that teenagers might be seen as old and middle-aged persons may be taken for teenagers.

The most obvious example when someone new comes to a Discord and enthusiastically starts presenting ideas. If these ideas are not "approved" by the leaders, the person might immediately be mocked and treated as a beginner / idiot.

I've seen this played out several times, at one occasion a 60+ year old gentleman presenting a language and compiler, where he solved several long standing practical problems he'd encountered over the course of his career. He was laughed at and ridiculed as knowing nothing about programming or real world problems by kids 1/3rd his age because he wasn't presenting it in the way that was "expected" in that community. It was painful to see.

Outside of direct bullying, criticizing well-known people in the business is a favorite past-time. You often find people confidently deriding others as "not having any experience", "not knowing what they're talking about", "is just making things up etc". These critics are very sure of themselves, with aforementioned followers to echo those feelings. So you end up with a bunch of 16-year olds deriding 50+ year old programmers with multiple hit games under their belt as "not knowing anything about programming" and collecting pats on the back by the crowd for saying something so profound!

So what is my point? Nothing really except for these observations and to conclude that there is no wisdom of the crowds online, you need to find truth on your own.

Comments

Comment by Christoffer Lernö

An observation: I notice that by virtue of people being mostly anonymous, a curious effect occurs on programming discords (and by extension elsewhere):

People who are "chat savvy" (or whatever we should call being good at writing in a way that is similar to being good at social interactions in real life) are able to dominate discussions by virtue of this.

In addition, people tend to "cluster" when it comes to opinions, so that followers of such persons may sway others ("everyone else agrees on this").

However, such savvy has nothing to do with actual programming skill or knowledge. Many of these "leaders" are in fact fairly inexperienced, if not outright beginners. Age is similarly obfuscated, so that teenagers might be seen as old and middle-aged persons may be taken for teenagers.

The most obvious example when someone new comes to a Discord and enthusiastically starts presenting ideas. If these ideas are not "approved" by the leaders, the person might immediately be mocked and treated as a beginner / idiot.

I've seen this played out several times, at one occasion a 60+ year old gentleman presenting a language and compiler, where he solved several long standing practical problems he'd encountered over the course of his career. He was laughed at and ridiculed as knowing nothing about programming or real world problems by kids 1/3rd his age because he wasn't presenting it in the way that was "expected" in that community. It was painful to see.

Outside of direct bullying, criticizing well-known people in the business is a favorite past-time. You often find people confidently deriding others as "not having any experience", "not knowing what they're talking about", "is just making things up etc". These critics are very sure of themselves, with aforementioned followers to echo those feelings. So you end up with a bunch of 16-year olds deriding 50+ year old programmers with multiple hit games under their belt as "not knowing anything about programming" and collecting pats on the back by the crowd for saying something so profound!

So what is my point? Nothing really except for these observations and to conclude that there is no wisdom of the crowds online, you need to find truth on your own.

January 18, 2024
9 min read

How bad is LLVM really?

Originally from: https://c3.handmade.network/blog/p/8852-how_bad_is_llvm_really

LLVM used to be hailed as a great thing, but with language projects such as Rust, Zig and others complaining it's bad and slow and they're moving away from it – how bad is LLVM really?

What is LLVM?

LLVM of today is not just a compiler backend, it's a whole toolchain, and the project also provides a linker (lld), a C compiler available as a library (Clang) and much more.

Except for the anomaly of Zig (Zig also uses the entire Clang as a library), most language projects simply use the LLVM backend, and possibly also the lld linker.

LLVM, Clang, lld and most other parts of the project are written in C++, with a C API available for LLVM, but not for most of the other libraries.

The speed problem

When Clang was released, a selling point was that it compiled faster than GCC. Since then this has slipped a bit and GCC and Clang is about equally slow.

The problem is not in optimized builds - most people accept that optimized builds will compile slowly. No, the problem is that unoptimized builds compile slowly. How slow? LLVM codegen and linking takes over 98% of the total compilation time for the C3 compiler when codegen is single threaded with no optimizations.

If codegen is 2 magnitudes slower than parsing, lexing and semantic checking combined, then you can see why compiler writers might not be totally happy with LLVM's performance.

Why is LLVM slow?

First a disclaimer: I have only read the LLVM source code a bit, I haven't contributed anything beyond a few small fixes so I'm not an expert.

However, it seems to me that LLVM has a fairly traditional C++ OO design. One thing this results in is an abundance of heap allocations. An early experiment switching the C3 compiler to mimalloc improved LLVM running times with a whopping 10%, which could only be true if memory allocations were a large contributor to the runtime cost. I would have expected LLVM to use arena allocators, but that doesn't seem to be the case for most code.

Heap allocations aside, using C++ or similar languages often invite certain inefficient patterns. It's easy to just rely on high level constructs to solve problems:

Need to check if a list has duplicates? No problem, just grab a hash set and check!

Except if the list is typically only 2-3 entries, then just doing a linear search might be much faster and require no setup. It doesn't matter how clever and fast the hash set is. And they're usually fast – LLVM has lots of optimized containers, but if no container was needed, then it doesn't matter how fast it was.

It's not necessarily bad code, but it's not code this is likely to be highly performant.

Why is LLVM "bad"?

There are other warts LLVM has. First up, the documentation isn't particularly great. It's not worse than many other libraries I've used, so this is more of a "we all wish it could be better because understanding the backend is hard enough as it is".

More fundamentally though, LLVM is very much a backend for C/C++. While LLVM has test suites, Clang is ultimately the product in the LLVM umbrella that really tests the backend. This results in codegen not used by Clang being notoriously unreliable, as well as often poorly optimized (passing structs around by value for instance).

Another consequence is that LLVM often has mandatory UB where C/C++ does. For example, integer division by zero is currently an unescapable undefined behaviour in LLVM – which is bad if your language wanted to define x / 0 to be 0 for example. Another example is when i << x overflows due to x being the same or larger than the bit width of i. This yields a poison value in LLVM, so if you wanted, say, it to be 0, you would have add a select on every such shift as there is no way to request well defined behaviour. At least in this case the result is a poison value and not UB. C/C++ of course considers i << x undefined behaviour for these overflow cases.

So: bugs, not-so-great documentation and assumption of C/C++ semantics are probably the main complaints I've seen.

The problem with alternatives

Alternatives to LLVM that pop up are Cranelift, QBE etc. However, at the moment none of those offers the same kind of complete solution that LLVM provides - and some of them are slower than using LLVM! If you already started using LLVM's advanced features, you will struggle with feature parity, not to mention the limited platform support.

Integrating with GCC is an alternative, but it doesn't solve the compilation speed problem, nor the other "bad" things about LLVM.

At this point, a lot of projects will start thinking about writing their own backend, and honestly this is probably a better alternative than using anything incomplete off the shelf at the moment, as this ensures there isn't some missing functionality that is impossible to handle later.

So while there are some promising upcoming backends (Tilde Backend comes to mind), there isn't really a drop in replacement for LLVM today.

LLVM the good parts

While there are these downsides to LLVM, we shouldn't lose track of what it actually brings to the table. It's a full fledged backend that is way more field tested than anything one could hope to write by oneself. It's reliable in the sense that it's not going away tomorrow or in five years. Buried in LLVM + Clang is a treasure trove of domain knowledge that a single developer can't be expected to accumulate on their own.

Being able to use LLVM is a huge service to language developers. What it lacks in speed it wins back in completeness.

Final words

We all love to complain about LLVM. It's far from perfect, not the least in regards to speed. But at the same time, it is allowing language designers to build compilers that produces production quality machine code on a wide variety of platforms. So really, starting out with LLVM is a good idea. Once there is a backend that works there is plenty of time to explore other backends without any pressure.

So is LLVM bad? Well it has its bad parts, but it's also probably the best backend you can pick for your compiler when you start out (not counting transpiling to C).

You can worry about the bad parts later.

Comments

Comment by Christoffer Lernö

LLVM used to be hailed as a great thing, but with language projects such as Rust, Zig and others complaining it's bad and slow and they're moving away from it – how bad is LLVM really?

What is LLVM?

LLVM of today is not just a compiler backend, it's a whole toolchain, and the project also provides a linker (lld), a C compiler available as a library (Clang) and much more.

Except for the anomaly of Zig (Zig also uses the entire Clang as a library), most language projects simply use the LLVM backend, and possibly also the lld linker.

LLVM, Clang, lld and most other parts of the project are written in C++, with a C API available for LLVM, but not for most of the other libraries.

The speed problem

When Clang was released, a selling point was that it compiled faster than GCC. Since then this has slipped a bit and GCC and Clang is about equally slow.

The problem is not in optimized builds - most people accept that optimized builds will compile slowly. No, the problem is that unoptimized builds compile slowly. How slow? LLVM codegen and linking takes over 98% of the total compilation time for the C3 compiler when codegen is single threaded with no optimizations.

If codegen is 2 magnitudes slower than parsing, lexing and semantic checking combined, then you can see why compiler writers might not be totally happy with LLVM's performance.

Why is LLVM slow?

First a disclaimer: I have only read the LLVM source code a bit, I haven't contributed anything beyond a few small fixes so I'm not an expert.

However, it seems to me that LLVM has a fairly traditional C++ OO design. One thing this results in is an abundance of heap allocations. An early experiment switching the C3 compiler to mimalloc improved LLVM running times with a whopping 10%, which could only be true if memory allocations were a large contributor to the runtime cost. I would have expected LLVM to use arena allocators, but that doesn't seem to be the case for most code.

Heap allocations aside, using C++ or similar languages often invite certain inefficient patterns. It's easy to just rely on high level constructs to solve problems:

Need to check if a list has duplicates? No problem, just grab a hash set and check!

Except if the list is typically only 2-3 entries, then just doing a linear search might be much faster and require no setup. It doesn't matter how clever and fast the hash set is. And they're usually fast – LLVM has lots of optimized containers, but if no container was needed, then it doesn't matter how fast it was.

It's not necessarily bad code, but it's not code this is likely to be highly performant.

Why is LLVM "bad"?

There are other warts LLVM has. First up, the documentation isn't particularly great. It's not worse than many other libraries I've used, so this is more of a "we all wish it could be better because understanding the backend is hard enough as it is".

More fundamentally though, LLVM is very much a backend for C/C++. While LLVM has test suites, Clang is ultimately the product in the LLVM umbrella that really tests the backend. This results in codegen not used by Clang being notoriously unreliable, as well as often poorly optimized (passing structs around by value for instance).

Another consequence is that LLVM often has mandatory UB where C/C++ does. For example, integer division by zero is currently an unescapable undefined behaviour in LLVM – which is bad if your language wanted to define x / 0 to be 0 for example. Another example is when i << x overflows due to x being the same or larger than the bit width of i. This yields a poison value in LLVM, so if you wanted, say, it to be 0, you would have add a select on every such shift as there is no way to request well defined behaviour. At least in this case the result is a poison value and not UB. C/C++ of course considers i << x undefined behaviour for these overflow cases.

So: bugs, not-so-great documentation and assumption of C/C++ semantics are probably the main complaints I've seen.

The problem with alternatives

Alternatives to LLVM that pop up are Cranelift, QBE etc. However, at the moment none of those offers the same kind of complete solution that LLVM provides - and some of them are slower than using LLVM! If you already started using LLVM's advanced features, you will struggle with feature parity, not to mention the limited platform support.

Integrating with GCC is an alternative, but it doesn't solve the compilation speed problem, nor the other "bad" things about LLVM.

At this point, a lot of projects will start thinking about writing their own backend, and honestly this is probably a better alternative than using anything incomplete off the shelf at the moment, as this ensures there isn't some missing functionality that is impossible to handle later.

So while there are some promising upcoming backends (Tilde Backend comes to mind), there isn't really a drop in replacement for LLVM today.

LLVM the good parts

While there are these downsides to LLVM, we shouldn't lose track of what it actually brings to the table. It's a full fledged backend that is way more field tested than anything one could hope to write by oneself. It's reliable in the sense that it's not going away tomorrow or in five years. Buried in LLVM + Clang is a treasure trove of domain knowledge that a single developer can't be expected to accumulate on their own.

Being able to use LLVM is a huge service to language developers. What it lacks in speed it wins back in completeness.

Final words

We all love to complain about LLVM. It's far from perfect, not the least in regards to speed. But at the same time, it is allowing language designers to build compilers that produces production quality machine code on a wide variety of platforms. So really, starting out with LLVM is a good idea. Once there is a backend that works there is plenty of time to explore other backends without any pressure.

So is LLVM bad? Well it has its bad parts, but it's also probably the best backend you can pick for your compiler when you start out (not counting transpiling to C).

You can worry about the bad parts later.

Comment by Christoffer Lernö

Lexing, parsing and analysis is about 1-2% of the entire compile time when compiling C3 code with no optimizations. The rest is LLVM + linking, where linking is a small part of the time.

You can compare some compiler benchmarks here: https://github.com/nordlow/compiler-benchmark

Not that such benchmark really gives the real time compilation will take on general code, but it gives a rough level of magnitude between different compilers (and consequently compiler backends, as this is where the most of the time is spent for something like C)

January 17, 2024
3 min read

Syntax - when in doubt, don't innovate

Originally from: https://c3.handmade.network/blog/p/8851-syntax_-_when_in_doubt%252C_don%2527t_innovate

One of the most attractive things about language design is to be able to tweak the syntax of a language on its fundamental level, so not surprisingly you'll see language designers coming up with all sorts of alternatives to conventional syntax.

The problem is that it takes a while – I'd say a year at least – to figure out if some particular new syntax is good. It often takes less time figure out if it's bad, but in some cases it might not be obvious until very late. So just because it's not immediately bad, it doesn't mean you won't find out something later.

Even worse, it's hard to weed out false negatives: sometimes syntax might appear to be "bad" simply because it is unfamiliar.

For that reason I think a good rule of thumb when working on syntax might be "when in doubt, do not innovate".

Just like other language features should "carry their weight" (that is, their value should outweigh their cost), so should syntax. "It's setting the language apart" or "I like how it looks" is fairly low on the value scale if the language is intended for use by others. If you're not sure whether some new syntax is necessary then it's better to wait until you know if it is. Meanwhile, there are established syntax conventions out there you can lean on.

New syntax shines where it enables (possibly new and innovative) language features to be expressed cleanly and clearly. It is probably better to prioritize such syntax innovations than, say, innovate new symbol combinations for arithmetics.

Comments

Comment by Christoffer Lernö

One of the most attractive things about language design is to be able to tweak the syntax of a language on its fundamental level, so not surprisingly you'll see language designers coming up with all sorts of alternatives to conventional syntax.

The problem is that it takes a while – I'd say a year at least – to figure out if some particular new syntax is good. It often takes less time figure out if it's bad, but in some cases it might not be obvious until very late. So just because it's not immediately bad, it doesn't mean you won't find out something later.

Even worse, it's hard to weed out false negatives: sometimes syntax might appear to be "bad" simply because it is unfamiliar.

For that reason I think a good rule of thumb when working on syntax might be "when in doubt, do not innovate".

Just like other language features should "carry their weight" (that is, their value should outweigh their cost), so should syntax. "It's setting the language apart" or "I like how it looks" is fairly low on the value scale if the language is intended for use by others. If you're not sure whether some new syntax is necessary then it's better to wait until you know if it is. Meanwhile, there are established syntax conventions out there you can lean on.

New syntax shines where it enables (possibly new and innovative) language features to be expressed cleanly and clearly. It is probably better to prioritize such syntax innovations than, say, innovate new symbol combinations for arithmetics.

January 14, 2024
4 min read

C3 0.5.3 Released

Originally from: https://c3.handmade.network/blog/p/8848-c3_0.5.3_released

It's almost 2 months since 0.5.0 was released and we're now at 0.5.3. This is the change list from 0.5.2:

Changes / improvements

Migrate from using actual type with GEP, use i8 or i8 array instead.
Optimize foreach for single element arrays.
Move all calls to panic due to checks to the end of the function.

Fixes

Single module command line option was not respected.
Fixed issue with compile time defined types (String in this case), which would crash the compiler in certain cases.
Projects now correctly respect optimization directives.
Generic modules now correctly follow the implicit import rules of regular modules.
Passing an untyped list to a macro and then using it as a vaarg would crash the compiler.
Extern const globals now work correctly.

Stdlib changes

init_new/init_temp deprecated, replaced by new_init and temp_init.

What about 0.5.1 and 0.5.2?

Unfortunately I never blogged about those. So here is a short recap on what happened in 0.5.1 and 0.5.2:

Changes / improvements

Allow trailing comma in calls and parameters #1092.
Improved error messages for const errors.
Do not link with debug libraries unless using static libraries.
Add 'print-linking' build option.
System linker may be used even if the target arch is different from current.
Slice -> array/vector works for constant slice lengths.

Fixes

Fixes issue where single character filenames like 'a.c3' would be rejected.
Better errors when index type doesn't match len() when doing user defined foreach.
Fixes to to_int for hexadecimal strings.
Fixed issue when using a generic type from a generic type.
Bug with vector parameters when the size > 2 and modified.
Missing error on assigning to in-parameters through subscripting.
Inference of a vector on the lhs of a binary expression would cause a crash.
Fixes to PriorityQueue
On Aarch64 use the correct frame pointer type.
On Aarch64 macOS, ensure the minimum version is 11.0 (Big Sur)
Fixes to the yacc grammar.
Dsym generation on macOS will correctly emit -arch.
Stacktrace on signals on Linux when backtrace is available.

Stdlib changes

Allow to_int family functions take a base, parsing base 2-10 and 16.
delete and delete_range added to DString.
Splitter iterator added.
splitter and iterator String methods.
load_new, load_buffer and load_temp std::io::file functions.

0.5 has feature stability guarantees, so any code written for 0.5.0 will with on all of 0.5.x.

If you want to read more about C3, check out the documentation: https://c3-lang.org or download it and try it out: https://github.com/c3lang/c3c

Comments

Comment by Christoffer Lernö

It's almost 2 months since 0.5.0 was released and we're now at 0.5.3. This is the change list from 0.5.2:

Changes / improvements

Migrate from using actual type with GEP, use i8 or i8 array instead.
Optimize foreach for single element arrays.
Move all calls to panic due to checks to the end of the function.

Fixes

Single module command line option was not respected.
Fixed issue with compile time defined types (String in this case), which would crash the compiler in certain cases.
Projects now correctly respect optimization directives.
Generic modules now correctly follow the implicit import rules of regular modules.
Passing an untyped list to a macro and then using it as a vaarg would crash the compiler.
Extern const globals now work correctly.

Stdlib changes

init_new/init_temp deprecated, replaced by new_init and temp_init.

What about 0.5.1 and 0.5.2?

Unfortunately I never blogged about those. So here is a short recap on what happened in 0.5.1 and 0.5.2:

Changes / improvements

Allow trailing comma in calls and parameters #1092.
Improved error messages for const errors.
Do not link with debug libraries unless using static libraries.
Add 'print-linking' build option.
System linker may be used even if the target arch is different from current.
Slice -> array/vector works for constant slice lengths.

Fixes

Fixes issue where single character filenames like 'a.c3' would be rejected.
Better errors when index type doesn't match len() when doing user defined foreach.
Fixes to to_int for hexadecimal strings.
Fixed issue when using a generic type from a generic type.
Bug with vector parameters when the size > 2 and modified.
Missing error on assigning to in-parameters through subscripting.
Inference of a vector on the lhs of a binary expression would cause a crash.
Fixes to PriorityQueue
On Aarch64 use the correct frame pointer type.
On Aarch64 macOS, ensure the minimum version is 11.0 (Big Sur)
Fixes to the yacc grammar.
Dsym generation on macOS will correctly emit -arch.
Stacktrace on signals on Linux when backtrace is available.

Stdlib changes

Allow to_int family functions take a base, parsing base 2-10 and 16.
delete and delete_range added to DString.
Splitter iterator added.
splitter and iterator String methods.
load_new, load_buffer and load_temp std::io::file functions.

0.5 has feature stability guarantees, so any code written for 0.5.0 will with on all of 0.5.x.

If you want to read more about C3, check out the documentation: https://c3-lang.org or download it and try it out: https://github.com/c3lang/c3c