Skip to content

2023

Say hello to C3 0.5

Originally from: https://c3.handmade.network/blog/p/8824-say_hello_to_c3_0.5

C3 is a programming language that builds on the syntax and semantics of the C language, with the goal of evolving it while still retaining familiarity for C programmers. It's an evolution, not a revolution: the C-like for programmers who like C.

It is finally time to release C3 0.5. This version is the first version of the C3 compiler (and by extension, the C3 language) which is feature-stable.

Before 0.5, the language changed in the same minor version, so the 0.4.1 version of the compiler might not compile code written for 0.4.20 and vice versa.

From 0.5 and forward this changes: each future version will have its own branch where bug fixes will happen, but otherwise the features are frozen. New features will be reserved for the dev and master branches. Consequently, as we announce 0.5, work will actually move on to 0.6 which is where the active development will happen.

This allows people to pick a version to confidently work with, knowing that there will be no changes to language semantics or the standard library.

Feature complete

With 0.5, C3 language itself can also be considered feature complete, and for 0.6, 0.7, 0.8, 0.9 the focus will be on the standard library. A good standard library should address real life use-cases, to solve commonly encountered issues of the users.

In order to properly know what those use-cases are, a diverse set of projects must be written in C3. And for people to build non-trivial projects in C3 without problems there must be some stability guarantees to the compiler itself. This is what 0.5 provides, and why we now switch forward to refining the standard library.

Explore C3

Interested in trying out C3 0.5? Learn more on the language's official site: https://c3-lang.org. Obtain the compiler from GitHub at https://github.com/c3lang/c3c/issues and join the community shaping the future of the C3 programming language.

Comments


Comment by Christoffer Lernö

C3 is a programming language that builds on the syntax and semantics of the C language, with the goal of evolving it while still retaining familiarity for C programmers. It's an evolution, not a revolution: the C-like for programmers who like C.

It is finally time to release C3 0.5. This version is the first version of the C3 compiler (and by extension, the C3 language) which is feature-stable.

Before 0.5, the language changed in the same minor version, so the 0.4.1 version of the compiler might not compile code written for 0.4.20 and vice versa.

From 0.5 and forward this changes: each future version will have its own branch where bug fixes will happen, but otherwise the features are frozen. New features will be reserved for the dev and master branches. Consequently, as we announce 0.5, work will actually move on to 0.6 which is where the active development will happen.

This allows people to pick a version to confidently work with, knowing that there will be no changes to language semantics or the standard library.

Feature complete

With 0.5, C3 language itself can also be considered feature complete, and for 0.6, 0.7, 0.8, 0.9 the focus will be on the standard library. A good standard library should address real life use-cases, to solve commonly encountered issues of the users.

In order to properly know what those use-cases are, a diverse set of projects must be written in C3. And for people to build non-trivial projects in C3 without problems there must be some stability guarantees to the compiler itself. This is what 0.5 provides, and why we now switch forward to refining the standard library.

Explore C3

Interested in trying out C3 0.5? Learn more on the language's official site: https://c3-lang.org. Obtain the compiler from GitHub at https://github.com/c3lang/c3c/issues and join the community shaping the future of the C3 programming language.


Comment by Christoffer Lernö

The change list for 0.5:

Changes / improvements

  • Trackable allocator with leak allocation backtraces.
  • $defined can take a list of expressions.
  • $and compile time "and" which does not check expressions after the first is an error.
  • $is_const returns true if an expression is compile time const.
  • $assignable returns true is an expression may be implicitly cast to a type.
  • $checks and @checked removed, replaced by an improved $defined
  • Asm string blocks use AT&T syntax for better reliability.
  • Distinct methods changed to separate syntax.
  • 'exec' directive to run scripts at compile time.
  • Project key descriptions in --list command.
  • Added init-lib to simplify library creation.
  • Local const work like namespaced global const.
  • Added $$atomic_fetch_* builtins.
  • vectors may now contain pointers.
  • void! does not convert to anyfault.
  • $$masked_load / $$masked_store / $$gather / $$scatter for vector masked load/store.
  • $$select builtin for vector masked select.
  • Added builtin benchmarks by benchmark, compile-benchmark commands and @benchmark attribute.
  • Subtype matching in type switches.
  • Added parentof typeid property.
  • Slice assignment is expanded.
  • Enforced optional handling.
  • Better dead code analysis, and added dead code errors.
  • Exhaustive switches with enums has better analysis.
  • Globals may now be initialized with optional values.
  • New generic syntax.
  • Slice initialization.
  • $feature for feature flags.
  • Native stacktrace for Linux, MacOS and Windows.
  • Macro ref parameters are now of pointer type and ref parameters are not assignable.
  • Added nextcase default.
  • Added $embed to embed binary data.
  • Ad hoc generics are now allowed.
  • Allow inferred type on method first argument.
  • Fix to void expression blocks
  • Temporary objects may now invoke methods using ref parameters.
  • Delete object files after successful linking.
  • Compile time subscript of constant strings and bytes.
  • @if introduced, other top level conditional compilation removed.
  • Dynamically dispatched interfaces with optional methods.
  • $if now uses $if <expr>: syntax.
  • $assert now uses $assert <expr> : <optional message>
  • $error is syntax sugar for $assert false : "Some message"
  • $include, $echo no longer has mandatory () around the arguments.
  • $exec for including the output of files.
  • assert no longer allows "try unwrap"
  • Updated cpu arguments for x86
  • Removed support for ranged case statements that were floats or enums, or non-constant.
  • nextcase with a constant expression that does not match any case is an error.
  • Dropped support for LLVM 13-14.
  • Updated grammar and lexer definition.
  • Removal of $elif.
  • any / anyfault may now be aliased.
  • @stdcall etc removed in favor of @callconv
  • Empty fault definitions is now an error.
  • Better errors on incorrect bitstruct syntax.
  • Internal use wildcard type rather than optional wildcard.
  • Experimental scaled vector type removed.
  • Disallow parameterize attributes without parameters eg define @Foo() = { @inline }.
  • Handle @optreturn contract, renamed @return!.
  • Restrict interface style functions.
  • Optional propagation and assignment '!' and '?' are flipped.
  • Add l suffix (alias for i64).
  • Allow getting the underlying type of anyfault.
  • De-duplicate string constants.
  • Change @extname => @extern.
  • define and typedef removed.
  • define is replaced by def.
  • LLVM "wrapper" library compilation is exception free.
  • private is replaced by attribute @private.
  • Addition of @local for file local visibility.
  • Addition of @public for overriding default visibility.
  • Default visibility can be overridden per module compile unit. Eg module foo @private.
  • Optimized macro codegen for -O0.
  • Addition of unary +.
  • Remove possibility to elide length when using ':' for slices.
  • Remove the : and ; used in $if, $switch etc.
  • Faults have an ordinal.
  • Generic module contracts.
  • Type inference on enum comparisons, e.g foo_enum == ABC.
  • Allow {} to initialize basic types.
  • String literals default to String.
  • More const modification detection.
  • C3L zip support.
  • Support printing object files.
  • Downloading of libraries using vendor "fetch".
  • Structural casts removed.
  • Added "native" option for vector capability.
  • $$shufflevector replaced with $$swizzle and $$swizzle2.
  • Builtin swizzle accessors.
  • Lambdas, e.g a = int(x, y) => x + y.
  • $$FILEPATH builtin constant.
  • variant renamed any.
  • anyerr renamed anyfault.
  • Added $$wasm_memory_size and $$wasm_memory_grow builtins.
  • Add "link-args" for project.
  • Possible to suppress entry points using --no-entry.
  • Added memory-env option.
  • Use the .wasm extension on WASM binaries.
  • Update precedence clarification rules for ^|&.
  • Support for casting any expression to void.
  • Win 32-bit processor target removed.
  • Insert null-check for contracts declaring & params.
  • Support user defined attributes in generic modules.
  • --strip-unused directive for small binaries.
  • $$atomic_store and $$atomic_load added.
  • usz/isz replaces usize and isize.
  • @export attribute to determine what is visible in precompiled libraries.
  • Disallow obviously wrong code returning a pointer to a stack variable.
  • Add &^| operations for bitstructs.
  • @noinit replaces = void to opt-out of implicit zeroing.
  • Multiple declarations are now allowed in most places, eg int a, b;.
  • Allow simplified (boolean) bitstruct definitions.
  • Allow @test to be placed on module declarations.
  • Updated name mangling for non-exports.
  • defer catch and defer try statements added.
  • Better errors from $assert.
  • @deprecated attribute added.
  • Allow complex array length inference, eg int[*][2][*] a = ....
  • Cleanup of cast code.
  • Removal of generic keyword.
  • Remove implicit cast enum <-> int.
  • Allow enums to use a distinct type as the backing type.
  • Update addition and subtraction on enums.
  • @ensure checks only non-optional results.
  • assert may now take varargs for formatting.

Stdlib changes

  • Tracking allocator with location.
  • init_new/init_temp for allocating init methods.
  • DString.printf is now DString.appendf.
  • Tuple and Maybe types.
  • .as_str() replaced by .str_view()
  • Added math::log(x , base) and math::ln(x).
  • Hashmap keys implicitly copied if copy/free are defined.
  • Socket handling.
  • csv package.
  • Many random functions.
  • Updated posix/win32 stdlib namespacing
  • process stdlib
  • Stdlib updates to string.
  • Many additions to List: remove, array_view, add_all, compact etc
  • Added dstringwriter.
  • Improved printf formatting.
  • is_finite/is_nam/is_inf added.
  • OnStack allocator to easily allocate a stack buffer.
  • File enhancements: mkdir, rmdir, chdir.
  • Path type for file path handling.
  • Distinct String type.
  • VarString replaced by DString.
  • Removal of std::core::str.
  • JSON parser and general Object type.
  • Addition of EnumMap.
  • RC4 crypto.
  • Matrix identity macros.
  • compare_exchange added.
  • printfln and println renamed printfn and printn.
  • Support of roundeven.
  • Added easings.
  • Updated complex/matrix, added quaternion maths.
  • Improved support for freestanding.
  • Improved windows main support, with @winmain annotations.
  • SimpleHeapAllocator added.
  • Added win32 standard types.
  • Added saturated math.
  • Added @expect, @unlikely and @likely macros.
  • Temp allocator uses memory-env to determine starting size.
  • Temp allocator is now accessed using mem::temp(), heap allocator using mem::heap().
  • Float parsing added.
  • Additions to std::net, ipv4/ipv6 parsing.
  • Stream api.
  • Random api.
  • Sha1 hash function.
  • Extended enumset functionality.
  • Updated malloc/calloc/realloc/free removing old helper functions.
  • Added TrackingAllocator.
  • Add checks to prevent incorrect alignment on malloc.
  • Updated clamp.
  • Added Clock and DateTime.
  • Added posix socket functions.

Fixes

  • Structs returned from macros and then indexed into directly could previously be miscompiled.
  • Naked functions now correctly handles asm.
  • Indexing into arrays would not always widen the index safely.
  • Macros with implicit return didn't correctly deduct the return type.
  • Reevaluating a bitstruct (due to checked) would break.
  • Fix missing comparison between any.
  • Fix issue of designated initializers containing bitstructs.
  • Fix issue of designated initializers that had optional arguments.
  • Fixed ++ and -- for bitstructs.
  • Fix to bug where library source files were sometimes ignored.
  • Types of arrays and vectors are consistently checked to be valid.
  • Anonymous bitstructs check of duplicate member names fixed.
  • Assignment to anonymous bitstruct members in structs.
  • Fix casts on empty initializers.
  • Fix to DString reserve.
  • Fix where aliases did not do arithmetic promotion.
  • @local declarations in generic modules available by accident.
  • Fixes missing checks to body arguments.
  • Do not create debug declaration for value-only parameter.
  • Bug in alignment for atomics.
  • Fix to bug when comparing nested arrays.
  • Fix to bug when a macro is using rethrow.
  • Fixes bug initializing a const struct with a const struct value.
  • Fixes bug when void is passed to an "any"-vararg.
  • Fixed defer/return value ordering in certain cases.
  • Fixes to the x64 ABI.
  • Updates to how variadics are implemented.
  • Fixes to shift checks.
  • Fixes to string parsing.
  • Bug when rethrowing an optional from a macro which didn't return an optional.
  • Fixed issues with ranged cases.
  • Disallow trailing ',' in function parameter list.
  • Fixed errors on flexible array slices.
  • Fix of readdir issues on macOS.
  • Fix to slice assignment of distinct types.
  • Fix of issue casting subarrays to distinct types.
  • Fixes to split, rindex_of.
  • List no longer uses the temp allocator by default.
  • Remove test global when not in test mode.
  • Fix sum/product on floats.
  • Fix error on void! return of macros.
  • Removed too permissive casts on subarrays.
  • Using C files correctly places objects in the build folder.
  • Fix of overaligned deref.
  • Fix negating a float vector.
  • Fix where $typeof(x) { ... } would not be a valid compound literal.
  • Fix so that using var in if (var x = ...) works correctly.
  • Fix int[] -> void* casts.
  • Fix in utf8to16 conversions.
  • Updated builtin checking.
  • Reduce formatter register memory usage.
  • Fixes to the "any" type.
  • Fix bug in associated values.
  • More RISC-V tests and fixes to the ABI.
  • Fix issue with hex floats assumed being double despite f suffix.
  • Fix of the tan function.
  • Fixes to the aarch64 ABI when passing invalid vectors.
  • Fix creating typed compile time variables.
  • Fix bug in !floatval codegen.
  • Fix of visibility issues for generic methods.
  • Fixes to $include.
  • Fix of LLVM codegen for optionals in certain cases.
  • Fix of $vasplat when invoked repeatedly.
  • Fix to $$DATE.
  • Fix of attributes on nested bitstructs.
  • Fix comparing const values > 64 bits.
  • Defer now correctly invoked in expressions like return a > 0 ? Foo.ABC! : 1.
  • Fix conversion in if (int x = foo()).
  • Delay C ABI lowering until requested to prevent circular dependencies.
  • Fix issue with decls accidentally invalidated during $checked eval.
  • Fold optional when casting slice to pointer.
  • Fixed issue when using named arguments after varargs.
  • Fix bug initializing nested struct/unions.
  • Fix of bool -> vector cast.
  • Correctly widen C style varargs for distinct types and optionals.
  • Fix of too aggressive codegen in ternary codegen with array indexing.

Comment by Christoffer Lernö

It allows the language to be easily parsable. The classic problem in a C-like grammar is that it is ambiguous with respect to types vs variables. In C this is typically solved using the "lexer hack", where the parser feeds types back into the lexer. Other methods include outlawing certain types of expressions and using infinite lookahead, this is the method D uses for example.

In C3, the distinct naming rules for types disambiguates the grammar, making it LL(1). Also see here: https://c3-lang.org/faq/#syntax-language-design

So to be clear, it's not about trying to enforce some arbitrary name standards, but rather to simplify the grammar. Picking PascalCase for the types was pretty much the only possible choice. I might write a blog post about this some time.

Too much power, too poor accuracy - the story of $checks in C3

Originally from: https://c3.handmade.network/blog/p/8810-too_much_power%252C_too_poor_accuracy_-_the_story_of_checks_in_c3

Recently C3 lost its $checks() function. It would take any sequence of declarations and expressions, and if it failed to semantically check anywhere, return false.

It was an extremely powerful and flexible way of testing pretty much anything at compile time. Some examples:

// Test if a value may be indexed:
$checks(a[0]);
// Test if something supports addition:
$checks(a + a);
// Test if you can assign something to the type of another variable
$checks(b = a);
// Test if you can call a function with the values of two variables
$checks(foo(a, b));
// Check if a type has a particular field
$checks(Foo x, x.my_field);
// Check if a type is ordered
$checks(Foo x, x < x);

In essence, $checks was a Swiss Army knife for compile-time validation, making it redundant to employ multiple compile-time functions like $defined(x). So, why did we part ways with $checks (and its contract counterpart @checked)?

Well, it turns out that with power comes also lack of clarity. Take, for example, the $checks(foo(a, b)) call – it could potentially fail for a multitude of reasons:

  1. foo might not be visible in the scope.
  2. foo needs to be called with the module name, e.g. my_module::foo
  3. foo might not be a callable variable pointer or function.
  4. a might not be visible in the scope.
  5. b might not be visible in the scope.
  6. foo might take fewer than 2 or more than 2 arguments.
  7. There could be a type mismatch between a and the first parameter of foo.
  8. There could be a type mismatch between b and the second parameter of foo.

So while we might have wanted to test for some of these, it might fail for any of the listed cases and there is no way we can determine which one, unless we move it out of the $checks and test it so that it errors just the same way.

While this is a problem when writing the $checks, it also poses a problem when refactoring, as it is hard to tell when you accidentally change something that breaks inside of $checks, causing it to reject legitimate parameters.

So $checks unfortunately combines power with inexactness. In fact, its power comes from being inexact and just bundling all the implicit checks together.

The alternative solution

C3 already had $defined(...) which would do a lightweight check if a variable or a field was defined. Its functionality had almost completely been eclipsed by $checks(...) but now got a new life: $defined would semantically check all but the outermost part of a nested expression. The final expression would then be conditionally checked.

The new behaviour was reminiscent of $checks, but would only have a single "tested" semantic check. For example, $defined(foo(a, b)) would return true if it checked correctly, and false only if "foo" wasn't callable or didn't accept 2 arguments.

The downside is that $defined must be carefully crafted to correctly do each "test" it supports.

But all in all, this is a substantial upgrade to correct compile time checking, which is very important in C3.


Addition: without $checks the various examples instead become:

// Test if a value may be indexed:
$defined(a[0]);
// Test if something supports addition:
types::is_numerical($typeof(a))
// Test if you can assign something to the type of another variable
$assignable(a, $typeof(b));
// Test if you can call a function with the values of two variables
$defined(foo(a, b));
// Check if a type has a particular field
$defined(Foo{}.my_field);
// Check if a type is ordered
Foo.is_ordered

Comments


Comment by Christoffer Lernö

Recently C3 lost its $checks() function. It would take any sequence of declarations and expressions, and if it failed to semantically check anywhere, return false.

It was an extremely powerful and flexible way of testing pretty much anything at compile time. Some examples:

// Test if a value may be indexed:
$checks(a[0]);
// Test if something supports addition:
$checks(a + a);
// Test if you can assign something to the type of another variable
$checks(b = a);
// Test if you can call a function with the values of two variables
$checks(foo(a, b));
// Check if a type has a particular field
$checks(Foo x, x.my_field);
// Check if a type is ordered
$checks(Foo x, x < x);

In essence, $checks was a Swiss Army knife for compile-time validation, making it redundant to employ multiple compile-time functions like $defined(x). So, why did we part ways with $checks (and its contract counterpart @checked)?

Well, it turns out that with power comes also lack of clarity. Take, for example, the $checks(foo(a, b)) call – it could potentially fail for a multitude of reasons:

  1. foo might not be visible in the scope.
  2. foo needs to be called with the module name, e.g. my_module::foo
  3. foo might not be a callable variable pointer or function.
  4. a might not be visible in the scope.
  5. b might not be visible in the scope.
  6. foo might take fewer than 2 or more than 2 arguments.
  7. There could be a type mismatch between a and the first parameter of foo.
  8. There could be a type mismatch between b and the second parameter of foo.

So while we might have wanted to test for some of these, it might fail for any of the listed cases and there is no way we can determine which one, unless we move it out of the $checks and test it so that it errors just the same way.

While this is a problem when writing the $checks, it also poses a problem when refactoring, as it is hard to tell when you accidentally change something that breaks inside of $checks, causing it to reject legitimate parameters.

So $checks unfortunately combines power with inexactness. In fact, its power comes from being inexact and just bundling all the implicit checks together.

The alternative solution

C3 already had $defined(...) which would do a lightweight check if a variable or a field was defined. Its functionality had almost completely been eclipsed by $checks(...) but now got a new life: $defined would semantically check all but the outermost part of a nested expression. The final expression would then be conditionally checked.

The new behaviour was reminiscent of $checks, but would only have a single "tested" semantic check. For example, $defined(foo(a, b)) would return true if it checked correctly, and false only if "foo" wasn't callable or didn't accept 2 arguments.

The downside is that $defined must be carefully crafted to correctly do each "test" it supports.

But all in all, this is a substantial upgrade to correct compile time checking, which is very important in C3.


Addition: without $checks the various examples instead become:

// Test if a value may be indexed:
$defined(a[0]);
// Test if something supports addition:
types::is_numerical($typeof(a))
// Test if you can assign something to the type of another variable
$assignable(a, $typeof(b));
// Test if you can call a function with the values of two variables
$defined(foo(a, b));
// Check if a type has a particular field
$defined(Foo{}.my_field);
// Check if a type is ordered
Foo.is_ordered

Some guidelines to new syntax design

Originally from: https://c3.handmade.network/blog/p/8778-some_guidelines_to_new_syntax_design

Syntax discussions tend to be highly contextual. The syntax of a language is not a standalone, separate entity, but rather interacts with what type of algorithmic solutions you envision users to employ. On top of that, one must be aware of that syntax shapes the solutions users will prefer in sometimes unpredictable ways.

This makes completely new syntax very hard to analyze. And also hard to write any guidelines for.

That said, I think there are some things we can say about syntax design, to form some very simple (and obvious) guidelines:

  1. In general, an easy-to-parse syntax tend to be easier for a user to read quickly than a complex-to-parse syntax.
  2. Newly invented syntax will initially be harder for people to grok than established syntax. So it is bad if you try to make experienced programmers understand it "at a glance".
  3. Newly invented syntax does makes the language feel more "different" (unique, inventive etc) than established syntax. So it is good if you want to make the language stand out as being different at a glance.
  4. It's harder to know the downsides of newly invented syntax. So much more research is needed, and it's important to be ready to change it down the line if it doesn't work out.
  5. One's personal opinions of what "nice looking syntax" is very unlikely to be the objectively most accurate opinion, so be aware how that "beautiful" syntax might be hideous to someone else.

Happy hacking!

Comments


Comment by Christoffer Lernö

Syntax discussions tend to be highly contextual. The syntax of a language is not a standalone, separate entity, but rather interacts with what type of algorithmic solutions you envision users to employ. On top of that, one must be aware of that syntax shapes the solutions users will prefer in sometimes unpredictable ways.

This makes completely new syntax very hard to analyze. And also hard to write any guidelines for.

That said, I think there are some things we can say about syntax design, to form some very simple (and obvious) guidelines:

  1. In general, an easy-to-parse syntax tend to be easier for a user to read quickly than a complex-to-parse syntax.
  2. Newly invented syntax will initially be harder for people to grok than established syntax. So it is bad if you try to make experienced programmers understand it "at a glance".
  3. Newly invented syntax does makes the language feel more "different" (unique, inventive etc) than established syntax. So it is good if you want to make the language stand out as being different at a glance.
  4. It's harder to know the downsides of newly invented syntax. So much more research is needed, and it's important to be ready to change it down the line if it doesn't work out.
  5. One's personal opinions of what "nice looking syntax" is very unlikely to be the objectively most accurate opinion, so be aware how that "beautiful" syntax might be hideous to someone else.

Happy hacking!

Compile-time and short-circuit evaluation

Originally from: https://c3.handmade.network/blog/p/8773-compile-time_and_short-circuit_evaluation

Recently a user had a problem with the following code in C3:

$if $foo != "" && $foo[0] != '_':
    ...
$endif

As a reminder, compile time evaluation is distinguished using a $ sigil, so in this case the idea was to check whether the compile time variable $foo was an empty string, and if it wasn't, compare the first character with '_'.

If $foo is indeed an empty string, this code will fail at compile time.

This is because constant folding in C3 follows semantic evaluation, and a binary expression will first type check the sub expressions && was evaluated. That is, at compile time there is no short-circuit evaluation.

The curious effect of short-circuit evaluation

We could say that for && we only evaluate the left hand side, and if that one is false, then we don't evaluate the rest. This is perfectly legitimate behaviour BUT it would mean this would pass semantic checking as well:

if (false && okeoefkepofke[3.141592])
{
    ...
}

Why? Because constant folding would need to work the same way: we evaluate the first part to false, so now we never check the expression okeoefkepofke[3.141592].

So now we got this big piece of code that is wrong and never checked...

But obviously no one would write that, right? Except for something like this is quite reasonable code:

macro foo($foo)
{
  if ($foo && abc()) { ... }
}

This problem is not unique, people using any sort of dynamically typed scripting languages will be familiar with this exact problem. And the solution – if you care about the code actually working – is to write more tests.

Trying to eat the cake and keep it

One possibility one can consider, is to have short-circuit behaviour only in compile time constant environments, so:

// Const global? Don't evaluate the right hand side.
const bool FOO = false && foewkfoewkf[fefeji]; 

fn void test()
{
    // Compile time conditional? Don't evaluate the right hand side
    $if false && foofoekfe[kfiejfie]:
        ...
    $endif
    // And same with switch:
    $switch
        $case false && fokeokfe[ofkeofk]:
            ...
    $endswitch
    // But this would be an error:
    bool b = false && fokefoek[ofofke]; // Error!
}

But if "never short-circuiting" is annoying and unexpected, and "always short-circuiting" requires much more testing, this "a little of both", creates a corner in the language which can be just as problematic as the former two. Having expression evaluation behave differently depending on where it's evaluated, is something likely to confuse even experienced users.

As usual, language design is a trade-off

For C3, semantic checking is prioritized over compile time convenience. I think everyone who's been working with macros in C3 knows the lazy evaluation of macros can easily hide bugs already, and having short-circuiting constant evaluation would just magnify this problem.

There are languages that consistently uses short-circuiting constant evaluation at compile time instead. This allows leveraging this the feature for all its conditional compilation. Where C3 uses $if or $switch and very clear "this is evaluated at compile time" blocks to facilitate finding compile-time bugs, other languages may prefer to streamline the look of the code allowing compile-time and runtime evaluation blur but also being consistent in following the same rules. While this comes at the aforementioned added cost of testing, it might be a trade-off its users prefer.

Comments


Comment by Christoffer Lernö

Recently a user had a problem with the following code in C3:

$if $foo != "" && $foo[0] != '_':
    ...
$endif

As a reminder, compile time evaluation is distinguished using a $ sigil, so in this case the idea was to check whether the compile time variable $foo was an empty string, and if it wasn't, compare the first character with '_'.

If $foo is indeed an empty string, this code will fail at compile time.

This is because constant folding in C3 follows semantic evaluation, and a binary expression will first type check the sub expressions && was evaluated. That is, at compile time there is no short-circuit evaluation.

The curious effect of short-circuit evaluation

We could say that for && we only evaluate the left hand side, and if that one is false, then we don't evaluate the rest. This is perfectly legitimate behaviour BUT it would mean this would pass semantic checking as well:

if (false && okeoefkepofke[3.141592])
{
    ...
}

Why? Because constant folding would need to work the same way: we evaluate the first part to false, so now we never check the expression okeoefkepofke[3.141592].

So now we got this big piece of code that is wrong and never checked...

But obviously no one would write that, right? Except for something like this is quite reasonable code:

macro foo($foo)
{
  if ($foo && abc()) { ... }
}

This problem is not unique, people using any sort of dynamically typed scripting languages will be familiar with this exact problem. And the solution – if you care about the code actually working – is to write more tests.

Trying to eat the cake and keep it

One possibility one can consider, is to have short-circuit behaviour only in compile time constant environments, so:

// Const global? Don't evaluate the right hand side.
const bool FOO = false && foewkfoewkf[fefeji]; 

fn void test()
{
    // Compile time conditional? Don't evaluate the right hand side
    $if false && foofoekfe[kfiejfie]:
        ...
    $endif
    // And same with switch:
    $switch
        $case false && fokeokfe[ofkeofk]:
            ...
    $endswitch
    // But this would be an error:
    bool b = false && fokefoek[ofofke]; // Error!
}

But if "never short-circuiting" is annoying and unexpected, and "always short-circuiting" requires much more testing, this "a little of both", creates a corner in the language which can be just as problematic as the former two. Having expression evaluation behave differently depending on where it's evaluated, is something likely to confuse even experienced users.

As usual, language design is a trade-off

For C3, semantic checking is prioritized over compile time convenience. I think everyone who's been working with macros in C3 knows the lazy evaluation of macros can easily hide bugs already, and having short-circuiting constant evaluation would just magnify this problem.

There are languages that consistently uses short-circuiting constant evaluation at compile time instead. This allows leveraging this the feature for all its conditional compilation. Where C3 uses $if or $switch and very clear "this is evaluated at compile time" blocks to facilitate finding compile-time bugs, other languages may prefer to streamline the look of the code allowing compile-time and runtime evaluation blur but also being consistent in following the same rules. While this comes at the aforementioned added cost of testing, it might be a trade-off its users prefer.

Inspirations for C3's features

Originally from: https://c3.handmade.network/blog/p/8723-inspirations_for_c3%2527s_features

When designing a new programming language, research is incredibly important. While research can be investigating new syntax and new semantics, most of it is actually looking at other language's features and seeing if anything worked extra well and wether it could be useful for your own language.

C3 is derived from C2, which in turn is an evolution of C, so the basis of the language itself is clear. But what about the features on top of C -where do they come from? I thought it might be amusing to list the features and where they originated.

Features and where they come from

Modules – Java was probably the primary inspiration for a lot of it, since it has a very simple and well understood system with packages. However, Java's imports are actually only about visibility, not about really importing anything, so there are very clear differences. I've written more in detail here.

Generic modules - This was inspired by macro based container libraries in C, as well as ASTEC's "@module" macro.

Faults/optionals - Originally this was similar to Zig's design, but took on inspiration from Herbceptions, Haskell/Rust results, C and Go error handling into something original.

Macros - This was based largely on ASTEC but added things like iteration.

Struct subtyping - This is a Plan9 feature that also ended up in Go. I got it from reading about the Plan9 C compiler.

Slices - This exists in many languages, it's hard to say what languages I based it on.

Slicing syntax - The ^1 syntax comes from C#, otherwise it's mostly D with some looks at Swift and Odin.

Contracts - While a lot of languages try to add a bit of contract support, Eiffel is the language I looked at. Placing the contracts in the docs was a change for C3.

Def - I started by looking at D. The inclusion of "distinct" types comes from Odin. The restriction that function types only to be accessed through def is from C2. The idea that generic modules are instantiated using def occurs in earlier languages, I remember looking at Ada in particular.

Reflection - I'd say Jai got the ball rolling here, with some additional inspiration from Odin. It was clear from the start that reflection like Java or Objective-C was out of question, and that Jai's runtime information was more than I wanted. I read about reflection in other languages as well, with D having quite a bit of influence on the syntax.

Operator overloading - I certainly looked at overloading in C++, D and other languages, but in the end the result was a bit in between everything.

Dynamic calls - This is from Objective-C.

Undefined behaviour - The C3 attitude to UB is strongly influenced by Odin, but doesn't go quite that far.

Implicit conversions - Originally this borrowed from Zig, but after a lot of research, it ended as a unique blend of C and Java ideas, without the need for untyped literals.

Precedence rules - Just trying to avoid retaining the poor precedence rules of C.

Project files - Derived from C2, but modified.

Any and typeid types - mainly inspired by Odin.

Enum associated values - derived from Java enums.

Bitstructs - inspired by PacketC.

Extended switch - pattern matching in many languages.

Flowtyping to unwrap - Java / Kotlin in JetBrains' IDEs.

Foreach - ObjC and Java originally. The idea (and syntax!) to allow getting values by ref comes from PHP.

Base64 and hex literals - Inspired by language "wish lists" on the web :D.

Zero init by default - Ultimately Odin convinced me this was a good idea.

Array/slice arithmetics - A subset of Odin and D functionality.

Type methods - An extension of C2 struct functions.

Attributes - Based on C2 attributes

Defer - Based on Swift and Jai defer. Extensions defer catch and defer try were added on top. While Zig has a errdefer which works like defer catch the C3 feature was developed without knowledge of that Zig addition(!)

Special syntax for compile time - Mostly driven by a need to make compile time clearer than compile time code in Zig.

Visibility rules - I did lots of research on this, so it's hard to say where it comes from. Certainly some I made up for C3. "Public by default" comes from Odin. Some ideas for export and visibility came from D.

Raw strings - I experimented with a lot of different styles, ultimately I picked Go style from comparing with Odin. Escaping a single backtick by having two in a row is also from some language, but unfortunately I don't recall which one.

Ranges in initializers - This is a GCC extension.

Expression block - This is a variant of the GCC statement expression that I changed be a self contained block where return only jumped out of the block. So it's an evolution of the GCC feature.

Ranges in case statements - Yes, this is a GCC extension as well.

Named arguments - Probably borrowed from Swift originally.

Trailing macro body - This is a unique functionality, but it is somewhat similar to trailing body lambdas in Ruby and later Swift.

Lambdas - These are syntactically very similar to Java's lambdas. But of course C3 does not capture closures.

Static initializers and finalizers - Syntactically somewhat derived from Java static blocks.

Function syntax - This is from C2, but in shortened form (C2 uses func)

Allocators - Influences from Jai, Odin and Zig, but ultimately C3 picks its own trade off.

Temp allocators - Mostly based off Odin originally.

Inline asm - Mostly based on MSVC inline asm.

Final words

On top of the above, C3 is of course indebted to all the people I've engaged in language discussions with over the years. I should mention Jon Goodwin (Cone) and Andrey Penechko (Vox) in particular, but I want to thank everyone who helped with thoughts and feedback (and complaints!) over the years.

Thank you!


If you are curious about C3 you can try it at https://learn-c3.org or download the compiler from https://github.com/c3lang/c3c

P.S. A bonus tidbit: the use of printn and printfn instead of println and printfln comes from F#

Comments


Comment by Christoffer Lernö

When designing a new programming language, research is incredibly important. While research can be investigating new syntax and new semantics, most of it is actually looking at other language's features and seeing if anything worked extra well and wether it could be useful for your own language.

C3 is derived from C2, which in turn is an evolution of C, so the basis of the language itself is clear. But what about the features on top of C -where do they come from? I thought it might be amusing to list the features and where they originated.

Features and where they come from

Modules – Java was probably the primary inspiration for a lot of it, since it has a very simple and well understood system with packages. However, Java's imports are actually only about visibility, not about really importing anything, so there are very clear differences. I've written more in detail here.

Generic modules - This was inspired by macro based container libraries in C, as well as ASTEC's "@module" macro.

Faults/optionals - Originally this was similar to Zig's design, but took on inspiration from Herbceptions, Haskell/Rust results, C and Go error handling into something original.

Macros - This was based largely on ASTEC but added things like iteration.

Struct subtyping - This is a Plan9 feature that also ended up in Go. I got it from reading about the Plan9 C compiler.

Slices - This exists in many languages, it's hard to say what languages I based it on.

Slicing syntax - The ^1 syntax comes from C#, otherwise it's mostly D with some looks at Swift and Odin.

Contracts - While a lot of languages try to add a bit of contract support, Eiffel is the language I looked at. Placing the contracts in the docs was a change for C3.

Def - I started by looking at D. The inclusion of "distinct" types comes from Odin. The restriction that function types only to be accessed through def is from C2. The idea that generic modules are instantiated using def occurs in earlier languages, I remember looking at Ada in particular.

Reflection - I'd say Jai got the ball rolling here, with some additional inspiration from Odin. It was clear from the start that reflection like Java or Objective-C was out of question, and that Jai's runtime information was more than I wanted. I read about reflection in other languages as well, with D having quite a bit of influence on the syntax.

Operator overloading - I certainly looked at overloading in C++, D and other languages, but in the end the result was a bit in between everything.

Dynamic calls - This is from Objective-C.

Undefined behaviour - The C3 attitude to UB is strongly influenced by Odin, but doesn't go quite that far.

Implicit conversions - Originally this borrowed from Zig, but after a lot of research, it ended as a unique blend of C and Java ideas, without the need for untyped literals.

Precedence rules - Just trying to avoid retaining the poor precedence rules of C.

Project files - Derived from C2, but modified.

Any and typeid types - mainly inspired by Odin.

Enum associated values - derived from Java enums.

Bitstructs - inspired by PacketC.

Extended switch - pattern matching in many languages.

Flowtyping to unwrap - Java / Kotlin in JetBrains' IDEs.

Foreach - ObjC and Java originally. The idea (and syntax!) to allow getting values by ref comes from PHP.

Base64 and hex literals - Inspired by language "wish lists" on the web :D.

Zero init by default - Ultimately Odin convinced me this was a good idea.

Array/slice arithmetics - A subset of Odin and D functionality.

Type methods - An extension of C2 struct functions.

Attributes - Based on C2 attributes

Defer - Based on Swift and Jai defer. Extensions defer catch and defer try were added on top. While Zig has a errdefer which works like defer catch the C3 feature was developed without knowledge of that Zig addition(!)

Special syntax for compile time - Mostly driven by a need to make compile time clearer than compile time code in Zig.

Visibility rules - I did lots of research on this, so it's hard to say where it comes from. Certainly some I made up for C3. "Public by default" comes from Odin. Some ideas for export and visibility came from D.

Raw strings - I experimented with a lot of different styles, ultimately I picked Go style from comparing with Odin. Escaping a single backtick by having two in a row is also from some language, but unfortunately I don't recall which one.

Ranges in initializers - This is a GCC extension.

Expression block - This is a variant of the GCC statement expression that I changed be a self contained block where return only jumped out of the block. So it's an evolution of the GCC feature.

Ranges in case statements - Yes, this is a GCC extension as well.

Named arguments - Probably borrowed from Swift originally.

Trailing macro body - This is a unique functionality, but it is somewhat similar to trailing body lambdas in Ruby and later Swift.

Lambdas - These are syntactically very similar to Java's lambdas. But of course C3 does not capture closures.

Static initializers and finalizers - Syntactically somewhat derived from Java static blocks.

Function syntax - This is from C2, but in shortened form (C2 uses func)

Allocators - Influences from Jai, Odin and Zig, but ultimately C3 picks its own trade off.

Temp allocators - Mostly based off Odin originally.

Inline asm - Mostly based on MSVC inline asm.

Final words

On top of the above, C3 is of course indebted to all the people I've engaged in language discussions with over the years. I should mention Jon Goodwin (Cone) and Andrey Penechko (Vox) in particular, but I want to thank everyone who helped with thoughts and feedback (and complaints!) over the years.

Thank you!


If you are curious about C3 you can try it at https://learn-c3.org or download the compiler from https://github.com/c3lang/c3c

P.S. A bonus tidbit: the use of printn and printfn instead of println and printfln comes from F#

Language design bullshitters

Originally from: https://c3.handmade.network/blog/p/8721-language_design_bullshitters

Inevitably people will ask "what language should I choose for my compiler?".

The answer is really: "you can use any language, so all things being equal, pick one you're good at."

Of course there are caveats:

  1. You want it to go really fast? Then C is better than Python.
  2. Are you making a DSL? Then you probably want to do it in the host language.
  3. Do you want to experiment with some parsing techniques? Then some languages might be a better fit than others

... and so on.

So when someone says something like "C is a bad choice for writing a compiler" as a general statement, you know they are just making it up as they go along.

The C3 compiler is written in C, and there is frankly no other language I could have picked that would have been a substantially better choice. – Sure, writing it in C2 or Odin would certainly have avoided some of C's warts, but the difference would not have been significant. And doing an OO-style C++, or worse, Java, would just have pushed the compiler to slower and more bloated, with no additional benefits other than there are more Java programmers than C programmers.

"Say you are bad at programming without saying you're bad at programming"

So what do you think are the arguments against C?

"C memory management is hard".

My god, if you think your compiler has to have a lot of free and that is the hard part about writing a compiler then you have ABSOLUTELY ZERO business handing out advice on compilers - or programming.

(Memory allocation can be handled in different ways in a compiler, with the simplest way being using arena allocators)

"C doesn't have feature X, so it will be a nightmare writing a compiler for it"

This is the prime argument for people arguing for writing the compiler in Ocaml or some other functional language. "C doesn't have extensive support for pattern matching, how can I use [my preferred technique] without that??? IT'S IMPOSSIBLE AND NO ONE SHOULD TRY IT"

If you point out that there are plenty of compilers written in C, the argument becomes "yes, but they are old and it's not modern to use C".

A carnival of made up arguments

There is no lack of people who want to give advice on language design. Even language designers that actually know what they talk about struggle to give good advice that are applicable to your particular design if you ask about it. It's just hard.

With that in mind, guess what the quality of advice is from people who just have some theoretical knowledge of compiler and language design? Yes, it is as bad as you might guess.

Also, somewhat unfortunately, the group that has little experience is the ones who tend to have the most time to argue for things. Of course their arguments are made up of what they just happen to think is true and what they read on blogs they liked.

To sum it up

So you want to write a compiler? Get some advice for language design? Well do ask, but just keep in mind that most of what you read is just trash advice made up by people who actually don't know what they talk about. Especially on forums where there aren't many people who actually write compilers. You'll get bad advice in places dedicated to programming language design as well, but your odds of picking up some good advice is better.

And the proof is in the pudding: if you actually look at what compilers are written and what languages they're written in, you at least know what's been proven to be production ready. And do look at compiler performance too, because that will matter if you're serious about the project.

And to me, if I find out that someone is making things up, then clearly other things they say isn't trustworthy either. Language design seems to be one of those things people like to have opinions on because they know the risk of being called out for lying is low.

Comments


Comment by Christoffer Lernö

Inevitably people will ask "what language should I choose for my compiler?".

The answer is really: "you can use any language, so all things being equal, pick one you're good at."

Of course there are caveats:

  1. You want it to go really fast? Then C is better than Python.
  2. Are you making a DSL? Then you probably want to do it in the host language.
  3. Do you want to experiment with some parsing techniques? Then some languages might be a better fit than others

... and so on.

So when someone says something like "C is a bad choice for writing a compiler" as a general statement, you know they are just making it up as they go along.

The C3 compiler is written in C, and there is frankly no other language I could have picked that would have been a substantially better choice. – Sure, writing it in C2 or Odin would certainly have avoided some of C's warts, but the difference would not have been significant. And doing an OO-style C++, or worse, Java, would just have pushed the compiler to slower and more bloated, with no additional benefits other than there are more Java programmers than C programmers.

"Say you are bad at programming without saying you're bad at programming"

So what do you think are the arguments against C?

"C memory management is hard".

My god, if you think your compiler has to have a lot of free and that is the hard part about writing a compiler then you have ABSOLUTELY ZERO business handing out advice on compilers - or programming.

(Memory allocation can be handled in different ways in a compiler, with the simplest way being using arena allocators)

"C doesn't have feature X, so it will be a nightmare writing a compiler for it"

This is the prime argument for people arguing for writing the compiler in Ocaml or some other functional language. "C doesn't have extensive support for pattern matching, how can I use [my preferred technique] without that??? IT'S IMPOSSIBLE AND NO ONE SHOULD TRY IT"

If you point out that there are plenty of compilers written in C, the argument becomes "yes, but they are old and it's not modern to use C".

A carnival of made up arguments

There is no lack of people who want to give advice on language design. Even language designers that actually know what they talk about struggle to give good advice that are applicable to your particular design if you ask about it. It's just hard.

With that in mind, guess what the quality of advice is from people who just have some theoretical knowledge of compiler and language design? Yes, it is as bad as you might guess.

Also, somewhat unfortunately, the group that has little experience is the ones who tend to have the most time to argue for things. Of course their arguments are made up of what they just happen to think is true and what they read on blogs they liked.

To sum it up

So you want to write a compiler? Get some advice for language design? Well do ask, but just keep in mind that most of what you read is just trash advice made up by people who actually don't know what they talk about. Especially on forums where there aren't many people who actually write compilers. You'll get bad advice in places dedicated to programming language design as well, but your odds of picking up some good advice is better.

And the proof is in the pudding: if you actually look at what compilers are written and what languages they're written in, you at least know what's been proven to be production ready. And do look at compiler performance too, because that will matter if you're serious about the project.

And to me, if I find out that someone is making things up, then clearly other things they say isn't trustworthy either. Language design seems to be one of those things people like to have opinions on because they know the risk of being called out for lying is low.


Comment by Christoffer Lernö

Absolutely, I don't know if needed to add that. General purpose languages should all be easy to make languages in.

And while asm isn't exactly the nicest abstraction to write a compiler in, that used to be what a lot of (AST-less) compilers were written in back in the days.

Updating keywords for 0.5

Originally from: https://c3.handmade.network/blog/p/8685-updating_keywords_for_0.5

I’ve been working on shaving off the rough corners in the C3 syntax for version 0.5, and one of the changes I'm likely to make is replacing variant and anyerr with any and anyfault

"variant" was originally chosen because it wasn't intended for frequent use – unlike most any types in other languages. In addition I liked the idea that "any" could be used as a variable name.

As for anyerr, it was chosen while I still called the failure result error. anyerr was than Zig’s anyerror and I've in general been happy with the name. The abbreviation doesn't affect readability or clarity.

As the optional/result semantics matured however, it became increasingly clear that error (or a shorter "err") made a bad keyword. With its novel semantics it doesn't quite represent an error, and it was important to highlight this. That was why the keyword was changed to fault instead of error.

I wasn't sure about fault, so I tried variants of it - including reusing enums (enum MyResult : anyerr { ... }), but everything I tried was in practice less clear than fault.

So to avoid too many different terms anyfault is likely going to replace anyerr. While I would have liked to shorten it, I've found no good way to abbreviate “fault” (unlike “error” -> "err"). Fortunately, anyerr/anyfault is not used frequently. Currently in the standard library it is just used in two locations. This is in contrast with Zig, where anyerror is a common return type.

The experiment using variant rather than any largely failed: I never really needed any as a variable name, and where the type was used variant felt less clear than any would have been.

This also gives the language a consistent pair:

any
anyfault

While consistency in name isn't a requirement, it's always nice to have when you can.

Most importantly, the lesson here is that it is fine to pick some keywords and try them out, and its fine to change them. Neither anyfault nor any were choices I could know were "right" from the beginning. Rather, they are choices that only experience could reveal.

Don't expect your first syntax and keyword choices to be the best ones, but also you need to decide on something to get started. No matter how much bikeshedding you do, you can't really predict the feel of a choice until you try it for real.

Comments


Comment by Christoffer Lernö

I’ve been working on shaving off the rough corners in the C3 syntax for version 0.5, and one of the changes I'm likely to make is replacing variant and anyerr with any and anyfault

"variant" was originally chosen because it wasn't intended for frequent use – unlike most any types in other languages. In addition I liked the idea that "any" could be used as a variable name.

As for anyerr, it was chosen while I still called the failure result error. anyerr was than Zig’s anyerror and I've in general been happy with the name. The abbreviation doesn't affect readability or clarity.

As the optional/result semantics matured however, it became increasingly clear that error (or a shorter "err") made a bad keyword. With its novel semantics it doesn't quite represent an error, and it was important to highlight this. That was why the keyword was changed to fault instead of error.

I wasn't sure about fault, so I tried variants of it - including reusing enums (enum MyResult : anyerr { ... }), but everything I tried was in practice less clear than fault.

So to avoid too many different terms anyfault is likely going to replace anyerr. While I would have liked to shorten it, I've found no good way to abbreviate “fault” (unlike “error” -> "err"). Fortunately, anyerr/anyfault is not used frequently. Currently in the standard library it is just used in two locations. This is in contrast with Zig, where anyerror is a common return type.

The experiment using variant rather than any largely failed: I never really needed any as a variable name, and where the type was used variant felt less clear than any would have been.

This also gives the language a consistent pair:

any
anyfault

While consistency in name isn't a requirement, it's always nice to have when you can.

Most importantly, the lesson here is that it is fine to pick some keywords and try them out, and its fine to change them. Neither anyfault nor any were choices I could know were "right" from the beginning. Rather, they are choices that only experience could reveal.

Don't expect your first syntax and keyword choices to be the best ones, but also you need to decide on something to get started. No matter how much bikeshedding you do, you can't really predict the feel of a choice until you try it for real.

Some language design lessons learned

Originally from: https://c3.handmade.network/blog/p/8682-some_language_design_lessons_learned

Language design lessons learned

As you work on a programming language, you'll come to realize things about language design that isn't easy to come by any other way than actually working on a language.

Here are some lessons I learned that was applicable for C3.

1. Make the language easy to parse for the compiler and it will be easy to read for the programmer

If you stop and think about it, this isn't strange: when we read we do so in a way similar to that of a parser, scanning ahead visually. So if the parser needs little lookahead, so does a human reader.

Lots of people approaching language design are often obsessed with finding the parser algorithm that can accept the most types of grammars.

This is then completely counterproductive: better restrict your grammar to LL(1) to make it easy to read.

2. Lexing, parsing and codegen are all well covered by textbooks. But how to model types and do semantic analysis can only be found in by studying compilers.

This has a lot to do with the fact that semantic analysis and types are intrinsically linked to the language semantics, so it's not possible to establish general rules that apply to all languages.

This means that the best resources when starting out is to actually look at compilers for similar languages. The design space here is huge, so lots of different designs are possible even for the same language, but having some references when starting out is invaluable.

For C3 I looked at Clang, TCC, C2 and Cone.

3. Inventing a completely new language construct should only be done if it is absolutely necessary.

This is might seem controversial: why would one build a language that isn't inventing something?

But it turns out there is a lot of value in remixes: C++ is C + Simula, C is B + types, Kotlin is an evolved Java etc.

Value is in the combination of features, not in some perceived "new" functionality. Also, for established features it's often possible to make improvements because the problems are known, but with a new feature you will have to figure them out as you go along. The first language with a particular feature is rarely the language which implements it the best.

4. Don’t take advice from other language designers

What is good for one language might be a horrible idea in another. It is hard to describe a language's goals and ideas, so even if they take the time, they will not understand the nuances of your design.

I have seen so much bad advice over the years.

There is also a lot unsolicited advice. People who will tell you:

  1. What features must be in included a "modern" language.
  2. What type of parser you must use (everything else is old and bad)
  3. What programming language to write the compiler in (e.g. "it's impossible to write a compiler in C")
  4. What paradigm all new languages must be (OO, functional etc)

5. “Better syntax” is subjective and never a selling point.

What you see over and over again is people spending a lot of time doing languages that are reskins of existing languages: e.g. "like java but with better syntax".

What they have in common is that some superficial changes to syntax is argued to be a huge selling point for the language. Often these changes are things that most people would disagree with but for the individual designer this is elegant or "simple".

6. Macros are easy to make powerful but hard to make readable.

The difficulty designing macros is not to make them flexible enough but rather to make them limited in the right way, so that they are readable while still being useful.

There is a difficult trade-off to be made, as greater flexibility makes it harder to know what the macro can do, which reduces readability. Different languages will naturally make different trade-offs, but "macros can do anything!" is rarely a good idea.

7. There will always be people who hate your language no matter what.

It's the wrong paradigm, it has the declarations in the wrong order, it doesn't have a GC or it has a GC, it has RAII or it doesn't have RAII. Anything may be a reason for others to dismiss any language.

Keep in mind that there are people who hate each of these languages as well: C, C++, Go, Rust, Pascal, Haskell, OCaml, Swift, Objective-C, Ruby, Python, Java, C#, JavaScript, Typescript, PHP, Kotlin, Scala, and any other popular language.

8. It is much easier to iterate semantics before they're implemented

Doing a writeup of some semantics allow you to iterate quickly on the design. Changing semantics often means lots of changes to a compiler, so it's painful to change it once it's already in the language. Writing code for your imagined semantics is a powerful tool to experiment with lots of variations.

9. It is much easier to evaluate syntax using it for a real task

In contrast to (8), no amount of bike-shedding of syntax can replace actually trying out syntax for some real examples. Often the conclusions are surprising, with the a priori "best" syntax having problems in real life scenarios.

Summary

These are some lessons I've learned while I've been working on C3. Are they are applicable in general or not? Maybe, maybe not. After all (4) says not to take advice from other language designers, so if you're a language designer do keep in mind they might not apply. 😜

Comments


Comment by Christoffer Lernö

Language design lessons learned

As you work on a programming language, you'll come to realize things about language design that isn't easy to come by any other way than actually working on a language.

Here are some lessons I learned that was applicable for C3.

1. Make the language easy to parse for the compiler and it will be easy to read for the programmer

If you stop and think about it, this isn't strange: when we read we do so in a way similar to that of a parser, scanning ahead visually. So if the parser needs little lookahead, so does a human reader.

Lots of people approaching language design are often obsessed with finding the parser algorithm that can accept the most types of grammars.

This is then completely counterproductive: better restrict your grammar to LL(1) to make it easy to read.

2. Lexing, parsing and codegen are all well covered by textbooks. But how to model types and do semantic analysis can only be found in by studying compilers.

This has a lot to do with the fact that semantic analysis and types are intrinsically linked to the language semantics, so it's not possible to establish general rules that apply to all languages.

This means that the best resources when starting out is to actually look at compilers for similar languages. The design space here is huge, so lots of different designs are possible even for the same language, but having some references when starting out is invaluable.

For C3 I looked at Clang, TCC, C2 and Cone.

3. Inventing a completely new language construct should only be done if it is absolutely necessary.

This is might seem controversial: why would one build a language that isn't inventing something?

But it turns out there is a lot of value in remixes: C++ is C + Simula, C is B + types, Kotlin is an evolved Java etc.

Value is in the combination of features, not in some perceived "new" functionality. Also, for established features it's often possible to make improvements because the problems are known, but with a new feature you will have to figure them out as you go along. The first language with a particular feature is rarely the language which implements it the best.

4. Don’t take advice from other language designers

What is good for one language might be a horrible idea in another. It is hard to describe a language's goals and ideas, so even if they take the time, they will not understand the nuances of your design.

I have seen so much bad advice over the years.

There is also a lot unsolicited advice. People who will tell you:

  1. What features must be in included a "modern" language.
  2. What type of parser you must use (everything else is old and bad)
  3. What programming language to write the compiler in (e.g. "it's impossible to write a compiler in C")
  4. What paradigm all new languages must be (OO, functional etc)

5. “Better syntax” is subjective and never a selling point.

What you see over and over again is people spending a lot of time doing languages that are reskins of existing languages: e.g. "like java but with better syntax".

What they have in common is that some superficial changes to syntax is argued to be a huge selling point for the language. Often these changes are things that most people would disagree with but for the individual designer this is elegant or "simple".

6. Macros are easy to make powerful but hard to make readable.

The difficulty designing macros is not to make them flexible enough but rather to make them limited in the right way, so that they are readable while still being useful.

There is a difficult trade-off to be made, as greater flexibility makes it harder to know what the macro can do, which reduces readability. Different languages will naturally make different trade-offs, but "macros can do anything!" is rarely a good idea.

7. There will always be people who hate your language no matter what.

It's the wrong paradigm, it has the declarations in the wrong order, it doesn't have a GC or it has a GC, it has RAII or it doesn't have RAII. Anything may be a reason for others to dismiss any language.

Keep in mind that there are people who hate each of these languages as well: C, C++, Go, Rust, Pascal, Haskell, OCaml, Swift, Objective-C, Ruby, Python, Java, C#, JavaScript, Typescript, PHP, Kotlin, Scala, and any other popular language.

8. It is much easier to iterate semantics before they're implemented

Doing a writeup of some semantics allow you to iterate quickly on the design. Changing semantics often means lots of changes to a compiler, so it's painful to change it once it's already in the language. Writing code for your imagined semantics is a powerful tool to experiment with lots of variations.

9. It is much easier to evaluate syntax using it for a real task

In contrast to (8), no amount of bike-shedding of syntax can replace actually trying out syntax for some real examples. Often the conclusions are surprising, with the a priori "best" syntax having problems in real life scenarios.

Summary

These are some lessons I've learned while I've been working on C3. Are they are applicable in general or not? Maybe, maybe not. After all (4) says not to take advice from other language designers, so if you're a language designer do keep in mind they might not apply. 😜

Four ways to ways when you need a variably sized list in C3

Originally from: https://c3.handmade.network/blog/p/8654-four_ways_to_ways_when_you_need_a_variably_sized_list_in_c3

In this blog post we'll review four standard ways to handle the case when you need a list with a size which is only known at runtime.

Use a generic List allocated on the heap

import std::io;
import std::collections::list;

// We create a generic List that holds doubles:
define DoubleList = List(<double>);

fn double test_list_on_heap(int len)
{
  DoubleList list;   // By default will allocate on the heap
  defer list.free(); // Free memory at exit with a defer statement.
  for (int i = 0; i < len; i++)
  {
    // Append each element
    list.push(i + 1.0); 
  }
  double sum = 0;
  foreach (d : list) sum += d;
  return sum;
}

We can use list.init(len) if we have some default length in mind, otherwise it's not necessary.

Use a generic List allocated with the temp allocator

Here we instead use the temp allocator to allocate and manage memory. The @pool() { ... } construct will release all temporary allocations inside of the body block.

fn double test_list_on_temp_allocator(int len)
{
  @pool()
  {
    DoubleList list;
    list.temp_init();   // Init using the temp allocator
    for (int i = 0; i < len; i++)
    {
      list.push(i + 1.0);
    }
    double sum = 0;
    foreach (d : list) sum += d;
    // No need to free explicitly!
    return sum;
  };
}

Allocate an array on the heap

This is the conventional way to do it in C if the length is unknown. Note how we can use defer to write the allocation and the free together to avoid forgetting freeing if there are multiple exits.

fn double test_array_on_heap(int len)
{
  double[] arr = mem::new_array(double, len);
  defer free(arr); // Free at function exit.
  for (int i = 0; i < len; i++)
  {
    arr[i] = i + 1.0;
  }
  double sum = 0;
  foreach (d : arr) sum += d;
  return sum;
}

Allocate an array on the temp allocator

Using the temp allocator is as close to doing allocations for free if we need arbitrarily long lists:

fn double test_array_on_temp_allocator(int len)
{
  @pool()
  {
    // The array will be released when exiting `pool()`
    double[] arr = mem::temp_array(double, len); 
    for (int i = 0; i < len; i++)
    {
      arr[i] = i + 1.0;
    }
    double sum = 0;
    foreach (d : arr) sum += d;
    return sum;
  };
}

Summary

We looked at four standard ways to use arbitrarily long lists in C3. Two of them used a growable list, which is important if you might not know the exact length in advance. The other two use simple arrays.

This also contrasted using the temp allocator with the heap allocator. In a later blog post I'll discuss the allocators in more detail.

A gist with the full code can be found here.

Comments


Comment by Christoffer Lernö

In this blog post we'll review four standard ways to handle the case when you need a list with a size which is only known at runtime.

Use a generic List allocated on the heap

import std::io;
import std::collections::list;

// We create a generic List that holds doubles:
define DoubleList = List(<double>);

fn double test_list_on_heap(int len)
{
  DoubleList list;   // By default will allocate on the heap
  defer list.free(); // Free memory at exit with a defer statement.
  for (int i = 0; i < len; i++)
  {
    // Append each element
    list.push(i + 1.0); 
  }
  double sum = 0;
  foreach (d : list) sum += d;
  return sum;
}

We can use list.init(len) if we have some default length in mind, otherwise it's not necessary.

Use a generic List allocated with the temp allocator

Here we instead use the temp allocator to allocate and manage memory. The @pool() { ... } construct will release all temporary allocations inside of the body block.

fn double test_list_on_temp_allocator(int len)
{
  @pool()
  {
    DoubleList list;
    list.temp_init();   // Init using the temp allocator
    for (int i = 0; i < len; i++)
    {
      list.push(i + 1.0);
    }
    double sum = 0;
    foreach (d : list) sum += d;
    // No need to free explicitly!
    return sum;
  };
}

Allocate an array on the heap

This is the conventional way to do it in C if the length is unknown. Note how we can use defer to write the allocation and the free together to avoid forgetting freeing if there are multiple exits.

fn double test_array_on_heap(int len)
{
  double[] arr = mem::new_array(double, len);
  defer free(arr); // Free at function exit.
  for (int i = 0; i < len; i++)
  {
    arr[i] = i + 1.0;
  }
  double sum = 0;
  foreach (d : arr) sum += d;
  return sum;
}

Allocate an array on the temp allocator

Using the temp allocator is as close to doing allocations for free if we need arbitrarily long lists:

fn double test_array_on_temp_allocator(int len)
{
  @pool()
  {
    // The array will be released when exiting `pool()`
    double[] arr = mem::temp_array(double, len); 
    for (int i = 0; i < len; i++)
    {
      arr[i] = i + 1.0;
    }
    double sum = 0;
    foreach (d : arr) sum += d;
    return sum;
  };
}

Summary

We looked at four standard ways to use arbitrarily long lists in C3. Two of them used a growable list, which is important if you might not know the exact length in advance. The other two use simple arrays.

This also contrasted using the temp allocator with the heap allocator. In a later blog post I'll discuss the allocators in more detail.

A gist with the full code can be found here.

A look at modules (in general + in the context of C3)

Originally from: https://c3.handmade.network/blog/p/8650-a_look_at_modules_in_general__in_the_context_of_c3

Despite being a general concept, modules are often very different from language to language. One major reason for this is that overall language semantics puts many constraints on how modules may work. However, despite these constraints there is a lot of specific design work required.

I'm going to look at the modules in general and also talk a little about how C3 modules work.

An initial observation

When making a module system one first have to decide whether a module is a separate concept or not. Because if the language has the idea of static variables and functions attached to a type there is actually already a sort of module system present.

Here is a short snippet written in the C2 language to illustrate this:

// File bar.c2
module bar;
// Plain function
func int get_one() {
    return 1;
}  

// File foo.c2
module foo;
import bar;
type Bar struct {
  int x;
}  
// Static function
func int Bar.get_one() {
    return 1;
}

func void test() {
    int a = Bar.get_one();
    int b = bar.get_one();
}

The type here acts as namespace in itself. If we extend the type with static variable we can similarly emulate namespaced global variables.

Most languages with methods on their types gladly accept this ambiguity, but one can draw the conclusion that modules are not needed and only structs are necessary. This is the approach taken by Zig. The downside is that it also leads to counter-intuitive things such as "a file is a struct" and having to explicitly arrange sub-modules in a hierarchy.

The other way to resolve the ambiguity is to have type methods, but abolish static methods and globals. This is the approach of C3. The downside is that some methods that are naturally static, such as Foo.new_instance() or constants Foo.MAX_VALUE can't be expressed.

We can also note that Java, while having "packages" use classes as the primary namespacing mechanism for free functions and constants, which is a bit more relaxed than Zig's approach, since the hierarchy is external.

Sub-modules and paths

Flat vs hierarchal

The module namespace can be flat with a single module name or hierarchal, where modules have sub-modules. While flat modules are nice to work with and easy to implement, there is much more contention for unique names. This can mean that module names may need to have longer names to require uniqueness, e.g. mylib_io for the flat module and mylib::io for the hierarchal. But hierarchal modules in general have an even worse problem with length: e.g. std.debug.print("Hello, world!\n", .{}); (with apologies to Zig).

Aliasing and import

The obvious solutions to long names are aliasing and namespace imports. Here is again a C2 example:

import networking as net; // Aliasing
import filesystem local; // Namespace import


// Equivalent:
doSomething(); // Namespace import
filesystem.doSomething();

// Equivalent:
net.connect(); // Aliased
networking.connect();

The downside of aliasing is that aliases may differ between authors and implementations. So while someone might alias networking to net, someone else uses nw. This together with the difficulty of naming aliases makes it a less attractive solution. Full namespace import avoids naming issues, but makes it much less clear what are local functions and what is implemented elsewhere.

C3 path shortening

C3 has a hierarchal module system but employs path shortening. This is basically that the first part of a module path may be elided: std::net::sockets::new_from_url(url) can be used as sockets::new_from_url(url) as long as it is not ambiguous.

Requiring at least the sub-module name in the path is a design decision to avoid the readability problems mentioned with namespace imports. In the example "new_from_url(url)" on its own lacks the context that the "sockets::" prefix gives.

Surveying other languages it's clear that usually contain sufficient context in their names. For this reason they are exempt from the prefix requirement in C3.

Note how something similar happens in Java in practice: java.math.BigInteger is the import, you then use BigInteger, but call static "functions" namespaced: BigInteger prime = BigInteger.probablePrime(128, rnd);

In the Java case this comes from import java.math.BigInteger being an actual namespace import, but then the classes themselves provide a second layer or namespacing.

Visibility

The other major component to modules is visibility between modules. Note that nothing is saying that explicit imports are necessary: with full paths the correct types, functions and variables may be found anyway.

With "import" statements the most common scheme is this:

  • Modules not imported: no visibility.
  • Module imported: public declarations are visible.

Hierarchal visibility

As a complement to the above in hierarchal module systems, a module may see non-public declarations in sub modules and/or parent modules.

The desire to have this feature arise from wanting to separate the visible "api layer" module and the internal "implementation layer" modules that which contains implementation details that may change over time.

The downside of this method for modules to peek into other modules is the need to build this into the hierarchy.

"Friend" visibility

As an alternative to the above hierarchal visibility above is to declare "friend" modules that may access the module. This has fewer constraints than trying to fit modules neatly into the right sort of hierarchy just to get the correct visibility between modules.

There is still the drawback that in order to "friend" another module, the module needs to know of that other module.

Becoming a "friend"

Often the concept of visibility is conflated with some idea of "internal safety": "I make this private to make it safe from other modules". This is trying to interpolate the metaphor too far. Visibility and access modifiers are there to help the user of the types to use / override functionality in the correct way. "Public" communicates that this function is made for general consumption, "private" means internal consumption and it not being part of the surface API of the functionality.

However, if one knows what one is doing then circumventing these protections can be useful. For example:

  • There may be a bug that can be circumvented by calling private methods.
  • One may want to exploit the particular functionality of a specific version of a library.
  • One may want to modify behaviour for some other reason that the author did not foresee.

Often languages have convoluted ways of circumventing visibility in these cases, e.g. calling functions using reflection in Java, just because the need does arise.

The obvious way is then for a module to be able to declare itself the friend of a module. A C3 example:

module test;
fn void fn_private() @private {}

module foo;
import test @public; // Override visibility

fn void main()
{
    // This is not an error due to the "@private" import.
    test::fn_private();
}

We can note that C3 has public by default. It is possible to set a different default:

module test2 @private;

fn void fn_private() {}
fn void fn_public() @public {} // Explicitly needs @public!

Visibility levels

To talk about visibility at all we need at least two levels to differentiate between. Usually these are public and private, where public means visible outside of the module and private being visible only inside of the module.

In fact, we could stop here because this will in most cases be all we need. For this reason there is a possibility to not encode this in a keyword, but in the name itself: Go's "uppercase means public" and Dart's "leading underscore means private" (note: I considered the latter for C3).

Between "private" and "public"

If we want hierarchal visibility, then we need another level above private but below public, indicating that something is available to other modules (below or above) in the hierarchy.

Similarly, for the "friend" module visibility we need a visibility level for this behaviour. As an example Rust has pub(in path) and pub(crate) (although note that both of those are somewhat constrained).

Below "private"

If modules may span multiple source files, there is the possibility of another visibility level, where visibility is restricted to the file with the declaration. This is C's static, Swift's fileprivate and C3's @local (Note: while C3 could have used static for globals and functions, it's a poor name for type visibility. This is why @local was chosen instead).

This is not exhaustive: depending on language features more visibility levels might be possible. For C3 with import @private, having "public", "private" and "local" seems to cover most use cases.

Imports

While imports usually is a good way to determine dependencies, this is not guaranteed. As an example: while most Java programmers may think of Java's import as importing classes, all it actually does is to fold namespaces.

The point here is that while import may roughly correspond to the dependency graph, it's not guaranteed to exactly do so. This means that imports is usually simply a way to limit the pollution of the current namespace.

This is very valuable though, in fact this is a variant of the public / private division: importing is picking a set of modules that can be accessed (= is public to the current module).

Narrow imports

In the Java world, wildcard imports (e.g. import java.util.*) is by tradition considered bad. Instead Java source files often contain a litany of single class imports. This is such a problem that most IDEs offer to both hide the list of imports and manage it for you.

In the Java case the tangible benefit claimed is that if you do something like this:

import java.util.*;
import java.sql.*;

You have problem if you try to use Date since it's now unambiguous.

Having written a lot of Java code that works with the DB I can confidently say that the problem here is not the imports, but the reuse of Date in both Java packages. If the java.sql class had a reasonable name like SqlDate this import would not have been a problem AND there would be no confusion when trying to use a java.util.Date and java.sql.Date in the same code, which happens quite often.

So the fact that the above is touted as a reason just shows how weak the arguments are for narrow imports in Java.

HOWEVER if a language uses import to actually pull in dependencies, then narrow is likely better, but it's important to note that this isn't necessarily the case. It's not true in Java, nor is it true in C3.

No imports?

One might think that dumping all modules in the current namespace would be unworkable, but if we already use the full path to types and functions, there are no ambiguities. Even C3 abbreviated paths work fine in general.

The downside is that now things like code completion is going to match EVERYTHING in all modules, which just makes for a much worse experience. This also affects things like error messages. The imports help the compiler (and an IDE) to make better guesses and in general just be more friendly.

A middle ground

In C3 imports are implicitly wildcard, so import std::io will also import sub modules to std::io. It's also possible to have more than one import in a single row, e.g. import std::io, std::math;. To me this seems like a reasonable compromise.

More controversially, C3 modules will implicitly import parent and child modules. So std::io::socket could implicitly import std::io, std and the child module std::io::socket::channel. I am not sure of this feature and it might go away. That said, because there is no sibling module import (e.g. std::io does not implicitly import std::math), the namespace pollution is still fairly low.

Dependency resolution

If the import does not resolve the actual dependency graph, then all code must be at least parsed and analysed. For the C3 compiler this is not a problem, since lexing, parsing and semantic analysis is a fraction of the total compilation time. However, it's desirable to output only the part of the code that is in use.

Exports

We have one more problem: just because a function is public doesn't mean it should be exported in a library.

We can illustrate this with a simple example: let's say we want to build a simple web scraper which creates a list of all the image URLs on a web page. To do so we use a module which handles http + https and writes a thin layer on top with a single function that takes a string and returns a list or strings with the URLs. In other word, we only have a single function that we want to export.

But if we create a static library with this functionality and naively export the public functions we will get the not just get our single function, but the public functions of the http module as well... plus public functions of anything the http module uses!

While the linker might strip unused code when creating an executable, even in this case we will still generate code that is not used.

Explicit exports

The first necessary feature is to be able to mark functions and globals as being exported. Note that being exported is orthogonal to public / private. Public and private is about source level visibility, and exports is about library and linker visibility.

Because exported functions are usually public, some languages conflate public and export, making export simply a variant of "public". (In C3 the @export makes a function or global exported, it has no effect on visibility between modules).

Entry points => dependency graph

With export we're now able to make a real dependency graph. For a regular executable the main function can be considered the entry point, otherwise we use functions marked export to trace dependencies.

Summary

  • We have looked how static methods and member overlap with module namespaced functions and globals. This means namespacing can be done with modules, static methods and member or a combination thereof. C3 uses modules only.
  • Modules may be flat or hierarchal. C3 uses a hierarchal module namespace.
  • Various methods may be used to reduce repetitive module prefixing. Aliasing namespace inlining are common. C3 uses path shortening.
  • The simplest visibility semantics only has public and private.
  • Accessing "private" functions is useful, and there are various solutions.
  • One method is adding a special visibility level to let a parent or child module access private functions.
  • Another method is defining what other modules as "friends" to access private functions as if they were public.
  • C3 allows a module to import private functions of other modules.
  • C3 has three visibility levels: @public @private and @local. "local" means it is local to the current module section.
  • Imports can be narrow or wide. C3 prefers wildcard imports. Narrow imports is mostly useful when imports directly can infer the dependency graph.
  • Exports need to be different from "all of the public functions".
  • C3 uses @export to mark declarations to export.

If you want to try out C3, you can test it here: https://learn-c3.org.

Comments


Comment by Christoffer Lernö

Despite being a general concept, modules are often very different from language to language. One major reason for this is that overall language semantics puts many constraints on how modules may work. However, despite these constraints there is a lot of specific design work required.

I'm going to look at the modules in general and also talk a little about how C3 modules work.

An initial observation

When making a module system one first have to decide whether a module is a separate concept or not. Because if the language has the idea of static variables and functions attached to a type there is actually already a sort of module system present.

Here is a short snippet written in the C2 language to illustrate this:

// File bar.c2
module bar;
// Plain function
func int get_one() {
    return 1;
}  

// File foo.c2
module foo;
import bar;
type Bar struct {
  int x;
}  
// Static function
func int Bar.get_one() {
    return 1;
}

func void test() {
    int a = Bar.get_one();
    int b = bar.get_one();
}

The type here acts as namespace in itself. If we extend the type with static variable we can similarly emulate namespaced global variables.

Most languages with methods on their types gladly accept this ambiguity, but one can draw the conclusion that modules are not needed and only structs are necessary. This is the approach taken by Zig. The downside is that it also leads to counter-intuitive things such as "a file is a struct" and having to explicitly arrange sub-modules in a hierarchy.

The other way to resolve the ambiguity is to have type methods, but abolish static methods and globals. This is the approach of C3. The downside is that some methods that are naturally static, such as Foo.new_instance() or constants Foo.MAX_VALUE can't be expressed.

We can also note that Java, while having "packages" use classes as the primary namespacing mechanism for free functions and constants, which is a bit more relaxed than Zig's approach, since the hierarchy is external.

Sub-modules and paths

Flat vs hierarchal

The module namespace can be flat with a single module name or hierarchal, where modules have sub-modules. While flat modules are nice to work with and easy to implement, there is much more contention for unique names. This can mean that module names may need to have longer names to require uniqueness, e.g. mylib_io for the flat module and mylib::io for the hierarchal. But hierarchal modules in general have an even worse problem with length: e.g. std.debug.print("Hello, world!\n", .{}); (with apologies to Zig).

Aliasing and import

The obvious solutions to long names are aliasing and namespace imports. Here is again a C2 example:

import networking as net; // Aliasing
import filesystem local; // Namespace import


// Equivalent:
doSomething(); // Namespace import
filesystem.doSomething();

// Equivalent:
net.connect(); // Aliased
networking.connect();

The downside of aliasing is that aliases may differ between authors and implementations. So while someone might alias networking to net, someone else uses nw. This together with the difficulty of naming aliases makes it a less attractive solution. Full namespace import avoids naming issues, but makes it much less clear what are local functions and what is implemented elsewhere.

C3 path shortening

C3 has a hierarchal module system but employs path shortening. This is basically that the first part of a module path may be elided: std::net::sockets::new_from_url(url) can be used as sockets::new_from_url(url) as long as it is not ambiguous.

Requiring at least the sub-module name in the path is a design decision to avoid the readability problems mentioned with namespace imports. In the example "new_from_url(url)" on its own lacks the context that the "sockets::" prefix gives.

Surveying other languages it's clear that usually contain sufficient context in their names. For this reason they are exempt from the prefix requirement in C3.

Note how something similar happens in Java in practice: java.math.BigInteger is the import, you then use BigInteger, but call static "functions" namespaced: BigInteger prime = BigInteger.probablePrime(128, rnd);

In the Java case this comes from import java.math.BigInteger being an actual namespace import, but then the classes themselves provide a second layer or namespacing.

Visibility

The other major component to modules is visibility between modules. Note that nothing is saying that explicit imports are necessary: with full paths the correct types, functions and variables may be found anyway.

With "import" statements the most common scheme is this:

  • Modules not imported: no visibility.
  • Module imported: public declarations are visible.

Hierarchal visibility

As a complement to the above in hierarchal module systems, a module may see non-public declarations in sub modules and/or parent modules.

The desire to have this feature arise from wanting to separate the visible "api layer" module and the internal "implementation layer" modules that which contains implementation details that may change over time.

The downside of this method for modules to peek into other modules is the need to build this into the hierarchy.

"Friend" visibility

As an alternative to the above hierarchal visibility above is to declare "friend" modules that may access the module. This has fewer constraints than trying to fit modules neatly into the right sort of hierarchy just to get the correct visibility between modules.

There is still the drawback that in order to "friend" another module, the module needs to know of that other module.

Becoming a "friend"

Often the concept of visibility is conflated with some idea of "internal safety": "I make this private to make it safe from other modules". This is trying to interpolate the metaphor too far. Visibility and access modifiers are there to help the user of the types to use / override functionality in the correct way. "Public" communicates that this function is made for general consumption, "private" means internal consumption and it not being part of the surface API of the functionality.

However, if one knows what one is doing then circumventing these protections can be useful. For example:

  • There may be a bug that can be circumvented by calling private methods.
  • One may want to exploit the particular functionality of a specific version of a library.
  • One may want to modify behaviour for some other reason that the author did not foresee.

Often languages have convoluted ways of circumventing visibility in these cases, e.g. calling functions using reflection in Java, just because the need does arise.

The obvious way is then for a module to be able to declare itself the friend of a module. A C3 example:

module test;
fn void fn_private() @private {}

module foo;
import test @public; // Override visibility

fn void main()
{
    // This is not an error due to the "@private" import.
    test::fn_private();
}

We can note that C3 has public by default. It is possible to set a different default:

module test2 @private;

fn void fn_private() {}
fn void fn_public() @public {} // Explicitly needs @public!

Visibility levels

To talk about visibility at all we need at least two levels to differentiate between. Usually these are public and private, where public means visible outside of the module and private being visible only inside of the module.

In fact, we could stop here because this will in most cases be all we need. For this reason there is a possibility to not encode this in a keyword, but in the name itself: Go's "uppercase means public" and Dart's "leading underscore means private" (note: I considered the latter for C3).

Between "private" and "public"

If we want hierarchal visibility, then we need another level above private but below public, indicating that something is available to other modules (below or above) in the hierarchy.

Similarly, for the "friend" module visibility we need a visibility level for this behaviour. As an example Rust has pub(in path) and pub(crate) (although note that both of those are somewhat constrained).

Below "private"

If modules may span multiple source files, there is the possibility of another visibility level, where visibility is restricted to the file with the declaration. This is C's static, Swift's fileprivate and C3's @local (Note: while C3 could have used static for globals and functions, it's a poor name for type visibility. This is why @local was chosen instead).

This is not exhaustive: depending on language features more visibility levels might be possible. For C3 with import @private, having "public", "private" and "local" seems to cover most use cases.

Imports

While imports usually is a good way to determine dependencies, this is not guaranteed. As an example: while most Java programmers may think of Java's import as importing classes, all it actually does is to fold namespaces.

The point here is that while import may roughly correspond to the dependency graph, it's not guaranteed to exactly do so. This means that imports is usually simply a way to limit the pollution of the current namespace.

This is very valuable though, in fact this is a variant of the public / private division: importing is picking a set of modules that can be accessed (= is public to the current module).

Narrow imports

In the Java world, wildcard imports (e.g. import java.util.*) is by tradition considered bad. Instead Java source files often contain a litany of single class imports. This is such a problem that most IDEs offer to both hide the list of imports and manage it for you.

In the Java case the tangible benefit claimed is that if you do something like this:

import java.util.*;
import java.sql.*;

You have problem if you try to use Date since it's now unambiguous.

Having written a lot of Java code that works with the DB I can confidently say that the problem here is not the imports, but the reuse of Date in both Java packages. If the java.sql class had a reasonable name like SqlDate this import would not have been a problem AND there would be no confusion when trying to use a java.util.Date and java.sql.Date in the same code, which happens quite often.

So the fact that the above is touted as a reason just shows how weak the arguments are for narrow imports in Java.

HOWEVER if a language uses import to actually pull in dependencies, then narrow is likely better, but it's important to note that this isn't necessarily the case. It's not true in Java, nor is it true in C3.

No imports?

One might think that dumping all modules in the current namespace would be unworkable, but if we already use the full path to types and functions, there are no ambiguities. Even C3 abbreviated paths work fine in general.

The downside is that now things like code completion is going to match EVERYTHING in all modules, which just makes for a much worse experience. This also affects things like error messages. The imports help the compiler (and an IDE) to make better guesses and in general just be more friendly.

A middle ground

In C3 imports are implicitly wildcard, so import std::io will also import sub modules to std::io. It's also possible to have more than one import in a single row, e.g. import std::io, std::math;. To me this seems like a reasonable compromise.

More controversially, C3 modules will implicitly import parent and child modules. So std::io::socket could implicitly import std::io, std and the child module std::io::socket::channel. I am not sure of this feature and it might go away. That said, because there is no sibling module import (e.g. std::io does not implicitly import std::math), the namespace pollution is still fairly low.

Dependency resolution

If the import does not resolve the actual dependency graph, then all code must be at least parsed and analysed. For the C3 compiler this is not a problem, since lexing, parsing and semantic analysis is a fraction of the total compilation time. However, it's desirable to output only the part of the code that is in use.

Exports

We have one more problem: just because a function is public doesn't mean it should be exported in a library.

We can illustrate this with a simple example: let's say we want to build a simple web scraper which creates a list of all the image URLs on a web page. To do so we use a module which handles http + https and writes a thin layer on top with a single function that takes a string and returns a list or strings with the URLs. In other word, we only have a single function that we want to export.

But if we create a static library with this functionality and naively export the public functions we will get the not just get our single function, but the public functions of the http module as well... plus public functions of anything the http module uses!

While the linker might strip unused code when creating an executable, even in this case we will still generate code that is not used.

Explicit exports

The first necessary feature is to be able to mark functions and globals as being exported. Note that being exported is orthogonal to public / private. Public and private is about source level visibility, and exports is about library and linker visibility.

Because exported functions are usually public, some languages conflate public and export, making export simply a variant of "public". (In C3 the @export makes a function or global exported, it has no effect on visibility between modules).

Entry points => dependency graph

With export we're now able to make a real dependency graph. For a regular executable the main function can be considered the entry point, otherwise we use functions marked export to trace dependencies.

Summary

  • We have looked how static methods and member overlap with module namespaced functions and globals. This means namespacing can be done with modules, static methods and member or a combination thereof. C3 uses modules only.
  • Modules may be flat or hierarchal. C3 uses a hierarchal module namespace.
  • Various methods may be used to reduce repetitive module prefixing. Aliasing namespace inlining are common. C3 uses path shortening.
  • The simplest visibility semantics only has public and private.
  • Accessing "private" functions is useful, and there are various solutions.
  • One method is adding a special visibility level to let a parent or child module access private functions.
  • Another method is defining what other modules as "friends" to access private functions as if they were public.
  • C3 allows a module to import private functions of other modules.
  • C3 has three visibility levels: @public @private and @local. "local" means it is local to the current module section.
  • Imports can be narrow or wide. C3 prefers wildcard imports. Narrow imports is mostly useful when imports directly can infer the dependency graph.
  • Exports need to be different from "all of the public functions".
  • C3 uses @export to mark declarations to export.

If you want to try out C3, you can test it here: https://learn-c3.org.