Skew

Skew is a web-first, cross-platform programming language with an optimizing compiler.

What is it?

Skew is a programming language for building cross-platform software.
It compiles to straightforward, readable source code in other languages and is designed to be easy to integrate into a mixed-language code base.
The main focus of the project has been to develop solid, production-quality code for the web.
Skew looks like this:

The language and compiler are in development and are still somewhat subject to change.
The compiler is bootstrapped, which means it's written in Skew and compiles itself.
It currently contains production-quality JavaScript generation and working C# generation.
C++ generation is next and is already in progress.

Why use it?

The intent is to use this language for the platform-independent stuff in an application and to use the language native to the target platform for the platform-specific stuff.
When done properly, the vast majority of the code is completely platform-independent and new platforms can be targeted easily with a small platform-specific shim.

Fast compile times:
Code compiles at the speed of a browser refresh.
Web development still feels like web development despite using an optimizing compiler with static typing.
This is in contrast to many other comparable compile-to-JavaScript languages.

Natural debugging experience:
Debugging is done in a single language using the platform-native debugger.
No need to try to debug a multi-language app with a debugger that only understands one language.

Easy integration:
Generated code is very readable and closely corresponds with the original.
Language features allow for the easy import and export of code to and from the target language.

Fast iteration time:
In addition to a fast compiler and a good debugging experience, garbage collection is used instead of manual memory management.
This eliminates a whole class of time-consuming bugs that get in the way of the important stuff.

Native code emission:
For native targets, application logic is compiled directly to native code and is not interpreted in a virtual machine.
Native apps don't have to pay for JIT warmup time and native app performance is not at the whim of heuristics.
The generated code can be compiled using industry-standard compilers that leverage decades of optimization work.

Cons:

Lack of IDE support:
IDE support is planned but is a significant undertaking and will not materialize for a while.
Developers who normally lean heavily on IDEs will be less efficient than usual.

Immaturity:
This is a new programming language and hasn't stood the test of time.
There will likely be many rough edges both in the language design and in the tools.
Many planned features are not yet implemented.

Lack of community:
New programming languages don't have the wealth of searchable Q&A data that established programming languages have.
Solutions to random issues are likely not available online.

No cross-platform multithreading:
Multithreading is not a language feature and needs to be done in the target language.
This limits multithreading opportunities to cleanly separable tasks like image decoding.

Lack of low-level features:
Features such as memory layout, move semantics, destructors, and vector instructions are intentionally omitted.
These features don't map well to all language targets and their emulation is expensive.
Use of these features is limited to imported library routines implemented in the target language.

Getting Started

Installation

First, install node if you don't have it already.
The release version of the compiler is cross-compiled to JavaScript and node is a JavaScript runtime.
Installing node also installs a package manager called npm that can be used to install the compiler:

npm install -g skew

The compiler command is called "skewc".
Run "skewc --help" to see a list of all available command-line flags.
Example usage:

skewc src/*.sk--output-file=compiled.js--release

Examples

Here are some simple examples to start from.
Each example demonstrates how to do input and output using imported code from the target language:

Create a file called "sparks.sk" that looks like this:

Invoke the compiler to generate JavaScript code.
Any compilation errors will show up here:

skewc sparks.sk --output-file=sparks.js

Create another file called "index.html" to serve the compiled code:

<body></body>
<script src="sparks.js"></script>

That's it!
Open "index.html" in a browser to see the app.
To build a production-ready version of your app, just add the "--release" flag to the end of the compiler invocation:

skewc sparks.sk --output-file=sparks.js--release

Create a file called "calculator.sk" that looks like this:

Invoke the compiler to generate JavaScript code.
Any compilation errors will show up here:

skewc calculator.sk --output-file=calculator.js

That's it!
Run the generated code to see the app:

node calculator.js

Create a file called "calculator.sk" that looks like this:

Invoke the compiler to generate C# code.
Any compilation errors will show up here:

All of these examples use the "dynamic" type for simplicity, but you'll likely want type imports for real work.
Skew will eventually have a package manager and type imports will live there.
I've put some HTML5 type imports on GitHub for convenience in the meantime.

An important thing to keep in mind when developing with Skew is that the compiler does dead code elimination.
This means the compiler won't generate any output if nothing is marked for export.
You can either indicate a "main" function with the "@entry" annotation as is done in the examples above, or you can use the "@export" annotation to ensure certain functions are exported.

To get a better feel for the language, it's probably helpful to take a look at the different examples of Skew code in the live editor in addition to looking at the documentation below.

IDE support

Install the skew package for syntax highlighting, inline errors, type tooltips on hover, and other nice IDE features.
Visual Studio Code is a cross platform Chromium-based text editor that works on Windows, OS X, and Linux and has nothing to do with the original Visual Studio product.
Installing extensions in Visual Studio Code is a little strange. Use
to bring up the in-app command line, then type "ext install skew-vscode" and press enter when it loads.

Follow these instructions to install syntax highlighting.
Right-click and using the "Goto Definition" command sort of works (it looks for all declarations with that name).

Language Reference

This reference is collapsed by default to make it easy to quickly get to different entries.
Either click an entry below to expand it or expand all entries in this section to read them all at once.

Skew has a handful of built-in types with special literals.
The code below is not valid (Skew only allows declarations at the top level) but it demonstrates the various kinds of literals and their types:

Variables are declared with the "var" keyword.
Unlike C-like languages, the type comes after the variable name.
This makes blocks of variables much more readable because the variable name is more important than the variable type.
If the type is omitted, the type is inferred from the assigned value.

var explicitlyTyped int = 0
var implicitlyTyped = 0

Read-only variables are declared with the "const" keyword instead of the "var" keyword.
This is similar to the "final" keyword in Java.
It is not similar to the "const" keyword from C++; the object referenced by the variable can still be mutated.
The compiler just ensures the variable cannot be reassigned.

Constant values can be overridden at compile time using the "--define" flag.
For example, the flag "--define:readOnly=2" would cause "readOnly" to be initialized to 2 instead of 0 in the code above.
This is useful for supporting multiple build configurations through conditional compilation.

Functions are declared with the "def" keyword.
Parentheses are not used for functions that don't take any arguments, both when declaring the function and when calling it.
This cuts out on a lot of unnecessary clutter that is present in other languages while still remaining unambiguous (functions must always be called and cannot be used by value).
Absence of a return type is indicated by just not specifying a return type instead of requiring a special "void" type as other languages like C do.

Multiple declarations can exist for the same function, although there can only be one implementation.
This is sometimes useful when organizing code for clarity and is especially useful when combined with conditional compilation.
It's sort of comparable to forward declarations in C/C++ although they are order-independent and don't need to come first.

Unlike in C, all conditional statements omit the parentheses surrounding the condition and require braces:

@export
def test(a bool, b bool) {
if a {}
else if b {}
else {}
}

If statements cannot be used as expressions.
Use C-style conditional expressions instead:

@export
def test(a bool) int {
return a ? 1 : 2
}

Skew has several different types of loops.
Each loop below loops from 0 to 4 inclusive:

One unusual thing about return statements is the handling of return statements where the value is on the next line.
This case is trivial in semicolon-terminated languages like C because the value comes before the semicolon.
In languages like JavaScript, automatic semicolon insertion gets in the way and causes the code to behave incorrectly (return undefined instead of the value).
Skew doesn't suffer from this issue despite not having semicolons or automatic semicolon insertion.
In Skew, an expression following a return statement is considered to be the returned value.

A lambda expression is an anonymous function that can be stored in a variable.
They use a different syntax than regular functions.
Regular functions declared using "def" are not lambda expressions and cannot be casted to a lambda type since they cannot be stored in a variable in some language targets (Java, for example).
The "=>" symbol is used to separate the argument list and the function body.

Lambda types are specified using the "fn" keyword.
The syntax is similar to the syntax for defining a function except argument names are not included (only argument types) and parentheses are required.

Since packaging up and manipulating little bits of code is so convenient, there are several shortcuts for creating more concise lambda expressions.
The argument and/or return types can be omitted when they can be inferred from context.

A class provides an object template that can be used to create instances of that class.
A constructor is a function inside a class called "new" with no return type.
The constructor's job is to set up the new class instance with initial values for all instance variables.
Use "self" to refer to the class instance.

When not explicitly declared, constructors are automatically generated with one argument for each instance variable without a default value in declaration order.
This greatly simplifies defining objects in many situations.
For example, the above code can be simplified to the code below since the constructor can be generated automatically.
Unlike Java, constructors are just members of the type they construct and don't require a special operator to invoke:

Skew's object model is similar to Java.
A class can inherit from at most one base class and can implement any number of interfaces.
The symbols ":" and "::" are used instead of the "extends" and "implements" keywords.

Overriding a function can be error-prone in the presence of function overloading.
To fix this, overriding a function must be done using the "over" keyword instead of the "def" keyword.
The compiler checks that all functions declared using "over" are actually overriding something.
That way changing the type signature of the overridden function is a compile error unless the type signature of the overriding function is also updated.
The overridden function can be called from within the overriding function using the "super" keyword.

Unlike Java, all type declarations are "open", which means members from duplicate type declarations for the same type are all merged together at compile time into a single type.
This allows for large type declarations to be better organized.
In the example below, the "ChunkedBuffer" type can be made to implement the "Encoder" interface using a separate declaration.

Generics are implemented using type erasure to ensure a compact and readable implementation in the generated code.
The current implementation is pretty simple.
There isn't any type inference yet, and also no advanced features like covariant or contravariant conversions.

An enum is a compile-time integer constant.
Enums automatically convert to ints but ints don't automatically convert to enums.
When referencing an enum value, the enum type can be omitted when it can be inferred from context.
This leaves just a leading "." character.

For convenience, each enum type automatically generates a "toString" function if one isn't present.
Enums can have instance functions just like any other object type.
Instance functions are automatically rewritten as global functions during compilation that take the enum as an extra first argument.

There is also built-in support for integer flags.
Use a "flags" declaration instead of an "enum" declaration.
Unlike normal enums, these special enums get "~", "&", "|", "^", and "in" operators.
The literal "0" also stands for the empty set.

Casting is done using the "as" operator.
These are useful for converting between primitive types and also for downcasting from a base class to a derived class.
Downcasts are unchecked and have no performance impact in dynamic language targets, where they disappear entirely.

String interpolation is the term for a special escape sequence that embeds an arbitrary expression inside a string.
This is a useful shortcut because it will automatically call the "toString" function on the expression.

Protected access works for any named scope including namespaces and interfaces.

All variables and functions inside a class declaration are attached to instances of that class.
To scope global variables and functions inside the class name, put them inside a namespace with the same name as the class.
This does what the "static" keyword does in many other languages.

This works for all types with instances including enums and interfaces.

Skew has quite a few operators.
Many of them are described in detail after this.
The ones that may be unfamiliar and that aren't described in detail are the "**" exponent operator, the "%%" modulus operator, and the ">>>" unsigned right shift operator.
Here's the complete list for quick reference:

An operator is overloaded by declaring an instance function on the target type with that operator name.
For binary operators, the compiler checks for operator overloads on the left operand.
Operator overloading can make code more readable in many cases, but can also dramatically reduce readability when used incorrectly.
Use with good judgement.

Another interesting operator is the "<=>" comparison operator.
It returns an integer that is less than zero if the left operand is less than the right, greater than zero if the left operand is greater than the right, and zero if the left operand is equal to the right.
It is used by the compiler to automatically implement the "<", ">", "<=", and ">=" operators, which cannot be implemented manually.

The "?.", "??", and "?=" operators make handling nullable values easier and safer.
They are inspired by the same operators from C#.
The expression "a?.b" is short for "a != null ? a.b : null", the expression "a ?? b" is short for "a != null ? a : b", and the expression "a ?= b" is short for "a != null ? a : a = b".
The compiler ensures sub-expressions are only evaluated once.

The "==" and "!=" operators cannot be overloaded because they test for reference equality when using generics and this would lead to subtle bugs since generics are implemented with type erasure.
The "&&" and "||" operators cannot be overloaded because of their special short-circuit evaluation behavior.
The "=" operator cannot be overridden since that's too confusing in a language that lacks value types.
If the "=" were overridable, objects would then be copied on assignment but would be passed by reference as arguments to function calls.

The syntax for list and map literals isn't just for the native list and map types; it's also available to user-defined types.
The simplest way of doing this is with special constructors called "[new]" and "{new}" that take lists as arguments.
The compiler puts all expressions inside the list or map literal in a list and passes it to the constructor.

The more flexible way of doing this is with the special "[...]" and "{...}" instance functions.
The compiler constructs a new object of that type and builds a chain of calls to those special instance functions with one call for each element in the list or map literal.
This allows the type to support different types of values in the same literal.

XML-style object initialization syntax is supported as a convenient way to initialize trees of objects.
Each tag constructs an instance of the type corresponding to the tag name.
All attributes in the XML tag become assignments to variables on that instance.
Child elements are appended using special instance functions called "<>...</>".

Wrapped types can also be used to add encapsulation without the overhead of additional allocation.
Instance functions added to wrapped types are automatically rewritten as global functions during compilation that take the instance as an extra first argument.

Another use of wrapped types is to provide a nice object-oriented API on top of an index into an array of data.
This is sort of analogous to pointers in C, although they don't need to all be in the same address space like emscripten and aren't subject to fragmentation issues.
Pointer-style wrapped types can generate very tight code in release mode.

Wrapped types can also be used to create safe immutable wrappers for other APIs.
If they use function inlining correctly, these wrappers won't have any runtime performance overhead.
For example, the following code implements a simple read-only list type with the same efficiency of a built-in mutable list in release mode:

Top-level if statements allow for conditional code compilation.
Like all declarations, top-level if statements are also order-independent and are evaluated from the outside in.
Conditions must be compile-time constants but are fully type-checked, unlike the C preprocessor.
Including preprocessing as part of the syntax tree ensures that there aren't syntax errors hiding in unused code branches.
Constant values can be overridden at compile time using a command-line argument such as "--define:TRACE=true".

Another form of conditional compilation is the "@skip" annotation.
Annotating something with "@skip" means that all call sites and their argument evaluations will be removed at compile time and will not be present in the output.
This is cleaner and less error-prone than using top-level if statements for conditional compilation of functions because all uses are type-checked even when unused.

There is a special type called "dynamic" that tells the compiler to ignore type errors and pass all dynamically-typed code through verbatim to the target language.
It is mainly useful when interacting with external code, especially when targeting a dynamic language.

The "dynamic" type poisons everything it touches.
For example, "x + 1" has a dynamic type if "x" is dynamic because the compiler can't tell what type the expression will be ("x" could hold an int, a double, a string, or even something nonsensical like an object).
This makes it easy to bypass the compiler's type system by casting the value to the "dynamic" type first.

Pros:

Makes it easy to import large libraries without tons of type declarations.

Provides a way to bypass compiler checks when there's a type system mismatch with the target language.
For example, C# supports "in" and "out" modifiers on generic type parameters and Skew does not.

Used by the compiler to silence further errors when an error occurs during compilation (expressions with compile errors always have type "dynamic").

Cons:

Mistakes cause run-time errors instead of compile-time errors.

Most compiler features don't work on dynamically typed code (implicit function calls, operator overloading, function overloading, type wrapping, inlining, constant folding, etc.).
Statically-typed values can still be passed through a dynamically-typed environment but must be casted back to their static type before being used or they likely won't work correctly.

Values of dynamic type can also be used to emit constructor calls.
This is done using the "new" property just like for statically-typed objects.
Notice how function calls in the example below must be explicitly invoked via "()" unlike normal functions.
This is because the compiler has no type information so it can't distinguish functions from variables.

The keyword "dynamic" can also be used as an expression with a property name following it.
This causes the following name to be emitted as an identifier with a dynamic type.
It's mainly useful as an inline shortcut to avoid an imported declaration.

A double is a 64-bit non-nullable floating-point number.
A number literal must have a "." or an exponential "e" in it to be a double, otherwise it's an int.
The non-finite double constants are "Math.NAN" and "Math.INFINITY".

A string is an immutable sequence of unicode code units.
Multi-line strings are allowed without needing any special syntax.
String literals must be double-quoted (single quotes are for C-style character literals, which turn into an int with the unicode code point for that character).
Valid escape sequences are "\0", "\r", "\n", "\t", "\\", "\"", "\'", and hex-style "\xFF".

Strings are stored using the natural string encoding for the target language.
For example, JavaScript uses UTF-16 and C++ uses UTF-8.
All indices are in terms of code units since code unit access can be done in constant time.
To work in platform-independent code points instead, use the code point APIs and the "Unicode.StringIterator" class.

enum Target {
NONE
CPLUSPLUS
CSHARP
JAVASCRIPT
}
# This is set to the current language target
const TARGET Target
# This is true when the compiler is passed the "--release" flag
const RELEASE bool

# This causes a runtime failure when passed "false". Calls to assert
# will be completely removed in release mode, so arguments won't
# be evaluated. This is accomplished with the "@skip" annotation.
def assert(truth bool)

These annotations trigger special behaviors in the compiler for the symbols they annotate.

# Using "@alwaysinline" warns if inlining wasn't possible
def @alwaysinline
def @neverinline
# These cause warnings when the annotated symbol is used
def @deprecated
def @deprecated(message string)
# There can only be one active entry point during compilation. It must take
# no arguments or take a List<string> and return either nothing or an int.
def @entry
# Imported code is assumed to exist and will not appear in the compiled result
def @import
# Exported code isn't dead stripped, minified, or otherwise altered
def @export
# This influences overload resolution for ambiguous matches
def @prefer
# Change the name of the annotated symbol during code generation
def @rename(name string)
# This causes calls to the function to be completely removed after type
# checking. This means the arguments will not be evaluated at the call site.
def @skip
# This is meant to be used for other annotations. When those annotations are
# on a function, "@spreads" causes those annotations to be propagated to
# other functions when that function is inlined inside them.
def @spreads
# This causes a C# "using" statement to be emitted when the symbol is used
def @using(name string)
# This causes a C++ "#include" pragma to be emitted when the symbol is used
def @include(name string)

Compiler Optimizations

The compiler currently includes quite a few different optimizations aimed at reducing size code when compiling to JavaScript to speed up network transfer time.
Most of these passes can be enabled individually with compiler flags but they are all enabled with the "--release" flag.

This reference is collapsed by default to make it easy to quickly get to different entries.
Either click an entry below to expand it or expand all entries in this section to read them all at once.

The compiler uses a complex lattice of rules to compact the generated JavaScript code in release mode similar to the advanced compilation mode in Google Closure Compiler.
However, the compiler can often do better than Google's compiler due to certain language features (explicit integer types, for example).
This reduces application startup time by transferring less data over the network.
The Skew compiler is also a whole lot faster than Google's compiler so advanced optimizations can be enabled by default.

When not enabled as part of the "--release" flag, this pass can be selectively enabled with the "--js-mangle" flag.

The compiler includes a global symbol motion and renaming pass in release mode to reduce symbol name length similar to the advanced compilation mode in Google Closure Compiler.
Namespace nesting is removed and symbols are renamed to the shortest possible names using frequency analysis.
Unrelated properties and local variables can reuse the same names, which makes those names even more common and easier for gzip to compress.
This reduces application startup time by transferring less data over the network.
It also serves as a form of obfuscation if you care about that sort of thing.
Renaming is not done for symbols with an "@import" or "@export" annotation and for names accessed off of the special "dynamic" type.

When not enabled as part of the "--release" flag, this pass can be selectively enabled with the "--inline-functions" flag.

Constant folding does partial evaluation of the code at compile-time.
This often opens up opportunities for further optimizations such as dead code elimination inside the "else" branch of an "if true" statement.
Constant folding works well in combination with with function inlining.

The code above becomes this JavaScript code in release mode.
Notice how all getters have been inlined, the call to the constructor with constant arguments became a single integer literal, and the call to the constructor inside "opaque" has been partially evaluated by performing "r << 16 | g << 8 | b" at compile time:

When not enabled as part of the "--release" flag, this pass can be selectively enabled with the "--fold-constants" flag.

JavaScript only has one set of arithmetic operators and they operate on floating-point values.
CPUs are much faster when working with integers so JIT compilers try really hard to figure out when the code is using numbers that behave like 32-bit integers so they can generate machine code for 32-bit integers instead of floating-point numbers and get a big speedup.

However, this guesswork isn't a perfect solution.
The code is now at the whim of the JIT heuristic engine, which may or may not be able to guess correctly.
These guesses are likely not even consistent between successive runs of the same code due to different optimization decisions being made based on changing timing measurements and/or execution flow.
Missing these optimizations can sometimes cause drastic performance issues, especially inside tight inner loops of math-heavy code.
Even if the JIT guesses that a particular arithmetic operator can be an integer correctly, it often doesn't have enough range information to be able to eliminate bounds checks, so the generated assembly is littered with overflow checks and deoptimization bailouts.

Skew has actual integer operations, even when compiling to JavaScript.
It ensures every integer arithmetic operation is wrapped in a bitwise operation.
All modern JITs omit bounds checks when they can prove the checks are unnecessary, which is always the case for integer addition, subtraction, division, and remainder operations when wrapped in a bitwise operation.
Integer multiplication operations actually can't be done with the floating-point "*" operator because 64-bit doubles don't have enough bits to represent the result, but luckily modern JITs provide "Math.imul" to efficiently do 32-bit multiplication.

When given that JavaScript, V8 produces the following optimized assembly code (stack checks and tagging and stuff is omitted for clarity).
Notice how the operands remain integers throughout the computation:

This transformation is always performed for correctness reasons, even in debug mode.
Floating-point arithmetic operations are not equivalent to their integer counterparts and cannot be substituted without compromising the semantics of the code.

Virtual functions have overhead, mostly because they prevent inlining.
Devirtualization detects functions that are unnecessarily virtual and rewrites them to global functions instead, which can then be optimized further.
It is why function inlining works on instance functions.

The code above becomes this JavaScript code in release mode.
Notice how the call through the IOne interface, which only has one implementation, was completely devirtualized and inlined.
This is only possible because the IOne interface was able to be removed.

When not enabled as part of the "--release" flag, this pass can be selectively enabled with the "--globalize-functions" flag.

This removes unused code at the function level.
The compiler starts from functions with the "@export" and "@entry" annotations and finds all transitive dependencies.
Any code that wasn't reached is treated as dead and removed from the output.
This reduces application startup time by transferring less data over the network.
This is also enabled in debug mode because lots of extra unused library code makes debugging harder.