Throughout this blog, * denotes a notation heavy segment in the talk. These can be difficult to summaries quickly without typesetting. See the authors paper online for now and I'll try to find speaker slides at the later time.

PLMW Organizers - Intro

This is the 4rd PLMW workshop and we are very pleased to see you all. Mentoring workshops are now being much more common. This year we have funded 75 students. We would like the thank the speakers, sponsors and organisers (including Annabel :)

Video from Derek

POPL proceedings are now available online

Nate Foster (Cornell University) - You and Your Graduate Research

Prelude:

- doing a PhD is not easy, many do not finish. This is a survivors story
- some of my comments many be US specific
- acknowledgements (including Jorge Cham from PhD comics :))

Motivation

THis is useful from everyone, not just undergraduates.

Why you shouldn't do it?
- money: nick shows a graph of salaries in university, football coaching is clearly the right way to go for a big salary.
- start-ups: Tech start-ups are now a top topic, Oculus VR example and you don't need a PhD for this
- respect: Education is highly valued and there's respect from degrees, but no one will call you doctor exact your mom.
- to stay in school: Undergraduate was fun so lets stay in school, a PhD program is totally different.
- being a professor: Let all become a professor, demonstration of the numbers of academics verse other careers

Why you should?
- opens opportunities
- cool jobs in industry: leadership, work on cool problems, you can work on large systems and with real users
- freedom: applies on any level, freedom to choose interesting problems, opportunities to work on the big problems

What is a PhD?

Comparing degrees at different levels:
- high school: basics for life
- undergrad: broad knowledge in a field
- professional: advanced knowledge and practical skills
- PhD: advanced knowledge and a research contribution.

PhD is open edge

PhD is a transformation - we start with intelligent people who aren't researchers. Its a apprentice based scheme to being a researcher. Its not an easy or painless process.

PhD Success

pick an institution
The community that your in will shape your work,
Very important factors include advisor, opportunities and peers. Typically less important factors are finances, institution and location.

Dive into Research
Don't be distracted by other things e.g. classes and teaching
Pick great problems

"It is better to do the right problem the wrong way than the wrong problem the right way" - Hamming

Stay Engaged
Working with others means it easier, as you can motivate each other.
Know when its time to switch topics - good research are versatile and able to switch quickly
Independence - you need to know when the time is right to go rogue and ignore your adviser
Reaching escape velocity to graduate

Peter Müller (ETH Zürich) - Building automatic program verifiers

verification: Given a program P and a specification S, prove that all execution of P satisfies S
automatic verification: Develop an algorithm that decide whether P satisfies S

We can't prove halting => we are finished :)

Lets start again ...

Peter demonstrates of verification tool called Viper, using the typical account balance and transfer example. The verifier gives two error, we add preconditions and the verifier successes. We then use the verifier for thread safety, again fixing a concurrency bug with locking.

Automatic verification is not b/w. We can have semi-automatic verification: we can guide the verifier. The complexity of the code varies greatly, most success in the area focuses on a small area. We build verifiers from weaker guarantees. Verification can working with modularity (or maybe it will not). Varying complexity of properties.

7 Stage process

Define the research goals
What is the state of the art?
Find examples the area in the problem space that is not currently handled (well) by current verifiers.

Find Reasoning Principle
How to explain the correctness of the code to a friend? For the loop, we would use a loop invariant.
This is very different to techniques like model checking, I wouldn't say to friend "I have checked 15 billion possible evaluation paths and are all valid".

Break Down Arguments
Decompose the correctness argument: what do I need to prove for each program and what can I reuse between programs. How can I modularise the program.

Design Specification Language
Designing a language to express intended behaviour, we must know who the user will be and their level of experience. Find the right compromise between expressibility and simplicity, that is right for the intended audience.

Design Program Logic
Determine which properties need to be checked and which properties may be assumed.

Evaluate the Solution
Firstly an experimental evaluation, does the solution work on the example code you were executing. Then he meta-theory like the soundness proof.

Research Direction
Either develop now technical for new languages and feature (e.g. the recent interest in event-bases programming due to android.) or work in general infrastructure for utilise in many areas.

Frank Pfenning (Carnegie Mellon University) - Proof theory and its role in programming language research

I will show you why every PL researcher needs a bit of proof theory.

How do we write correct programs ?
We don't :)

In practice, programming and informal reasoning go hand in hand, we use mental logical assertions, e.g. after calling sort the list is sorted. It's vital to decompose the problem into part so we can reason locally.

We need to develop programming language with the logic of the programmer in mind. Think about your least favourite programming language and why the operational or logical model of the languages doesn't agree with you. Reasoning is an integral part of programming. We need to co-design the language and logic for reasoning about it.

Logic is computation so the key is to design coherent logical and operations semantics.

Demonstration of the naive algorithm to compare equivalence by walking through the stages and comparing. This algorithm is looking for bi simulation. This algorithm has quadratic complexity. Instead we can used HK (Hopcroft and Kerp) algorithm.

[I got a bit lost in the theory of coalgebras for the rest of the talk, sorry]*

]]>http://www.syslog.cl.cam.ac.uk/2015/01/14/programming-languages-mentoring-workshop-plmw/feed/0New Directions in Operating Systemshttp://www.syslog.cl.cam.ac.uk/2014/11/25/new-directions-in-operating-systems/
http://www.syslog.cl.cam.ac.uk/2014/11/25/new-directions-in-operating-systems/#commentsTue, 25 Nov 2014 14:51:45 +0000http://www.syslog.cl.cam.ac.uk/?p=1871Notes from the New Directions in Operating Systems 1 day conference at the Shoreditch Village Hall in London.

Rump kernels and {why,how} we got here

Writing drivers and filesystems is hard. Want to reuse this existing code, continuing to benefits from updates. A rump kernel runs on a hypervisor. A normal OS kernel with most of the code removed. Does not provide threads, scheduler, exec, VM. Can therefore run anywhere and integrates with anything. Some glue code connects to hypervisor. Application can add libc, etc, if desired. The syscall interface contains much useful logic (e.g. path resolution), so want to keep that too.

Was able to compile to JavaScript and debug driver using FireBug. Fetched a web-page in the Linux kernel. Also runs on Xen (Mini-OS) and bare metal.

Genode – OS security by design

Reduce TCB (Linux has a very large attack surface, lots of code running with high privilege).

Accountability and utilisation. Linux tries to provide an illusion of unlimited resources, with hacks like the OOM killer.

Usable security. e.g. SE-Linux is unusable.

A GENODE system is structured as a tree, with parents owning their children. Parents provide resources to their children from their own resources and control what the children can do. Each application only needs to trust those components it depends on (e.g. its ancestors and services it uses).

Combined with virtualisations, can run e.g. Linux and BSD as processes. Also takes components from many places: Linux TCP stack, Rump kernel filesystems, etc, each running as a GENODE process.

Security is based on object capabilities. e.g. a child can tell its parent that it provides a service (e.g. the GUI) and other children can retrieve them. For example, xterm asks for access to the GUI. It’s parent, the user session, tags the request with the application name and passes it to its parent (init), which forwards it to the GUI. Once the capability is granted, the application can use it directly without going via its parents.

Processes can trade resources. For example, a client can make a request on a server and pass the server the resources needed for the request.

VirtualBox has been ported (shows it booting Windows). Not optimised yet, but was able to play a video under it (slightly jerky).

Using a modified libc, can run a Unix environment without needing a virtualised kernel. Uses Vi to edit the GENODE background, showing integration with the rest of the system.

Runs a web browser (Qt app). The File Open dialog shows only a few files - the browser can’t access anything else. Browser can run GENODE processes as (isolated) plugins. Uses a nested GENODE GUI to run a 3D demo. The plugin is also isolated from the browser - the browser can’t get key presses from it (not sure how this works).

Current focus is making it usable as a dev system, adding a capability GUI.

New ideas about old OS security

- embedded
- ubiquitous
- many computers per person rather than many users per computer
- security still neglected
- software liability still disclaimed
- lots of malware

Lots of access control models. Hard for OS vendors to decide what to implement. => Linux LSM, FreeBSD MAC, etc. LSM not very composable, but MAC is. If policies disagree, it uses the most restrictive one.

Filtering system calls is not a good approach, due to concurrency. Instead every kernel component talks to the MAC framework. Attach labels to many kernel objects. They are useful for many different security frameworks.

MAC originally controlled access by users, but on modern devices it’s applications that are the key subjects.

Debugging security is hard. Need good tracing. DTrace useful here.

Capsicum. Applications get compromised. Want to contain the attacker. Object capabilities for FreeBSD (extends POSIX by removing ambient authority). File descriptors become capability. Now a production feature in FreeBSD 10. Linux patches have been posted to linux-kernel. Linux seccomp required 11,300 lines of code, compared to 100 on Capsicum (doing what?).

Libraries can use sandboxing even when their application doesn’t.

Compartmentalization is Neat!
- but different ways to break things up
- has overhead
- hardware does this badly
- so, put capability support into the architecture!
- now, userspace sandboxing with little os intervention
- BERI/CHERI
- can we do fine-grained cpu-supported compartments and not touch kernel?
- How does tagged memory interact with virtual-memory systems?
- How should syscalls work? Sould be blox them?
- How do signals work?

Configuration at a distance
- No physical access.
- No remote access.
- Not user-like. Programmatic.

Increasingly managing higer-level systems
- Not individual machines
- Servers are cattle not pets (fields and farms)
- Autoscaling groups (AWS)
- Mesos is a system which does this aggregation
- An operating system for a data center
- Kubernetes
- An ocean of user containers
- Scheduled and dynamically packed into nodes

Software installation done
More interesting to control installed software
- start a docker container
- run app on mesos
- set up security groups on AWS
- digitalOcean

I want a pony
- Managing an autoscaling CoreOS/Atomic cluster in AWS
- with etcd/consul
- immutable instances
- with the network in VPC/Weave
- with docker container arranged by Kubernetes
- All from Puppet DSL

Conclusions
- Future here but not evenly distributed
- These tools look like the future
- Manage not just provision
- In Search of Certainty

Exploring a new way to manage systems with ostree and Atomic project

Ostree: read-only system in /usr (with symlinks to /var for mutable stuff). To upgrade, you must reboot. Can also reboot to old version. “Git for filesystem” (can branch, but not diff).

Want to protect base system from containers for OpenShift. Also, protect containers form each other. Uses SE-Linux for this (svirt).

Example: setting up a static web service, plus wiki and cache. Will put everything in a different container. Web designer used Fedora 20, so want to use the same. Want to isolate mediawiki from the network, except for the cache service. Using S3 storage for mediawiki. Want to make sure that updates are tracked, not just hacks on the production server. Everything is described in a JSON file (not shown).

Hard to play around with transport layer if you have to recompile the kernel every time.

Example: switch positions of TCP port and sequence number headers. If both endpoints are modified they will still work, but to monitoring systems on the path it will be difficult to track them (looks like the port keeps increasing).

Network Stack in Userspace

It takes a long time to get a new network feature (e.g. a TCP option) deployed and widely used. In the filesystem world, FUSE makes it easy to test new features easily, even though performance isn’t great.

Options:

Containers: but, the network stack is still shared with the host

Library OS

NUSE is a library OS version of a normal network stack. Patch to the kernel tree adds new arch “arch/sim”. Hijacks application’s system calls to redirect to NUSE so that existing apps don’t need to be changed.

Jitsu: Just-in-Time Summoning of Unikernels

In distributed systems, complexity is the killer. We need to get rid of all these layers. We all depend on infrastructure that suffers security breaks again and again. Let’s get rid of POSIX and just keep network protocols as the standards.

Example: the slide stack is being server from a little ARM box (attached to the laptop by a network cable).

Although Mirage often runs of Xen, Mirage is more general than that. e.g. there is a JavaScript backend. You can start by building against Unix with Linux sockets, then remove this and use direct stacks.

Mirage aims to be contained, compact and efficient. And type-safe.

Each Xen guest has a flat memory layout and uses a single core. We trust all the code running in the unikernel, relying on type-safety, and try to protect against external threats.

Some common unikernels (DNS server, web server, etc) are around half a MB (smaller with dead-code elimination). Unikernels boot fast enough that they can be used like processes. Small enough to store the binaries in Git. Example: git pull to update web-site. Can git bisect to track down problems. Can detect a commit to a Git repository fixing a security bug and recompile within seconds.

How to program computers (kos)

Kos is a tiny OS written in K. Very fast by avoiding bloat. Terse syntax. KDB database written in K. Violates what is generally considered best practices in programming - food for thought.

DIOS, a distributed operating system for your data centre

Malte Schwarzkopf, University of Cambridge

Too many abstractions are bad for scalability, scheduling, data locality, data-flow tracking security, optimisations. Get away from POSIX - make an API for datacentres. Distributed OS.

DIOS adds 11 syscalls to POSIX. UUIDs for global object names. Kernel provides objects, but provides additional data about them. e.g. where stored. Can be used for optimisations. DIOS runs in the Linux kernel, but is portable enough that it could be ported to BSD.

Example: loads dios kernel module and uses it for map reduce (word counting). Could run across machines, but for demo all running on one laptop.

Trying to make an OS popular: a cautionary tale

Experience doing marketing for Ubuntu (to 2012). Ubuntu’s founding goal was originally to fix bug #1 - the dominance of Windows. Wanted to centralise everything around Ubuntu/launchpad and bring open source to the masses.

Plan was Canonical would work closely with manufacturers.

Problems:

OSS communities suspicious as launchpad not open source.

Git beat bzr.

PS makers scared of MS.

Intel had nothing to gain.

Public indifferent. Hard to interest people outside the core enthusiasts.

Ubuntu diversification confused people.

Lessons:

Focus on a clear message.

Beware of big companies setting the direction. e.g. Ubuntu-for-TV is not a community driven idea.

Don’t rely on other people to finish your project when it’s not their key focus.

Unifying the two? -- concurrent programs are easily parallelized: should we use the same primitives?Java, C# and others do but it's a bad idea. At scale, death by context-switching.

Fibers for concurrency (cheap! -- have millions) and Domains for parallelism (expensive! -- have ~#core ones)

Concurrency primitives that are proposed are non-monadic. MVar: a blocking variable. Can use MVars to define an "async" function.

Demo using a Fibonacci function: One recursive call is done using "async". Parallelize it to 48 cores -- get diminishing returns as expected, but the speedup is impressive.

Q: Do you keep spawning even when no domains are free?
A: Yes. Spawning creates a new fiber which is extremely cheap (minor allocation of 30 words).

Q: Can you manipulate domains from within the program?
A: Yes.

Multiple domains run in parallel; fibers are balanced between domains. Creating domains is heavyweight.

OCaml is very fast for immutable data and functional programs -- try to keep this unchanged. Mutability is more complicated in a multithreaded system! Use a descendant of Doligez-Leroy.

Each domain has a private minor heap; no pointers exist between minor heaps. The major heap is shared. The major heap can point into minor heaps. Can do completely independent minor collections. Shared heap: mostly-concurrent, inspired by VCGC.

Status: bootstraps, but GC needs more testing, tuning and benchmarking. Bytecode only at the moment. Weak references, finalizers etc. are still missing.

Q: Are you exposing a raw memory model?
A: We'll probably settle on enforcing an ordering of stores by using a memory fence on platforms which don't support this natively.

Q: Your collector stops all of the mutator sets. Is there any way of doing collection in parallel with mutators?
A: Have collection in parallel, then stop the world to verify (?)

Q: If fiber does a blocking system call, does that block the entire domain?
A: Yes at the moment, but there are plans to change that.

Q: Can what you are doing be used in monadic libraries in LWT or Async?
A: Yes but that would probably not a good idea because users of these libraries don't assume that their code is running concurrently.

Q: Is it possible to make sure that a particular fiber is in the same or is not in the same domain as other fibers?
A: Priorities, affinities to domains etc. are very interesting and we are just starting to provide these.

Ephemerons meet OCaml GC, by François Bobot (CEA)

François Bobot presenting "Ephemerons meet OCaml GC"

Memoization -- computing a function with a cache while avoiding memory leaks: if a key is not needed anymore, we want to remove the entry from the cache. Particular case: heap-allocated keys not needed anymore means not reachable (apart from the cache).

Porting process: used old Python and new OCaml parts together; communication via JSON. Literal translation first; refactor later in OCaml.

Results: similar LOC, 10x faster, reliable, great community!

Q: Can 0install deal with multiple versions of the same package?
A: For OCaml packages, use OPAM.

Q: How often did you run into type errors when porting to OCaml?
A: Porting the SAT solver was a little tricky. None and null are handled better in the OCaml code and all kinds of network errors are handled now.

Q: Are you still using objects?
A: Currently getting rid of objects and making things more immutable.

Transport Layer Security purely in OCaml (*), by Hannes Mehnert (presenting, University of Cambridge), David Kaloper Meršinjak (University of Nottingham)

Authentication (X.509): client has set of trust anchors (CA certificates); server has certificate signed by a CA; during handshake client receivers server certificate chain; client verifies that server certificate is signed by a trust anchor.

C: The team that discovered the Heartbleed bug ran their test framework against this library, and found five bugs in Linux but none in this library.

Q: What about timing and interaction with GC?
A: Best practice is not to use and data-dependent branches and to use the same memory access patterns. No data dependent allocations. But the preferred way is to do this in the language.

Q: How about tests?
A: We have a test suite but we'd like to automatically generate tests and run them against other implementations to be able to do comparisons.

Haskell has type classes, purity, laziness where OCaml is different. But what's really different is library search: Hoogle. Can search by name, by type, or both.

OCamlBrowser, OCaml API Search are very limited so Jun built OCamlScope.

OCamlBrowser works only for locally compiled code, uses OCaml typing code. OCaml API search was based on OCamlBrowser but has been discontinued.

Previously there were many problems with trying to build an OCaml API search. Now cmt/cmti files give compiled AST with locations, and OPAM gives unified installations; compiler-libs make it easier to access OCaml internals.

Classic programming problem: in let x = a in b, we want to catch exceptions raised by a but not those raised by b. In OCaml 4.02, the match .. with .. construct is extended to analyse both the value of a and exceptions raised during a's evaluation.

This binding is traditionally treated like any other definition. This can cause subtle type errors with applicative functors; accesses sometimes go through more indirections; linking problems.

In OCaml 4.02, these constructs are treated specially during type-checking, compilation and linking.

Discussion of the application of this to libraries.

3. Mutability of strings

For historical reasons, the string type is mutable in place and has two broad uses. It can be used to represent text, or as an I/O buffer (mutable). OCaml programmers usually keep these two uses distinct. But mistakes happen, and user code may be malicious.

In the Lafosec study (ANSSI), there are a few examples of OCaml code where one can mutate someone else's string literals, or make a Hashtable key disappear.

New solution by Damien Doligez: in 4.02, there are two base types -- string for text, and bytes for byte arrays. By default, the two are synonymous but incompatible if option -safe-string is given.

In default mode, all existing code compiles but get warnings when using String functions that mutate in place.

In -safe-string mode, the values of type string are immutable (unless unsafe coercions are used); I/O code and imperative constructions of strings need to be rewritten to use type bytes and library functions from Bytes.

OPAM 1.2: "The Platform Release" -- solver errors are explained in plain English rather than boolean formulae, more expressive queries (reverse dependencies and recursive); can clone the source code and repo file for any OPAM package.

Steady growth in the number of released OPAM packages. Growth in number of contributors is much slower.

One More Thing: Assemblage -- an very alpha eDSL to describe OCaml projects. OCaml as host language with Merlin auto-completion. Can introspect the project description to generate build rules. Very lightweight, integrates easily.

Q: Do you have anything to help with ARM cross-compilation?
A: Global build "glue" still required to support cross-compilation (Assemblage will help with this)

Q: Any platform is a (false?) promise to define a set of supported libraries. What's OPAM's take on this?
A: We as the developers of the platform don't want to be the people who define what the current set of packages considered as "stable" is. We just build the tools.

Results of the experiment: 18 new contributions (patches) from people who probably wouldn't have sent their patches in the old system, 18 new reviews! So Github attracted a new crowd of contributors.

Some Github pull requests were handled very quickly (small fixes and changes). Developer meetings help taking decisions. Pull requests are most effective during initial development: before the feature freeze (after that the team is busy with getting the release out).

Negative results: to be effective, external users should be told more about the development cycle. Github was used mostly by people used to git.

What if you had a distributed database with git-like semantics? -- E.g. commit history

Problem: merges!

Have an Obj and a Git backend. Implementations of various persistent data structures with merge function: prefix trees, mergeable queues, mergeable ropes.

A Case for Multi-Switch Constraints in OPAM, by Fabrice Le Fessant (INRIA)

Fabrice Le Fessant making the case for multi-switch constraints in OPAM

OPAM: the official way to install OCaml packages. Builds a CUDF universe with packages available for the switch, and the dependency constraints between these packages. Send the universe to the CUDF solver, then apply the solution to the switch.

Multi-switch constraints: add a switch prefix to each package name. allow dependency constraints between packages with different switch prefixes

Nullable type t? includes NULL and values of type t. Provide type inference algorithm featuring HM polymorphism that statically guarantees that NULL cannot be used as a regular value (algorithm's soundness has been proved).

Effects descriptor: a set of atomic effects. Inference: infer types using the OCaml compiler, then bottom-up analysis; fixpoint for mutually recursive definitions. Then represent effects as monads in Coq.

There is one monad per descriptor of effects -- how do we compose them? Impossible in general, but doable when we restrict the shape of the monads.

Conclusions: hard to make compiler work on real examples; functors are complex!

Q: The granularity of effects matters a lot when you use this to prove properties of your OCaml code.
A: Yes.

Q: How can I use this to prove properties about recursion -- e.g. whether they are tail-recursive?
A: (?)

Q: Do you handle local references?
A: Not at this moment.

Q: You only support part of the Pervasives library. Which parts do you not support?
A: Can't currently pass effectful functions as arguments, so for example List.iter is not supported in the usual use case.

GPGPU programming. Two main frameworks: Cuda and OpenCL. Use different languages to write kernels (Assembly or C/C++ subset) and to manage kernels (more or less any general purpose language can be used here).

SPOC compiles to Cuda or OpenCL. It contains the Sarek DSL and a runtime.

This is all possible with OPAM 1.2! (Check opam --help, look for --criteria)

You can help: try different profiles, test expressivity of the preference language, help debug OPAM.

Conclusion: package managers are complex; a very hard part is dependency solving! Modern package managers must share common components, in particular dependency solvers and the user preference language.

Q: Does this keep track of explicitly installed packages?
A: Yes, OPAM knows which packages were installed directly (the "root set") and which were installed in order to satisfy dependencies.

Q: Why don't other package managers use this technology now?
A: Package managers are core parts of any system so people tend to be very resilient to change.

Simple, efficient, sound-and-complete combinator parsing for all context-free grammars, using an oracle (*), by Tom Ridge (University of Leicester)

The P3 combinator parsing library. Can handle all context-free grammars (CFGs). Scannerless or can use a lexer. Good theoretical basis, but slow.

Example given.

Memoized counting -- you can't do this with any other parser!

Compute actions over all good parse trees; there are exponentially many such parse trees, but this doesn't have to take exponential time!

Disambiguation: directly encode in code with using "option".

Supports modular combination, e.g. "helper parsers".

Comparison with Happy: it's faster, in some cases ridiculously so.

Long term aim: general parser, verified correctness and performance, usable in the real world.

Welcome

Welcome, this year we worked closely with the OCaml workshop to give the two workshops they're own focus, this workshop is more theory focused then the OCaml workshop. This workshop doesn't just include features in current SML, but also features that could be included in the future. We welcome all ML's from the ML family. We will be publishing the proceeding and they will be available free online.

Session 1: Module Systems

Chair: Didier Rémy

Andreas Rossberg

Abstract: We propose a redesign of ML in which modules are first-class values. Functions, functors, and even type constructors are one and the same construct. Likewise, no distinction is made between structures, records, or tuples, including tuples over types. Yet, 1ML does not depend on dependent types, and its type structure is expressible in terms of plain System F-omega, in a minor variation of our F-ing modules approach. Moreover, it supports (incomplete) Hindley/Milner-style type inference. Both is enabled by a simple universe distinction into small and large types.

We are rethinking the design of ML. We talk about ML as a language, its actually at least 2.5 languages. There is the core language and the module language. There's even a 3rd language of type expression. This are all syntactically different.

Who likes the type expression syntax in ML ? No-one

Modules in ML, as 2nd class. Modules are more verbose but powerful, leaving us with some difficult decisions. There has been work in packing modules as fist class values.

We want a classless society !
I want the expressive power of modules, with style.

I propose 1ML - first class modules language/ unified ML

No more staring with core language and add modules constructs. Instead we start with a module language and then add some core constructs.

Example of explicit typed 1ML: Similar to ML plus two functor types, one for pure and one for inprue. Type declaration syntax is replaced with anonymous modules.

Examples of types as modules: This common in module IL's, its not too new.

This extends to type constructors as applicative functors.

The unavoidable map functor example :)

A more interesting example: The textbook example of selection something at runtime. This could be done in OCaml but it would be much more verbose.

Example of collection to demonstrate associated types. Examples of how this can again be achieved in OCaml but its still more verbose.

Example code using higher kinded polymorphism, which couldn't be do in OCaml

I am using F-ing modules semantics and collapsing syntax and semantics.

Abstract: We promote the use of type-level module aliases, a trivial extension of the ML module system, which helps avoiding code dependencies, and provides an alternative to strengthening for type equalities.

Type-level module aliases - a simple feature in OCaml 4.02

Writing ```module S = String``` in signature will alias S to string.

Introducing the new inferred type-level module aliases and how this extending the subtyping relation.

The concept of type-level modules alias appears first in Traviatta (ICFP '06), used to allow type interface of recursive modules.

Later we discovered that type-level modules aliases were a good match for OCaml-style applicative functors.

Helping applicative functors was probably not enough to justify a new feature. However remember that the original goal was to simplify program analysis.

Modules as name spaces: Sharing of types allow grafting a modules somewhere else in the hierarchy.

This idea only working in theory: for separate compilation things backtrack. We have a risk of name conflict in are libraries as forest of modules. -pack is monolithic, using modules as name spaces creates large interfaces (e.g. JS core).

How to other language handle this issue: SML using a compilation manage which use special files using a dedicated syntax.

Example of manual packing

Now adding type-level module alias. We do not need to duplicate the module signatures.

Induced dependencies - a new compilation flag -no-alias-deps for 4.02.

3 Steps:
- Create a mapping unit whose role is only to map short names to prefixed names
- open this unit
- create an export unit

Compared to using -pack, this approach requires rename source files to unique name and add an open statement.

In core/async this divides the size of compiled interfaces by 3,

Currently, type level modules aliases can be created only for a limited subset of module paths, this excludes feature like functor applications, opaque coercions, functor arguments and recursive modules.

C: This also reduces required false dependences.

Q: What's the meta theory of this ?
A: This wasn't part of this work but were working into normalisation in the OCaml compiler

C: We are also using kinda thing for backpack in Haskell

Q: Currently we use ocamldep, what will happen to this tool ?

Session 2: Verification

Chair: Anil Madhavapeddy

Well-typed generic smart-fuzzing for APIs (Experience report)

Abstract: In spite of recent advances in program certification, testing remains a widely-used component of the software development cycle. Various flavors of testing exist: popular ones include unit testing, which consists in manually crafting test cases for specific parts of the code base, as well as quickcheck-style testing, where instances of a type are automatically generated to serve as test inputs.

These classical methods of testing can be thought of as internal testing: the test routines access the internal representation of the data structures to be checked. We propose a new method of external testing where the library only deals with a module interface. The data structures are exported as abstract types; the testing framework behaves like regular client code and combines functions exported by the module to build new elements of the various abstract types. Any counter-examples are then presented to the user. Furthermore, we demonstrate that this methodology applies to the smart-fuzzing of security APIs.

I work for a start-up, work on hardware security models (HSM). Some hackers have been about to steal lots of money due to issues with HSM.

Cryptosense Workflow

Testing -> Learning -> Model-checking -> ...

We will focus on Testing, we automatically test API's using QuickCheck (a combinator library to generate test case in Haskell), But to generate tests we need to know how to generate our types/data structures.

Improving the CakeML Verified ML Compiler (Research presentation)

Abstract: The CakeML project comprises a mechanised semantics for a subset of Standard ML and a formally verified compiler. We will discuss our plans for improving and applying CakeML in four directions: optimisations, new primitives, a library, and verified applications.

Introducing the CakeML team, we are verification people.

Why ML for verification: its clean and high level, its easy to reason about so can we then formally reason about it.

CakeML: a subset of ML which is easy to reason about, with verified implementation and demonstrate how easy it is to generate verified cakeML

How can we make proof assistant into trustworthy and practical program development platforms.
functions in HOL -> CakeMl -> Machine code

2 part to this talk:
- verified compiler
- making the compiler better

Highlights of CakeML compiler:
Most verified compiler work from source code to AST to IR to bytecode to Machine code. We take a difference approach and can do both dimensions in full.

The compiler phases are simple, we use only a few hops from IR to x86. Bytecode simplified proofs of real-eval-loop but made optimisation impossible.
We are now add many more IR's to optimise, like common compilers. Currently we only have x86 backend but we are adding ARM etc..

C: Inlining and specialisation of recursive functions are the key optimisations that you should focus on

C: You could use your compiler (written in cakeML) to evaluate the value of various optimisations

Q: Why don't you support functors ?
A: Its not a technical issue, this just a lot of work and its at the top of the priorities stack.

Session 3: Beyond Hindley-Milner

Chair: Jacques Garrigue

The Rust Language and Type System (Demo)

Felix Klock; Nicholas Matsakis (Mozilla Research)

Abstract: Rust is a new programming language for developing reliable and efficient systems. It is designed to support concurrency and parallelism in building applications and libraries that take full advantage of modern hardware. Rust's static type system is safe and expressive and provides strong guarantees about isolation, concurrency, and memory safety.

Rust also offers a clear performance model, making it easier to predict and reason about program efficiency. One important way it accomplishes this is by allowing fine-grained control over memory representations, with direct support for stack allocation and contiguous record storage. The language balances such controls with the absolute requirement for safety: Rust's type system and runtime guarantee the absence of data races, buffer overflows, stack overflows, and accesses to uninitialized or deallocated memory. In this demonstration, we will briefly review the language features that Rust leverages to accomplish the above goals, focusing in particular on Rust's advanced type system, and then show a collection of concrete examples of program subroutines that are efficient, easy for programmers to reason about, and maintain the above safety property.

If time permits, we will also show the current state of Servo, Mozilla's research web browser that is implemented in Rust.

Motivation: We want to implement a next gen web browser servo

See the rusty playpen online http://www.rust-lang.org/

We want a language for systems programming, C/C++ dominate this space. We want sound type checking. We identify classic errors and using typing to fix them.

Well-typed programs help to assign blame.

Systems programmers want to be able to predict performance.

Rust syntax is very similar to OCaml. Types have to explicit on top-level functions. Rust has bounded polymorphism, we don't have functor, this is more similar to typeclasses.

Value model between OCaml and Rust is very different. Rust in-lines storage but must pick the size of the largest variant.

To move or copy ? Move semantics

The Copy bound expresses that a type is freely copyable and its checked by the compiler.

Doing web-based data analytics with F# (Informed Position)

Abstract: With type providers that integrate external data directly into the static type system, F# has become a fantastic language for doing data analysis. Rather than looking at F# features in isolation, this paper takes a holistic view and presents the F# approach through a case study of a simple web-based data analytics platform.
Experiments can have a life of its own independent of a large-scale theory. This is a relevant case study, theory and language independent and demonstrates a nice combination of language features

Q: These language feature are good for specific application, do they make the language worse for other features ?
A: No the extensions use minimal syntax

Q: Is there version contrainting on type providers for data source ?
A: No

The rest of the posts will follow soon

Session 4: Implicits

Chair: Andreas Rossberg

Implicits in Practice (Demo)

Nada Amin (EPFL); Tiark Rompf (EPFL & Oracle Labs)

Abstract: Popularized by Scala, implicits are a versatile language feature that are receiving attention from the wider PL community. This demo will present common use cases and programming patterns with implicits in Scala.

Modular implicits (Research presentation)

Leo White; Frédéric Bour (University of Cambridge)

Abstract: We propose a system for ad-hoc polymorphism in OCaml based on using modules as type-directed implicit parameters.

Abstract: In this talk, we will go through how MirageOS lets the programmer build modular operating system components using a combination of functors and metaprogramming to ensure portability across both Unix and Xen, while preserving a usable developer workflow.

Compiling SML# with LLVM: a Challenge of Implementing ML on a Common Compiler Infrastructure (Research presentation)

Katsuhiro Ueno; Atsushi Ohori (Tohoku University)

Abstract: We report on an LLVM backend of SML#. This development provides detailed accounts of implementing functional language functionalities in a common compiler infrastructure developed mainly for imperative languages. We also describe techniques to compile SML#'s elaborated features including separate compilation with polymorphism, and SML#'s unboxed data representation.

Session 6: No longer foreign

Chair: Oleg Kiselyov

Akinori Abe; Eijiro Sumii (Tohoku University)

Abstract: While advanced type systems--specifically, dependent types on natural numbers--can statically ensure consistency among the sizes of collections such as lists and arrays, such type systems generally require non-trivial changes to existing languages and application programs, or tricky type-level programming. We have developed a linear algebra library interface that guarantees consistency (with respect to dimensions) of matrix (and vector) operations by using generative phantom types as fresh identifiers for statically checking the equality of sizes (i.e., dimensions). This interface has three attractive features in particular.

(i) It can be implemented only using fairly standard ML types and its module system. Indeed, we implemented the interface in OCaml (without significant extensions like GADTs) as a wrapper for an existing library.

(ii) For most high-level operations on matrices (e.g., addition and multiplication), the consistency of sizes is verified statically. (Certain low-level operations, like accesses to elements by indices, need dynamic checks.)

(iii) Application programs in a traditional linear algebra library can be easily migrated to our interface. Most of the required changes can be made mechanically.

To evaluate the usability of our interface, we ported to it a practical machine learning library (OCaml-GPR) from an existing linear algebra library (Lacaml), thereby ensuring the consistency of sizes.

SML3d: 3D Graphics for Standard ML (Demo)

John Reppy (University of Chicago)

Abstract: The SML3d system is a collection of libraries designed to support real-time 3D graphics programming in Standard ML (SML). This paper gives an overview of the system and briefly highlights some of the more interesting aspects of its design and implementation.

Closure

and we headed of to the ICFP Industrial Reception ...

]]>http://www.syslog.cl.cam.ac.uk/2014/09/05/ml-family-workshop/feed/0ICFP 2014: Day 3http://www.syslog.cl.cam.ac.uk/2014/09/03/icfp-2014-day-3/
http://www.syslog.cl.cam.ac.uk/2014/09/03/icfp-2014-day-3/#commentsWed, 03 Sep 2014 05:16:41 +0000http://www.syslog.cl.cam.ac.uk/?p=1819Good Morning from the 3rd and final day of the 19th International Conference on Functional Programming, as ever Leo and I (Heidi) are here to bring you the highlights.

Keynote (Chair: Edwin Brady)

Depending on Types

Stephanie Weirich (University of Pennsylvania

Is GHC a dependently type PL ? Yes*

The story of dependently typed Haskell ! We are not claiming that every Agda program could be ported to Haskell

Example: Red-black trees (RBT) from Agda to haskell

All the code for today's talk is on github: https://github.com/sweirich/dth

Example of insertion into RBT from Okasaki, 1993.

How do we know that insert preserves RBT invariants ?
We are going to use types

Examples RBT in Agda from Licata. We can now turn this into GHC using GADT's and datatype promotion.

Example: repository is an integer; allow adding one. Everything works out nicely.

Next example: repository is a natural number; allow adding one. Problem: can't always subtract one (which would be the inverse of adding one)! -- solved using singleton types.

Closing thoughts: computation vs. homotopy. There is a tension between equating terms by identifications and distinguishing them by computations. Analogy: function extensionality already equates bubble sort and quicksort -- they are the same function but different programs. Computation is finer-grained than equality.

C: Very understandable talk! But I still need to see a talk / paper where Homotopy Type Theory guides me to something new.
A: The functorial part comes for free within HoTT. Also we're forced to do everything functorially.

Pattern Matching without K

Dependent pattern matching is very powerful, but HoTT is incompatible with dependent pattern matching. The reason: it depends on the K axiom (uniqueness of identity proofs), which HoTT does not allow. But not all forms of dependent pattern matching depend on the K axiom!

Dependent pattern matching. Have a dependent family of types -- example of pattern matching given with dependent pattern matching and without.

K is incompatible with univalence.

How to avoid K: don't allow deleting reflexive equations, and when applying injectivity on an equation c s = c t of type D u, the indices u should be self-unifiable.

Possible extensions: detecting types that satisfy K (i.e. sets -- allow deletion of "obvious" equations like 2 = 2); implementing the translation to eliminators (this is being worked on or is done but not yet released); extending pattern matching to higher inductive types.

Conclusion: by restricting the unification algorithm, we can make sure that K is never used -- no longer have to worry when using pattern matching for HoTT!

Q: What was your experience contributing to Agda?
A: Not a huge change -- not too hard.

Q: Is there a different way of fixing the "true = false" proof?
A: K infects the entire theory.

Q: You said the criterion is conservative. Have you tried recompiling Agda libraries? What did you observe?
A: 36 errors in the standard library, 16 of which are easily fixable, 20 are unfixable or it is unknown how to fix them.

Q: Is K infectious?
A: Yes -- when activating the "without K" option, you cannot use a library that defines something equivalent to K.

Q: How do you prove termination of structurally recursive functions?
A: Map to numbers (e.g. list to length)

Q: Refinement types for call-by-push-value?
A: There is some work on refinement types which does not depend on the evaluation strategy.

Q: In the vector algorithm algorithm library, every function had to be annotated.
A: Here the first argument usually increases rather than decreasing so had to write a custom metric but this was simple.

Q: How does this interact with other features of Haskell like GADTs?
A: LiquidHaskell works on Core so there should be no problem.

Q: Did you find any bugs in your termination analysis of different libraries?
A: Yes a tiny bug in Text, has been fixed now.

A Theory of Gradual Effect Systems

Programs produce results and side-effects. Can use type and effect systems to control this. Here, functions have an effect annotation -- a set of effects that the function may cause. Expressions are also annotated with their effects.

Get into trouble when passing an effectful function to another function: as a safe approximation, the higher-order function must always assume that the function passed in may have any possible effect.

Introduce "unknown" effect (upside-down question mark). Combine this with a generic type-and-effect system which comes with "adjust" and "check" funtions.

Use abstract interpretation to lift adjust and check from static to gradual typing. Idea: extend the domains of sets to include "unknown" side-effects.

Define concretization function from consistent privileges to the feasible set of effect sets.

Introduce internal language: with "has" and "restrict" constructs.

Future work: implementation (currently targeting Scala); Blame

Q: Can you always derive the concretization function?
A: It is provided by us.

Q: If you're doing dynamic checking and dealing with arrays, but you can't raise errors -- what are you supposed to do?
A: You don't want to produce a side effect when you're not allowed to do so.

Q: Does your system allow parameterization on effect sets?
A: No, not in this paper. The implementation requires it though so this is future work.

Session 12: Dependent Types (Chair: Ulf Norell)

How to Keep Your Neighbours in Order

Conor McBride (University of Strathclyde)

Conclusion first: Push types in to type less code!

A very engaging and amusing talk. Conor demonstrates (with lots of Agda code!) how to use types to build correct binary heaps and 2-3 trees.

Q: What does "push requirements in" mean?
A: Start from the requirements. The information that's driving the maintenance of the requirements is passed in. (?)

A Relational Framework for Higher-Order Shape Analysis

Gowtham Kaki (Purdue); Suresh Jagannathan (Purdue)

Shape analysis in functional languages: ADTs track shapes of data. Types can document shape invariants of functions -- but we want to be able to refined these constraints!

Can we refine ML types to express and automatically verify rich relationships? Need a common language to express fine-grained shapes. Build up from relations and relational algebra. Example: in-order and forward-order.

Relational operators are union and cross
Predicates are equality and inclusion

But: relational types for polymorphic and higher-order functions must be general enough to relate different shapes at different call sites. Examples: id, treefoldl : a tree -> b -> (b -> a -> b) -> b

Consider id : a-> a -- shape of argument is the shape of its result.

Functions can be parameterized over relations. Also get parametric relations: relations can be parameterized over relations.

We consider the "effectively propositional fragment" of many-sorted first-order logic. Key observation for translation: a fully instantiated parametric relation can be defined in terms of its component non-parametric relations.

Implementation: CATALYST, tycon.github.io/catalyst/

Q: Does your type system allow the user to introduce new relations?
A: Yes.

Q: Can you compile any kind of parametric relation or are there restrictions?
A: You can also write relations on infinite data structures.

Session 13: Domain Specific Languages II (Chair: Yaron Minsky)

There is no Fork: an Abstraction for Efficient, Concurrent, and Concise Data Access

Imagine you a web program, the server side code for generating a webpage, with many sources. For efficient, we need concurrency, batching, caching. We could brute force, but that destroys modularity and we its messy.

What about using explicitly concurrency ? It really on the programmer too much e.g. they forget to fork, add false dependancies.

A rule engine: input, evaluate rules and return.

Consider coding a blog, example code including Fetch monad. The concurrency is implicit in the code, just use monand and applicative.

No manual batching, again implicit in code.

Demonstrating code for map and applicative. The monadic bind gives a tree of dependencies for IO.

This was developed for the haxl project at FB, migrating from an in-house DSL called FXL to Haskell. The code framework is open-source, github.com/facebook/haxl and cabal install haxi.

Implicit parallelism is hard but data fetching doesn't suffer from this problem.

Q: Is binder linear in depth of blocked requests ?
A: Take it offline

Q: Is there accidental sequential code ?
A: We haven't rolled it out yet

Introducing a trivial toy language for examples. Defined a simple deep and shallow embedding for the toy language. Shallow embedding must be compositional. The toy language have eval and print, print is messy to add to shalow embedding.

Dependent interpretations is introduced for the deep embedding and again is ok to shallowing embedding but not great. Context senstive interpretation can again be expressed as a fold.

Deep embedding are ADT's, shallow embeddings are folds over that AS.

Q: Has you method for shallow embedding sacrificed extensibility ?
A: Yes but we can work around it

Q: What about binders, what about recursion ? You example is too simple.
A: Not sure

Distilling Abstract Machines

Setting: lambda-calculus. There has been a lot of research into the relationship between abstract machines and substitution. We introduce the Linear Substitution Calculus which decomposes beta-reduction. It is equipped with a notion of structural equivalence.

Then define reduction strategies on it: by name / by value (LR / RL) / by need require a specific form of evaluation context together with specific versions of the multiplicative and the exponential step.

Distilleries: choice of deterministic multiplicative and exponential evaluation rule and an equivalence relation which is a strong bisimulation; an abstract machine with transitions defined by the chosen rules; a decoding function. Moral: the LSC is exactly the abstract machnie "forgetting" the commutative transitions. In all cases, distillation preserves complexity.

Q: What does the fourth possible combination of multiplicative and exponential evaluation correspond to (both by value)?
A: That would be a sort of call-by-value, depending on how you define the context.

Q: Can you bring some order to the seemingly unrelated evaluation contexts?
A: With these contexts, you tell the machine where to look for the next redex. You can even do non-deterministic machines. At some point, your choice is limited if you want execution to be deterministic.

Q: From reduction semantics to an abstract machine is a mechanical process. Is it possible to find an algorithm that generates distilleries?
A: I can do it by hand ... don't know if there is an algorithm.

Student Research Competition Award Presentation

ICFP 2015 Advert & Closing

ICFP 2015 will take place in Vancouver, Canada: icfpconference.org/icfp2015

The venue is great -- connectivity, location etc.

New: will have a Mentoring Workshop for beginning graduate students doing research. Workshop proposals and paper submission deadlines are on the website.

]]>http://www.syslog.cl.cam.ac.uk/2014/09/03/icfp-2014-day-3/feed/0ICFP 2014: Day 2http://www.syslog.cl.cam.ac.uk/2014/09/02/icfp-2014-day-2/
http://www.syslog.cl.cam.ac.uk/2014/09/02/icfp-2014-day-2/#commentsTue, 02 Sep 2014 06:54:37 +0000http://www.syslog.cl.cam.ac.uk/?p=1791This is day 2 of the 19th International Conference on Functional Programming. Leo and I(Heidi) are here bright and early to bring you the highlights from Gothenburg, Sweden. We are not the only livebloggers here at ICFP, check out Edward Yang's excellent notes.

Keynote (Chair: Jesse Tov)

Instead of an abstract, the keynote has a bibliography of interesting work in the area

Contracts are infectious

Example of racket code, working in a call-by-value language here a point-in function. Demonstration of contract violation including a blame, from applying bools instead of integers. Robby's adds an contract to guard access to point-in and the blame now points to the top-level.

Graph demonstrating how the number contracts in Racket in git has constantly increased. Robby identifiers the outliers e.g. big libraries with lots of contracts have been introduced. Most commits add 1 contract, if they add any.

Everyone is the audience is thinking, why use contracts when you can use types, as identified by Robby.

Contracts Verse Types: The previous example is used with types instead of contracts and negative integers are used to demonstrate how types are insufficient. He introduces a dependant contract, hinting at a comparison to dependent types. We have the arguments at the time of checking unlike dependant types hence its actually possible in real languages.

Switch to an example of a function called dc, which takes a drawing function and bonding box and draws as a side effect. Robby introduces a broken function, which introduces state pollution. We can't check that state after equals the state before, but this would be too late. Here's a contract example, generating a random state and testing for state pollution, this doesn't give us complete corrects but its good enough in this contexts.

Don't think about contracts in terms of projections, but in terms of boundaries (work by Christos)

Addition of contacts to simply typed lambda calculus, in particular how monitors add contracts to boundaries. Example of if we have contract (x =< 3), how values of x > 3 causes the function to abort. If we have another contract to checks f's arguments and return values are x=<3, we allow the function to passed though to with extra monitors on return values.

What happens if we have a bug in the monitor ?

Applications: How can contracts work if you don't give have access to the code the other side of the boundary. We can use the contract exercise function to generate random functions to test monitors. We have an notion of whether one contract is stronger than another.

Gradual typing: How we interfere between typed and untyped languages, see how we do it with Typed Racket.
Static Analysis: Havic, exhaustive boundary checking

This presentation is actually written in racket, all output is real output.

Example of 2D-tree data definition and writing a function to find the nearest point.

What's hard of students ?

- Type directed development is difficult even though there's mechanical rules to apply
- "Purpose statements" are hard, student just ignore them and read code, this breaks down then they get to recursion

CS community has build many models of real world artifices, build by engineers instead of computer scientists. E.g. memory models, processor architectures etc.. This take many postdoc years to build.

Models developed inside theorem prover or PL and then hand ported, porting maybe machine assisted (perl scripts :)). This language to language translation is ugly. We have theorem prover lock-in. Porting requires experiments in both systems. There it not a library of off the shelf components.

Introducing Lem:
Source -> Lem -> HOL, OCaml, Documentation
Lem syntax is similar to OCaml
Lem is a pragmatic tool, we are happy to accept code that causes problems for TP's
Example Lem including highlighting how whitspace and comments are presented

get lem from cl.cam.ac.uk/~pes20/lem

Q: How to you support naming and binding ?
A: We would like to add nominal naming

Q: Miranda dealt with ADTs rather differently. It later turned out that the Haskell and Miranda approach can be translated into each other.
A: Miranda's ADT system might run into trouble with type families...

Q: How many packages would need role annotations?
A: Not clear how to assess that. Probably very few (2?).

Q: How about Safe Haskell?
A: Coercible is not yet in the safe subset.

Q: Why not put these role annotations in the kind syntax?
A: Too big a change to the kind system. Makes it less flexible.

80s: Cardelli, Wand formulate type inference as a two-stage process: generating and solving a conjunction of equations. Benefits: higher-level thinking: equations and conjunction instead of substitutions and composition; also greater modularity: constraint solving in library, generation by user. But new variables still created via global side effects.

Constraint solving: on paper, every constraint can be rewritten until it is either solved or you get an error.

A problem: submitting a closed ML term to the generator yields a closed constraint which the solver rewrites to false or true (is well typed or is not well typed).

Question: can one perform elaboration without compromising the modularity and elegance of the constraint-based approach?

A "low level" solution: the generator could produce a pair of a constraint and a template for an elaborated term, sharing mutable placeholders for evidence so that after the constraint is solved, the template can be "solidified" into an elaborated term.

This approach has three stages: generation, solving, solidification; each user construct is dealt with twice -- not very elegant.

Can we describe generation and solidification in a unified manner? Yes: give the user a DSL to express computations which generate constraints and read their solutions simultaneously. The implementation of this is simple, and allows the user to define inference and elaboration in one inductive function.

Conclusion: solution is simple, modular, elegant, performant. Could be used in other settings potentially -- e.g. higher-order pattern unification?

Q: Would this work with lazy evaluation?
A: Not sure.

C: Notation of the algorithm is awkward.
A: Yes, this shows that we need a nicer way to express computations with applicative functors.

Q: How about a constructive proof instead of returning true or false?
A: Possible but then the proof tree would have to be inspected -- would still need to do work twice.

Higher order arrows (with switch) are inherently dynamic, can replace arrow function. Arrows with switch are equivalent to Monad. Makes it harder to optimize.

Why arrows with switch? -- "power start and stop".

Contribution: introduce resettability and non-interfering choice. Get general resettability and arrowized recursion.

Example: IntegralReset. Want to be able to reset it with an event. Easy to do with switch: just replace the integral function. Without switch we can simulate a reset but we can't modify the function itself. Not nice: doesn't generalize.

Resetting state: state is in delays and loops.

General resettability: can take any signal function and transform it into a settable signal function.

Presents "settable laws": identity, uniformity, default.

Other example: pausing an arrow. IntegralWhen, a signal function that performs an integral only under a given condition. Can't do this with arrow choice. Solution: use non-interfering choice.

Causal Commutative Arrows (CCA) can be heavily optimized. Allows choice but not switch. Reduces arrows to one of two forms.

Summary: introduced settability, a new model for controlling FRP state; non-interfering choice allows new forms of expression and arrowized recursion. Switch is really only needed for true higher order expressions!

Q: Is settability like a transistor?
A: Yes.

Q: Are there other circuit design ideas that we could re-use?
A: Not sure. Interesting idea.

Q: Can you use CCA factorization with GHC e.g. using rewrite rules?
A: Unresolved so far... There is a Template Haskell library for doing something like this.

Speedup as a function of memory usage has diminishing returns. Want to find sweetspot on the speedup vs. memory usage curve.

Type-directed translation: extend SML types with Changeable / Stable.

Need a technique to control granularity of the dependency graph. Treat blocks of data as a single reference cell. Fixed block size leads to disproportionately slow update time -- so instead use probabilistic chunking scheme.

Applications and Evaluation: incremental PageRank, graph connectivity, social circles. Very high speedups achieved in prototype implementations.

Future work: parallel self-adjusting computation.

Q: Have you done experiments with larger changes?
A: Yes.

Q: Is it realistic for companies to use 10x more memory in order to speed up computations?
A: It's a trade-off.

Q: What would be the speedup when you don't use additional memory?
A: Zero (?)

Q: This looks like it's intrinsically sequential. How do you plan to parallelize this?
A: It's all about dependency tracking -- dependencies can be used to decide where to parallelize.

Q: How would a non-pure merge sort perform?
A: Need to check.

Q: Can you incrementalize the whole or parts of SML?
A: Interesting question. Non-pure parts of SML would be problematic.

ICFP Contest Presentation by Duncan Coutts and Nicolas Wu

Created by members of the Department of Computer Science at Oxford University and Well-Typed.

Simulator and debugger given -- in the "lingua franca of the internets", JavaScript (actually written in Haskell and the compiled using GHCJS).

Before the contest, pointless hints were tweeted.

Major panic as the git-based site deployment system went haywire just when the contest started, resolved manually. Next 5 hours spent with all hands on deck. Participants could see specification diffs on GitHub.

Judging process: collected and verified all submissions. Had enough resources to run all vs all rather than multiple-round tournament (one team's lambda man against another team's ghosts and the other way round). Ran 200220 games.

Simon's observations: 1. an unequal partnership made this possible 2. good titles are powerful!

Ralf: used inspiration from different fields.

]]>http://www.syslog.cl.cam.ac.uk/2014/09/02/icfp-2014-day-2/feed/0ICFP 2014: Day 1 (another perspective)http://www.syslog.cl.cam.ac.uk/2014/09/01/icfp-2014-day-1-another-perspective/
http://www.syslog.cl.cam.ac.uk/2014/09/01/icfp-2014-day-1-another-perspective/#commentsMon, 01 Sep 2014 09:33:51 +0000http://www.syslog.cl.cam.ac.uk/?p=1773This is day 1 of the 19th International Conference on Functional Programming. Leo and I (Heidi) are here live blogging from Gothenburg, Sweden. He posts is here, we will combine them (with photos :)) in the coming weeks.

the beautiful conference location

Opening

Good morning and time to get started. Official definition of ICPF: its a forum for discussion of researchers. 466 people have registered for the conference or associated events. 21 sponsors giving $60K of sponsorship.

DAPRA, US funding organisation with rotating project management so constant surprises. Clean slate formal methods for proving security for embedded systems. Power, TV, medical devices, phones and vehicles, all have remote security risks with amusing exploits. Car is a computer on wheels, with many ECU's contacted with buses both high and low speed. If you can connect to the network, you can control anything under software control, getting to the network is easier then you'd think e,g using the diagnostic plug under the steering wheel. Almost all the car systems are under software control, include breaks. You can get access remotely by 4 ways: e.g. via bluetooth, allstar, virus to mechanises diagnostic equipment, via music to buffer overfull car software.

The traditional fix: air gaps & obscurity. Air gaps a no longer the case, cars need to be networked e.g. for remote monitoring.
Lots of documentation for this systems nowadays

The desktop approach fix: this approach barely works on desktop. The security issues with security code, approach 1/3 in recent watch list.
Current solutions are millions of LOC, extensive monitoring need extra CPU. Embedded codes goes though lots of review, no "patch every Tuesday" approach.

The new fix: Formal method based approach. Why now ? Lots of tools & infrastructure which is now mature and advances in SAT solves in later 10 years.

composition means how do you get high assurance components and build a high assurance systems ?
Started with Pem testing 4 vehicles, 2 prep and 2 open source. The systems had been engineered to be easy to connect so this wasn't a surprise. Hijacking demonstration of AI drown using aircrack. This can be quite a shock to the user. e.g care driver. Ever 18 months, the red teams will report on how much effort is required to hack. Starting with monolithic quad. They began by upgrading processor and refactor monolithic software, the H/W abstraction layer was a big hit with the community. Gradually swapping out legacy code to high assure code.

How should you do if you know your under a DoS attack ? it depends on context

Its been proved that system is memory safe, ignore malformed messages, ignores non-authenticated and all good messages will reach motor control.

"The most secure UAV on the planet" :)

AADL - language for architecture specification used to propagate system wide properties down until an assurance it reached. A formal systematics haves been added to AADL and provided using resolute tool, assurance cases and synthesis tools. The glue code can be generated, this is typically the weak point.

Galois haven written two DSL's in Haskell, They have created Ivory for C like code and Tower for tasks/connections.

NICTA have written a fully functionally code micro kernel.

Boeing are using the tools build to develop the unmanned little bird. There on track for running real HACKEMS code on a helicopters in 2 years. There's a linux partition for mission specific code, the red team will have full access to this partition.

Q: Are you working on the wrong problem ? We don't have the talented people to make this commonplace, this isn't cool enough
A: Hacking demonstrations get people interested. The US government is considering mandating that all cars are networked, but this is dramatic
security issues, remote code execution

Q: Why is the hardware so constrained ?
A: It is very high scale, there's very small margins

Q: What did this cost, what this research want to do something differ to what management want ?
A; $70 million, we discuss but that why its good to NFS

Q: How much more expensive will high assures systems be ?
A: We don't know yet, that what we are working on. We focus on re-usability, open source, scale to reduce costs.

Q: What's the multi-core story ?
A: Its much more difficult to verify multi-core, bowing want single core, maybe they won't be available soon.

Q:
A: AR Don was started with an existing open source platform

Q: How will this approach scale
A: 80 % quadcoper is high assurance. But we don't even now how many CoL there is in car ? Linux is big, but we can add high assurance code progressively.

Q: How can insure and secure software co-exist ?
A: That's exactly what we are testing with the red team and linux partition.

Q: How do you know that hardware isn't vulnerable ?
A: Our project is big enough, other teams work this question.

Session 1: Domain Specific Languages I (Chair: Anil Madhavapedd)

Challenges: Its an embedded systems, its high real
time system (1 millsec), system needs to be safe and secure.
Constrains: 3 engineers and 18 months, similar projects are 50K CoL.

There are billions of embedded systems, the 1 pence cost sensitivity it very high. Basically choice between C, C or C.
Typical secure issues, difficult to push a patch.

Approach: We would love to use Haskell or OCaml, but GC is not compatible with hard real-time. C or C++ (a subsec), a trade-off between safety and developer productivity.

Build your own tools, starting with a clean state language, correct by construction. We built a embedded DSL compiler in just a few month. The language is embedded in Haskell and compiles to C, its OS and called Ivory. Ivory is safe subset of C with memory & type system, no undefined behaviour. No heap, you can only allocate to the stack or global. We embedded in Haskell so we can utilise its type checker. Haskell is ivory macro language.

Modules are composed to expressions and global, there's a global name space.

Tower DSL - for organising whole applications. Build on top off ivory, to combine procedures into applications.

The quadcopter story - we have almost build all code in ivory. The bulk of the code is building drivers, that was about 10 kloc, the application was only 3 kloc. We needed to operate with existing the messaging format, we mangled some python code to generate ivory. Totally about 48 kloc.

Evaluation:
The code was delivered to the red team, and now no undefined/buffer overflows etc.. There exists some broad system level issues.

This was a large complex application, but we build high assurance code very quickly. Embedding DSL in Haskell allowed us to build compositional programs.

Source code is at ivorylang.org

Q: You and OCaml both target ARM, so why not us OCaml
A: We wanted compatibility with niche systems

Q: Did you run out of memory ? how do you communication this to the programmer ?
A: We don't currently have way to determine maximum stack depth, we are working on it.

Policy language for describing network policies. A quick intro to basic networking, we traditionally we use distributed protocols such as OSPF but now we have a centralized server to coordinate. We us openflow 1.0, each switch is a simple predicate and action table. There isn't just 1 table, different tables for many difference purposes e.g. different layer. Some switches have reconfigurable pipelines. We are now looking at OpenFlow 2.0 to switch into two phrases: configuration (like flashing router) and population.

We have developed a core calculus for pipelines, we have hardware models for 3 switch architectures and add virtual pipelines to our compiler.

The language:

example using the Broadcom trident switch, each table is a function from a packet to a set of packets

The control can connect to many switchers, the programmer can express a virtual pipeline as a minimum for each physical pipeline

example controller
m:(typ, port) + r:(dst MAC, port)

We developed 3 general techniques we designed for "RMT" architecture
1. Consolidate packet duplication
2. Refactor field modification
3. find field size

The syntax for the language, see paper. We are inspired by the frenetic family of languages.

We have added concurrency and thus race conditions, we using typing judgements for read and write permissions on each field.
Our normalisation lemma handles rules such as no overlap between read and write sets.

Q: Your plus operator seems to compile after parallelism
A: Its not compiled away just implicit

Session 2: Static Analysis (Chair: Ken Friis Larse)

SeLINQ: Tracking Information Across Application-Database Boundaries

Speaker: Daniel Schoepe (Chalmers University of Technology)

Motivation: Information flow control for computation & database queries. We given data to many complex application that we don't trust. Communication bounds better application is weak spot, so we consider end-to-end. Our contribution is end-to-end information flow with type checker and compiles to F#.

Typically we interface with SQL database, if we don't do it properly then there's lots if issues. Now people write code to generate inference code with SQL

Information flow code should into lead private data. Non- interference means private data channels shouldn't interact with public data channels. Language-integrated query using a toy language (subset of F#) with includes meta-programming. Security policies and security levels are expressed in the type system. We can then annotate databases columns, e.g. name is public but age is private.

Low equivalence relation is equivalence taking into consideration the new types for security levels. We can then use this relation to encode non-interference.

The type system is based on flowCaml, by Pottier and Simonet, 2003.

Our type checker and compiler implementation is written in Haskell using BNFC, we support ADT's .

Using a movie rental database as an example, we consider addresses as private but cities as public. We can query for all rental from a city with public but if we then try to get addresses, the type checker will reject this.

Q: Do you support more complex labelling schemes ?
A: Yes, we support lattices

Q: Can we classify fields or rows ?
A: No

Q: The exists function you demonstrated doesn't depend on type of elements, does it ?
A: No, but we can express that

A software system is a family of programs with a configuration, making error messages are difficult to read. We start with a program family, select a program and analysis but variational analysis takes a different route. We will look at the variational analysis today (VA):

Session 3: Binding Structure (Chair: Tarmo Uustal)

Programming is hard. Meta programming allows us to program to program, but I can introducing name issues. Alpha equivalent inputs, should produce alpha equivalent outputs. We have developed a binding safe programming language, with notation to express complex binding. Romeo expresses binding information in types.

Example remeo code:
stx = (Lamda (a) (+ a 1))

Romeo automatically renames a (as its not exposed) to makes sure no names clash.

Again, have a look at the paper for cool code examples

Romeo is a binding safe language that support term languages with complex binding structure

Q: In meta programming we want effects, how do deal with e.g. meta programming for loop invariant code motion?
A: Remeo is side-effect free, effects in an meta programming we'll take this offline

Q: Is this online ?
A: Not yet

Q: This is very similar to lamdba M ?
A: This was your inspiration but its not powerful enough as we have syntax cases

Q: Hygienic macro expansion ?
A: We indirectly mentioned it

Q: Thanks for getting starting with an example
A: Thanks

Q: What's the complexity of expansing remeo code ?
A: potentially n^2, we think we could perform the substitutions lazily to improve this

Q: can we represent practical thing ?
A: We don't have a story for possible racket macro's you could want, some are simple

Research focus on correctness of optimisations but there's a lot less work on how we check that the transformation is actually an improvement.
Example of adding an accumulator to th
e List.rev function

The worker wrapper theroms captures induction.

I'm headiing off and direct the reader over to Leo at http://www.syslog.cl.cam.ac.uk/2014/09/01/icfp-2014-day-1/

]]>http://www.syslog.cl.cam.ac.uk/2014/09/01/icfp-2014-day-1-another-perspective/feed/0ICFP 2014: Day 1http://www.syslog.cl.cam.ac.uk/2014/09/01/icfp-2014-day-1/
http://www.syslog.cl.cam.ac.uk/2014/09/01/icfp-2014-day-1/#commentsMon, 01 Sep 2014 08:08:59 +0000http://www.syslog.cl.cam.ac.uk/?p=1771This is day 1 of the 19th International Conference on Functional Programming.

Opening by the General Chair, Johan Jeuring

Quick facts about ICFP 2014:

340 people registered for ICFP

466 people registered for any event

most people are from Sweden, UK and USA

21 sponsors, $60k in sponsoring

Keynote: Using Formal Methods to Enable More Secure Vehicles -- DARPA's HACMS Program

Kathleen Fisher (Tufts University)

Four and a half years long program, 2012--2017.

Many everyday objects today are computers! Infrastructure, medical devices, refrigerators, cars ... examples given for these things being hacked.

Focuses on cars: "computer on wheel" -- 100s of interconnected computing units. Everything is under control of software: braking, accelerating, locking.

First attack vector was via diagnostic bus but this requires physical control (you need to be in the car). Later found that physical control is not necessary to take over a car (Bluetooth, emergency number, audio CD).

Documentation for car systems exist and spare parts can be ordered online and analyzed (so no air gap, no security by obscurity).

Structure of the HACMS program: technical areas are vehicle experts, OS (building on verified sel4 kernel), control systems, research integration (management, composition: how to put high-assurance components together to create a bigger high-assurance component?, integration), red team (hints at problem of how to assess the security of a system -- currently a heuristic is: how long does it take for hackers to take over the system?).

Air team works on helicopter and quadcopter; ground team works on cars.

Demo video: WiFi-controlled quadcopter being taken over using aircrack-ng.

Every 18 months the red team gets access to the system and tries to find vulnerabilities.

Started with monolithic software, no RTOS, no security. After first iteration: use faster processor, put FreeRTOS and HAL under still monolithic Ardupilot software. Next iterations: generate more and more code from new DSL.

SMACCMCopter 18 months assessment: it flies! Air team can prove system-wide security properties like memory safety, ignoring malformed / non-authenticated messages; all good messages received will reach the motor controller. Red team found no security flaws in sex weeks with full access to source code. "Probably the most secure UAV on the planet" ("kinda scary").

Research integration (Rockwell Collins) used AADL to describe how hardware and software components fit together. Developed tools (AGREE, Resolute) to reason about the system. They are used to check assumptions about the system (example: there is only one way to communicate with the quadcopter).

Control systems (Galois) created Ivory and Tower, open-source EDSLs for synthesizing safe low-level code and tasks, respectively. Designed and built SMACCMCopter, the first high-assurance UAV software, in under two engineer-years.

OS (NICTA) proved full functional correctness of a fast OS microkernel, sel4 (now open source). Also working on eChronos, a high-assurance RTOS product line, and formally verified OS components (drivers, file system etc). CAmkES is a system for connecting and configuring these components.

Air team in phase two tries to re-architecture (split) the system so it runs on a flight computer and a mission computer as is usual in real world scenarios. Red team will then try to gain control over the flight computer from the mission computer.

More detailed description of individual tasks follows.

Tech transition: a huge problem. The cyber-physical systems industry know they have a cyber-security problem but there are many barriers to adoption of HACMS-like technology.

C: Make it cool! Come up with a catastrophic event!
A: Well this is a response to "you can take over a car wirelessly". Another aspect: it will probably be mandated soon for cars to take to each other -- but remote code execution was never considered.

Q: Why not have quadcores in the car?
A: Very price-sensitive; the industry has tiny margins.

Q: How much does this project cost?
A: $70m.

Q: Research: top-down or bottom-up, which is better?
A: We need both. DARPA is bottom-up, NSF is top-down.

Q: You expect to pay more for high-assurance code. What is the cost? Especially for the second, third, ... project.
A: High-assurance code will always be more expensive. We hope to bring down the additional cost by developing tools that aid building high-assurance code. But have to consider: traditionally we pay people to write sloppy code; the cost from running sloppy code is never properly accounted for! Maybe high-assurance software is cheaper in the long run.

Q: Why did you replace the Arduino processor?
A: We wanted to start with an existing open source hardware platform.

Q: (bit unclear)
A: We're close to having one high-assurance system, the SMACCMCopter. Still far away from having a high-assurance car for example (one problem: no-one knows how many lines of code a normal car has!)

Q: How do you combine secure and unsecure code?
A: That's one of the challenges of course.

Q: How do you know the hardware does what you think it does?
A: In this project, we simply trusted the hardware. Other people are working on hardware issues.

Embedded systems are everywhere. They're basically computers from the 80s: small and cheap. Development tools are from the 80s too (C). You get all the security flaws you'd expect -- buffer overflows etc. Can't push patches, and there are more attack surfaces than ever.

Alternative: build your own tools. Start with clean slate, use language approach, correct by construction. Built an embedded DSL in just a few months: Ivory, embedded in Haskell, compiles to safe subset of C. It's memory safe and allows no undefined or implementation defined behavior. No heap, only global and stack allocation.

(Side story: a third-party Python tool generating C output was modified to output Ivory code as well; this code could then be type checked to make sure it was correct. This way, the Python tool didn't need to be trusted.)

Red team results: found no buffer overflow, no undefined behaviours, no denial of service. After five months they found two subtle architecture level bugs. Lifted the layer of abstraction: no need to worry about low-level problems, can concentrate on architecture.

Conclusion. Quickly and cheaply built a large, complex application; caught lots of errors early by Haskell type checking, could focus on application design; used Haskell as a macro system making it possible to build compositional programs.

Q: How is this an improvement over the OCaml code generator?
A: Cross-compilation was a specific concern, making sure the code runs on 8-bit, 16-bit, 32-bit architectures.

Q: Restricted amount of memory: how did you deal with that?
A: Don't yet have a way to assess the maximum stack use of a program. But we think it is solvable.

Q: You construct correct-by-construction code that is a subset of C, e.g. you don't do pointer arithmetic.
A: Yes.

Q: Do all arrays have a size known at compile time?
A: Yes.

Q: Type error messages were a problem?
A: Yes.

Q: You don't allow undefined behavior. How do you prevent overflows?
A: This is not done at the type level; instead we insert assertions in the C code (at the moment ~2500 assertions). Hope to analyze and discharge these assertions statically in the future.

Q: How about Rust?
A: Big fan. Rust was in its infancy when the project started two years ago. But when using Haskell we get type checking for our macros.

Review on networking: have an enterprise network with hosts, connected to a switch. In a SDN, the controller tells the switch which rules to use to deal with packets. Openflow 1.0 uses prioritised rules for this. Has predicates and actions. But switch hardware is not that simple: lots of tables in a switch pipeline. New configurable designs available now.

Openflow 2.0: splits interaction between controller and switch into two phases: 1. configuration (think flashing an FPGA); 2. population of the tables.

Intel "flexpipe" supports reconfigurability. Each table can match on and write to every field. Also supports concurrency; introduce concurrency operator to model this.

What about the controller? Easiest for programmer to define virtual pipeline that describes the requirements.

New operator: packet duplication -- copy and run packet through multiple components.

Example: compilation to the barefoot "RMT" architecture -- three passes: 1. consolidate packet duplication (output ports must be known at the time of packet duplication, also need to add metadata to model processing as it would have taken place in the virtual pipeline); 2. refactor field modification; 3. place generated tables into physical tables (use commutativity of concurrency and other tricks -- the algorithm is a combination of brute force and dynamic programming).

Metatheory. Predicates form a Boolean algebra; grammar given for Predicates and Policies. Have a type system and small-step semantics.

Motivation: confidential data is entrusted to many application which are composed of many components. Information leakage often occurs at boundaries, e.g. SQL injections. Need to look at confidentiality as end-to-end security to prevent attaches.

Q: Can I make the security policy express things like "only certain people should be able to access this"
A: No, can't do this at the moment.

Q: Could this be embedded into a computational framework where one could assist the system in showing that some access is in fact safe when the type system's approximation cannot show it?
A: That would be an interesting extension.

Q: Do you deal with aggregation?
A: This has not been considered yet.

Type-Based Parametric Analysis of Program Families

Sheng Chen (presenting), Martin Erwig (Oregon State University)

Can often think of programs as "program families" which by configuration can be refined to single programs. But build & install may fail in unexpected ways. Trying all combinations of options is infeasible.

Idea: use variational analysis directly on the program family / variational program. Introduce a framework for doing variational analysis in the type system.

Q: Have you tried to do effect analysis in this framework?
A: The framework should be expressive enough to do this but it hasn't been done yet.

Session 3: Binding Structure (Chair: Tarmo Uustal)

Romeo: A System for More Flexible Binding-Safe Programming

Paul Stansifer (presenting), Mitchell Wand (Northeastern University)

Introduction: metaprogramming allows us to automatically generate programs. Name binding is a problem -- example given. Recent survey found that 8 out of 9 metaprogramming systems had name binding problems.

Binding safety: alpha-equivalent inputs should produce alpha-equivalent outputs. In theory this works: FreshML, MetaML. But these only support simple binding forms like λ. Real languages have more than λ, e.g. let* in Racket, "local" construct in SML.

Conclusion: Romeo is a binding-safe language that supports term languages with complex binding structure.

Q: What about de Bruijn indices? Effects in the metalanguage?
A: Romeo is side-effect free.

Q: Can I play with Romeo?
A: I'll try to convince my supervisor to release the source code. Currently the implementation is out of sync with the formalism.

Q: This looks similar to λ-m.
A: Lambda-m is not powerful enough. We give you syntax case.

C: Thanks for starting with a concrete example.
A: Thanks to my peers for the criticism!

Q: How does hygienic macro expansion relate to this?
A: If your macro gets larger and larger, there is less safety -- hygienic macros are all about protecting macros from each other.

Q: Complexity of expanding Romeo code?
A: O squared, so that's not good. Macro systems have the same problem though. Trick: perform actual substitution lazily, and we think the same technique could be used with Romeo also.

Q: How much expressiveness does Romeo have?
A: Can we express all Racket macros -- probably, we think most are just simple extensions of what we have right now.

Two subproblems: 1. what is the concrete function called from some computed function call? -- already handled by 0CFA; 2. does the potential inlining point have a compatible environment? -- "reflow", JITs.

Contribution: determine when higher-order inlining is safe. Key insight: don't check "is the environment the same?" but "could any of the free variables have been rebound between the closure capture location and the call site / potential inlining point?". This supersedes ad-hoc higher-order inlining tricks.

Q (SPJ): You showed that the analysis is cheap but what do we get?
A: Benchmarks are in the paper.

Q: If I want to use this: is it clear for me where the optimization will kick in?
A: Well, yes, if you're the compiler writer (laughs from the audience). But this is a problem indeed.

Q: I usually pass closures to "map" and "fold", so these will have many call sites. Can this still be optimized in this framework?
A: "map" and "fold" are usually handled in a special way by the compiler anyway.

Q: Is it imaginable to prove that the optimization actually makes the program run faster?
A: Next presentation will try to answer this. But it's very difficult.

Q: What's the relationship to Milton's optimizations?
A: Not entirely sure. Is defunctionalization or CFA-style analysis better? An interesting question.

Context: lots of research into optimization focuses on correctness: is meaning preserved?; comparatively little work on improvement -- ensuring programs become "better".

Example: list reversal function. How can we prove that the improved program is both faithful to the original version and also more efficient?

The Worker/Wrapper transformation: Original program has type A, worker has type B, wrapper has type B -> A.

Formally, given

f : A -> A, g : B -> B, abs : B -> A, rep : A -> B

and

abs . rep = id, g . rep = rep . f

then

fix f = abs (fix g)

Reasoning about efficiency. Need some way to compare the cost of two programs. Naive approach: count steps taken to evaluate terms to some normal form. But this fails in a lazy context! Need precongruence.

Precongruence: a preorder and a congruence -- this excludes negative interaction between improvements at different points in the program.

Improvement theory: counting steps, also have a notion of improvement within a constant factor.

Structure of improvement proofs is essentially the same as that of correctness proofs.

Conclusions: we can make rigorous arguments about the efficiency of lazy programs; make performance guarantees for a general-purpose optimization.

Further work: structured recursion; turn it into a typed theory (advantage: only need to worry about contexts where the types make sense); look at improvement of space usage and other resources; quantify improvement (how much faster? which optimization should be applied if there are alternatives?); automate it.

Q: Do you have completeness results?
A: It's not clear whether Worker/Wrapper improvement proofs can be translated from fixpoint style to improvement theory.

Q: Do you have any examples where Worker/Wrapper transformation makes things slower?
A: Of course we have come across these, but we wouldn't put that in the paper! (Laughs from the audience). [Tree reversal example follows]

Q: Problem in GHC is the cost of building closures. How is this reflected in your framework?
A: Read early papers on improvement theory.

Q: How about local improvements?
A: This work does not look at the surroundings.

Q: Would things change dramatically if you applied this to a call by value language?
A: There has been some work on that.

Session 5: Context Dependence (Chair: Yukiyoshi Kameyama)

Compositional Semantics for Composable Continuations: From Abortive to Delimited Control

Paul Downen, Zena M. Ariola (University of Oregon)

Classical control: callcc is the classic control operator, going back to Scheme. Corresponds to classical logic (the way the λ-calculus corresponds to intuitionist logic).

Delimited control: delimit the scope of effects; continuations compose like functions. These little changes make delimited control vastly more expressive than classical control.

Classical control in detail. Operational semantics of callcc as an extension of CBV lambda-calculus given. Alternatively, we can understand callcc using an equational theory. But: it's weaker than operational semantics. Some programs can be evaluated to a value but equational theory for callcc cannot reach a value! How can we know that we have the whole context?

Calling a continuation never returns -- it "jumps". A jump is the same when inside a larger evaluation context so it delimits the usable extent of a continuation.

λμ syntactically distinguishes jumps as "commands".

Relaxing the syntax. Λμ collapses term/command distinction. It is interpreted with the same rules only that we have more expressive meta-variables. No new constructs, no new rules. As typed calculi, Λμ is considered equivalent to Parigot's λμ. What happens if you ignore types?

shift and reset: common basis for delimited control. Continuations return and can be composed like functions.

λ-calculus + shift + reset is a true subset of Λμ. What's missing?

Difference between shift and shift0: the latter removes its surrounding delimiter.

λ-calculus + shift0 + reset0 is equivalent to Λμ.

Λμ as a framework for delimited control gives provable observational guarantees about the operators, e.g. idempotency of reset.

More in the paper: can get nice reasoning principles using call-by-value, call-by-name, call-by-need; equational correspondence with compositional transformations.

Q: What do you mean by equality in this presentation?
A: Observational equality -- this is about tools for reasoning about programs rather than expressivity of the calculi.

Q: λμ has a well known type system but to introduce delimited control you had to relax the syntax. Is there a type system for this?
A: (?)

Coeffect calculus for bounded reuse: annotate each variable in the environment with the number of its uses in the expression, essentially tracking the number of variable uses.

Variable, abstraction and contraction rules shown.

Another possibility: coeffect calculus with historical values. Replace addition with max operation in the contraction rule.

Environment related: coeffect calculus for implicit parameters. Example: implicit parameter for time zone and time. We can calculate how many parameters are still required (i.e. not given by the current environment). The corresponding rule is non-deterministic.

What's in the paper? Unified system: coeffect scalar structure (a generalized semiring structure) and shape-indexed coeffects (context splitting and merging, per-variable or whole-context). Also have monadic and comonadic semantics -- not simple as λ-calculus is asymmetric (many to one)!

How to achieve modularity, type-safety and reuse without requiring a PhD in type theory? Bruno Oliveira proposes algebras as an alternative to algebraic datatypes and OO hierarchies, more specifically: variants of F-Algebras. Fold Algebras (using products of functions) and F-Algebras (using sums of products) are isomorphic.

An example of encoding generalized algebraic datatypes (GADTs) using type classes in a way similar to Church encodings due to Hinze from 2003, when there were no GADTs in Haskell.

Similarly, in OO, Church encodings were used to model the visitor pattern.

The Expression Problem: a problem of modularity. Given an existing type of expressions, how do you extend it (e.g. how do you extend data Exp = Lit Int | Add Exp Exp with multiplication)?

Solution 1: encode algebras as type classes, extend using subclasses.

Solution 2: use F-Algebras (Datatypes à la Carte)

In OO languages, can also solve the Expression Problem with algebras: use interface inheritance.

All current solutions using algebras still require heavy encodings. In the future: programming language support for algebras?

WGP Session 2

Generic Constructors and Eliminators from Descriptions

Larry Diehl (presenting), Tim Sheard (Portland State University)

Abstract: Dependently typed languages with an “open” type theory introduce new datatypes using an axiomatic approach. Each new datatype introduces axioms for constructing values of the datatype, and an elimination axiom (which we call the standard eliminator) for consuming such values. In a “closed” type theory a single introduction rule primitive and a single elimination rule primitive can be used for all datatypes, without adding axioms to the theory.

Abstract continued: We review a closed type theory, specified as an A GDA program, that uses descriptions for datatype construction. Descriptions make datatype definitions first class values, but writing programs using such datatypes requires low-level understanding of how the datatypes are encoded in terms of descriptions. In this work we derive constructors and standard eliminators, by defining generic functions parameterized by a description. Our generic type theory constructions are defined as generic wrappers around the closed type theory primitives, which are themselves generic functions in the A GDA model. Thus, we allow users to write programs in the model without understanding the details of the description-based encoding of datatypes, by using open type theory constructions as an internal domain-specific language (IDSL).

We often write very similar functions for structurally similar data types, e.g. "add" for natural numbers and "append" for lists. Natural numbers and lists have the same recursive structure and a coherent property, a "length" attribute:

add (length ml) (length nl) = length (append ml nl)

A simple projection function exists from lists to natural numbers: the "length" function.

The motivation for this paper was to add ornaments to ML. Naturals/lists example: an ornament is defined by a projection function from the ornamented type to the bare type, e.g. "length". Declare an ornament as

ornament from length : a list -> nat

The system checks whether recursive structure is preserved.

Syntactic lifting is how "append" can be derived from "add" almost automatically. Given

Applications of this: for refactoring (see question further down), or to derive an evaluation function (uniquely) from "conv", a bijective function between expression types.

More complex data structures can be lifted too (e.g. Sets and Maps); so can higher-order functions. Lifting for GADTs is also possible and is automatic for some invariants given the expected type of the function.

There are three types of program specialization: partial evaluation, supercompilation and isomorphic specialization. Idea: we have A, B, P such that P : A -> B; also A', B' which are isomorphic to A and B, respectively, and P' : A' -> B' which can be computed more efficiently than P. Potentially there are multiple options for A', B' and P'. How can we use P' on A and B? (A and B: 'domain' data; A', B': 'core' data)

The example used is matrix-vector multiplication. Matrices, for example, can be represented as dense (vector of vectors), This is implemented as an EDSL (embedded domain specific language) in Scala. Here we can abstract over dense and sparse vectors and matrices. The specialization can take place at runtime, and we can allow for alternative specialized versions. Specialization can lead to faster code.

Generic composition of isomorphisms -- idea: build isomorphisms for each constructor in the core language. These isomorphisms are first class citizens of the framework.

How it works: staged method invocation with graphs.

Take-home agenda:

Develop algorithm in EDSL

Identify isomorphic representations in the domain language

Implement ADTs using a core language and isomorphic representations

Generate representation-specific implementations in the core language

Implementation can be found at github.com/scalan.

Q: How did you proceed? Paper includes theory and implementation notes
A: Original idea was to use graphs, that was the starting point

Q: You talk about large performance increases compared to original Scala implementation. Have you compared with C/C++?
A: No.

Examples are given for each monad, arrow, and idiom; the control flow and data flow trees show the dynamic / static nature.

Static computations are useful as they allow optimisations and low-level implementations (arrows correspond approximately to circuits, for example).

Key contribution: using Flow Effects, we can write monad/arrow/idiom code in a single style -- this is very different from Haskell, where different notations exist for each.

A handler is an interpreter for an abstract computation. It is defined as a fold over a computation tree, specifying how return values and operations are interpreted. Given a monad handler, we can derive an arrow and idiom handler.

Q: Can it be inferred from a Flow Effects expression whether the computation it defines is a monad, an arrow or an idiom?
A: Yes.

Q: What's the status of the Flow Effects syntax?
A: Not finalised yet -- the syntax shown in the paper is a "mathematical" syntax with braces, indices etc.

A short discussion followed on whether a unifying notation is a good idea -- maybe it is a good thing that we can tell from the syntax whether a computation is a monad, an arrow or an idiom?

Scoping Rules on a Platter -- A Framework for Understanding and Specifying Name Binding

Name binding: for a given reference (use), where is the corresponding declaration (what it refers to)?

In order to understand and unambiguously define name binding rules, a number of name binding combinators has been developed.

Scope combinators are: hiding, merging, opening, weak hiding.

Together they provide a language that describes how scopes are combined to look up a reference. These are then used to describe how a specific language does name binding. The example is C++: the standard devotes over 60 pages in total to name binding rules (and it turns out both Clang and GCC get operator resolution wrong).

Discussant: Ilya Sergey

Q: Could we use attribute grammars for name binding?
A: Finding a grammar for C++ has been attempted many times... It's hard to take a principled approach and applying to C++.

Q: Did you find inconsistencies in the C++ standard when you tried to collect the scoping rules?
A: No, to our surprise -- it's complex but consistent.

Q: Did you come up with any ideas to simplify the rules along the way?
A: There don't seem to be any ways to simplify the rules (except using a different language).

The subtyping constraint is easily expressed as a type class with an injection function inj and a projection function prj. But this approach treats :+: asymmetrically, doesn't inspect the left-hand side and is ambiguous.

Contribution of this paper: a re-implementation of :<: which behaves as expected (f :<: g <==> there exists a unique injection from f to g), avoids ambiguous subtyping (by rejecting multiple occurrences of signatures) and allows expressing isomorphism.

Implementation: Have a type-level function Embed taking signatures f, g as arguments. Produces proof object p for f :<<: g -- a relaxed version of :<: allowing ambiguity -- and checks whether p proves f :<: g. We can then derive implementations of inj and prj using the proof object as an oracle in instance declarations.

The code for Embed, Sub and helper type classes is shown. It's all contained in the "compdata" package.

With this approach we gain lots of expressive power, e.g. control over how the search proceeds, at the price of more type level programming.

C: A solution for the error messages apparently exists. Also, Idris does this really nicely (it's also much more complicated).

Q: Motivation: want to use sums of products. Interesting tradeoff by maybe focusing on coproducts of products?
A: ?

Q: Would instance chains have solved your problem?
A: Yes.

Q: All this type hackery for deriving injections and projections -- how about just providing them every time?
A: Of course you can do that, but it's inconvenient and obvious from the type. If you need more flexibility, you can still do it manually.

Q: No canonical ordering for injection (eg sum of int and bool) -- is this a problem?
A: Yes, at the moment you have to annotate.

True Sums of Products

Edsko de Vries, Andres Löh (presenting) (Well-Typed LLP)

A new approach to datatype-generic programming in Haskell; notable features: faithful representation of datatypes as n-ary sums of n-ary products; a library of high-level combinators to encourage concise and reusable functions; separation of metadata in the representation.

Motivation was to improve on existing models: GHC.Generics representation is too constrained (binary sums and products) but at the same time too flexible (arbitrary nesting). Also has noisy metadata everywhere. Functions end up making implicit assumptions about the structure of the data which is not guaranteed by the type system.

Discussant: Patrik Janssen

Q: What kind of code does this generate?
A: Not looked at yet. But lots of implicit arguments are potentially being passed around at runtime.

Q: Can this be done better in Agda?
A: Library was completely developed in Haskell. Can probably be done more nicely in Agda, but in this case the development was driven by an actual need to deal with certain generics problems in Haskell.

Q: n-ary lists are left-biased: don't you run into similar performance problems as with binary representation?
A: List-based representation gives uniform encoding for all constructors. But probably not ideal for data types with thousands of constructors.

]]>http://www.syslog.cl.cam.ac.uk/2014/08/31/icfp-2014-day-0-wgp/feed/0Ionel Gog receives Google Fellowship in Distributed Systemshttp://www.syslog.cl.cam.ac.uk/2014/06/19/ionel-gog-receives-google-fellowship-in-distributed-systems/
http://www.syslog.cl.cam.ac.uk/2014/06/19/ionel-gog-receives-google-fellowship-in-distributed-systems/#commentsThu, 19 Jun 2014 14:35:15 +0000http://www.syslog.cl.cam.ac.uk/?p=1736Every year, Google award a number of PhD fellowship in Computer Science. These highly competitive awards support "some of the most outstanding graduate researchers in computer science across the globe" (Google). The SRG is fortunate to have Ionel Gog, a second-year PhD student, receiving the 2014 Google Europe Fellowship in Distributed Systems.

Ionel works on distributed data centre infrastructure for "big data" processing: his most recent project is the Circe workflow manager, which dynamically maps high-level declarative workflow descriptions to a range of different data processing systems -- such as Hadoop, Spark, GraphChi, PowerGraph or Naiad, -- choosing the best system automatically. Ionel also contributes to the R2D2 low-latency network interconnect, the Firmament cluster scheduler and the DIOS operating system as part of the SRG's CamSaS initiative for building a scalable systems software stack.