1. A new GHC option (`-XSafe`) enabling a "Safe" dialect of Haskell in which GHC rejects any source code that might produce unsafe effects.

19

1. A new GHC option (`-XSafe`) enabling a "Safe" dialect of Haskell in which GHC rejects any source code that might produce unsafe effects or otherwise subvert the type system.

20

20

21

21

2. Another new GHC option (`-XTrustworthy`) indicating that, though a module may invoke unsafe functions internally, the set of exported symbols cannot be used in an unsafe way. The `-XTrustworthy` option makes a small extension to the syntax of import statements, adding a `safe` keyword:

Roughly speaking, import declarations are either safe or unsafe, while modules are classified as either trusted or untrusted. A safe import declaration causes compilation to fail if the imported module is not trusted. In the Safe dialect, all import declarations are implicitly safe. When compiling with `-XTrustworthy`, import declarations with the `safe` keyword are safe, and those without unsafe. A module can be trusted only if it was compiled with `-XSafe` or `-XTrustworthy`, but there are additional restrictions as discussed below under '''module trust'''.

25

Roughly speaking, import declarations are either safe or unsafe, while modules are classified as either trusted or untrusted. A safe import declaration causes compilation to fail if the imported module is not trusted. In the Safe dialect, all import declarations are implicitly safe. When compiling with `-XTrustworthy`, import declarations with the `safe` keyword are safe, while those without the keyword are unsafe. A module can be trusted only if it was compiled with `-XSafe` or `-XTrustworthy`, but there are additional restrictions as discussed below under '''module trust'''.

26

26

27

27

== Safety Goal ==

28

28

29

As long as no module compiled with `-XTrustworthy` contains a vulnerability, the safe dialect's goal is to guarantee the following properties:

29

As long as no module compiled with `-XTrustworthy` contains a vulnerability, the Safe dialect's goal is to guarantee the following properties:

30

30

31

31

* '''Referential transparency.''' Functions in the Safe dialect must be deterministic. Moreover, evaluating them should have no side effects, and should not halt the program (except by throwing uncaught exceptions or looping forever).

The Safe dialect is intended to be of use for both normal (trusted) and untrusted code. Authors of trusted modules may wish to include `{-# LANGUAGE Safe #-}` pragmas to ensure they do not accidentally invoke unsafe actions (directly or indirectly), or to allow other Safe code to import their modules.

38

38

39

Note that the safe dialect does not prevent use of the symbol `IO`. If an untrusted module exports an `IO` action, that action may have arbitrary side effects. Simply compiling a module with `-XSafe` does not meaningfully restrict the effects of exported `IO` actions. Hence, an application A importing an untrusted but safe module M may safely invoke pure functions from M, but must avoid executing `IO` actions unless some other mechanism has been used to ensure M can only construct `IO` actions that conform to A's security goal. (See `-ultrasafe` below for one such mechanism.)

39

The safe dialect does not prevent use of the symbol `IO`. If an untrusted module exports an `IO` action, that action may have arbitrary side effects, regardless of the `-XSafe` option. Hence, while an application A importing a safe but possibly malicious module M may safely invoke pure functions from M, it must avoid executing `IO` actions exported by M unless some other mechanism ensures those actions conform to A's security goal. (See `-ultrasafe` below for one such mechanism.)

40

40

41

41

== Module trust ==

42

42

43

Recall that a safe import of a module M fails unless M is trusted. M is trusted when two conditions hold:

44

45

1. M is compiled with `-XSafe` or `-XTrustWorthy`, in which case we say M is ''trustable'', and

43

Recall that a safe import of a module M fails unless M is trusted. M is trusted when two conditions hold:

44

45

1. M was compiled with `-XSafe` or `-XTrustWorthy`, in which case we say M is ''trustable'', and

46

46

47

47

2. The modules on which M depends (transitively) for safety all reside in packages that the application trusts.

48

48

49

Determining trust requires two modifications to the way GHC manages modules. First, the interface file format must change to record whether each module is trustable, and if so what its trust dependencies are. Second, we need run-time options to specify which packages are trusted by an application.

50

51

We therefore change the interface file format to record for each module M whether or not M is trustable. If M is trustable, the interface file contains a list of ''trust dependencies'', which are (package, module) pairs on which the current application depends for safety. The trust dependencies of a module are determined as follows:

49

Determining trust requires two modifications to the way GHC manages modules. First, the interface file format must change to record whether each module is trustable, and if so what its trust dependencies are. Second, we need compiler options to specify which packages are trusted by an application.

50

51

We therefore extend the interface file format to record for each module M whether or not M is trustable. If M is trustable, the interface file contains a list of ''trust dependencies'', which are (package, module) pairs on which the current application depends for safety. The trust dependencies of a module are determined as follows:

52

52

53

53

* ''If a module is compiled with `-XTrustworthy`'', then its trust dependencies always include itself, plus the union of trust dependencies of all modules imported with the `safe` keyword.

* If a module was compiled with neither `-XSafe` nor `-XTrustworthy`, or, regardless of these options, if a module contains a `{-# LANGUAGE Untrustworthy #-}` pragma, then the interface file marks the module as not trustable, so there are no trust dependencies.

58

58

59

Currently, in any given run of the compiler, GHC classifies each package as either exposed or hidden. To incorporate trust, we now allow a second bit specifying whether each package is trusted or untrusted. This bit will be controllable by two new options to `ghc-pkg`, `trust` and `distrust`, which are analogous to `expose` and `hide`. (The trust bit is meaningless while a package is hidden, but will stay in the package database for when packages are exposed.)

59

Currently, in any given run of the compiler, GHC classifies each package as either exposed or hidden. To incorporate trust, we add a second bit specifying whether each package is trusted or untrusted. This bit will be controllable by two new options to `ghc-pkg`, `trust` and `distrust`, which are analogous to `expose` and `hide`. (The trust bit is meaningless while a package is hidden, but will stay in the package database for when the package is exposed.)

60

60

61

61

On the command line, several new options control which packages are trusted:

* A number of functions can be used to violate safety. Many of these have names prefixed with `unsafe` (e.g., {{{unsafePerformIO}}}, {{{unsafeIterleaveIO}}}, {{{unsafeCoerce}}} {{{unsafeSTToIO}}}, ...). However, not all unsafe functions fit this pattern. For instance, {{{inlinePerformIO}}} and {{{fromForeignPtr}}} from the `bytestring` package are unsafe.

80

80

81

* Code that defines hand-crafted instances of {{{Typeable}}} can violate safety by causing {{{typeOf}}} to return identical results on two distinct types, then using {{{cast}}} to coerce between the two unsafely. ''Are there other classes? Perhaps {{{Data}}} should also be restricted? Simon says {{{Ix}}} doesn't need to be protected anymore.''

81

* Code that defines hand-crafted instances of {{{Typeable}}} can violate safety by causing {{{typeOf}}} to return identical results on two distinct types, then using {{{cast}}} to coerce between the two unsafely. ''Are there other classes? Perhaps {{{Data}}} should also be restricted? Simon says {{{Ix}}} doesn't need to be protected anymore.''

82

82

83

83

* Certain exposed constructors of otherwise mostly safe data types allow unsafe actions. For instance, the {{{PS}}} constructor of {{{Data.ByteString.ByteString}}} contains a pointer, offset, and length. Code that can see the pointer value can act in a non-deterministic way by depending on the address rather than value of a {{{ByteString}}}. Worse, code that can use {{{PS}}} to construct {{{ByteString}}}s can include bad lengths that will lead to stray pointer references.

* {{{TemplateHaskell}}} is also particularly dangerous, as it can cause side effects even at compilation time.

88

88

89

* The {{{OverlappingInstances}}} extension can be used to violate semantic consistency, because untrusted code can potentially redefine a type instance (by containing a more specific instance definition) in a way that changes the behaviour of code importing the untrusted module.

89

* The {{{OverlappingInstances}}} extension can be used to violate semantic consistency, because malicious code could redefine a type instance (by containing a more specific instance definition) in a way that changes the behaviour of code importing the untrusted module.

* {{{OPTIONS_GHC}}} is dangerous in unfiltered form, as it could potentially make packages trusted that the invoking user doesn't trust.

93

* {{{OPTIONS_GHC}}} is dangerous in unfiltered form. Among other things, it could use `-trust` to trust packages the invoking user doesn't in fact trust.

94

94

95

95

* The {{{StandaloneDeriving}}} extension can be used to violate constructor access control by defining instances of {{{Read}}} and {{{Show}}} to examine and construct data values with inaccessible constructors.

* {{{-XSafe}}} should disallow the {{{TemplateHaskell}}}, {{{StandaloneDeriving}}}, {{{GeneralizedNewtypeDeriving}}}, and {{{CPP}}} language extensions, as well as {{{RULES}}} and {{{SPECIALIZE}}} pragmas.

105

* {{{-XSafe}}} should disallow the {{{TemplateHaskell}}}, {{{StandaloneDeriving}}}, {{{GeneralizedNewtypeDeriving}}}, and {{{CPP}}} language extensions, as well as the {{{RULES}}} and {{{SPECIALIZE}}} pragmas.

106

106

107

107

* Overlapping instance declarations must either all reside in modules compiled without `-XSafe`, or else all reside in the same module. It violates semantic consistency to allow Safe code to change the instance definition associated with a particular type.

* {{{OPTIONS_GHC}}} pragmas will have to be filtered. Some options, (e.g., `-fno-warn-unused-do-bind`) are totally fine, but many others are likely problematic (e.g., {{{-cpp}}}, which provides access to the local file system at compilation time, or {{{-F}}} which allows an arbitrary file to be executed, possibly even one named {{{/afs/}}}... and hence entirely under an attacker's control).

110

110

111

* Libraries will progressively need to be updated to export safe interfaces, which may require moving unsafe functions into separate modules, or adding new {{{ {-# LANGUAGE Safe #-} }}} modules that re-export a safe subset of symbols. Ideally, most modules in widely-used libraries would eventually contain either {{{ {-# LANGUAGE Safe -#} }}} or {{{ {-# LANGUAGE Trustworthy -#} }}} pragmas, except for internal modules or a few modules exporting unsafe symbols. Maybe haddock could add some indicator to make it obvious which modules are safe.

112

113

* The {{{-XTrustworthy}}} command-line option and corresponding pragma do not increase what a module can do--the only effect is to mark the module's interface file as trustable and add the module to its own trust dependency set. If both {{{-XTrustworthy}}} and {{{-XSafe}}} are supplied, then the language is still restricted to the Safe dialect and any unsafe actions will still cause a compilation error. A plausible use for both pragmas simultaneously is to prune the list of trusted modules--for instance, if a module imports a bunch of trusted modules but does not use any of their trusted features, or only uses those features in a very limited way. If the code happens also to be safe, the programmer may want to add {{{-XSafe}}} to catch accidental unsafe actions.

114

115

* The option `{-# LANGUAGE Untrustworthy -#}` is not incompatible with `{-# LANGUAGE Safe -#}`. The former causes the interface file to be marked not trustable, while the latter causes the source code to be confined to the safe dialect. `Untrustworthy` should be used in seemingly safe modules that export constructors that would cause other modules to do unsafe things. (The `PS` constructor discussed above is an example of a dangerous constructor that could potentially be defined in a module that happily compiles with `-XSafe`.)

111

* Libraries will progressively need to be updated to export trustable interfaces, which may require moving unsafe functions into separate modules, or adding new `{-# LANGUAGE Trustworthy #-}` modules that re-export a safe subset of symbols. Ideally, most modules in widely-used libraries will eventually contain either `{-# LANGUAGE Safe -#}` or `{-# LANGUAGE Trustworthy -#}` pragmas, except for internal modules or a few modules exporting unsafe symbols. Maybe haddock can add some indicator to make it obvious which modules are trustable and show the trust dependencies.

112

113

* When `-XSafe` and `-XTrustworthy` are used together, the language is restricted to the Safe dialect. The effect of `-XTrustworthy` is to change the trust dependency set. Specifically, the trust dependency set will include the module itself. However, rather than include the union of trust dependency sets of all imported modules, only dependencies of modules imported with the `safe` keyword are added to the current module's set. A plausible use for both pragmas simultaneously is to prune the list of trusted modules--for instance, if a module imports a bunch of trusted modules but does not use any of their trusted features, or only uses those features in a very limited way. If the code happens also to be safe, the programmer may want to add `-XSafe` to catch accidental unsafe actions.

114

115

* The option `{-# LANGUAGE Untrustworthy -#}` is also not incompatible with `{-# LANGUAGE Safe -#}`. The former causes the interface file to be marked not trustable, while the latter causes the source code to be confined to the Safe dialect. `Untrustworthy` should be used in seemingly safe modules that export constructors that would allow other modules to do unsafe things. (The `PS` constructor discussed above is an example of a dangerous constructor that could potentially be defined in a module that happily compiles with `-XSafe`.)

116

117

== Intended uses ==

118

119

We anticipate the Safe dialect and corresponding options being used in several ways.

120

121

=== Enforcing good programming style ===

122

123

Over-reliance on magic functions such as `unsafePerformIO` or magic symbols such as `#realWorld` can lead to less elegant Haskell code. The Safe dialect formalizes this notion of magic and prohibits its use. Thus, people may encourage their collaborators to use the Safe dialect, except when truly necessary, so as to promote better programming style.

124

125

=== Restricted IO monads ===

126

127

When defining interfaces for possibly malicious plugin modules, the interface can require the plugin to provide a computation in a monad that allows only restricted IO actions. For instance, consider defining an interface for a module `Danger` provided by an untrusted programmer. `Danger` should be allowed to read and write particular files (by name), but should not be able do any other form of IO, even though we don't trust it's author not to try.

128

129

We define the plugin interface so that it requires `Danger` to export a single computation, `Danger.runMe`, of type {{{RIO ()}}}, where `RIO` is a new monad defined as follows:

130

131

{{{

132

-- Either or both of the following pragmas would do

133

{-# LANGUAGE Trustworthy #-}

134

{-# LANGUAGE Safe #-}

135

136

module RIO (RIO(), runRIO, rioReadFile, rioWriteFile) where

137

138

-- Notice that symbol UnsafeRIO is not exported from this module!

139

140

newtype RIO a = UnsafeRIO { runRIO :: IO a }

141

142

instance Monad RIO where

143

return = UnsafeRIO . return

144

(UnsafeRIO m) >>= k = UnsafeRIO $ m >>= runRIO . k

145

146

-- Returns True iff access is allowed to file name

147

pathOK :: FilePath -> IO Bool

148

pathOK file = {- Implement some policy based on file name -}

149

150

rioReadFile :: FilePath -> RIO String

151

rioReadFile file = UnsafeRIO $ do

152

ok <- pathOK file

153

if ok then readFile file else return ""

154

155

rioWriteFile :: FilePath -> String -> RIO ()

156

rioWriteFile file contents = UnsafeRIO $ do

157

ok <- pathOK file

158

if ok then writeFile file contents else return ()

159

}}}

160

161

We compile `Danger` using the `-XSafe` flag. `Danger` can import module `RIO` because `RIO` is marked `Trustworthy`. Thus, `Danger` can make use of the `rioReadFile` and `rioWriteFile` functions to access permitted file names.

162

163

The main application then imports both `RIO` and `Danger`. To run the plugin, it calls {{{RIO.runRIO Danger.runMe}}} within the IO monad. The application is safe in the knowledge that the only IO to ensue will be to files whose paths were approved by the `pathOK` test. We are relying on the fact that the type system and constructor privacy prevent `RIO` computations from executing `IO` actions directly. Only functions with access to privileged symbol `UnsafeRIO` can lift `IO` computations into the `RIO` monad.

164

165

[Note that as shown, RIO could fall victim to TOCTTOU bugs or symbolic links, but the same approach applies to more secure monads.]

166

167

=== Restricted `IO` imports ===

168

169

An alternate approach to sandboxing possibly malicious plugins is to allow the code to execute `IO` actions, but to limit the primitive `IO` actions such code can import. In this case, the plugin module `Danger` must be compiled with `-ultrasafe`. Moreover, it will import a module such as the following:

170

171

{{{

172

{-# LANGUAGE Trustworthy #-}

173

174

module SafeIO (rioReadFile, rioWriteFile

175

, module RestrictedPrelude) where

176

177

import RestrictedPrelude -- Subset of Prelude without IO actions

178

179

-- Returns True iff access is allowed to file name

180

pathOK :: FilePath -> IO Bool

181

pathOK file = {- Implement some policy based on file name -}

182

183

rioReadFile :: FilePath -> IO String

184

rioReadFile file = do

185

ok <- pathOK file

186

if ok then readFile file else return ""

187

188

rioWriteFile :: FilePath -> String -> IO ()

189

rioWriteFile file contents = do

190

ok <- pathOK file

191

if ok then writeFile file contents else return ()

192

}}}

193

194

In this case, the type of `Danger.runMe` will be `IO ()`. However, because `-ultrasafe` implies `-distrust-all-packages`, the only modules `Danger` can import are trustable modules whose entire trust dependency set lies in the current package. Let's say that `SafeIO` and `Danger` are the only two such modules. We then know that the only IO actions `Danger.runMe` can directly execute are `rioReadFile` and `rioWriteFile`.