haskell - How do I avoid memory problems when writing to file using the Writer monad?

问题描述:

I am building some moderately large DIMACS files, however with the method used below the memory usage is rather large compared to the size of the files generated, and on some of the larger files I need to generate I run in to out of memory problems.

import Control.Monad.State.Strict

import Control.Monad.Writer.Strict

import qualified Data.ByteString.Lazy.Char8 as B

import Control.Monad

import qualified Text.Show.ByteString as BS

import Data.List

main = printDIMACS "test.cnf" test

test = do

xs <- freshs 100000

forM_ (zip xs (tail xs))

(\(x,y) -> addAll [[negate x, negate y],[x,y]])

type Var = Int

type Clause = [Var]

data DIMACSS = DS{

nextFresh :: Int,

numClauses :: Int

} deriving (Show)

type DIMACSM a = StateT DIMACSS (Writer B.ByteString) a

freshs :: Int -> DIMACSM [Var]

freshs i = do

next <- gets nextFresh

let toRet = [next..next+i-1]

modify (\s -> s{nextFresh = next+i})

return toRet

fresh :: DIMACSM Int

fresh = do

i <- gets nextFresh

modify (\s -> s{nextFresh = i+1})

return i

addAll :: [Clause] -> DIMACSM ()

addAll c = do

tell

(B.concat .

intersperse (B.pack " 0\n") .

map (B.unwords . map BS.show) $ c)

tell (B.pack " 0\n")

modify (\s -> s{numClauses = numClauses s + length c})

add h = addAll [h]

printDIMACS :: FilePath -> DIMACSM a -> IO ()

printDIMACS file f = do

writeFile file ""

appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])

B.appendFile file b

where

(s,b) = runWriter (execStateT f (DS 1 0))

i = nextFresh s - 1

j = numClauses s

I would like to keep the monadic building of clauses since it is very handy, but I need to overcome the memory problem. How do I optimize the above program so that it doesn't use too much memory?

网友答案:

If you want good memory behavior, you need to make sure that you write out the clauses as you generate them, instead of collecting them in memory and dumping them as such, either using lazyness or a more explicit approach such as conduits, enumerators, pipes or the like.

The main obstacle to that approach is that the DIMACS format expects the number of clauses and variables in the header. This prevents the naive implementation from being sufficiently lazy. There are two possibilities:

The pragmatic one is to write the clauses first to a temporary location. After that the numbers are known, so you write them to the real file and append the contents of the temporary file.

The prettier approach is possible if the generation of clauses has no side effects (besides the effects offered by your DIMACSM monad) and is sufficiently fast: Run it twice, first throwing away the clauses and just calculating the numbers, print the header line, run the generator again; now printing the clauses.

(This is from my experience with implementing SAT-Britney, where I took the second approach, because it fitted better with other requirements in that context.)

Also, in your code, addAll is not lazy enough: The list c needs to be retained even after writing (in the MonadWriter sense) the clauses. This is another space leak. I suggest you implement add as the primitive operation and then addAll = mapM_ add.

网友答案:

As explained in Joachim Breitner's answer the problem was that DIMACSM was not lazy enough, both because the strict versions of the monads was used and because the number of variables and clauses are needed before the ByteString can be written to the file. The solution is to use the lazy versions of the Monads and execute them twice. It turns out that it is also necessary to have WriterT be the outer monad: