Monthly Archives: August 2011

Recursive-Regex is the end result of a weekend of playing with the code I published on Thursday about adding named dispatch functions to CL-PPCRE regular expressions. I kept at it and I think that this approach might have some promise for building up a library of reusable regexp/matcher chunks. I also found that this made it somewhat easier to obtain results from the regular expression search because I get back a full parse tree rather than the bindings typically supplied by CL-PPCRE.

I have it somewhat documented, loadable and testable, with all my current tests passing. There is even a recursive regex csv parser defined in the default dispatch table (mostly as a simple, but practical proof of concept).

A while ago I posted about my adventures playing with CL-PPCRE filter functions. In the previous blog post I destructively modify a cl-ppcre parse tree to add a filter function that can handle matching matched pairs of parentheses (a typical example of what regular expressions are NOT capable of). In this post I formalize that example into something that could be more broadly applied with less understanding of the underlying mechanics.

To begin with I define a function create-scanner-with-filters that will handle creating these special scanners for me. My idea is to provide a table of functions that should be called when we see certain strings inside of the regular expression. Because there are already named groups (see *allow-named-registers*) that can have parameters and that CL-PPCRE is already parsing for me, I decided to tie into the named registers to handle my function dispatching. This has the added niceness that whatever your filter matches is going to be stored in a register.

An over view of this process is: parse the regex, replace any named-register nodes’ (that have a function in the table) third element (usually a regex whose match will be stored in a register) with our specialized filter function, compile the new scanner and return that to the end user. I also decided that the regex that is the body of the named group should be available to the filter and in most cases should probably be used as part of the filter function.

If I continue to play with this, I might eventually release it as a library, but for now its stands well on its own.

Without further ado:

(cl-interpol:enable-interpol-syntax)(declaim(optimize(debug3)));; TODO: group binds in body expressions;; TODO: propogate current scanner options to body scanners(defunmake-matched-pair-matcher(open-charclose-char)"Will create a regex filter that can match arbitrary pairs of matched characters such as (start (other () some) end)"(lambda(body-regex)(setfbody-regex(if(eqlbody-regex:void)nil(cl-ppcre:create-scanner`(:SEQUENCE:START-ANCHOR,body-regex:END-ANCHOR))))(lambda(pos);;(format T "TEST3 ~A ~A ~%" cl-ppcre::*reg-starts* cl-ppcre::*reg-ends*)(iter(withfail=nil)(withstart=pos)(withcnt=0)(forc=(charcl-ppcre::*string*pos))(if(first-iteration-p)(unless(eqlcopen-char)(returnfail));; went past the string without matching(when(>=pos(lengthcl-ppcre::*string*))(returnfail)))(cond((eqlcopen-char)(incfcnt))((eqlcclose-char)(decfcnt)(when(zeropcnt);; found our last matching char(if(or(nullbody-regex)(cl-ppcre:scanbody-regexcl-ppcre::*string*:start(+1start):endpos))(return(+1pos))(returnfail)))))(incfpos)))))(defundefault-dispatch-table()"Creates a default dispatch table with a parens dispatcher that can match pairs of parentheses"`(("parens".,(make-matched-pair-matcher#\(#\)))))(defuncreate-scanner-with-filters(regex&optional(function-table(default-dispatch-table)))"Allows named registers to refer to functions that should be in the place of the named register"(let*((cl-ppcre:*allow-named-registers*T)(p-tree(cl-ppcre:parse-stringregex)))(labels((dispatcher?(name)"Return the name of the dispatcher from the table if applicable"(cdr(assocnamefunction-table:test#'string-equal)))(mutate-tree(tree)"Changes the scanner parse tree to include any filter functions specified in the table"(typecasetree(nullnil)(atomtree)(list(aif(and(eql:named-register(firsttree))(dispatcher?(secondtree)))`(:named-register(secondtree)(:filter,(funcallit(thirdtree))))(iter(foritemintree)(collect(mutate-treeitem))))))));; mutate the regex to contain our matcher functions;; then compile it(cl-ppcre:create-scanner(mutate-treep-tree)))))(defparameter*example-function-phrase*"some times I like to \"function (calling all coppers (), another param (), test)\" just to see what happens")(defunrun-examples()"Just runs some examples expected results: ((\"function (calling all coppers (), another param (), test)\" #(\"(calling all coppers (), another param (), test)\")) (\"function (calling all coppers (), another param (), test)\" #(\"(calling all coppers (), another param (), test)\")) (NIL)) "(flet((doit(regex)(multiple-value-list(cl-ppcre:scan-to-strings(create-scanner-with-filtersregex)*example-function-phrase*))))(list(doit#?r"function\s*(?<parens>)")(doit#?r"function\s*(?<parens>([^,]+,)*[^,]+)")(doit#?r"function\s*(?<parens>not-matching-at-all)"))))

PS. I don’t claim this is actually worth anything, only that I had fun doing it.

I have quite a few database driven web applications that make heavy use of tabular imports and exports (from their primary database, other databases, and exterior data sources (eg: CSVs). This data structure provides column, row, and cell access to getting and setting values, as well as providing functionality to create composite data-tables by retrieving and combining subsections of existing data-tables. This library also aims to ease type coercion from strings to common-lisp types.

I had many scattered, not well tested, not easily runnable pieces of CSV code. I was unhappy with this situation, then decided to refactor all of this into a single project. I wrote tests for it and had a library so I thought I might release it. This project started as extensions and bugfixes on arnesi’s CSV.

I then looked around and saw there are other CSV libraries out there that probably mostly accomplished what I had set out to do. However, I already had my code that was tested, had an easier license (BSD), and provided a framework to interact with my other libraries and systems, so I figured why not just release it anyway.

The only interesting code in this library (to me) is that I managed to make the read/write-csv functions accept a string, pathname, or stream as the first argument and I managed to make sure that streams get closed if these functions created them (file streams for example), but not if the stream was passed in. Nothing great, but I had fun writing it.

Other niceties I would like to continue to build out in this library is its integration with other related libs (like CLSQL). I have code to handle exporting database queries as CSVs as well as code to handle importing CSVs into databases both serially and in bulk. I also use data-tables to have a lisp representation of the just parsed data-table and to coerce that table of string values into relevant common-lisp types.

We use SBCL as our primary Common Lisp Implementation. It is a great runtime, but there is always room for improvement. Nikodemus Siivola is currently fundraising for threading improvements. If you love free, awesome common lisp implementations, please support this project.

A commonly experienced error when using CLSQL in a web environment is database connections conflicting with each other from simultaneous web requests. These problems arise because, by default, clsql standard-db-objects keep a reference to the connection they were queried / created from and reuse this database connection (rather than a new one you may have provided with clsql-sys:with-database). This means that two separate threads could try to use the same database connection (provided through clsql-sys:with-database or by having objects queried from the same connection accessed in multiple threads / http requests).

We solved this problem by introducing a clsql-sys::choose-database-for-instance method (available in clsql master branch from http://git.b9.com/clsql.git. (This branch will eventually be released as CLSQL6) Then in our web applications we define the following class and method override. Usually I then pass this name to clsql-orm or as a direct superclass to any of my web def-view-classes. After this, I just use with-database to establish dynamic connection bindings and everything pretty much works out (as these dynamic bindings are not shared across threads).