Category Archives: Programming

Post navigation

A common complaint from a co-worker is not being able to find relevant library functionality. We have libraries that do some tasks well, but if you haven’t used it before, how are you to know that it is there. More over, how do you find what you are looking for from all of the available utility libraries currently loaded.

After seeing Peter Seibel’s Manifest screencast. I was struck by the idea that you could index all the doc strings to provide a powerful search tool. I dont know about powerful yet, but this idea has turned into at least a search tool: Manifest-Search. This is the product of one days hacking and so should not be construed as the end-all-be-all common lisp search tool, however, it is at least a step in that direction.

I would like to eventually get this integrated more fully with both quicklisp and manifest, but that is all in the future. I think it would be amazing to search for functionality I need, and get documentation for a library I have not yet installed, but is distributed by quicklisp.

In the first released version of access I defined the setf versions as (defun (setf accesses) (new o &rest keys)…). In order to make this work out for plists and alists (where adding a key can result in a new HEAD element), I was forced to return the updated object rather than the “new” value that setf usually returns. I was unhappy with this oddity at the time but didn’t know directly how to fix it (obviously some macrology was in order to capture the “place” being modified).

Today I looked into the docs for define-set-expander and saw how to transform my code into “correct” setf’s. To do this i transformed my previous setf functions into set-access and set-accesses which return (values new-value possibly-new-object). I then define my setf expanders in terms of calling those functions and setting the place passed in to possibly-new-object. It took a little while to figure out and I’m still not entirely sure I wrote the optimal common lisp for this. However I was able to elide the outer setf from these expressions in the tests (setf pl (setf (access pl ‘one) ‘new-val)) and now the new (setf (access pl ‘one) ‘new-val) returns ‘new-val as would be expected.

There were some requests for more, better examples of where access might be useful:

My html components have a plist representing direct html attributes. I update these with (setf (accesses ctl ‘attributes ‘name) “myFormName”) and its correlary (accesses ctl #’attributes :name). Note that both forms work even though one uses a local symbol and one a keyword (they are compared by symbol-name so that I can think about it less). Also I am ok referring to the attributes function by name or function object (both will result in calling the attributes function on ctl).

Another example from the web domain: I often store a reference to a database object on the control that is responsible for displaying it. Thus getting the database primary key off of the data for a control can be (accesses client-form ‘data ‘adwolf-db:accountid). This allows me (where useful) to ignore the difference between an unsaved, new object and an object that hasn’t been created yet (for things like putting the id in the url, the difference is irrelevant).

While not currently implemented this way, my group-by library which groups items into nested alists or hashtable could potentially use access to handle the different implementations

Access is a common lisp library I just culled out of our immense utility mud ball and refactored into a library all its own. Access makes getting and setting values in common data structures support a single unified api. As such you could access a specific key from an alist stored in a hashtable stored in the slot of an object as (accesses o ‘k1 ‘k2 ‘k3). It also supports setting values (setf (accesses o ‘k1 ‘k2 ‘k3) “new-val”). Obviously there are some limitations to this approach, but for me, with my coding conventions, I don’t tend to run into them (see the README for details).

Access has removed some of my need for forms like (awhen a (awhen (fn1 it) (fn2 it))) with (access a ‘fn1 ‘fn2). To me, it allows me to more accurately express what I am trying to do while ignoring the vagaries of shifting implementation details. It also eases setting values in nested objects because it handles propagating the value up the chain rather than me having to do that myself (ie adding a new key-value pair to a the front of an alist stored in an object, automatically saves the new resulting alist in the object). I don’t expect that this is tasteful coding, but it is easier and allows me to not get mired down trying to decide if I want it to be an alist, plist, hashtable, or object because the cost to change it later is essentially zero.

Performance is rarely in issue in the apps that I tend to write. However, if it were, I would not use access as it does significant type and dispatch analysis that could be avoided by using the specific access functions of the data structure I am using.

A dot syntax familiar to those who use javascript/python/ruby type languages is available as well. This transforms calls like foo.bar.bast into (accesses foo ‘bar ‘bast). I don’t use this syntax as I tend to prefer the lisp function-call syntax, but it seems to be an oft requested / discussed feature, and I had fun writing the code.

One of our many wordpresses was not allowing you to crop images. I tracked this down to the image failing to load which in turn was caused by an extra \r\n preceding the image content. This extra line-break is caused when an included php file ends in ?>\r\n. Because php writes any content outside of a php tag to the output stream, this causes an extra newline to precede any other content you might have been trying to send (such as a jpeg image). This can cause all sorts of problems, in this case corrupting the JPEG output.

To fix this problem I investigated how to get grep to search in multiline mode (install pcregrep). I then had the trouble that $ matches end of line rather than end of file. After some googling I found that \z will match end of file, and with that I was off to the races. This pcregrep expression will allow you to find php files with pesky trailing space issues.

The offending plugin in my case was an older version of wp-e-commerce (which is not easily upgradeable). After finding all the files with trailing whitespace and removing it, I could now crop images in wordpress again.

In my previous blog post, I discussed how recursive-regex has been
maturing but that it still wasn’t and was not intended to compete with
actual parser toolkits. I wanted to quantify this assumption and
present a head-to-head analysis of using recursive-regex vs usingcl-yacc.
I chose cl-yacc because
I already had a working css3-selector parser implemented. My
existing cl-yacc implementation is based on
the published
CSS3-Selectors Grammar (but the modified some to get it to work).

CL-YACC Parser

I found implementing the parser in cl-yacc to be fairly tedious,
time consuming and error prone, even for such a small language as
css-selectors. It doesn’t help that I tried to do it using the
published css3 grammar and lex files which are somewhat awkward (eg:
open parens are in the lexer and close parens are in the grammar).

I wrote a not very well documented (f)lex file reader to read in theexisting CSS3 flex
file and build a lexer compatible with cl-yacc. After getting a
valid lexer, I started working with the published grammar to convert
it to a format cl-yacc would approve of. Along the way I was finding
the syntax for cl-yacc to be pretty cumbersome, so I used some
read time execution to turn forms like the first below, into forms like
the second (macro expanded) one below which are valid cl-yacc
productions. (This is not a great idea or great code, but it did simplify
the parser def for me.)

The difficulties I had in implementation were mostly concerning where
white space could appear, in which productions, at which levels in the
grammar. It seemed like it was quite a task to get an unambiguous
grammar and have all my extra unimportant white space everywhere still
be parsable. After I had finally managed to rid my grammar of
reduce/reduce conflicts, I was able to parse my language and it seemed
like a fairly peppy parser.

The only problem I really have with this implementation is that it
seems like it is totally illegible after the fact. Even knowing about
parsers and having written this one, I don’t feel comfortable
modifying the language. It seems like it would be difficult to get
working again. Thankfully, I don’t anticipate having to do much
language rewriting in this case.

CL-YACC parser definition (lexer not shown)

(yacc:define-parser*css3-selector-parser*(:start-symbolselector)(:terminals(:|,|:|*|:|)|:|(|:|>|:|+|:|~|:|:|:|[|:|]|:|=|:|-|:S:IDENT:HASH:CLASS:STRING:FUNCTION:NTH-FUNCTION:INCLUDES:DASHMATCH:BEGINS-WITH:ENDS-WITH:SUBSTRING:integer))(:precedence((:left:|)|:s:|,|:|+|:|~|)))(selector#.(rule(or-sel)or-sel))(or-sel#.(rule(comb-sel:|,|spacesor-sel)(list:orcomb-selor-sel))#.(rule(comb-sel)comb-sel))(comb-sel#.(rule(and-selcombinatorcomb-sel)(listcombinatorand-selcomb-sel))#.(rule;; need to handle trailing spaces here;; to avoid s/r(and-selspaces)and-sel))(combinator(:s(constantly:child))(spaces:|>|spaces(constantly:immediate-child))(spaces:|~|spaces(constantly:preceded-by))(spaces:|+|spaces(constantly:immediatly-preceded-by)))(and-sel#.(rule(and-selsimple-selector)(list:andand-selsimple-selector))#.(rule(simple-selector)simple-selector))(simple-selector#.(rule(:HASH)`(:hash,(but-firsthash)))#.(rule(:CLASS)`(:class,(but-firstclass)))#.(rule(:IDENT)`(:element,ident))(:|*|(constantly:everything))(attrib#'identity)(pseudo#'identity))(attrib#.(rule(:|[|spaces:IDENTspaces:|]|)`(:attribute,ident))#.(rule(:|[|spaces:IDENTspacesattrib-value-defspaces:|]|)`(:attribute,ident,attrib-value-def)))(attrib-value-def#.(rule(attrib-match-typeattrib-value)(listattrib-match-typeattrib-value)))(attrib-match-type#.(rule(:|=|):equals)#.(rule(:includes):includes)#.(rule(:dashmatch):dashmatch)#.(rule(:begins-with):begins-with)#.(rule(:ends-with):ends-with)#.(rule(:substring):substring))(attrib-value#.(rule(:ident)ident)#.(rule(:string)(but-quotesstring)))(pseudo#.(rule(:|:|:IDENT)(list:pseudoident))#.(rule(:|:|:FUNCTIONspacesselector:|)|)(list:pseudo(but-lastfunction)selector))#.(rule(:|:|:NTH-FUNCTIONspacesnth-exprspaces:|)|)`(:nth-pseudo,(but-lastnth-function),@nth-expr)))(nth-expr#.(rule(:ident)(cond((string-equalident"even")(list20))((string-equalident"odd")(list21))(T(error"invalid nth subexpression"))))#.(rule(nth-sign:integer)(list0(if(string-equalnth-sign"-")(*-1(parse-integerinteger))(parse-integerinteger))))#.(rule(nth-sign:integer:ident)(let(extra-num)(cond((string-equal"n"ident)T);; this is because our lexer will recogince n-1 as a valid ident;; but n+1 will hit the rule below((alexandria:starts-with-subseq"n"ident)(setfextra-num(parse-integer(subseqident1))))(T(error"invalid nth subexpression in (what is ~A)"ident)))(list(or(if(string-equalnth-sign"-")(*-1(parse-integerinteger))(parse-integerinteger))0)(orextra-num0))))#.(rule(nth-sign:integer:identnth-sign:integer)(when(andinteger-1(nullnth-sign-1))(error"invalid nth subexpression 2n+1 style requires a sign before the second number"))(list(or(if(string-equalnth-sign-0"-")(*-1(parse-integerinteger-0))(parse-integerinteger-0))0)(or(if(string-equalnth-sign-1"-")(*-1(parse-integerinteger-1))(parse-integerinteger-1))0))))(nth-sign#.(rule(:|+|):|+|)#.(rule(:|-|):|-|)#.(rule()()))(spaces(:S)()))

Recursive-Regex Parser

After having written recursive-regex, I wanted a way to beat some of
the bugs out as well as have a good example of what I could use this
tool for, and with what performance characteristics. To accomplish
this task, I converted the CL-YACC grammar and lex file into a single
parser definition and
wrote some code to turn that file into the recursive dispatch functions.

REX: a recursive expression file format based (loosely) on lex

I had a really shoddy lexish reader for the existing lex based lexer.
I converted this to a similar tool (without the c blocks) for defining
named recursive expressions. These files have the .rex extension.
They consists of options, inline-definitions, and named productions.
The definitions for css-selectors are in css3.rex.

Once I had my recursive expression definition, getting the parser was
a fairly easy task. Along the way I added some code to minimize the
parse tree results by promoting children, when a parent only had a
single child that matched the same length as the parent. I also
improved the parse tracing, so that I could observer and debug what
the expression was doing while it was matching. With the tree minimized
I also had to revise many of my tests.

Performance Numbers

These are the performance numbers for the two parsers each parsing 6
short inputs 1000 times. Also included is (a version) of that
output. (Recursive-expressions return CLOS object trees that have in
the results below been converted to a list representation for easy of
viewing.) As you can see the recursive-expressions version is ten
times slower and uses twenty times the memory.

TL/DR

In conclusion, I am certainly not going to replace my working, fast,
memory efficient cl-yacc parser with my recursive-expressions parser.
However, if I wanted to have a working, legible (maybe) parser
definition, that will match as I intuitively expect, I might use
recursive-expressions. Because I am so used to using regex’s for
matching, if performance was not an issue, I would probably always
prefer the recursive expressions version. I could also see the
recursive expressions solution being a nice prototyping tool to help
develop the cl-yacc parser.

Obviously some of these opinions are going to be biased because
I wrote one of these libraries and not the other

CL-YACC

Pros

Pretty quick and very memory efficient parser

Easy to customize parser output (each production has a lambda body to build whatever output is necessary)

Theoretically well grounded

Cons

Its hard to make unambiguous grammars

Not exceedingly helpful with its suggestions for how to fix your ambiguities

Recursive Regex

Pros

Relatively easy to get working

lexer and parser are the same tool, built on top of cl-ppcre, which
you presumably already know, and has a good test environment (regex-coach, repl)

Parses ambiguous grammars

Has reasonable parser tracing built in, so debugging can be somewhat easier

Cons

Not very concerned with parse time or memory consumption

Parses ambiguous grammars

Bad pathological cases (exponential search)

Currently no architecture for modifying the parse tree other than an after
the fact rewrite

While this started as a toy, to scratch an intellectual itch, I think that this project is potentially a nice mid point between full blown parser frame work and regular expressions. Grammars are hard to get right though, so if you are writing your own language you might want to investigate something from the cliki parser generators page (eg: cl-yacc).

Recursive-Regex is the end result of a weekend of playing with the code I published on Thursday about adding named dispatch functions to CL-PPCRE regular expressions. I kept at it and I think that this approach might have some promise for building up a library of reusable regexp/matcher chunks. I also found that this made it somewhat easier to obtain results from the regular expression search because I get back a full parse tree rather than the bindings typically supplied by CL-PPCRE.

I have it somewhat documented, loadable and testable, with all my current tests passing. There is even a recursive regex csv parser defined in the default dispatch table (mostly as a simple, but practical proof of concept).

A while ago I posted about my adventures playing with CL-PPCRE filter functions. In the previous blog post I destructively modify a cl-ppcre parse tree to add a filter function that can handle matching matched pairs of parentheses (a typical example of what regular expressions are NOT capable of). In this post I formalize that example into something that could be more broadly applied with less understanding of the underlying mechanics.

To begin with I define a function create-scanner-with-filters that will handle creating these special scanners for me. My idea is to provide a table of functions that should be called when we see certain strings inside of the regular expression. Because there are already named groups (see *allow-named-registers*) that can have parameters and that CL-PPCRE is already parsing for me, I decided to tie into the named registers to handle my function dispatching. This has the added niceness that whatever your filter matches is going to be stored in a register.

An over view of this process is: parse the regex, replace any named-register nodes’ (that have a function in the table) third element (usually a regex whose match will be stored in a register) with our specialized filter function, compile the new scanner and return that to the end user. I also decided that the regex that is the body of the named group should be available to the filter and in most cases should probably be used as part of the filter function.

If I continue to play with this, I might eventually release it as a library, but for now its stands well on its own.

Without further ado:

(cl-interpol:enable-interpol-syntax)(declaim(optimize(debug3)));; TODO: group binds in body expressions;; TODO: propogate current scanner options to body scanners(defunmake-matched-pair-matcher(open-charclose-char)"Will create a regex filter that can match arbitrary pairs of matched characters such as (start (other () some) end)"(lambda(body-regex)(setfbody-regex(if(eqlbody-regex:void)nil(cl-ppcre:create-scanner`(:SEQUENCE:START-ANCHOR,body-regex:END-ANCHOR))))(lambda(pos);;(format T "TEST3 ~A ~A ~%" cl-ppcre::*reg-starts* cl-ppcre::*reg-ends*)(iter(withfail=nil)(withstart=pos)(withcnt=0)(forc=(charcl-ppcre::*string*pos))(if(first-iteration-p)(unless(eqlcopen-char)(returnfail));; went past the string without matching(when(>=pos(lengthcl-ppcre::*string*))(returnfail)))(cond((eqlcopen-char)(incfcnt))((eqlcclose-char)(decfcnt)(when(zeropcnt);; found our last matching char(if(or(nullbody-regex)(cl-ppcre:scanbody-regexcl-ppcre::*string*:start(+1start):endpos))(return(+1pos))(returnfail)))))(incfpos)))))(defundefault-dispatch-table()"Creates a default dispatch table with a parens dispatcher that can match pairs of parentheses"`(("parens".,(make-matched-pair-matcher#\(#\)))))(defuncreate-scanner-with-filters(regex&optional(function-table(default-dispatch-table)))"Allows named registers to refer to functions that should be in the place of the named register"(let*((cl-ppcre:*allow-named-registers*T)(p-tree(cl-ppcre:parse-stringregex)))(labels((dispatcher?(name)"Return the name of the dispatcher from the table if applicable"(cdr(assocnamefunction-table:test#'string-equal)))(mutate-tree(tree)"Changes the scanner parse tree to include any filter functions specified in the table"(typecasetree(nullnil)(atomtree)(list(aif(and(eql:named-register(firsttree))(dispatcher?(secondtree)))`(:named-register(secondtree)(:filter,(funcallit(thirdtree))))(iter(foritemintree)(collect(mutate-treeitem))))))));; mutate the regex to contain our matcher functions;; then compile it(cl-ppcre:create-scanner(mutate-treep-tree)))))(defparameter*example-function-phrase*"some times I like to \"function (calling all coppers (), another param (), test)\" just to see what happens")(defunrun-examples()"Just runs some examples expected results: ((\"function (calling all coppers (), another param (), test)\" #(\"(calling all coppers (), another param (), test)\")) (\"function (calling all coppers (), another param (), test)\" #(\"(calling all coppers (), another param (), test)\")) (NIL)) "(flet((doit(regex)(multiple-value-list(cl-ppcre:scan-to-strings(create-scanner-with-filtersregex)*example-function-phrase*))))(list(doit#?r"function\s*(?<parens>)")(doit#?r"function\s*(?<parens>([^,]+,)*[^,]+)")(doit#?r"function\s*(?<parens>not-matching-at-all)"))))

PS. I don’t claim this is actually worth anything, only that I had fun doing it.

I have quite a few database driven web applications that make heavy use of tabular imports and exports (from their primary database, other databases, and exterior data sources (eg: CSVs). This data structure provides column, row, and cell access to getting and setting values, as well as providing functionality to create composite data-tables by retrieving and combining subsections of existing data-tables. This library also aims to ease type coercion from strings to common-lisp types.

I had many scattered, not well tested, not easily runnable pieces of CSV code. I was unhappy with this situation, then decided to refactor all of this into a single project. I wrote tests for it and had a library so I thought I might release it. This project started as extensions and bugfixes on arnesi’s CSV.

I then looked around and saw there are other CSV libraries out there that probably mostly accomplished what I had set out to do. However, I already had my code that was tested, had an easier license (BSD), and provided a framework to interact with my other libraries and systems, so I figured why not just release it anyway.

The only interesting code in this library (to me) is that I managed to make the read/write-csv functions accept a string, pathname, or stream as the first argument and I managed to make sure that streams get closed if these functions created them (file streams for example), but not if the stream was passed in. Nothing great, but I had fun writing it.

Other niceties I would like to continue to build out in this library is its integration with other related libs (like CLSQL). I have code to handle exporting database queries as CSVs as well as code to handle importing CSVs into databases both serially and in bulk. I also use data-tables to have a lisp representation of the just parsed data-table and to coerce that table of string values into relevant common-lisp types.