Details

Description

With data reader support, it's impossible to write a program to read an arbitrary stream of Clojure forms. For example, the following code will fail with the current 1.4.0-beta1 tagged literal support:

#point [0 2]

It might be enough to require that the read side define a reader for point, but what if we do not wish to require that both sides of the conversation know about the #point tag beforehand? Using the identity function as a default allows the reader to handle the literal form as is even when it doesn't recognize a tag (or even care to translate it).

The change from the current behavior is that missing tagged literal parsers are no longer error conditions.

Activity

I'd like to preserve the unknown literal tag as well as the value. That would allow a program to recreate the printed representation. A map would work as the value. You'd get something like: {:unknown-literal point :value [0 2]}. If you needed to pass that on to some other process, you could easily write it in the original literal form. Perhaps the key for literal tag should be namespace qualified to avoid possible confusion with user data. Another benefit of returning a map for an unknown literal tag is that equality tests still seem reasonable: (not= #foo "abc" #bar "abc").

Steve Miner
added a comment - 12/Feb/12 10:30 AM - edited I'd like to preserve the unknown literal tag as well as the value. That would allow a program to recreate the printed representation. A map would work as the value. You'd get something like: {:unknown-literal point :value [0 2]}. If you needed to pass that on to some other process, you could easily write it in the original literal form. Perhaps the key for literal tag should be namespace qualified to avoid possible confusion with user data. Another benefit of returning a map for an unknown literal tag is that equality tests still seem reasonable: (not= #foo "abc" #bar "abc").

It would be convenient if I could handle unknown tags using some sort of catch-all key in *data-readers* (say 'default). The associated function should take two arguments: the tag and the literal value. If there is no 'default key in *data-readers*, then an error would be thrown (same as Clojure 1.4).

I think it's a simple way to allow the programmer to take control without having to add new API or data types. It's just one distinguished key ('default, :default something like that) and one additional line of doc.

Steve Miner
added a comment - 18/Sep/12 11:01 AM - edited It would be convenient if I could handle unknown tags using some sort of catch-all key in *data-readers* (say 'default). The associated function should take two arguments: the tag and the literal value. If there is no 'default key in *data-readers*, then an error would be thrown (same as Clojure 1.4).
I think it's a simple way to allow the programmer to take control without having to add new API or data types. It's just one distinguished key ('default, :default something like that) and one additional line of doc.
I can provide a patch.

This needs to be addressed. We should follow the advice given in the edn docs:

If a reader encounters a tag for which no handler is registered, the implementation can either report an error, call a designated 'unknown element' handler, or create a well-known generic representation that contains both the tag and the tagged element, as it sees fit. Note that the non-error strategies allow for readers which are capable of reading any and all edn, in spite of being unaware of the details of any extensions present.

We can get both of the latter by having a preinstalled default unknown element handler that returns the generic representation. "identity" is out since it loses the tag. Using a map with namespaced keys is somewhat subtle. An TaggedElement record type is also possible.

Issues are - what happens when more than one library tries to supply a default? If the system supplies one, perhaps it's best to only allow dynamic rebinding and not static installation. Or error on conflicting default handlers, or first/last one wins (but first/'last' might be difficult to predict).

Rich Hickey
added a comment - 21/Sep/12 9:17 AM - edited This needs to be addressed. We should follow the advice given in the edn docs:

If a reader encounters a tag for which no handler is registered, the implementation can either report an error, call a designated 'unknown element' handler, or create a well-known generic representation that contains both the tag and the tagged element, as it sees fit. Note that the non-error strategies allow for readers which are capable of reading any and all edn, in spite of being unaware of the details of any extensions present.

We can get both of the latter by having a preinstalled default unknown element handler that returns the generic representation. "identity" is out since it loses the tag. Using a map with namespaced keys is somewhat subtle. An TaggedElement record type is also possible.
Issues are - what happens when more than one library tries to supply a default? If the system supplies one, perhaps it's best to only allow dynamic rebinding and not static installation. Or error on conflicting default handlers, or first/last one wins (but first/'last' might be difficult to predict).

Minimal patch that adds support for a default data reader in *data-readers*. If an unknown tag is read, we look up the 'default reader in *data-readers and call it with two arguments: the tag and the value. By default, there is no default reader and you get an exception as in Clojure 1.4.

Steve Miner
added a comment - 18/Oct/12 8:54 PM Minimal patch that adds support for a default data reader in *data-readers*. If an unknown tag is read, we look up the 'default reader in *data-readers and call it with two arguments: the tag and the value. By default, there is no default reader and you get an exception as in Clojure 1.4.

An alternative patch that establishes a default data reader to handle the case of an unknown tag. The default reader returns a map with metadata to define the :unknown-tag. This preserves the information for the unknown data literal, but keeps a simple and convenient format.

Steve Miner
added a comment - 18/Oct/12 8:57 PM An alternative patch that establishes a default data reader to handle the case of an unknown tag. The default reader returns a map with metadata to define the :unknown-tag. This preserves the information for the unknown data literal, but keeps a simple and convenient format.

I am guessing that you consider these two patches alternatives, not cumulative. I am marking as screened the "default-in-data-readers" patch, which allows users to specify a 'default handler.

The other patch "unknown-data-reader", which specifies a built-in Clojure handler for unknown tags, is not screened. It specifies a default handler that returns a metadata-annotated map. If there is to be a default handler, I think it would need to return something disjoint from data, e.g. a tagged interface of some kind (or at least a namespaced name in the map.)

It would be a great help to have a little discussion along with the patches.

Stuart Halloway
added a comment - 19/Oct/12 5:27 AM Steve,
I am guessing that you consider these two patches alternatives, not cumulative. I am marking as screened the "default-in-data-readers" patch, which allows users to specify a 'default handler.
The other patch "unknown-data-reader", which specifies a built-in Clojure handler for unknown tags, is not screened. It specifies a default handler that returns a metadata-annotated map. If there is to be a default handler, I think it would need to return something disjoint from data, e.g. a tagged interface of some kind (or at least a namespaced name in the map.)
It would be a great help to have a little discussion along with the patches.

Yes, the two patches are separate alternatives. The "default-in-data-readers" patch just adds the special treatment for the 'default key in *data-readers* without providing a system default. This allows the application programmer to provide a catch-all handler for unknown data literal tags. Libraries should be discouraged from setting a 'default handler. Conflicts with the 'default key in data_readers.clj are handled just like other keys so it would be a bad thing for multiple libraries to try to take over the 'default data reader. Libraries can instead provide implementations and let the application programmer do the actual setting of the 'default key.

The second patch ("unknown-data-reader") implements 'default similarly, but also provides for a 'default reader in default-data-readers. My unknown-data-reader returns a map. I found it safer and simpler to deal with keywords instead of symbols for unknown tag – Clojure doesn't like symbols for unknown packages and you never know what you might get in data literal tags. After experimenting with with my original idea of having a map with two keys (essentially :tag and :value), I decided that I preferred the single entry map with the key derived from the tag and the value preserved as read. Adding the metadata to define the :unknown-tag provides enough information for the application programmer to deal with the maps unambiguously. I think the single entry maps are easier to read.

The alternative that I did not pursue would be to use a new record type as the default data type. My second patch could be used as a basis for that approach. We just need to replace my unknown-data-reader function with one that creates a record (TaggedElement).

Steve Miner
added a comment - 19/Oct/12 8:01 AM Yes, the two patches are separate alternatives. The "default-in-data-readers" patch just adds the special treatment for the 'default key in *data-readers* without providing a system default. This allows the application programmer to provide a catch-all handler for unknown data literal tags. Libraries should be discouraged from setting a 'default handler. Conflicts with the 'default key in data_readers.clj are handled just like other keys so it would be a bad thing for multiple libraries to try to take over the 'default data reader. Libraries can instead provide implementations and let the application programmer do the actual setting of the 'default key.
The second patch ("unknown-data-reader") implements 'default similarly, but also provides for a 'default reader in default-data-readers. My unknown-data-reader returns a map. I found it safer and simpler to deal with keywords instead of symbols for unknown tag – Clojure doesn't like symbols for unknown packages and you never know what you might get in data literal tags. After experimenting with with my original idea of having a map with two keys (essentially :tag and :value), I decided that I preferred the single entry map with the key derived from the tag and the value preserved as read. Adding the metadata to define the :unknown-tag provides enough information for the application programmer to deal with the maps unambiguously. I think the single entry maps are easier to read.
The alternative that I did not pursue would be to use a new record type as the default data type. My second patch could be used as a basis for that approach. We just need to replace my unknown-data-reader function with one that creates a record (TaggedElement).

I don't like the 'default entry, especially if it is usually wrong for a library to set it.

Having no bindable var default makes editing data_readers.clj an outstanding chore for everyone.

It is unlikely a single special override in data_readers.clj is suitable for every reading situation, even in the same app.

If there is a known TaggedElement type, then people need only be able to opt out of that.
So if there were a default-tagged-reader function of (tag read-element) that built a TaggedElement (or, alternatively, threw, if people prefer), people could just rebind that in a particular reading context.

There's no perfect default, but in most situations getting unknown data is an error. I personally would default to that, and provide a read-tagged-element fn people could bind in when trying to implement read-anything.

Rich Hickey
added a comment - 19/Oct/12 5:43 PM I don't like the 'default entry, especially if it is usually wrong for a library to set it.
Having no bindable var default makes editing data_readers.clj an outstanding chore for everyone.
It is unlikely a single special override in data_readers.clj is suitable for every reading situation, even in the same app.
If there is a known TaggedElement type, then people need only be able to opt out of that.
So if there were a default-tagged-reader function of (tag read-element) that built a TaggedElement (or, alternatively, threw, if people prefer), people could just rebind that in a particular reading context.
There's no perfect default, but in most situations getting unknown data is an error. I personally would default to that, and provide a read-tagged-element fn people could bind in when trying to implement read-anything.

old patches deleted. This revised patch introduces a var *default-data-reader-fn* which can be used to handle unknown tags. By default it is nil and an exception is thrown for the unknown tag as in Clojure 1.4. If it's non-nil, the function is called with the tag and value. I chose the name so that it contained 'data-reader', which makes it search friendly. I wanted to commit this separately from any attempt to provide a built-in catch-all reader and new record type as that might be more contentious.

Steve Miner
added a comment - 20/Oct/12 10:03 AM old patches deleted. This revised patch introduces a var *default-data-reader-fn* which can be used to handle unknown tags. By default it is nil and an exception is thrown for the unknown tag as in Clojure 1.4. If it's non-nil, the function is called with the tag and value. I chose the name so that it contained 'data-reader', which makes it search friendly. I wanted to commit this separately from any attempt to provide a built-in catch-all reader and new record type as that might be more contentious.

The patch for *default-data-reader-fn* was committed for Clojure 1.5 beta1. The programmer can now specify a catch-all reader, which solves the main issue. However, there is no built-in default reader so unknown tags still cause exceptions to be thrown as in Clojure 1.4.

I think this bug can be closed. Default reader implementations could be provided by a contrib library. Or someone can open a new bug if you want the language to provide a built-in default reader.

Steve Miner
added a comment - 02/Nov/12 9:42 AM - edited The patch for *default-data-reader-fn* was committed for Clojure 1.5 beta1. The programmer can now specify a catch-all reader, which solves the main issue. However, there is no built-in default reader so unknown tags still cause exceptions to be thrown as in Clojure 1.4.
I think this bug can be closed. Default reader implementations could be provided by a contrib library. Or someone can open a new bug if you want the language to provide a built-in default reader.