IDs containing slashes cause problems

Details

Description

Although VUFIND-508 solves problems with most special characters in Solr IDs, slashes still seem to cause issues (in both VuFind 1.x and 2.x). This is the result of standard Apache functionality; it can be worked around by adding "AllowEncodedSlashes on" to the <VirtualHost> section used by VuFind.

Demian Katz
added a comment - 25/Jul/12 1:54 PM This article explains the cause of this problem:
http://www.jampmark.com/web-scripting/5-solutions-to-url-encoded-slashes-problem-in-apache.html
It sounds like Apache's "AllowEncodedSlashes" directive may be the most straightforward solution.

Thanks to Thomas Schwaerzler for actually experimenting with this. He found that adding the "AllowEncodedSlashes NoDecode" directive to his Apache configuration, VuFind would receive the IDs. However, some modifications to web/services/Record/Record.php were necessary to decode the slash:

$_id = str_replace('%2F', '/', $_REQUEST['id']);

if (!($record = $this->db->getRecord($_id))) { // ...

This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash.

Perhaps there is a way to further refine the Apache and mod_rewrite configuration to make this work properly... it would be nice to find a simple solution that doesn't set up any weird special cases.

Demian Katz
added a comment - 27/Jul/12 12:02 PM Thanks to Thomas Schwaerzler for actually experimenting with this. He found that adding the "AllowEncodedSlashes NoDecode" directive to his Apache configuration, VuFind would receive the IDs. However, some modifications to web/services/Record/Record.php were necessary to decode the slash:
$_id = str_replace('%2F', '/', $_REQUEST['id']);
if (!($record = $this->db->getRecord($_id))) { // ...
This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash.
Perhaps there is a way to further refine the Apache and mod_rewrite configuration to make this work properly... it would be nice to find a simple solution that doesn't set up any weird special cases.

certainly for multiple occurences of "/" the expression had to be modified.to also cover trailing or leading slashes maybe something like this woudl be needed:
pattern_map.id_remove_slash.pattern_0 = (.+)?/(.+)?=>$1_$2 # untested

Thomas Schwaerzler
added a comment - 19/Oct/12 12:14 PM - edited since i had the problem again with another partner i chose this quite comfortable workaround solution: replacing the "/" at marc.properties like this:
id = 001, (pattern_map.id_remove_slash), first
# remove first occurence of "/"
pattern_map.id_remove_slash.pattern_0 = (.+)/(.+)=>$1_$2
certainly for multiple occurences of "/" the expression had to be modified.to also cover trailing or leading slashes maybe something like this woudl be needed:
pattern_map.id_remove_slash.pattern_0 = (.+)?/(.+)?=>$1_$2 # untested

Demian Katz
added a comment - 28/Aug/13 8:34 AM The 27/Jul/12 comment to 1.x code, but I can think of two possible solutions in VuFind 2.x:
Solution a) Add an event to the HathiTrust Solr back-end to decode slashes when requesting records.
Solution b) Customize your indexing rules to translate slashes into %2F at index-time.
I'd still love to find a more seamless solution!

Solution a) Add an event to the HathiTrust Solr back-end to decode slashes when requesting records.

Hm. I don't think this will work. A search event cannot change the function arguments, only the ParamBag.

"This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash."

An ID containing the literal sequence "%2F" would be encoded as "%252F" as a query parameter while a ID containing "/" would be %2F. In theory there shouldn't be a problem.

David Maus
added a comment - 28/Aug/13 8:43 AM - edited Solution a) Add an event to the HathiTrust Solr back-end to decode slashes when requesting records.
Hm. I don't think this will work. A search event cannot change the function arguments, only the ParamBag.
"This is a step in the right direction, but it unfortunately sets up the (very unlikely but possible) situation that VuFind can't tell between an ID containing the string "%2F" and an ID containing a slash."
An ID containing the literal sequence "%2F" would be encoded as "%252F" as a query parameter while a ID containing "/" would be %2F. In theory there shouldn't be a problem.
Request parameter => urldecode() = > --- => urlencode() => Solr => --- => urlencode() => Create Link

Regarding the %2F problem I mention, if memory serves, the problem is that "AllowEncodedSlashes NoDecode" means that Apache does not decode slashes but it does decode everything else... so an encoded / remains %2F, but %252F also resolves to "%2F." It is possible that I am mistaken about this -- it is a very confusing situation and it has been several months since I actually ran tests myself -- but that is my current recollection.

Demian Katz
added a comment - 28/Aug/13 8:56 AM - edited Regarding the %2F problem I mention, if memory serves, the problem is that "AllowEncodedSlashes NoDecode" means that Apache does not decode slashes but it does decode everything else... so an encoded / remains %2F, but %252F also resolves to "%2F." It is possible that I am mistaken about this -- it is a very confusing situation and it has been several months since I actually ran tests myself -- but that is my current recollection.

I've just received a report that VF2 works with "AllowEncodedSlashes on" instead of "AllowEncodedSlashes NoDecode" with no code changes needed. It's possible that the NoDecode solution was only necessary for 1.x because of the more complicated mod_rewrite rules in the old version. I'll have to do some testing of my own, but perhaps we can just put AllowEncodedSlashes on in httpd-vufind.conf (if it's allowed at that level) and close the ticket.

EDIT: I tried to test this out, both putting the directive in httpd-vufind.conf and in my top-level Apache configuration, but I got 404 errors in both cases. I will see if I can find out more information on the successful configuration.

Demian Katz
added a comment - 28/Aug/13 12:55 PM - edited I've just received a report that VF2 works with "AllowEncodedSlashes on" instead of "AllowEncodedSlashes NoDecode" with no code changes needed. It's possible that the NoDecode solution was only necessary for 1.x because of the more complicated mod_rewrite rules in the old version. I'll have to do some testing of my own, but perhaps we can just put AllowEncodedSlashes on in httpd-vufind.conf (if it's allowed at that level) and close the ticket.
EDIT: I tried to test this out, both putting the directive in httpd-vufind.conf and in my top-level Apache configuration, but I got 404 errors in both cases. I will see if I can find out more information on the successful configuration.

David Maus
added a comment - 28/Aug/13 3:58 PM The AllowEncodedSlashes directive can only be applied in the virtual host or server config context: https://httpd.apache.org/docs/2.2/mod/core.html#AllowEncodedSlashes

Thanks, David and Joe -- I have confirmed that everything works properly when AllowEncodedSlashes on is added within the appropriate <VirtualHost> section of the Apache configuration. This is not something we can resolve from within the VuFind code, but I have added a note to the installation documentation, so I believe that we can now close this issue.

Demian Katz
added a comment - 29/Aug/13 9:17 AM Thanks, David and Joe -- I have confirmed that everything works properly when AllowEncodedSlashes on is added within the appropriate <VirtualHost> section of the Apache configuration. This is not something we can resolve from within the VuFind code, but I have added a note to the installation documentation, so I believe that we can now close this issue.