How to debug solr exceptions/warnings for Alfresco (Part 2)

In this post I dig a bit deeper in how you can debug solr exceptions/warnings. In part 1 of this blog series we looked briefly at the problem where we got a timeout and how we can fix this in the general case.

This post will show how to get more information about what could be the problem causing these errors.

We will look at an example where we get the following error in the solr log, however the way of debugging would be the same or similar for other kinds of errors:

2015-04-28 15:02:50,413 WARN [org.alfresco.solr.tracker.CoreTracker] Node index failed and skipped for 8727989 in Tx 9223372036854775807
org.json.JSONException: Unterminated string at character 1866
at org.json.JSONTokener.syntaxError(JSONTokener.java:413)
at org.json.JSONTokener.nextString(JSONTokener.java:244)
at org.json.JSONTokener.nextValue(JSONTokener.java:344)
at org.json.JSONObject.(JSONObject.java:206)
at org.json.JSONTokener.nextValue(JSONTokener.java:347)
at org.json.JSONArray.(JSONArray.java:125)
at org.json.JSONTokener.nextValue(JSONTokener.java:351)
at org.json.JSONObject.(JSONObject.java:206)
at org.alfresco.solr.client.SOLRAPIClient.getNodesMetaData(SOLRAPIClient.java:774)
at org.alfresco.solr.tracker.CoreTracker.indexNode(CoreTracker.java:2376)
at org.alfresco.solr.tracker.CoreTracker.reindexNodes(CoreTracker.java:1057)
at org.alfresco.solr.tracker.CoreTracker.updateIndex(CoreTracker.java:566)
at org.alfresco.solr.tracker.CoreTrackerJob.execute(CoreTrackerJob.java:45)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)

In this warning in the log we get no real information about what is causing the error, just that it concerns a node with id 6899573 in the transaction 9223372036854775807 and that it has something to do with parsing a json.

So the natural next step would be to fetch information about the node which we in this case will do by manually fetching the JSON the same way as Solr does.

If we look at SOLRAPIClient.java:getNodesMetaData it is obvious that Solr makes a request to api/solr/metadata to fetch metadata about a node. This will return a Json string with the metadata for an object.

We want to do the same for this particular failing node which means that we need to construct a Json request to http://localhost:8080/alfresco/service/api/solr/metadata.

By default the solr endpoints are protected by SSL. To make things easier to debug you can disable SSL on Solr by following the guide in the Alfresco Documentation: http://docs.alfresco.com/4.2/tasks/running-without-ssl.html.

I used the Chrome plugin Postman for performing this action, but curl or any other tool capable of making a REST Post request will do.

I used a Json validator (http://jsonformatter.curiousconcept.com/ and copy-pasted the json what the validation result as). It complained about an invalid character in the following part of the Json and what we can spot here is that in the document name an utf-8 character has not been correctly escaped.

Comparing the file name from the metadata we see that it should be “Test \u0015 document.pdf”, whereas in the parentAssocs part it says “Test \x15 document.pdf” which is not a valid escape string in Json. Further investingation showed that this is a bug in solr.lib.ftl which comes from that Freemarker does not translate unicode characters below 0×20. Quick fix, remove the special character from the filename (which probably should not be there anyway) and reindex the node. The issue has been reported to Alfresco Support and if it becomes a Jira issue I’ll link it in this article.

I made a patch to solr.lib.ftl which fixes this issue and when running the same REST call again and then we get a valid Json back from the metadata endpoint:

I have checked your site and i’ve found some duplicate
content, that’s why you don’t rank high in google, but there
is a tool that can help you to create 100% unique articles,
search for: Boorfe’s tips unlimited content

I see you don’t monetize your site, don’t waste your traffic, you can earn additional cash every month because you’ve got hi quality content.
If you want to know how to make extra $$$, search for: Mertiso’s tips best adsense alternative