There are multiple SolrRequestParser implementations: FormDataRequestParser, MultipartRequestParser, RawRequestParser, SimpleRequestParser, StandardRequestParser. By default, it uses StandardRequestParser. In StandardRequestParser.parseParamsAndFillStreams, if it is GET or HEAD request, it will parse the query string, and create a SolrParams: please refer about how it parses the query string.If it is a POST request, for normal post request, StandardRequestParser.parseParamsAndFillStreams will use FormDataRequestParser to parse the form data, and create a SolrParams.The following curl request will be handled by FormDataRequestParser.curl -d "stream.body=<add><doc><field name='contentid'>content1</field></doc></add>&clientId=client123&batchId=1" http://host:port/solr/updateIf the data is uploaded as a file, like below:curl http://host:port/solr/update -F "fieldName=@data.xml"The fieldName doesn't matter and can be anything.StandardRequestParser.parseParamsAndFillStreams will use MultipartRequestParser, which will use apache commons to create fileupload.FileItem, then create a servlet.FileItemContentStream.How it determines whether the request is multipart?ServletFileUpload.isMultipartContent(req), whether the contentType starts with "multipart/".For a POST request, if the request is not format mentioned before, it will use RawRequestParser which creates a servlet.HttpRequestContentStream from the request.Then in SolrRequestParsers.buildRequestFrom, it will get stream.file, stream.body, stream.url, and constructs ContentStreamBase.FileStream/StringStream/URLStream. The file stream.file points to must be a local file to Solr server.Subclasses of ContentStreamBaseHttpRequestContentStreamWrap an HttpServletRequest as a ContentStreampublic InputStream getStream() throws IOException {return req.getInputStream();}FileItemContentStreamWrap a org.apache.commons.fileupload.FileItem as a ContentStreamContentStreamBase.FileStreamContentStreamBase.URLStreamContentStreamBase.StringStreamDocumentAnalysisRequestHandlerTest.ByteStreamUsing curl to send request to Solrcurl -d "stream.body=<add><doc><field name=\"id\">id1</field></doc></add>&clientId=client123" http://host:port/solr/updatecurl -d "stream.body=<add><commit/></add>&clientId=client123" http://host:port/solr/updateError:In this case, have to add "" for the value of -d, as the value contains special characters, like <, otherwise it will report error:curl -d stream.body=<add><doc><field name=\"id\">id1</field></doc></add>&clientId=client123 http://host:port/solr/update< was unexpected at this time.For the stream body, have to use " to enclose property name, like \"id\". The following request will fail:curl -d "stream.body=<add><doc><field name=id>id1</field></doc></add>&clientId=client123" http://host:port/solr/updateorg.apache.solr.common.SolrException: Unexpected character 'i' (code 105) in start tag Expected a quoteat [row,col {unknown-source}]: [1,23]Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character 'i' (code 105) in start tag Expected a quoteat [row,col {unknown-source}]: [1,23]at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)Correct Usage:Use "" to enclose the value of -d.Use \ to escape specail characrts, " to \", \ to \\.curl -d "stream.body=2,0,1,0,1,\"c:\\\",1,0,\"c:\",0,1,16 %0D%0A 2,0,1,0,1,\"x:\\\",2,0,\"x:\",0,1,16 &separator=,&fieldnames=omiited&literal.id=9000&stream.contentType=text/csv;charset=utf-8&commit=true" http://localhost:8080/solr/update/csvCode:

Design Principal: SolrParams defines common behavior, but doesn't defines how it should be implemented - the data structure SolrParams uses. SolrParams.wrapDefaults wraps 2 SolrParams, one additional default SolrParams, if get(name) can't find value in the first map, it will get value from the default SolrParams.In org.apache.solr.handler.RequestHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse), it will wrap params from request, and add defaults,appends and invariant params from the request handler in solrconfig.xml. In org.apache.solr.handler.RequestHandlerBase.init(NamedList), it reads params in defaults section into variable defaults, params in appends section into variable appends, params in invariants section into variable invariants, .In org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, SolrQueryResponse), it wraps defaults,appends,invariants into SolrParams in request.SolrPluginUtils.setDefaults(req,defaults,appends,invariants);Example like below:

The SolrParams in a RequestHandler.handleRequestBody is a DefaultSolrParams, through it, you can get the key/value paris in solrconfig.xml.NamedListIn the NamedList in RequestHandler.init() method, if you want to access the value defined in defaults section, you can: 1. run super.init(args);, which will read defaults section into variable defaults SolrParams, then you can: defaults.get("update.contentType");2. Or you can NamedList<Object> defautNl = args.get("defaults"); then read defautNl.If you run SolrParams params = SolrParams.toSolrParams(args); the SolrParams is MapSolrParams, not a DefaultSolrParams, this means it doesn't wrap normal configuration, defaults,appends,invariants into one SolrParams. If you run params.get("update.contentType"); it will return null.Change SolrParams in a requestModifiableSolrParams newParams = new ModifiableSolrParams(req.getParams());req.setParams(newParams);Set Content Type1. Set default content stream type for a request handler in solrconfig.xml:<lst name="defaults"><str name="update.contentType">application/xml</str></lst>2. Set default content stream type for a request handler in code.In init(NamedList) method of the requestHandler.setAssumeContentType("application/xml");3. Set content stream type in url: &stream.contentType=application/xmlHow SolrParams is created:

At the Chinese New Year Celebration Party in our company, the host gives us a word puzzle:Give you 5 characters: S, N, A, K, E (as this year is year of Snake.), write down all words that are only composed of these 5 characters, each character can occur 0 or multiple times.This is a funny algorithm question, and can be solved using Tire like below.

We read word from a dictionary file, build a Trie, when try to get all words comprised of these candidate characters, we use depth-first order, for each valid character in the first layer, iterate all valid characters in second layer, and go on.When construct this Trie:If this trie is going to be searched multiple times for different candidate characters, we can insert all words into this Trie.If we only answer this question one time, then we only insert words that are comprised of only these candidate characters.The code is like below: You can review complete code in Github.

Radix/PATRICIA TrieRadix/PATRICIA Trieis a space-optimized trie data structure where each node with only one child is merged with its child.This makes them much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.Java implementation

Recently I am trying to package embedded jetty, solr.war, and solr.home in one package, and start and shut the embedded jetty server dynamically.About how to package embedded jetty, solr.war, and solr.home in one package, and reduce the size, please refer to:Part 1: Shrink Solr Application SizePart 2: Use Proguard to Shrink Solr Application SizePart 3: Use Pack200 to Shrink Solr Application SizeThis article would like to talk about how to start and shutdown embedded jetty server dynamically.There are several things we need consider:1. How to make sure only one instance running?Fo each unzipped package, user should only run it once, if user clicks the run.bat again, it should report the application is already running.I found this article, the basic idea is to create a lock file, and lock it when the application is running. When application ended/closed release the lock and delete the file lock. If not able to create and lock the file, it means the application is already running.2. Dynamical PortIf the port user gives is not available, maybe already occupied by other application, we will try to find a free port in a range.How to check whether a port is available?We can try to create a ServerSocket, bound to that port, if it throws exception, means it is already occupied, if not, means it is a free port.ServerSocket socket = new ServerSocket(port);3. How to start and shutdown embedded jettyThe code is like below:To make sure this jetty exclusively binds to the port, we need create a SelectChannelConnector, and setReuseAddress false.To make the jetty server shutdown itself on a valid request, add a ShutdownHandler with SHUTDOWN_PASSWORD.Please refer to the article.Then later, users can call http://host:por/shutdown?token=SHUTDOWN_PASSWORD&_exitJvm=true, _exitJvm=true makes the jvm also exits, the application would end.

When write code, I made so many simple mistakes, so I try to write them down here to remind me not to make same mistakes again.Copy&Paste is evil.Most times, the problem happens when I copy and paste, change it, but forget to change some places. Take time to check and review code before start to compile or run tests. This can save me a lot of time.Boolean conditionif(!valid) or if(valid). if(a.equals(b)) or if(!a.equals(b))if(map.isEmpty()) or if(!map.isEmpty())Use &&, Not &str != null & str.equalsIgnoreCase("true")Throw NPE when str is null.Forget else statement.Think about what should be done in else statement. Forget default in switchNullPointerExceptionUse Optional to avoid NPEUse NPE safe method- like Objects.equals, CollecitonUtils.isEmpty, StringUtils.equals etc.Forget to initialize variable, especially for instance variable.Check Null, and handle the case.NPE when unbox int value = Long or (Long)obj;the Long or obj may be null.Float maxScore = null;maxScore = docList.maxScore(); // if use float, here it may throws NullPonterExceptioncheck whether the collection is null before use for-loop or iterator.for(String str: strList) Forget to shutdown threadpool and wait for it finishexecutor.shutdown();executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MINUTES);Where to put executor.shutdown() or server.shutdown()We have to wait until all tasks are done or submitted。Add object more than one timeif (obj != null && (Long) obj == 0) { sortedNL.add(label, queryValue); queryValue = new NamedList<Object>();}sortedNL.add(label, queryValue == null ? new NamedList<Object>() : queryValue);Forget to check for preconditions, null pointers - Defensive programmingDefensive programming teaches to check whenever you are in doubt excplicity about the method arguments. When to call super.method()Understand when we should call super.method and why.In MyUpdateRequestHandler.init(NamedList),

In this case, I have to call super.init(args) at last, as init method in parent calls createDefaultLoaders, and in my subclasses, I overwrite createDefaultLoaders, which need parameter clientIdParamName.If I call super.init(args), in createDefaultLoaders, the clientIdParamName would be null which is not expected.Map KeyOnce you put a key/value pair in a hash map you should not change the value of the key, ever, in any way that changes the hash code. If the key is changed where it generates a new hash code, you will not be able to locate the correct bucket in the HashMap that contains the key/value pair. Throw exceptions to signal exceptional conditions instead of using Null flagsMemory Leak- Non-static inner or anonymous class holds a reference to outer class.- Lambda will create implicit reference only when we are using some method or field from the enclosing class.- Use static inner class + WeakReference to rescueCustom Map Key or Set object- whether implements hashcode or equals correctlyTreeMap/TreeSet- whether implement compareTo method correctlyMutable key or object in Map or Set

Commandhttp://ss64.com/nt/type filenameSleep some timesleep 5timeout 5These 2 commands are not available in every windows machine, in practice, we can use ping to cause delay.ping 1.1.1.1 -n 1 -w 1000 >NUL 2>NULstart command: run command in a separate window.call:Calls one batch program from another without stopping the parent batch program. findfind [/v] [/c] [/n] [/i] "string" [[Drive:][Path]FileName[...]]/v(reverse), /c(count), /n(show line number), /i(case-insensitive)for %f in (*.bat) do find "PROMPT" %f dir c:\ /s /b | find "CPU"sort/r, /+n, more+n : Displays first file beginning at the line specified by n./c, /sThe following commands are accepted at the more prompt:SPACEBARDisplay next pageENTERDisplay next linefDisplay next file=Show line numberp nDisplay next n liness nSkip next n linesq, ?Widnows BatchRead until there are 3 lines in file result@ECHO OFF:readFileLoopif exist result ( set /a "x = 0" for /F "tokens=*" %%L in (result) do set /a "x = x + 1"if "%x%" EQU "3" ( goto :end)) > NUL 2>&1goto :readFileLoop@ECHO ONUsing batch filesIf statementCompareOp : EQU, NEQ, LSS, LEQ, GTR, GEQFor - for /?for {%variable|%%variable} in (set) do command [ CommandLineOptions]Use %variable to carry out for from the command prompt. Use %%variable to carry out the for command within a batch file.Directories only: for /DRecursive: for /RIterating a range of values: for /LIterating and file parsing: for /Feol=cSpecifies an end of line character (just one character).skip=nSpecifies the number of lines to skip at the beginning of the file.delims=xxxtokens=x,y,m-nfor /F "eol=; tokens=2,3* delims=," %i in (myfile.txt) do @echo %i %j %kThis command parses each line in Myfile.txt, ignoring lines that begin with a semicolon and passing the second and third token from each line to the FOR body (tokens are delimited by commas or spaces). The body of the FOR statement references %i to get the second token, %j to get the third token, and %k to get all of the remaining tokens.Using batch parameters%0-%9When you use batch parameters in a batch file, %0 is replaced by the batch file name, and %1 through %9 are replaced by the corresponding arguments that you type at the command line. To access arguments beyond %9, you need to use the shift command. modifiers: %~f1, %~d1, %~p1, %~n1, %~x1, %~s1, %~a1, %~t1, %~z1setlocal/endlocalUse setlocal to change environment variables when you run a batch file. Environment changes made after you run setlocal are local to the batch file. Cmd.exe restores previous settings when it either encounters an endlocal command or reaches the end of the batch file.Parse Command Args:parseParamsset key=%~1set value=%~2if "%key%" == "" goto :eofif "%key%" == "-MY_JAVA_OPTIONS" ( shiftshiftif "%value%" == "" ( echo Empty value for -MY_JAVA_OPTIONS goto end )SET MY_JAVA_OPTIONS=%value% goto parseParams %*) else (rem skip some other parametersshift goto parseParams %*)goto :eof:parseArgs REM recursive procedure to split off the first two tokens from the inputif "%*" NEQ "" (for /F "tokens=1,2,* delims== " %%i in ("%*") do call :assignKeyValue %%i %%j & call :parseArgs %%k)goto :eof:assignKeyValueif /i "%1" EQU "-Xmx" (SET Xmx=%2) else if /i "%1" EQU "-Xms" ( SET Xms=%2) goto :eofSubString: SET var=%var:~10,5%http://geekswithblogs.net/SoftwareDoneRight/archive/2010/01/30/useful-dos-batch-functions-substring-and-length.aspxSave current path and restore it laterset currentPath=%cd%

We are all familiar with the auto completion function provided by IDE, for example, in eclipse, if we type Collections.un, then eclipse would list all methods that start with "un" such as unmodifiableCollection, unmodifiableList etc.

So how to implement this function?

How to find all strings that starts with prefix provided repeatedly and efficiently?

Answer:

We need to preprocess the list of string, so later we can quickly search it.

One way is to sort the string list by alphabetical order, then when search with the prefix (say app), we binary search this list and get a lower index whose string is larger than “app”, and get a higher index whose string is less than “apr”, then all strings between the lower index and higher index[lower index, higher index) are the strings that starts with the prefix.

Each query would take O(longn), n is the length of the string list.

Another better way is to create a tree from the string list, for example, for string "append", it would look like this:

[root node(flag)]

/

a

/ \

[ST] p

\

p -- return all strings from this sub tree

/

e

\

n

/ \

d [Sub Tree]

/

[leaf node(flag)]

So when we search all strings that starts with "app", it can search this tree, and get all strings of the p node, the time complexity depends on the length of the prefix, having nothing to do with the length of the string list. This is much better.

Code:

The complete algorithm/test code and also many other algorithm problems and solutions are available from Github.