I am trying to index some csv files that reside on HDFS. I am using the MapReduceIndexerTool and following

the tutorial MapReduce Batch Indexing with Cloudera Search. As advised by an earlier post I created a small subset of my data(only 500 records) and as advised also, I used the --dry-run option to make sure all is ok before the actual indexing takes place. My dry run runs successfully:

I also added the following the statement below to my morphline scripts:

sanitizeUnknownSolrFields { solrLocator : ${SOLR_LOCATOR}}

Still received the same error.

It is very frustrating, what I am doing should be pretty common and the problem has been solved by many before; not sure what is happening. I was able to use the same schema successfully on Solr 6.2 on my windows machine..

Could you share the following things (if you can) ?- collection name and the schema.xml of the collection- indexer configuration used (indexer_def.xml ?)- morphline configuration used- the command line used for launching the batch indexation- a small csv sample ?

This might enable me to pin-point the issue (or tell you that all seems fine for me).

<fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true"> <analyzer> <!-- KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token --> <tokenizer class="solr.KeywordTokenizerFactory"/> <!-- The LowerCase TokenFilter does what you expect, which can be when you want your sorting to be case insensitive --> <filter class="solr.LowerCaseFilterFactory" /> <!-- The TrimFilter removes any leading or trailing whitespace --> <filter class="solr.TrimFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" /> </analyzer> </fieldType>

<!-- since fields of this type are by default not stored or indexed, any data added to them will be ignored outright. --> <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />

So if you need these fields, you need to add them to the schema of the collection.

- 2 : if you don't need these fields, then you have to delete them before presenting the row to Solr. For this, the function "sanitizeUnknownSolrFields" if the right way to go. But you missplaced it in your morhpline.

It should be inside the command [] and after the readCSV function (of course before the loadSolr function).