I am having some weird problem while processing events coming from a file with this format:UTF-8 Unicode (with BOM) English text, with CRLF line terminators

Some of the events in the file contain this text: "Marés". While some events are sent correctly without begin cut by flume, there are others that arrive incomplete. And even more, the process of sending more events (once one event has been cut) stops. We end with incomplete files on HDFS. We have isolate the problem: trying with roll file sink instead of HDFS , removing all the interceptors, etc. However, we still have the same problem. Apparently, the troublesome event does not have any hide weird character and files are generated automatically so we would expect that if some malformed input comes from one event, it would come for the others too.

We really appreciate any hint that you could give us.

Thanks.

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:http://www.tid.es/ES/PAGINAS/disclaimer.aspx

I am having some weird problem while processing events coming from a file with this format:UTF-8 Unicode (with BOM) English text, with CRLF line terminators

Some of the events in the file contain this text: "Marés". While some events are sent correctly without begin cut by flume, there are others that arrive incomplete. And even more, the process of sending more events (once one event has been cut) stops. We end with incomplete files on HDFS. We have isolate the problem: trying with roll file sink instead of HDFS , removing all the interceptors, etc. However, we still have the same problem. Apparently, the troublesome event does not have any hide weird character and files are generated automatically so we would expect that if some malformed input comes from one event, it would come for the others too.

We really appreciate any hint that you could give us.

Thanks.

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:http://www.tid.es/ES/PAGINAS/disclaimer.aspx

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:http://www.tid.es/ES/PAGINAS/disclaimer.aspx

I have a feeling they are coming from SpoolingDirectory and the eventscontains newline characters (even delimiter).

If this is the case, you are going to see the events split up whenever theparser encounters the delimiter.*Author and Instructor for the Upcoming Book and Lecture Series**Massive Log Data Aggregation, Processing, Searching and Visualization withOpen Source Software**http://massivelogdata.com*On 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]> wrote:

>> Hello,>> I am having some weird problem while processing events coming from a> file with this format:> UTF-8 Unicode (with BOM) English text, with CRLF line terminators>> Some of the events in the file contain this text: "Marés". While some> events are sent correctly without begin cut by flume, there are others that> arrive incomplete. And even more, the process of sending more events (once> one event has been cut) stops. We end with incomplete files on HDFS. We> have isolate the problem: trying with roll file sink instead of HDFS ,> removing all the interceptors, etc. However, we still have the same> problem. Apparently, the troublesome event does not have any hide weird> character and files are generated automatically so we would expect that if> some malformed input comes from one event, it would come for the others> too.>> We really appreciate any hint that you could give us.>> Thanks.>>>> ------------------------------>> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar> nuestra política de envío y recepción de correo electrónico en el enlace> situado más abajo.> This message is intended exclusively for its addressee. We only send and> receive email on the basis of the terms set out at:> http://www.tid.es/ES/PAGINAS/disclaimer.aspx>

There are not event delimiters($) between fields of an event.I have tried forcing the encoding(because I believe this files, that are generated by our customer, are converted from ascii to utf-8 by BOM and they could contain characters with more bytes that the expected one):

but if i use a maxLineLenght of this size(250) then lot of events are truncated(event the max characters per line are 250):13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length exceeds max (250), truncating line!

if I take a look into the generated file, there are unrecognized chacarters: �� and events have been cut in a random way(there are lines with only 3 characters).

I have tried increasing the maxLineLenght parameter but I end getting a java heap space exception :(

I have a feeling they are coming from SpoolingDirectory and the events contains newline characters (even delimiter).

If this is the case, you are going to see the events split up whenever the parser encounters the delimiter.Author and Instructor for the Upcoming Book and Lecture SeriesMassive Log Data Aggregation, Processing, Searching and Visualization with Open Source Softwarehttp://massivelogdata.comOn 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Hello,

I am having some weird problem while processing events coming from a file with this format:UTF-8 Unicode (with BOM) English text, with CRLF line terminators

Some of the events in the file contain this text: "Marés". While some events are sent correctly without begin cut by flume, there are others that arrive incomplete. And even more, the process of sending more events (once one event has been cut) stops. We end with incomplete files on HDFS. We have isolate the problem: trying with roll file sink instead of HDFS , removing all the interceptors, etc. However, we still have the same problem. Apparently, the troublesome event does not have any hide weird character and files are generated automatically so we would expect that if some malformed input comes from one event, it would come for the others too.

We really appreciate any hint that you could give us.

Thanks.

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:http://www.tid.es/ES/PAGINAS/disclaimer.aspx________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Increase the maximum number of lines per event to a much higher number(like 5000).

Also change the output encoding to UTF-8

Let's make sure that the input encoding matches the encoding of theoriginal event. This can cause problems if it is not the right one.

Let's see if these changes make a difference.*Author and Instructor for the Upcoming Book and Lecture Series**Massive Log Data Aggregation, Processing, Searching and Visualization withOpen Source Software**http://massivelogdata.com*On 27 August 2013 11:13, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]> wrote:

> Hi Israel,>> thanks for your response. We already checked this, doing :set list with> vi editor our events look like this:>> "line1field1";"line1field2";"line1fieldN"*$*> "lineNfield1";"lineNfield2";"lineNfieldN"*$*>> There are not event delimiters*($)* between fields of an event.> I have tried forcing the encoding(because I believe this files, that are> generated by our customer, are converted from ascii to utf-8 by BOM and> they could contain characters with more bytes that the expected one):>> *agent.sources.rpb.inputCharset = UTF-16*> *agent.sources.rpb.deserializer.maxLineLength = 250*> *agent.sources.rpb.deserializer.outputCharset = UTF-16*>> but if i use a *maxLineLenght* of this size(250) then lot of events are> truncated(event the max characters per line are 250):> *13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length> exceeds max (250), truncating line!*>> if I take a look into the generated file, there are unrecognized> chacarters: �� and events have been cut in a random way(there are lines> with only 3 characters).>> I have tried increasing the maxLineLenght parameter but I end getting a> java heap space exception :(>> Again, thanks. Any help will be very appreciated.>>>> De: Israel Ekpo <[EMAIL PROTECTED]>>> Responder a: Flume User List <[EMAIL PROTECTED]>> Fecha: martes, 27 de agosto de 2013 16:29>> Para: Flume User List <[EMAIL PROTECTED]>> Asunto: Re: Events being cut by flume>> Hello Zoraida,>> What sources are you events coming from?>> I have a feeling they are coming from SpoolingDirectory and the events> contains newline characters (even delimiter).>> If this is the case, you are going to see the events split up whenever> the parser encounters the delimiter.>>> *Author and Instructor for the Upcoming Book and Lecture Series*> *Massive Log Data Aggregation, Processing, Searching and Visualization> with Open Source Software*> *http://massivelogdata.com*>>> On 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]> wrote:>>>>> Hello,>>>> I am having some weird problem while processing events coming from a>> file with this format:>> UTF-8 Unicode (with BOM) English text, with CRLF line terminators>>>> Some of the events in the file contain this text: "Marés". While some>> events are sent correctly without begin cut by flume, there are others that>> arrive incomplete. And even more, the process of sending more events (once>> one event has been cut) stops. We end with incomplete files on HDFS. We>> have isolate the problem: trying with roll file sink instead of HDFS ,>> removing all the interceptors, etc. However, we still have the same>> problem. Apparently, the troublesome event does not have any hide weird>> character and files are generated automatically so we would expect that if>> some malformed input comes from one event, it would come for the others>> too.>>>> We really appreciate any hint that you could give us.>>>> Thanks.>>>>>>>> ------------------------------>>>> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar

sorry for the delay. I tried your suggestion but still does not work. I have notice that if I do not specify the input/output encoding, the error is the same(always stops in the same event cutting it at the same character and stop of processing the rest of the file). However, comparing the resulting file with the one that we get when specifying enconding we have note that there are some differences. Specifically, the are some events that are spllited into two events because some break line is introduced(this happens when specifying the encoding). It looks like our files are not UTF-8 but the OS recognize them as UTF-8(some of them have BOM and others not). However, flume does not recognize them as UTF-8 because some weird character.

The default value for the available memory specified in $FLUME_HOME/bin/flume-ng is very small (20MB)

So, in your $FLUME_HOME/conf/flume-env.sh file

Try increasing your Java memory to a higher number (at most 50% of the available RAM)JAVA_OPTS="-Xms4096m -Xmx4096m -XX:MaxPermSize=4096m"

Then, in your agent configuration file:

Increase the maximum number of lines per event to a much higher number (like 5000).

Also change the output encoding to UTF-8

Let's make sure that the input encoding matches the encoding of the original event. This can cause problems if it is not the right one.

Let's see if these changes make a difference.Author and Instructor for the Upcoming Book and Lecture SeriesMassive Log Data Aggregation, Processing, Searching and Visualization with Open Source Softwarehttp://massivelogdata.comOn 27 August 2013 11:13, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:Hi Israel,

There are not event delimiters($) between fields of an event.I have tried forcing the encoding(because I believe this files, that are generated by our customer, are converted from ascii to utf-8 by BOM and they could contain characters with more bytes that the expected one):

but if i use a maxLineLenght of this size(250) then lot of events are truncated(event the max characters per line are 250):13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length exceeds max (250), truncating line!

if I take a look into the generated file, there are unrecognized chacarters: �� and events have been cut in a random way(there are lines with only 3 characters).

I have tried increasing the maxLineLenght parameter but I end getting a java heap space exception :(

I have a feeling they are coming from SpoolingDirectory and the events contains newline characters (even delimiter).

If this is the case, you are going to see the events split up whenever the parser encounters the delimiter.Author and Instructor for the Upcoming Book and Lecture SeriesMassive Log Data Aggregation, Processing, Searching and Visualization with Open Source Softwarehttp://massivelogdata.comOn 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Hello,

I am having some weird problem while processing events coming from a file with this format:UTF-8 Unicode (with BOM) English text, with CRLF line terminators

Some of the events in the file contain this text: "Marés". While some events are sent correctly without begin cut by flume, there are others that arrive incomplete. And even more, the process of sending more events (once one event has been cut) stops. We end with incomplete files on HDFS. We have isolate the problem: trying with roll file sink instead of HDFS , removing all the interceptors, etc. However, we still have the same problem. Apparently, the troublesome event does not have any hide weird character and files are generated automatically so we would expect that if some malformed input comes from one event, it would come for the others too.

We really appreciate any hint that you could give us.

Thanks.

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:http://www.tid.es/ES/PAGINAS/disclaimer.aspx________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:http://www.tid.es/ES/PAGINAS/disclaimer.aspx________________________________

sorry for the delay. I tried your suggestion but still does not work. I have notice that if I do not specify the input/output encoding, the error is the same(always stops in the same event cutting it at the same character and stop of processing the rest of the file). However, comparing the resulting file with the one that we get when specifying enconding we have note that there are some differences. Specifically, the are some events that are spllited into two events because some break line is introduced(this happens when specifying the encoding). It looks like our files are not UTF-8 but the OS recognize them as UTF-8(some of them have BOM and others not). However, flume does not recognize them as UTF-8 because some weird character.

The default value for the available memory specified in $FLUME_HOME/bin/flume-ng is very small (20MB)

So, in your $FLUME_HOME/conf/flume-env.sh file

Try increasing your Java memory to a higher number (at most 50% of the available RAM)JAVA_OPTS="-Xms4096m -Xmx4096m -XX:MaxPermSize=4096m"

Then, in your agent configuration file:

Increase the maximum number of lines per event to a much higher number (like 5000).

Also change the output encoding to UTF-8

Let's make sure that the input encoding matches the encoding of the original event. This can cause problems if it is not the right one.

Let's see if these changes make a difference.Author and Instructor for the Upcoming Book and Lecture SeriesMassive Log Data Aggregation, Processing, Searching and Visualization with Open Source Softwarehttp://massivelogdata.comOn 27 August 2013 11:13, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:Hi Israel,

There are not event delimiters($) between fields of an event.I have tried forcing the encoding(because I believe this files, that are generated by our customer, are converted from ascii to utf-8 by BOM and they could contain characters with more bytes that the expected one):

but if i use a maxLineLenght of this size(250) then lot of events are truncated(event the max characters per line are 250):13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length exceeds max (250), truncating line!

if I take a look into the generated file, there are unrecognized chacarters: �� and events have been cut in a random way(there are lines with only 3 characters).

I have tried increasing the maxLineLenght parameter but I end getting a java heap space exception :(

I have a feeling they are coming from SpoolingDirectory and the events contains newline characters (even delimiter).

If this is the case, you are going to see the events split up whenever the parser encounters the delimiter.Author and Instructor for the Upcoming Book and Lecture SeriesMassive Log Data Aggregation, Processing, Searching and Visualization with Open Source Softwarehttp://massivelogdata.comOn 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Hello,

I am having some weird problem while processing events coming from a file with this format:UTF-8 Unicode (with BOM) English text, with CRLF line terminators

Some of the events in the file contain this text: "Marés". While some events are sent correctly without begin cut by flume, there are others that arrive incomplete. And even more, the process of sending more events (once one event has been cut) stops. We end with incomplete files on HDFS. We have isolate the problem: trying with roll file sink instead of HDFS , removing all the interceptors, etc. However, we still have the same problem. Apparently, the troublesome event does not have any hide weird character and files are generated automatically so we would expect that if some malformed input comes from one event, it would come for the others too.

We really appreciate any hint that you could give us.

Thanks.

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:http://www.tid.es/ES/PAGINAS/disclaimer.aspx________________________________

* First, I have remove the BOM character. I don't know for which reason they were adding F0FF BOM, which means UTF-16BE whereas the command "file –b" says the file is UTF-8. * I have used iconv to convert the files to the encoding I suspected they are: cat file.csv | iconv -c -f UTF-8 -t ISO-8859-1 >> file.csv.iso * I run flume with this configuration: * agent.sources.rpb.inputCharset = ISO-8859-1 * agent.sources.rpb.deserializer.maxLineLength = 300 * agent.sources.rpb.deserializer.outputCharset = UTF-8

The resulting file has all the events on the original file! However, some lines have been added. Usign diff, I have seen that it happens that flume is splitting some events into two different lines(only 9 from 180000 but still). The other thing I have observed is that the resulting file contains ^M character(no, the one obtained by using iconv does not contains it).

sorry for the delay. I tried your suggestion but still does not work. I have notice that if I do not specify the input/output encoding, the error is the same(always stops in the same event cutting it at the same character and stop of processing the rest of the file). However, comparing the resulting file with the one that we get when specifying enconding we have note that there are some differences. Specifically, the are some events that are spllited into two events because some break line is introduced(this happens when specifying the encoding). It looks like our files are not UTF-8 but the OS recognize them as UTF-8(some of them have BOM and others not). However, flume does not recognize them as UTF-8 because some weird character.

The default value for the available memory specified in $FLUME_HOME/bin/flume-ng is very small (20MB)

So, in your $FLUME_HOME/conf/flume-env.sh file

Try increasing your Java memory to a higher number (at most 50% of the available RAM)JAVA_OPTS="-Xms4096m -Xmx4096m -XX:MaxPermSize=4096m"

Then, in your agent configuration file:

Increase the maximum number of lines per event to a much higher number (like 5000).

Also change the output encoding to UTF-8

Let's make sure that the input encoding matches the encoding of the original event. This can cause problems if it is not the right one.

Let's see if these changes make a difference.Author and Instructor for the Upcoming Book and Lecture SeriesMassive Log Data Aggregation, Processing, Searching and Visualization with Open Source Softwarehttp://massivelogdata.comOn 27 August 2013 11:13, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:Hi Israel,

There are not event delimiters($) between fields of an event.I have tried forcing the encoding(because I believe this files, that are generated by our customer, are converted from ascii to utf-8 by BOM and they could contain characters with more bytes that the expected one):

but if i use a maxLineLenght of this size(250) then lot of events are truncated(event the max characters per line are 250):13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length exceeds max (250), truncating line!

if I take a look into the generated file, there are unrecognized chacarters: �� and events have been cut in a random way(there are lines with only 3 characters).

I have tried increasing the maxLineLenght parameter but I end getting a java heap space exception :(

I have a feeling they are coming from SpoolingDirectory and the events contains newline characters (even delimiter).

If this is the case, you are going to see the events split up whenever the parser encounters the delimiter.Author and Instructor for the Upcoming Book and Lecture SeriesMassive Log Data Aggregation, Processing, Searching and Visualization with Open Source Softwarehttp://massivelogdata.comOn 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Good news:The 9 lines begin cut were because the maxLineLenght(when truncated, they are added below as a different event). Great, so, definitely, I can deal with those files and flume and the final configuration is:

By the way, it works without removing the BOM character or converting to ISO-8859-1. Was a problem of indicating to Flume the right encode(and iconv was the tool that I used to "discover" it).Hope it helps to someone.

* First, I have remove the BOM character. I don't know for which reason they were adding F0FF BOM, which means UTF-16BE whereas the command "file –b" says the file is UTF-8. * I have used iconv to convert the files to the encoding I suspected they are: cat file.csv | iconv -c -f UTF-8 -t ISO-8859-1 >> file.csv.iso * I run flume with this configuration: * agent.sources.rpb.inputCharset = ISO-8859-1 * agent.sources.rpb.deserializer.maxLineLength = 300 * agent.sources.rpb.deserializer.outputCharset = UTF-8

The resulting file has all the events on the original file! However, some lines have been added. Usign diff, I have seen that it happens that flume is splitting some events into two different lines(only 9 from 180000 but still). The other thing I have observed is that the resulting file contains ^M character(no, the one obtained by using iconv does not contains it).

sorry for the delay. I tried your suggestion but still does not work. I have notice that if I do not specify the input/output encoding, the error is the same(always stops in the same event cutting it at the same character and stop of processing the rest of the file). However, comparing the resulting file with the one that we get when specifying enconding we have note that there are some differences. Specifically, the are some events that are spllited into two events because some break line is introduced(this happens when specifying the encoding). It looks like our files are not UTF-8 but the OS recognize them as UTF-8(some of them have BOM and others not). However, flume does not recognize them as UTF-8 because some weird character.

The default value for the available memory specified in $FLUME_HOME/bin/flume-ng is very small (20MB)

So, in your $FLUME_HOME/conf/flume-env.sh file

Try increasing your Java memory to a higher number (at most 50% of the available RAM)JAVA_OPTS="-Xms4096m -Xmx4096m -XX:MaxPermSize=4096m"

Then, in your agent configuration file:

Increase the maximum number of lines per event to a much higher number (like 5000).

Also change the output encoding to UTF-8

Let's make sure that the input encoding matches the encoding of the original event. This can cause problems if it is not the right one.

Let's see if these changes make a difference.Author and Instructor for the Upcoming Book and Lecture SeriesMassive Log Data Aggregation, Processing, Searching and Visualization with Open Source Softwarehttp://massivelogdata.comOn 27 August 2013 11:13, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:Hi Israel,

There are not event delimiters($) between fields of an event.I have tried forcing the encoding(because I believe this files, that are generated by our customer, are converted from ascii to utf-8 by BOM and they could contain characters with more bytes that the expected one):

but if i use a maxLineLenght of this size(250) then lot of events are truncated(event the max characters per line are 250):13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length exceeds max (250), truncating line!

if I take a look into the generated file, there are unrecognized chacarters: �� and events have been cut in a random way(there are lines with only 3 characters).

I have tried increasing the maxLineLenght parameter but I end getting a java heap space exception :(

One more question. From Flume 1.4 documentation:deserializer.maxLineLength 2048 Maximum number of characters to include in a single event. If a line exceeds this length, it is truncated, and the remaining characters on the line will appear in a subsequent event.

By then I need to specify values like 300 ? If I do not specify, my events get truncated.Thanks,

Good news:The 9 lines begin cut were because the maxLineLenght(when truncated, they are added below as a different event). Great, so, definitely, I can deal with those files and flume and the final configuration is:

By the way, it works without removing the BOM character or converting to ISO-8859-1. Was a problem of indicating to Flume the right encode(and iconv was the tool that I used to "discover" it).Hope it helps to someone.

* First, I have remove the BOM character. I don't know for which reason they were adding F0FF BOM, which means UTF-16BE whereas the command "file –b" says the file is UTF-8. * I have used iconv to convert the files to the encoding I suspected they are: cat file.csv | iconv -c -f UTF-8 -t ISO-8859-1 >> file.csv.iso * I run flume with this configuration: * agent.sources.rpb.inputCharset = ISO-8859-1 * agent.sources.rpb.deserializer.maxLineLength = 300 * agent.sources.rpb.deserializer.outputCharset = UTF-8

The resulting file has all the events on the original file! However, some lines have been added. Usign diff, I have seen that it happens that flume is splitting some events into two different lines(only 9 from 180000 but still). The other thing I have observed is that the resulting file contains ^M character(no, the one obtained by using iconv does not contains it).

sorry for the delay. I tried your suggestion but still does not work. I have notice that if I do not specify the input/output encoding, the error is the same(always stops in the same event cutting it at the same character and stop of processing the rest of the file). However, comparing the resulting file with the one that we get when specifying enconding we have note that there are some differences. Specifically, the are some events that are spllited into two events because some break line is introduced(this happens when specifying the encoding). It looks like our files are not UTF-8 but the OS recognize them as UTF-8(some of them have BOM and others not). However, flume does not recognize them as UTF-8 because some weird character.

The default value for the available memory specified in $FLUME_HOME/bin/flume-ng is very small (20MB)

So, in your $FLUME_HOME/conf/flume-env.sh file

Try increasing your Java memory to a higher number (at most 50% of the available RAM)JAVA_OPTS="-Xms4096m -Xmx4096m -XX:MaxPermSize=4096m"

Then, in your agent configuration file:

Increase the maximum number of lines per event to a much higher number (like 5000).

Also change the output encoding to UTF-8

Let's make sure that the input encoding matches the encoding of the original event. This can cause problems if it is not the right one.

Let's see if these changes make a difference.Author and Instructor for the Upcoming Book and Lecture SeriesMassive Log Data Aggregation, Processing, Searching and Visualization with Open Source Softwarehttp://massivelogdata.comOn 27 August 2013 11:13, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:Hi Israel,

There are not event delimiters($) between fields of an event.I have tried forcing the encoding(because I believe this files, that are generated by our customer, are converted from ascii to utf-8 by BOM and they could contain characters with more bytes that the expected one):

agent.sources.rpb.in

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext