hive-user mailing list archives

Hi Furcy,
Thats lot of information.Thanks a lot
On Feb 13, 2015 3:40 PM, "Furcy Pin" <furcy.pin@flaminem.com> wrote:
> Hi Sreeman,
>
> Unfortunately, I don't think that Hive built-in format can currently read
> csv files with fields enclosed in double quotes.
> More generally, for having ingested quite a lot of messy csv files myself,
> I would recommend you to write a MapReduce (or Spark) job
> for cleaning your csv before giving it to Hive. This is what I did.
> The (other) kind of issue I've met were among :
>
> - File not encoded in utf-8, making special characters unreadable for
> Hive
> - Some lines with missing or too many columns, which could shift your
> columns and ruin your stats.
> - Some lines with unreadable characters (probably data corruption)
> - I even got some lines with java stack traces in it
>
> I hope your csv is cleaner than that, and would recommend that if you have
> the control on how it is generated, replace your current separator with tab
> (and replace inline tabs with \t) or something like that.
>
> There might be some open source tools for data cleaning already out there.
> I plan to release mine one day, once I've migrated it to Spark maybe, and
> if my company agrees.
>
> If you're lazy, I heard that Dataiku Studio (which has a free version) can
> do such thing, though I never used it myself.
>
> Hope this helps,
>
> Furcy
>
>
>
> 2015-02-13 7:30 GMT+01:00 Slava Markeyev <slava.markeyev@upsight.com>:
>
>> You can use lazy simple serde with ROW FORMAT DELIMITED FIELDS TERMINATED
>> BY ',' ESCAPED BY '\'. Check the DDL for details
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>>
>>
>>
>> On Thu, Feb 12, 2015 at 8:19 PM, Sreeman <sreebalineni@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> How all of you are creating hive/Impala table when the CSV file has some
>>> values with COMMA in between. it is like
>>>
>>> sree,12345,"payment made,but it is not successful"
>>>
>>>
>>>
>>>
>>>
>>> I know opencsv serde is there but it is not available in lower versions
>>> of Hive 14.0
>>>
>>>
>>>
>>
>>
>>
>> --
>>
>> Slava Markeyev | Engineering | Upsight
>> Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
>> <http://www.linkedin.com/in/slavamarkeyev>
>>
>
>