[ https://issues.apache.org/jira/browse/ASTERIXDB-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326139#comment-15326139
]
Abdullah Alamoudi commented on ASTERIXDB-1264:
----------------------------------------------
YES.
That would be the first (and probably most time consuming) step :)
> Feed didn't release lock if the ingesting hit some exceptions
> -------------------------------------------------------------
>
> Key: ASTERIXDB-1264
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1264
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: Feeds
> Reporter: Jianfeng Jia
> Assignee: Abdullah Alamoudi
> Attachments: cc.log, nc.log
>
>
> This is a discussed issue in the mailing list. I copy it here to make it more tractable
and shareable.
> I hit an wield issue that is reproducible, but only if the data has duplications and
also is large enough. Let me explained it step by step:
> 1. The dataset is very simple that only has two fields.
> DDL AQL:
> {code}
> drop dataverse test if exists;
> create dataverse test;
> use dataverse test;
> create type t_test as closed{
> fa: int64,
> fb : int64
> }
> create dataset ds_test(t_test) primary key fa;
> create feed fd_test using socket_adapter
> (
> ("sockets"="nc1:10001"),
> ("address-type"="nc"),
> ("type-name"="t_test"),
> ("format"="adm"),
> ("duration"="1200")
> );
> set wait-for-completion-feed "false";
> connect feed fd_test to dataset ds_test using policy AdvancedFT_Discard;
> {code}
> ——————————————————————————————
> That AdvancedFT_Discard policy will ignore the exception from the insertion and keep
ingesting.
> 2. Ingesting the data by a very simple socked adapter which reads the record one by one
from an adm file. The src is here:https://github.com/JavierJia/twitter-tracker/blob/master/src/main/java/edu/uci/ics/twitter/asterix/feed/FileFeedSocketAdapterClient.java
> The data and the app package is provided here: https://drive.google.com/folderview?id=0B423M7wGZj9dYVQ1TkpBNzcwSlE&usp=sharing
> To feed the data you can run:
> ./bin/feedFile -u 172.17.0.2 -p 10001 -c 5000000 ~/data/twitter/test.adm
> -u for sever url
> -p for server port
> -c for count of line you want to ingest
> 3. After ingestion, all the requests about the ds_test was hanging. There is no exception
and no responds for hours. However it can respond any other queries that on other datasets,
like Metadata.
> That data contains some duplicated records which should trigger the insert exception.
If I change the count from 5000000 to lower, let’s say 3000000, it has no problems, although
it contains duplications as well.
> Answer from [~amoudi] :
> I know exactly what is going on here. The problem is you pointed out is
> caused by the duplicate keys. If I remember correctly, the main issue is
> that locks that are placed on the primary keys are not released.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)