hi，all.I want to describe a phenomenon that happens to our hbase cluster.I use puts(List<Put>) to insert many records with writing hlog enable,and some time later I delete all of these records with writing hlog disable.When one week later, i scan the table, I found some records I have deletereappear again.It is an interesting case. In my opinion, if we delete data without enablewriting hlog, when regionserver fails, the log will replay in anotherregionserver.Can anyone tell me if I persist on deleting records without enable writinghlog, is there a way to prevent these records from reappearing again sometime later?

Time of course is relative, so I have to ask what occurred between the write and the delete? How much time? Did you have any compactions in between the write and the delete?

Why are you not consistent in your use of the WAL ? On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]> wrote:

> hi，all.> I want to describe a phenomenon that happens to our hbase cluster.> I use puts(List<Put>) to insert many records with writing hlog enable,> and some time later I delete all of these records with writing hlog disable.> When one week later, i scan the table, I found some records I have delete> reappear again.> It is an interesting case. In my opinion, if we delete data without enable> writing hlog, when regionserver fails, the log will replay in another> regionserver.> Can anyone tell me if I persist on deleting records without enable writing> hlog, is there a way to prevent these records from reappearing again some> time later?> > Cheers!> -- > Bing Jiang> weibo: http://weibo.com/jiangbinglover> BLOG: http://blog.sina.com.cn/jiangbinglover> BLOG: http://www.binospace.com> National Research Center for Intelligent Computing Systems> Institute of Computing technology> Graduate University of Chinese Academy of Science

> Some time later?>> Time of course is relative, so I have to ask what occurred between the> write and the delete?> How much time? Did you have any compactions in between the write and the> delete?>> Why are you not consistent in your use of the WAL ?>>> On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]> wrote:>> > hi，all.> > I want to describe a phenomenon that happens to our hbase cluster.> > I use puts(List<Put>) to insert many records with writing hlog enable,> > and some time later I delete all of these records with writing hlog> disable.> > When one week later, i scan the table, I found some records I have delete> > reappear again.> > It is an interesting case. In my opinion, if we delete data without> enable> > writing hlog, when regionserver fails, the log will replay in another> > regionserver.> > Can anyone tell me if I persist on deleting records without enable> writing> > hlog, is there a way to prevent these records from reappearing again some> > time later?> >> > Cheers!> > --> > Bing Jiang> > weibo: http://weibo.com/jiangbinglover> > BLOG: http://blog.sina.com.cn/jiangbinglover> > BLOG: http://www.binospace.com> > National Research Center for Intelligent Computing Systems> > Institute of Computing technology> > Graduate University of Chinese Academy of Science>>

I have it on my list of things to do to allow deferred WAL flush as a per operation option (right now it's a CF option).You really do not want to do anything with the WAL off. If you use deferred flush there is still a chance that this might happen (the RS could die in the few seconds after a Delete before it is flushed to the WAL), but it should be a rare occurrance.-- Lars

> Some time later?>> Time of course is relative, so I have to ask what occurred between the> write and the delete?> How much time? Did you have any compactions in between the write and the> delete?>> Why are you not consistent in your use of the WAL ?>>> On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]> wrote:>> > hi，all.> > I want to describe a phenomenon that happens to our hbase cluster.> > I use puts(List<Put>) to insert many records with writing hlog enable,> > and some time later I delete all of these records with writing hlog> disable.> > When one week later, i scan the table, I found some records I have delete> > reappear again.> > It is an interesting case. In my opinion, if we delete data without> enable> > writing hlog, when regionserver fails, the log will replay in another> > regionserver.> > Can anyone tell me if I persist on deleting records without enable> writing> > hlog, is there a way to prevent these records from reappearing again some> > time later?> >> > Cheers!> > --> > Bing Jiang> > weibo: http://weibo.com/jiangbinglover> > BLOG: http://blog.sina.com.cn/jiangbinglover> > BLOG: http://www.binospace.com> > National Research Center for Intelligent Computing Systems> > Institute of Computing technology> > Graduate University of Chinese Academy of Science>>

Thanks for all your suggestion and talk.One idea occurs to me why not check or restore wal when compactionexecutes. If it does, hbase can drop some unused hlog, I think that will beeffective to the issue.please correct me if I am wrong.

---Bing

2012/11/22 lars hofhansl <[EMAIL PROTECTED]>

> I have it on my list of things to do to allow deferred WAL flush as a per> operation option (right now it's a CF option).> You really do not want to do anything with the WAL off. If you use> deferred flush there is still a chance that this might happen (the RS could> die in the few seconds after a Delete before it is flushed to the WAL), but> it should be a rare occurrance.>>> -- Lars>>>> ________________________________> From: Bing Jiang <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]> Sent: Wednesday, November 21, 2012 7:20 AM> Subject: Re: delete rows without writing HLog may be appear in the future?>> we need to confirm that put must be safe,but deletes must be quick and> low-latency.> On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]>> wrote:>> > Some time later?> >> > Time of course is relative, so I have to ask what occurred between the> > write and the delete?> > How much time? Did you have any compactions in between the write and the> > delete?> >> > Why are you not consistent in your use of the WAL ?> >> >> > On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]>> wrote:> >> > > hi，all.> > > I want to describe a phenomenon that happens to our hbase cluster.> > > I use puts(List<Put>) to insert many records with writing hlog enable,> > > and some time later I delete all of these records with writing hlog> > disable.> > > When one week later, i scan the table, I found some records I have> delete> > > reappear again.> > > It is an interesting case. In my opinion, if we delete data without> > enable> > > writing hlog, when regionserver fails, the log will replay in another> > > regionserver.> > > Can anyone tell me if I persist on deleting records without enable> > writing> > > hlog, is there a way to prevent these records from reappearing again> some> > > time later?> > >> > > Cheers!> > > --> > > Bing Jiang> > > weibo: http://weibo.com/jiangbinglover> > > BLOG: http://blog.sina.com.cn/jiangbinglover> > > BLOG: http://www.binospace.com> > > National Research Center for Intelligent Computing Systems> > > Institute of Computing technology> > > Graduate University of Chinese Academy of Science> >> >>

In our hbase cluster, I test if delete records with hlog or without.Attachment is my my test.The result of test can testify why I make a decision of delete rows withouthlog .2012/11/22 Bing Jiang <[EMAIL PROTECTED]>

> Thanks for all your suggestion and talk.> One idea occurs to me why not check or restore wal when compaction> executes. If it does, hbase can drop some unused hlog, I think that will be> effective to the issue.> please correct me if I am wrong.>> ---Bing>> 2012/11/22 lars hofhansl <[EMAIL PROTECTED]>>>> I have it on my list of things to do to allow deferred WAL flush as a per>> operation option (right now it's a CF option).>> You really do not want to do anything with the WAL off. If you use>> deferred flush there is still a chance that this might happen (the RS could>> die in the few seconds after a Delete before it is flushed to the WAL), but>> it should be a rare occurrance.>>>>>> -- Lars>>>>>>>> ________________________________>> From: Bing Jiang <[EMAIL PROTECTED]>>> To: [EMAIL PROTECTED]>> Sent: Wednesday, November 21, 2012 7:20 AM>> Subject: Re: delete rows without writing HLog may be appear in the future?>>>> we need to confirm that put must be safe,but deletes must be quick and>> low-latency.>> On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]>>> wrote:>>>> > Some time later?>> >>> > Time of course is relative, so I have to ask what occurred between the>> > write and the delete?>> > How much time? Did you have any compactions in between the write and the>> > delete?>> >>> > Why are you not consistent in your use of the WAL ?>> >>> >>> > On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]>>> wrote:>> >>> > > hi，all.>> > > I want to describe a phenomenon that happens to our hbase cluster.>> > > I use puts(List<Put>) to insert many records with writing hlog enable,>> > > and some time later I delete all of these records with writing hlog>> > disable.>> > > When one week later, i scan the table, I found some records I have>> delete>> > > reappear again.>> > > It is an interesting case. In my opinion, if we delete data without>> > enable>> > > writing hlog, when regionserver fails, the log will replay in another>> > > regionserver.>> > > Can anyone tell me if I persist on deleting records without enable>> > writing>> > > hlog, is there a way to prevent these records from reappearing again>> some>> > > time later?>> > >>> > > Cheers!>> > > -->> > > Bing Jiang>> > > weibo: http://weibo.com/jiangbinglover>> > > BLOG: http://blog.sina.com.cn/jiangbinglover>> > > BLOG: http://www.binospace.com>> > > National Research Center for Intelligent Computing Systems>> > > Institute of Computing technology>> > > Graduate University of Chinese Academy of Science>> >>> >>>>>>> --> Bing Jiang> Tel：(86)134-2619-1361>> weibo: http://weibo.com/jiangbinglover> BLOG: http://blog.sina.com.cn/jiangbinglover> National Research Center for Intelligent Computing Systems> Institute of Computing technology> Graduate University of Chinese Academy of Science>>-- Bing JiangTel：(86)134-2619-1361weibo: http://weibo.com/jiangbingloverBLOG: http://blog.sina.com.cn/jiangbingloverNational Research Center for Intelligent Computing SystemsInstitute of Computing technologyGraduate University of Chinese Academy of Science

Sorry Bing.. am not much clear as what you suggest'One idea occurs to me why not check or restore wal when compactionexecutes. If it does, hbase can drop some unused hlog'.

Could you be more clear? Are you trying to read the WAL while compactionis going on?

RegardsRam

On Thu, Nov 22, 2012 at 9:23 AM, Bing Jiang <[EMAIL PROTECTED]>wrote:

> In our hbase cluster, I test if delete records with hlog or without.> Attachment is my my test.> The result of test can testify why I make a decision of delete rows> without hlog .>>>> 2012/11/22 Bing Jiang <[EMAIL PROTECTED]>>>> Thanks for all your suggestion and talk.>> One idea occurs to me why not check or restore wal when compaction>> executes. If it does, hbase can drop some unused hlog, I think that will be>> effective to the issue.>> please correct me if I am wrong.>>>> ---Bing>>>> 2012/11/22 lars hofhansl <[EMAIL PROTECTED]>>>>>> I have it on my list of things to do to allow deferred WAL flush as a>>> per operation option (right now it's a CF option).>>> You really do not want to do anything with the WAL off. If you use>>> deferred flush there is still a chance that this might happen (the RS could>>> die in the few seconds after a Delete before it is flushed to the WAL), but>>> it should be a rare occurrance.>>>>>>>>> -- Lars>>>>>>>>>>>> ________________________________>>> From: Bing Jiang <[EMAIL PROTECTED]>>>> To: [EMAIL PROTECTED]>>> Sent: Wednesday, November 21, 2012 7:20 AM>>> Subject: Re: delete rows without writing HLog may be appear in the>>> future?>>>>>> we need to confirm that put must be safe,but deletes must be quick and>>> low-latency.>>> On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]>>>> wrote:>>>>>> > Some time later?>>> >>>> > Time of course is relative, so I have to ask what occurred between the>>> > write and the delete?>>> > How much time? Did you have any compactions in between the write and>>> the>>> > delete?>>> >>>> > Why are you not consistent in your use of the WAL ?>>> >>>> >>>> > On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]>>>> wrote:>>> >>>> > > hi，all.>>> > > I want to describe a phenomenon that happens to our hbase cluster.>>> > > I use puts(List<Put>) to insert many records with writing hlog>>> enable,>>> > > and some time later I delete all of these records with writing hlog>>> > disable.>>> > > When one week later, i scan the table, I found some records I have>>> delete>>> > > reappear again.>>> > > It is an interesting case. In my opinion, if we delete data without>>> > enable>>> > > writing hlog, when regionserver fails, the log will replay in another>>> > > regionserver.>>> > > Can anyone tell me if I persist on deleting records without enable>>> > writing>>> > > hlog, is there a way to prevent these records from reappearing again>>> some>>> > > time later?>>> > >>>> > > Cheers!>>> > > -->>> > > Bing Jiang>>> > > weibo: http://weibo.com/jiangbinglover>>> > > BLOG: http://blog.sina.com.cn/jiangbinglover>>> > > BLOG: http://www.binospace.com>>> > > National Research Center for Intelligent Computing Systems>>> > > Institute of Computing technology>>> > > Graduate University of Chinese Academy of Science>>> >>>> >>>>>>>>>>>> -->> Bing Jiang>> Tel：(86)134-2619-1361>>>> weibo: http://weibo.com/jiangbinglover>> BLOG: http://blog.sina.com.cn/jiangbinglover>> National Research Center for Intelligent Computing Systems>> Institute of Computing technology>> Graduate University of Chinese Academy of Science>>>>>>> --> Bing Jiang> Tel：(86)134-2619-1361> weibo: http://weibo.com/jiangbinglover> BLOG: http://blog.sina.com.cn/jiangbinglover> National Research Center for Intelligent Computing Systems> Institute of Computing technology> Graduate University of Chinese Academy of Science>>

I think when compaction is intrigued, if the records has already flushedinto hdfs, whether it is worthless to retain the Hlog before that timestamp.In other ways, for example, some rows are deleted, then it executes acompaction, at the same time , the rows do not exist. So the hlog beforethe timestamp of compaction is not useful, and we can drop these unused wal.This is view of my own, please correct me if wrong.---Bing2012/11/22 ramkrishna vasudevan <[EMAIL PROTECTED]>

> Sorry Bing.. am not much clear as what you suggest> 'One idea occurs to me why not check or restore wal when compaction> executes. If it does, hbase can drop some unused hlog'.>> Could you be more clear? Are you trying to read the WAL while compaction> is going on?>> Regards> Ram>> On Thu, Nov 22, 2012 at 9:23 AM, Bing Jiang <[EMAIL PROTECTED]> >wrote:>> > In our hbase cluster, I test if delete records with hlog or without.> > Attachment is my my test.> > The result of test can testify why I make a decision of delete rows> > without hlog .> >> >> >> > 2012/11/22 Bing Jiang <[EMAIL PROTECTED]>> >> >> Thanks for all your suggestion and talk.> >> One idea occurs to me why not check or restore wal when compaction> >> executes. If it does, hbase can drop some unused hlog, I think that> will be> >> effective to the issue.> >> please correct me if I am wrong.> >>> >> ---Bing> >>> >> 2012/11/22 lars hofhansl <[EMAIL PROTECTED]>> >>> >>> I have it on my list of things to do to allow deferred WAL flush as a> >>> per operation option (right now it's a CF option).> >>> You really do not want to do anything with the WAL off. If you use> >>> deferred flush there is still a chance that this might happen (the RS> could> >>> die in the few seconds after a Delete before it is flushed to the> WAL), but> >>> it should be a rare occurrance.> >>>> >>>> >>> -- Lars> >>>> >>>> >>>> >>> ________________________________> >>> From: Bing Jiang <[EMAIL PROTECTED]>> >>> To: [EMAIL PROTECTED]> >>> Sent: Wednesday, November 21, 2012 7:20 AM> >>> Subject: Re: delete rows without writing HLog may be appear in the> >>> future?> >>>> >>> we need to confirm that put must be safe,but deletes must be quick and> >>> low-latency.> >>> On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]>> >>> wrote:> >>>> >>> > Some time later?> >>> >> >>> > Time of course is relative, so I have to ask what occurred between> the> >>> > write and the delete?> >>> > How much time? Did you have any compactions in between the write and> >>> the> >>> > delete?> >>> >> >>> > Why are you not consistent in your use of the WAL ?> >>> >> >>> >> >>> > On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]>> >>> wrote:> >>> >> >>> > > hi，all.> >>> > > I want to describe a phenomenon that happens to our hbase cluster.> >>> > > I use puts(List<Put>) to insert many records with writing hlog> >>> enable,> >>> > > and some time later I delete all of these records with writing hlog> >>> > disable.> >>> > > When one week later, i scan the table, I found some records I have> >>> delete> >>> > > reappear again.> >>> > > It is an interesting case. In my opinion, if we delete data without> >>> > enable> >>> > > writing hlog, when regionserver fails, the log will replay in> another> >>> > > regionserver.> >>> > > Can anyone tell me if I persist on deleting records without enable> >>> > writing> >>> > > hlog, is there a way to prevent these records from reappearing> again> >>> some> >>> > > time later?> >>> > >> >>> > > Cheers!> >>> > > --> >>> > > Bing Jiang> >>> > > weibo: http://weibo.com/jiangbinglover> >>> > > BLOG: http://blog.sina.com.cn/jiangbinglover> >>> > > BLOG: http://www.binospace.com> >>> > > National Research Center for Intelligent Computing Systems> >>> > > Institute of Computing technology> >>> > > Graduate University of Chinese Academy of Science

I think you are referring to a memstore flush.The HLog represents the set of changes that are in the memstore (in ram) but not in an HFile on disk, yet.I am pretty sure there is no flaw in the flush/compaction logic when it comes to deletes.If you do not write the deletes to the WAL and the RS crashes it is expected that deletes there were not flushed to disk are lost.

(And there's also HBASE-6059, which in some case resurfaces deleted data even when it was flushed to the WAL).-- Lars________________________________ From: Bing Jiang <[EMAIL PROTECTED]>To: [EMAIL PROTECTED] Sent: Wednesday, November 21, 2012 8:36 PMSubject: Re: delete rows without writing HLog may be appear in the future?

I think when compaction is intrigued, if the records has already flushedinto hdfs, whether it is worthless to retain the Hlog before that timestamp.In other ways, for example, some rows are deleted, then it executes acompaction, at the same time , the rows do not exist. So the hlog beforethe timestamp of compaction is not useful, and we can drop these unused wal.This is view of my own, please correct me if wrong.---Bing2012/11/22 ramkrishna vasudevan <[EMAIL PROTECTED]>

> Sorry Bing.. am not much clear as what you suggest> 'One idea occurs to me why not check or restore wal when compaction> executes. If it does, hbase can drop some unused hlog'.>> Could you be more clear? Are you trying to read the WAL while compaction> is going on?>> Regards> Ram>> On Thu, Nov 22, 2012 at 9:23 AM, Bing Jiang <[EMAIL PROTECTED]> >wrote:>> > In our hbase cluster, I test if delete records with hlog or without.> > Attachment is my my test.> > The result of test can testify why I make a decision of delete rows> > without hlog .> >> >> >> > 2012/11/22 Bing Jiang <[EMAIL PROTECTED]>> >> >> Thanks for all your suggestion and talk.> >> One idea occurs to me why not check or restore wal when compaction> >> executes. If it does, hbase can drop some unused hlog, I think that> will be> >> effective to the issue.> >> please correct me if I am wrong.> >>> >> ---Bing> >>> >> 2012/11/22 lars hofhansl <[EMAIL PROTECTED]>> >>> >>> I have it on my list of things to do to allow deferred WAL flush as a> >>> per operation option (right now it's a CF option).> >>> You really do not want to do anything with the WAL off. If you use> >>> deferred flush there is still a chance that this might happen (the RS> could> >>> die in the few seconds after a Delete before it is flushed to the> WAL), but> >>> it should be a rare occurrance.> >>>> >>>> >>> -- Lars> >>>> >>>> >>>> >>> ________________________________> >>> From: Bing Jiang <[EMAIL PROTECTED]>> >>> To: [EMAIL PROTECTED]> >>> Sent: Wednesday, November 21, 2012 7:20 AM> >>> Subject: Re: delete rows without writing HLog may be appear in the> >>> future?> >>>> >>> we need to confirm that put must be safe,but deletes must be quick and> >>> low-latency.> >>> On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]>> >>> wrote:> >>>> >>> > Some time later?> >>> >> >>> > Time of course is relative, so I have to ask what occurred between> the> >>> > write and the delete?> >>> > How much time? Did you have any compactions in between the write and> >>> the> >>> > delete?> >>> >> >>> > Why are you not consistent in your use of the WAL ?> >>> >> >>> >> >>> > On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]>> >>> wrote:> >>> >> >>> > > hi，all.> >>> > > I want to describe a phenomenon that happens to our hbase cluster.> >>> > > I use puts(List<Put>) to insert many records with writing hlog> >>> enable,> >>> > > and some time later I delete all of these records with writing hlog> >>> > disable.> >>> > > When one week later, i scan the table, I found some records I have> >>> delete> >>> > > reappear again.> >>> > > It is an interesting case. In my opinion, if we delete data without

If I set hbase.hregion.preclose.flush.size to zero, can HBase guaranteethat when HRegionServer quit, whether It will execute the last flush.As Lars says, the issue refers to memstore flush, I have check that thedefault value of ‘hbase.hregion.preclose.flush.size’ is 1024 * 1024 * 5.I think if under the circumstance that we cannot bear with the lowperformance of delete enable hlog, set ‘hbase.hregion.preclose.flush.size’to zero, is it another choice?Thanks.

----Bing

2012/11/22 lars hofhansl <[EMAIL PROTECTED]>

> Hi Bing,>> I think you are referring to a memstore flush.> The HLog represents the set of changes that are in the memstore (in ram)> but not in an HFile on disk, yet.>>> I am pretty sure there is no flaw in the flush/compaction logic when it> comes to deletes.>>> If you do not write the deletes to the WAL and the RS crashes it is> expected that deletes there were not flushed to disk are lost.>> (And there's also HBASE-6059, which in some case resurfaces deleted data> even when it was flushed to the WAL).>>> -- Lars>>> ________________________________> From: Bing Jiang <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]> Sent: Wednesday, November 21, 2012 8:36 PM> Subject: Re: delete rows without writing HLog may be appear in the future?>> I think when compaction is intrigued, if the records has already flushed> into hdfs, whether it is worthless to retain the Hlog before that> timestamp.> In other ways, for example, some rows are deleted, then it executes a> compaction, at the same time , the rows do not exist. So the hlog before> the timestamp of compaction is not useful, and we can drop these unused> wal.> This is view of my own, please correct me if wrong.> ---> Bing>>> 2012/11/22 ramkrishna vasudevan <[EMAIL PROTECTED]>>> > Sorry Bing.. am not much clear as what you suggest> > 'One idea occurs to me why not check or restore wal when compaction> > executes. If it does, hbase can drop some unused hlog'.> >> > Could you be more clear? Are you trying to read the WAL while compaction> > is going on?> >> > Regards> > Ram> >> > On Thu, Nov 22, 2012 at 9:23 AM, Bing Jiang <[EMAIL PROTECTED]> > >wrote:> >> > > In our hbase cluster, I test if delete records with hlog or without.> > > Attachment is my my test.> > > The result of test can testify why I make a decision of delete rows> > > without hlog .> > >> > >> > >> > > 2012/11/22 Bing Jiang <[EMAIL PROTECTED]>> > >> > >> Thanks for all your suggestion and talk.> > >> One idea occurs to me why not check or restore wal when compaction> > >> executes. If it does, hbase can drop some unused hlog, I think that> > will be> > >> effective to the issue.> > >> please correct me if I am wrong.> > >>> > >> ---Bing> > >>> > >> 2012/11/22 lars hofhansl <[EMAIL PROTECTED]>> > >>> > >>> I have it on my list of things to do to allow deferred WAL flush as a> > >>> per operation option (right now it's a CF option).> > >>> You really do not want to do anything with the WAL off. If you use> > >>> deferred flush there is still a chance that this might happen (the RS> > could> > >>> die in the few seconds after a Delete before it is flushed to the> > WAL), but> > >>> it should be a rare occurrance.> > >>>> > >>>> > >>> -- Lars> > >>>> > >>>> > >>>> > >>> ________________________________> > >>> From: Bing Jiang <[EMAIL PROTECTED]>> > >>> To: [EMAIL PROTECTED]> > >>> Sent: Wednesday, November 21, 2012 7:20 AM> > >>> Subject: Re: delete rows without writing HLog may be appear in the> > >>> future?> > >>>> > >>> we need to confirm that put must be safe,but deletes must be quick> and> > >>> low-latency.> > >>> On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]> >> > >>> wrote:> > >>>> > >>> > Some time later?> > >>> >> > >>> > Time of course is relative, so I have to ask what occurred between> > the> > >>> > write and the delete?

Which version of hbase? Because coprocessors are available in 92 and aboveversions only.

RegardsRam

On Wed, Nov 21, 2012 at 8:50 PM, Bing Jiang <[EMAIL PROTECTED]>wrote:

> we need to confirm that put must be safe,but deletes must be quick and> low-latency.> On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]>> wrote:>> > Some time later?> >> > Time of course is relative, so I have to ask what occurred between the> > write and the delete?> > How much time? Did you have any compactions in between the write and the> > delete?> >> > Why are you not consistent in your use of the WAL ?> >> >> > On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]>> wrote:> >> > > hi，all.> > > I want to describe a phenomenon that happens to our hbase cluster.> > > I use puts(List<Put>) to insert many records with writing hlog enable,> > > and some time later I delete all of these records with writing hlog> > disable.> > > When one week later, i scan the table, I found some records I have> delete> > > reappear again.> > > It is an interesting case. In my opinion, if we delete data without> > enable> > > writing hlog, when regionserver fails, the log will replay in another> > > regionserver.> > > Can anyone tell me if I persist on deleting records without enable> > writing> > > hlog, is there a way to prevent these records from reappearing again> some> > > time later?> > >> > > Cheers!> > > --> > > Bing Jiang> > > weibo: http://weibo.com/jiangbinglover> > > BLOG: http://blog.sina.com.cn/jiangbinglover> > > BLOG: http://www.binospace.com> > > National Research Center for Intelligent Computing Systems> > > Institute of Computing technology> > > Graduate University of Chinese Academy of Science> >> >>

yes,hbase has made a compaction between batch-put and deletes. any ideas?

On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:>> Some time later?>> Time of course is relative, so I have to ask what occurred between thewrite and the delete?> How much time? Did you have any compactions in between the write and thedelete?>> Why are you not consistent in your use of the WAL ?>>> On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]> wrote:>> > hi，all.> > I want to describe a phenomenon that happens to our hbase cluster.> > I use puts(List<Put>) to insert many records with writing hlog enable,> > and some time later I delete all of these records with writing hlogdisable.> > When one week later, i scan the table, I found some records I havedelete> > reappear again.> > It is an interesting case. In my opinion, if we delete data withoutenable> > writing hlog, when regionserver fails, the log will replay in another> > regionserver.> > Can anyone tell me if I persist on deleting records without enablewriting> > hlog, is there a way to prevent these records from reappearing againsome> > time later?> >> > Cheers!> > --> > Bing Jiang> > weibo: http://weibo.com/jiangbinglover> > BLOG: http://blog.sina.com.cn/jiangbinglover> > BLOG: http://www.binospace.com> > National Research Center for Intelligent Computing Systems> > Institute of Computing technology> > Graduate University of Chinese Academy of Science>

> yes,hbase has made a compaction between batch-put and deletes. any ideas?>> On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]>> wrote:> >> > Some time later?> >> > Time of course is relative, so I have to ask what occurred between the> write and the delete?> > How much time? Did you have any compactions in between the write and the> delete?> >> > Why are you not consistent in your use of the WAL ?> >> >> > On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]>> wrote:> >> > > hi，all.> > > I want to describe a phenomenon that happens to our hbase cluster.> > > I use puts(List<Put>) to insert many records with writing hlog enable,> > > and some time later I delete all of these records with writing hlog> disable.> > > When one week later, i scan the table, I found some records I have> delete> > > reappear again.> > > It is an interesting case. In my opinion, if we delete data without> enable> > > writing hlog, when regionserver fails, the log will replay in another> > > regionserver.> > > Can anyone tell me if I persist on deleting records without enable> writing> > > hlog, is there a way to prevent these records from reappearing again> some> > > time later?> > >> > > Cheers!> > > --> > > Bing Jiang> > > weibo: http://weibo.com/jiangbinglover> > > BLOG: http://blog.sina.com.cn/jiangbinglover> > > BLOG: http://www.binospace.com> > > National Research Center for Intelligent Computing Systems> > > Institute of Computing technology> > > Graduate University of Chinese Academy of Science> >>

In our apps, deletes will be frequent, and it occurs to each records everytime, if write hlog, the performance and response will be low. In fact,wecan bear with some records with delete fail, but recently I have found morerecords delete some time ago, for example, one week , they reappearagain.Then, that makes me curious about what should do next., delete withwriting hlog, or put without hlog....On Nov 21, 2012 11:19 PM, "Kevin O'dell" <[EMAIL PROTECTED]> wrote:

I'm not sure why you would be seeing the data after a delete. If what you said is true, then either the delete got lost or the delete happened before the insert (which doesn't make sense because the delete should have thrown an exception...)

I am also confused by what you mean that the delete has to be low latency. What's the timing difference between writing a delete to the WAL or bypassing the WAL.

Also I am concerned by your statement that you do the delete, it looks like it was deleted, only a week or two later, its back. That doesn't make sense because the data written to the WAL would have long since been flushed and then if you delete, then the flag should have still remained. Can you check the timestamp of the cell?

Something isn't right.

On Nov 21, 2012, at 9:37 AM, Bing Jiang <[EMAIL PROTECTED]> wrote:

> In our apps, deletes will be frequent, and it occurs to each records every> time, if write hlog, the performance and response will be low. In fact,we> can bear with some records with delete fail, but recently I have found more> records delete some time ago, for example, one week , they reappear> again.Then, that makes me curious about what should do next., delete with> writing hlog, or put without hlog....> On Nov 21, 2012 11:19 PM, "Kevin O'dell" <[EMAIL PROTECTED]> wrote:> >> Bing,>> >> I am curious to hear more about Mike's question. Why are you not using>> the WAL for your deletes?>> >> On Wed, Nov 21, 2012 at 10:17 AM, Bing Jiang <[EMAIL PROTECTED]>>> wrote:>> >>> yes,hbase has made a compaction between batch-put and deletes. any ideas?>>> >>> On Nov 21, 2012 11:10 PM, "Michael Segel" <[EMAIL PROTECTED]>>>> wrote:>>>> >>>> Some time later?>>>> >>>> Time of course is relative, so I have to ask what occurred between the>>> write and the delete?>>>> How much time? Did you have any compactions in between the write and>> the>>> delete?>>>> >>>> Why are you not consistent in your use of the WAL ?>>>> >>>> >>>> On Nov 21, 2012, at 6:37 AM, Bing Jiang <[EMAIL PROTECTED]>>>> wrote:>>>> >>>>> hi，all.>>>>> I want to describe a phenomenon that happens to our hbase cluster.>>>>> I use puts(List<Put>) to insert many records with writing hlog>> enable,>>>>> and some time later I delete all of these records with writing hlog>>> disable.>>>>> When one week later, i scan the table, I found some records I have>>> delete>>>>> reappear again.>>>>> It is an interesting case. In my opinion, if we delete data without>>> enable>>>>> writing hlog, when regionserver fails, the log will replay in another>>>>> regionserver.>>>>> Can anyone tell me if I persist on deleting records without enable>>> writing>>>>> hlog, is there a way to prevent these records from reappearing again>>> some>>>>> time later?>>>>> >>>>> Cheers!>>>>> -->>>>> Bing Jiang>>>>> weibo: http://weibo.com/jiangbinglover>>>>> BLOG: http://blog.sina.com.cn/jiangbinglover>>>>> BLOG: http://www.binospace.com>>>>> National Research Center for Intelligent Computing Systems>>>>> Institute of Computing technology>>>>> Graduate University of Chinese Academy of Science>>>> >>> >> >> >> >> -->> Kevin O'Dell>> Customer Operations Engineer, Cloudera>>

I would recommend delete with HLog put, but lets say your writes areminimal you should only have 32 hours(default) of WAL around at a timebefore the they are all flushed from too many HLogs. So you should nothave week old deletes coming back. One thought would be to raise your WALsize but lower the total number WALs kept. This will retain the sameamount of data but should flush fully quicker.

Kevin, Yes I agree with no of WALs kept. Lowering it will allow frequentflushes and that should lower the occurence of this issue.The more the flushes happen the lesser the probability of this issueoccurence.

But still if a RS crashes after doing the WAL less deletes then the putsare bound to reappear. That is why i thought of the CP based approach. Plscorrect me if i was wrong.

There are some hooks provided while log splitting happens.Use the preWALRestore and postWALRestore hooks. But you need to know thatthe LogEdit that comes for the Put is the one which was already deleted.

With this you can achieve what you want.

Disclaimer: I have not tried this yet.

RegardsRam

On Wed, Nov 21, 2012 at 6:07 PM, Bing Jiang <[EMAIL PROTECTED]>wrote:

> hi，all.> I want to describe a phenomenon that happens to our hbase cluster.> I use puts(List<Put>) to insert many records with writing hlog enable,> and some time later I delete all of these records with writing hlog> disable.> When one week later, i scan the table, I found some records I have delete> reappear again.> It is an interesting case. In my opinion, if we delete data without enable> writing hlog, when regionserver fails, the log will replay in another> regionserver.> Can anyone tell me if I persist on deleting records without enable writing> hlog, is there a way to prevent these records from reappearing again some> time later?>> Cheers!> --> Bing Jiang> weibo: http://weibo.com/jiangbinglover> BLOG: http://blog.sina.com.cn/jiangbinglover> BLOG: http://www.binospace.com> National Research Center for Intelligent Computing Systems> Institute of Computing technology> Graduate University of Chinese Academy of Science>

+

ramkrishna vasudevan 2012-11-21, 15:00

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext