On Friday 26 November 2004 11:22, Per Liden wrote:
> On Thu, 25 Nov 2004, Per Liden wrote:
>> [...]
>> > I'm having problems with DRBD getting stuck at around 99-100% during an
> > initial/full sync. This seems to be happening about 8 out of 10 times.
>> After some further testing it seems that I managed to resolve the issue.
>> Changes I made to my configuration:
> - Removed LVM (DRBD now runs directly on top of my hda10 device).
> - Changed meta-data to "internal" (instead of hda9 [0]).
> - Filesystem used on top of /dev/drbd0 is now reiserfs instead of ext3.
> (I thought I should mention is, even if the choice of filesystem
> shouldn't have anything to do with my sync problem).
>> So far I've done three full syncs without getting stuck. Unfortunately I
> did all the above changes in one go, so I can't really say if it was LVM
> or the separate meta-data partition that casued the problem. My guess is
> LVM though.
>> Whether I can live without LVM is something I'll have to look into...
>> [...]
>> > Interesting to note is that the nodes seem to have different ideas about
> > how much data needs to be synchronized, i.e.:
> > Nov 25 11:07:03 Proc1 kernel: drbd0: Resync started as SyncSource (need
> > to sync 60812372 KB [15203093 bits set]). vs.
> > Nov 25 11:07:04 Proc2 kernel: drbd0: Resync started as SyncTarget (need
> > to sync 60558500 KB [15139625 bits set]).
>> After my reconfiguration I haven't seen any thing like this again. Every
> time a sync is initiated both nodes have a common understanding of the
> number of bytes that need to be synchronized.
I think that I have resoved the issue of Eugene Crosser by now, it
should be solved with the patch applied to this e-mail. I am waiting
for Eugene to confirm that the issue is solved now.
I do not think it has anything to do with LVM or not LVM. It has
to do with wether you have application IO during the _start_ of
the resync process or not.
I will write a longish exlanation of the bug and the fix to the list
as soon as I have either the confirmation of Eugene or I found
the time to reproduce it here in the office
( Today is some outbrak of some stupid windows worm, and we have
to take care of the system's of our paying customers first... )
If you cound confirm this behavior (bug triggered by app IO during
start of resync) and that p4 fixes it, this would help a lot...
-philipp
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p4
Type: text/x-diff
Size: 2008 bytes
Desc: not available
Url : http://lists.linbit.com/pipermail/drbd-user/attachments/20041126/74e70bef/attachment.diff