Currently it's CPU-intensive for several reasons:1) It doesn't yet use the native CRC code2) It makes several unnecessary copies and byte buffer allocations, both inthe client and in the DataNode

There are open JIRAs for these, and I have a preliminary patch which helpeda lot, but it hasn't been high priority. On most clusters, writing becomesnetwork bound before being CPU-bound. On the other hand, as 10gbe isbecoming fairly common, this will probably be more important soon. Hopingto find time to get back to finishing the patches in the next few months.

> Currently it's CPU-intensive for several reasons:> 1) It doesn't yet use the native CRC code> 2) It makes several unnecessary copies and byte buffer allocations, both in> the client and in the DataNode>> There are open JIRAs for these, and I have a preliminary patch which helped> a lot, but it hasn't been high priority.can you attach crc path there? https://issues.apache.org/jira/browse/HDFS-3528i will finish it.

> Hoping to find time to get back to finishing the patches in the next few months.Todd, just attach these pathes to jira, they do not even needs to apply cleanly to trunk. I will get them finished within day. I do not have months which i can spare on waiting for work be done by you. If you do not want to share these patches, its still fine with me we can do this work alone as well. I need just word from you.

It is definitely buggy, it might not actually be faster, and itprobably isn't well commented. But feel free to have a go at it.

-Todd

On Thu, Nov 29, 2012 at 7:17 AM, Radim Kolar <[EMAIL PROTECTED]> wrote:>>> Hoping to find time to get back to finishing the patches in the next few>> months.> Todd,> just attach these pathes to jira, they do not even needs to apply cleanly> to trunk. I will get them finished within day. I do not have months which i> can spare on waiting for work be done by you. If you do not want to share> these patches, its still fine with me we can do this work alone as well. I> need just word from you.

> It is definitely buggy, it might not actually be faster, and it> probably isn't well commented. But feel free to have a go at it.thank you for your code, i got it merged with trunk. HDFS is crap code, private methods not documented at all, and unit tests are joke. I did some random code changes and some were not detected by unit tests. What methods are you using for testing?

On Tue, Dec 4, 2012 at 9:07 AM, Radim Kolar <[EMAIL PROTECTED]> wrote:>>> It is definitely buggy, it might not actually be faster, and it>> probably isn't well commented. But feel free to have a go at it.>> thank you for your code, i got it merged with trunk. HDFS is crap code,> private methods not documented at all, and unit tests are joke. I did some> random code changes and some were not detected by unit tests. What methods> are you using for testing?

If you're just going to insult us, please stay away. We don't needyour help unless you're going to be constructive.

> Agree. Want to write some?Its not about writing patches, its about to get them committed. I have experience that getting something committed takes months even on simple patch. I have about 10 patches floating around none of them was committed in last 4 weeks. They are really simple stuff. I haven't tried to go with some more elaborated patch because Bible says: if you fail easy thing, you will fail hard thing too.

I am thinking day by day that i really need to fork hadoop otherwise there is no way to move it forward where i need it to be.

>> Agree. Want to write some?>>> Its not about writing patches, its about to get them committed. I have> experience that getting something committed takes months even on simple> patch. I have about 10 patches floating around none of them was committed> in last 4 weeks. They are really simple stuff. I haven't tried to go with> some more elaborated patch because Bible says: if you fail easy thing, you> will fail hard thing too.>>There is inertia; nobody is happy with it -but that's the price of havingsomething that's designed to keep PB of data safe.

> I am thinking day by day that i really need to fork hadoop otherwise there> is no way to move it forward where i need it to be.>

A lot of the early hadoop projects chose this path. Once you get out ofsync with the apache code you have two problems -keeping your branch up to date with all fixes and features you want. -testing

On Tue, Dec 4, 2012 at 6:00 PM, Radim Kolar <[EMAIL PROTECTED]> wrote:> Its not about writing patches, its about to get them committed. I have> experience that getting something committed takes months even on simple> patch. I have about 10 patches floating around none of them was committed in> last 4 weeks.

Could you share a list of Jiras you're concerned about? I've seen afew patches you provided that got committed just fine, and I've seen afew patches that I thought didn't have a strong justification thatdidn't get committed, and I think I've seen a few Jiras that I thoughtwere a good idea that haven't been committed yet due to outstandingreview feedback or lack of a committer who can volunteer to do thework.

I'm not saying that the Hadoop process is perfect, far from it, butfrom where I sit (like you I'm a contributor but not yet a committer)it seems to be working OK so far for both you and I. Some things couldbe better, but the current fairly-conservative process has the benefitof keeping trunk in a really sane, safe state.

> They are really simple stuff. I haven't tried to go with some> more elaborated patch because Bible says: if you fail easy thing, you will> fail hard thing too.>> I am thinking day by day that i really need to fork hadoop otherwise there> is no way to move it forward where i need it to be.

Forking is tempting, but working with the community is reallypowerful. You've got plenty of successful jiras under your belt, let'sjust keep on truckin' and build a better Hadoop.

Todd asked a pretty reasonable question that I don't see an answer to-- where will murmur3 actually be used? We generally don't add code,even if it's good code that we're sure to need someday, until there'san actual user for it.

There needs to be a complete, up-to-date patch uploaded. This oneseems to have two patches that need to be applied to get a workingcommit -- HADOOP-9041.patch and fsinit-unit.txt. Also the latter has amisspelled classname, Initialization is spelled with a "t" rather thana "c".

It would be really good to develop a JUnit test that fails reliablyboth under mvn and Eclipse that shows the problem to avoid regressionsin the future... even if the unit test has to do moderately uncleanthings to force the failure. (But that's not a hard requirement, ifit's really impossible to do the current situation is OK.)

I don't understand this patch at all. Since it makes the constructorvacuous, why not just delete the constructor entirely? If avoiding thepossible "could be null" makes other code simpler, go ahead andinclude the simplification in this patch. (see below for more onincluding stuff in a single jira.)

Generally if Jenkins posts a -1 on a patch, you should follow up witha comment explaining why it's OK for this patch to fail the giventest. For example I had a change recently that fixed an intermittenttest failure, so I didn't need to add a test. Jenkins said "-1 notests included" and I commented "fixes TestFoo intermittent failures".

One of the ways the community has compensated for the heavyweight JIRAprocess is to allow a single JIRA to include more change than I wouldnormally put into a git commit. I do my development locally in aper-jira branch "hdfs1337" with normal small git-style commits, andthen when I'm ready to post a patch I "git diffupstream/trunk..hdfs1337 > hdfs1337.txt" to squash all the sane gitcommits into a single large diff to upload.

> I'm not saying that the Hadoop process is perfect, far from it, but> from where I sit (like you I'm a contributor but not yet a committer)> it seems to be working OK so far for both you and I.It does not work for me OK. Its way too slow. i got just 2k LOC in committed and still floating around patches. That is real and sad result of 1/2 year of cooperation. I know that contributor patches are low priority in every project, but this is too low priority for me.

> Some things could be better, but the current fairly-conservative process has the benefit> of keeping trunk in a really sane, safe state.if you want to keep code in safe state you need:1. good unit test2. high unit test coverage3. clean code4. documented code5. good javadoc

> You've got plenty of successful jiras under your belt, let's just keep on truckin' and build a better Hadoop.only successful work was rework of todd patch because it made hbase about 30% faster.

>> if you want to keep code in safe state you need:> 1. good unit test> 2. high unit test coverage> 3. clean code> 4. documented code> 5. good javadoc+ good functional tests, which explores the deployment state of the world,especially different networks. Once you get into HA you also need theability to trigger server failures and network partitions as part of a testrun.

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext