winutil.exe is not included in hadoop bin tarball

Details

Description

I don't have Windows environment, but one user who tried 2.2.0 release
on Windows reported that released tar ball doesn't contain
"winutil.exe" and cannot run any commands. I confirmed that winutil.exe is not included in 2.2.0 bin tarball surely.

Steve Loughran
added a comment - 10/Oct/16 09:21 The workaround is use the https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1 binaries; for some reason my windows VM has stopped compiling the native bits Hadoop, so I haven't yet done one for 2.7.4 (or 3.0.0-alpha, though that's failing in the java builds, so is more fundamental)

Historically, going back several years, Hadoop did use Cygwin to achieve partial compatibility on Windows. Ultimately, that approach fell short though. Cygwin makes a set of its own implementation choices on how to map Windows semantics to Unix semantics, and those implementation choices didn't always match with Hadoop's requirements. A few notable areas I remember were mapping NTFS ACLs to Unix-style file permissions, file locking, and child process management. That led to the decision to implement our own native code layer where we could control the semantics.

Consolidating the native code layer down to just hadoop.dll, without winutils.exe, would simplify this.

Chris Nauroth
added a comment - 22/Jun/16 16:24 Historically, going back several years, Hadoop did use Cygwin to achieve partial compatibility on Windows. Ultimately, that approach fell short though. Cygwin makes a set of its own implementation choices on how to map Windows semantics to Unix semantics, and those implementation choices didn't always match with Hadoop's requirements. A few notable areas I remember were mapping NTFS ACLs to Unix-style file permissions, file locking, and child process management. That led to the decision to implement our own native code layer where we could control the semantics.
Consolidating the native code layer down to just hadoop.dll, without winutils.exe, would simplify this.

An alternative is to support cygwin, think a lot of win dev have it so detecting CYGWIN or CYGWIN_VERSION as environment variable could allow to use bash style commands instead of win ones for all cases where winutils is expected to be used but missing.

Romain Manni-Bucau
added a comment - 22/Jun/16 09:40 An alternative is to support cygwin, think a lot of win dev have it so detecting CYGWIN or CYGWIN_VERSION as environment variable could allow to use bash style commands instead of win ones for all cases where winutils is expected to be used but missing.
wdyt?

no real problem, except the current builld/release process is done on linux systems; we'd have to coordinate it more, and there isn't currently a policy in place that the releases must come with windows runtimes. Feel free to join in on the hadoop common dev list and push for it...even though not many people run Hadoop clusters on windows, it is more relevant for standalone things downstream where people want to use their tools on windows

Steve Loughran
added a comment - 20/Jun/16 09:08 no real problem, except the current builld/release process is done on linux systems; we'd have to coordinate it more, and there isn't currently a policy in place that the releases must come with windows runtimes. Feel free to join in on the hadoop common dev list and push for it...even though not many people run Hadoop clusters on windows, it is more relevant for standalone things downstream where people want to use their tools on windows

+1 the github issue being more github than the signing there. What is blocking to put the binaries in the repo to allow a mvn packaging? We do it for tomee and while it is limited to few files it is acceptable and easy enough I think.

Romain Manni-Bucau
added a comment - 19/Jun/16 20:56 +1 the github issue being more github than the signing there. What is blocking to put the binaries in the repo to allow a mvn packaging? We do it for tomee and while it is limited to few files it is acceptable and easy enough I think.

I did sign the JARs, with the same gpg that's listed as my hadoop committer credentials; it was built off the ASF commit ID, and on a dedicated VM that I use for build and test of Hadoop stuff. You can trust it as much as you can any other binary you come from me, and I'm sure your build already passes through code I've done. The main issues with github is durability; how long can you trust it to be there.

What we are discussing is getting rid of winutils entirely, move to having a JAR containing the native libs inside, libs which are then unzipped depending on the platform...the way snappy does. That way: a JAR in the package or up on maven. Volunteers to help implement/test that welcome.

Steve Loughran
added a comment - 19/Jun/16 20:33 I did sign the JARs, with the same gpg that's listed as my hadoop committer credentials; it was built off the ASF commit ID, and on a dedicated VM that I use for build and test of Hadoop stuff. You can trust it as much as you can any other binary you come from me, and I'm sure your build already passes through code I've done. The main issues with github is durability; how long can you trust it to be there.
What we are discussing is getting rid of winutils entirely, move to having a JAR containing the native libs inside, libs which are then unzipped depending on the platform...the way snappy does. That way: a JAR in the package or up on maven. Volunteers to help implement/test that welcome.

Romain Manni-Bucau
added a comment - 19/Jun/16 18:31 Steve Loughran that's what I'm doing (patching beam to build on windows ATM) but would be saner and better to rely on an ASF or worse case central (like maven one) binary and not a github one

Steve Loughran
added a comment - 04/Nov/14 16:06 This problem is going to continue unless/until the hadoop releases include the native windows libs
Perhaps
we can build up the windows binaries for every release and stick them up alongside the hadoop.tar -off the -src release
release them shortly after the hadoop release
in the meantime, we can create the hadoop libs for each release and stick them up somewhere (inside/outside apache), for people that want them.

Tsuyoshi Ozawa
added a comment - 16/Dec/13 08:41 This is current workaround to use.
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
IIUC, all users need to build winutil.exe and hadoop.dll. Is this assumed?