Great start! Thank you very much for taking care of this ticket.
I know this is a work in progress but some notes:

I see a lot of references to hive even in files where occurrences have been replaced (see @HIVE_DAEMON@ in bigtop-packages/src/common/hcatalog/hcat_server.sh for instance)

typo: +HIVE_USER="hcatalgo"

I don't see the python parameter being used in the install script

in bigtop-packages/src/deb/hcatalog/hcatalog.postinst, no need to create the warehouse manually. There is now a script to do that. But we should not do these calls at configure time anyway. We cannot assume there is a hadoop cluster running and in some cases, may even create a /user/hive/warehouse on local filesystem. The metastore rename can also be removed since there is no upgrade from any previous version (but I guess this is a left over the original hive files used as a template)

No need for the "+Obsoletes: %
{name}

-webinterface" in the spec file

No need to require hadoop >= 0.20.2 in the spec file since Apache bigtop has never provided a version less than 0.20.2

I see hcatalog providing a hive binary. Is this on purpose or a work in progress? IF so, it should conflicts with hive package

In the install script, I see you are moving hcat binary to /usr/bin/. How does that work regarding JAVA_HOME? Is there any need to write a wrapper script to source hcatalog default file and bigtop-utils' java home autodetection script?

Bruno Mahé
added a comment - 10/Feb/12 22:08 Great start! Thank you very much for taking care of this ticket.
I know this is a work in progress but some notes:
I see a lot of references to hive even in files where occurrences have been replaced (see @HIVE_DAEMON@ in bigtop-packages/src/common/hcatalog/hcat_server.sh for instance)
typo: +HIVE_USER="hcatalgo"
I don't see the python parameter being used in the install script
in bigtop-packages/src/deb/hcatalog/hcatalog.postinst, no need to create the warehouse manually. There is now a script to do that. But we should not do these calls at configure time anyway. We cannot assume there is a hadoop cluster running and in some cases, may even create a /user/hive/warehouse on local filesystem. The metastore rename can also be removed since there is no upgrade from any previous version (but I guess this is a left over the original hive files used as a template)
No need for the "+Obsoletes: %
{name}
-webinterface" in the spec file
No need to require hadoop >= 0.20.2 in the spec file since Apache bigtop has never provided a version less than 0.20.2
I see hcatalog providing a hive binary. Is this on purpose or a work in progress? IF so, it should conflicts with hive package
In the install script, I see you are moving hcat binary to /usr/bin/. How does that work regarding JAVA_HOME? Is there any need to write a wrapper script to source hcatalog default file and bigtop-utils' java home autodetection script?
Again, thank you very much!

Roman Shaposhnik
added a comment - 12/Apr/12 17:59 Giri, great progress. A couple of questions still:
do you plan to work on Debian side?
you reference HCatalog 0.4.0, but I don't see it anywhere at Apache – where are you getting the tarballs from?
why is there (cd build) at the very end of the do-component-build ?
the man page reads like a Hive man page, not an hcatalog one
given that HCatalog is meant to be a better Hive metastore I'm surprised not to see any init.d scripts, what gives?

Giridharan Kesavan
added a comment - 13/Apr/12 02:32 Giri, great progress. A couple of questions still:
do you plan to work on Debian side?
I can start workin on the debian pkg in the next couple of days.
you reference HCatalog 0.4.0, but I don't see it anywhere at Apache – where are you getting the tarballs from?
I used the locally build hcatalog tar based on 0.4 branch. Since Hcatalog-359 and Hcatalog-342 are commited the next release of hcatalog can be packaged using bigtop.
why is there (cd build) at the very end of the do-component-build ?
the man page reads like a Hive man page, not an hcatalog one
Let me upload another patch by fixing the above two comments.
given that HCatalog is meant to be a better Hive metastore I'm surprised not to see any init.d scripts, what gives?
Hcatalog is the client to the hive metatstore. hcatalog requires hive-metastore to be up and running and uses the hive configs to talk to the metatstore.

Konstantin Boudnik
added a comment - 20/Sep/12 19:05 Giri,
I think we need to make sure that the component versions are parametrized and only coming from a central BOM declaration.
ant -Dforrest.home=$FORREST_HOME -Dhcatalog.version=0.4.0 -Dhbase.version=0.92.0 tar "$@"
Otherwise, every new release would be a nightmare'ish thing.

Bruno Mahé
added a comment - 22/Sep/12 04:50 Took a look at the latest patch and some comments:
The man page does not have any mention of Apache or incubating for projects
The default file should not export HIVE_HOME or HIVE_CONF
Let's not add more work to Anatoli. Could you separate $PREFIX from the other variables in the install script? See his recent patches for exemples
In the control file, Maintainer should be Apache Bigtop , not just Bigtop
Shouldn't the copyright file mention that Apache HCatalog (incubating) is in incubation (see the filed upstream-name)?
postinst script has a typo:
postinst script for hcatalot
preinst script has the following line:
# workaround for https://issues.cloudera.org/browse/DISTRO-223
. I am not sure to see the relationship with pig conf dir's symlink.
The spec file defines hadoop_username but I don't see it used anywhere. Please delete that line if it is not used.
In the spec file, are you sure you want to comment the following line 30:
#%global initd_dir %{_sysconfdir}/rc.d/init.d
? I also don't see any init script for Apache HCatalog (incubating) server (it would also need some user to be created)
If the wrapper script for Apache Hcatalog (incubating) sources /etc/default/hadoop, that means that the hadoop package should be a dependency
Other than that, it looks nice!

Shouldn't the copyright file mention that Apache HCatalog (incubating) is in incubation (see the filed upstream-name)?

I looked at couple of other incubating projects to see the value for upstream-name property, but it didnt help much. Could you please point to a specific proj that I can looks for examples?

hcatalog doesnt have any services and hence no rc scripts. Hcatalog uses the hive-metastore.
Hcat uses hive-metastore; In the v4 version of the patch hive was added as a dependency and not hadoop, reason being hive has a dependency on hadoop. With the latest patch I ve added hadoop and hive-metastore(to be precise) as a dependency to hcatalog.

I ve removed post, pre and the prerm scripts as hcatalog doesn't requires any configuration.

Giridharan Kesavan
added a comment - 26/Sep/12 00:54 Thanks Bruno, for the comments.
Shouldn't the copyright file mention that Apache HCatalog (incubating) is in incubation (see the filed upstream-name)?
I looked at couple of other incubating projects to see the value for upstream-name property, but it didnt help much. Could you please point to a specific proj that I can looks for examples?
hcatalog doesnt have any services and hence no rc scripts. Hcatalog uses the hive-metastore.
Hcat uses hive-metastore; In the v4 version of the patch hive was added as a dependency and not hadoop, reason being hive has a dependency on hadoop. With the latest patch I ve added hadoop and hive-metastore(to be precise) as a dependency to hcatalog.
I ve removed post, pre and the prerm scripts as hcatalog doesn't requires any configuration.
uploading v5 patch addressing comment

Konstantin Boudnik
added a comment - 26/Sep/12 05:55 Shouldn't the copyright file mention that Apache HCatalog (incubating) is in incubation (see the filed upstream-name)?
I think copyright is similar to the license boiler plate, so there's no difference IMO between incubating and a TPL projects.

Giridharan Kesavan
added a comment - 07/Nov/12 01:54 I updated the patch to include all the comments. Can someone help commit this? or If you see something is missing I can take a look and amend that patch.

Roman, I did'nt post any new patch. Last comment was on sep/21 and I uploaded a patch on sep/25 addressing the comments.
If you or someone has any comment/suggestion for patch posted on sep/25 (which being the latest) I can incorporate that.

Giridharan Kesavan
added a comment - 07/Nov/12 22:39 Roman, I did'nt post any new patch. Last comment was on sep/21 and I uploaded a patch on sep/25 addressing the comments.
If you or someone has any comment/suggestion for patch posted on sep/25 (which being the latest) I can incorporate that.

Roman Shaposhnik
added a comment - 08/Nov/12 18:21 - edited Giri, sorry I got confused. Here's what needs to happen to the latest patch you've attached:
you need to make sure that the versions of Hadoop, Hive and Pig are taken from the Bigtop BOM (it seems that you're already doing it for HBase)
please make sure to transition DEB side of the packaging to the new DEB packaging format that we've adopted in truk ( BIGTOP-713 )
please make sure to provide at least a skeleton of package testing manifest for HCatalog (for an example of how it is done see BIGTOP-744 )
please make sure to provide a modest amount of integration tests for HCatalog (for an example of how it is done see BIGTOP-736 )
Other than that – it looks pretty good! I'm looking forward to seeing HCatalog in one of the Bigtop releases.
P.S. For the #3 and #4 feel free to file subtasks for this JIRA

It looks like hcatalog 0.4 doesn't work well with hadoop2 and requires couple of patches. At the same time my latest patch has changes for debhelper 7.0.5 and these changes are not available on the 0.3 branch yet. So Im not sure.

Giridharan Kesavan
added a comment - 02/Mar/13 03:52 It looks like hcatalog 0.4 doesn't work well with hadoop2 and requires couple of patches. At the same time my latest patch has changes for debhelper 7.0.5 and these changes are not available on the 0.3 branch yet. So Im not sure.
Should I upload another patch just for 0.3 branch?

Konstantin Boudnik
added a comment - 19/Mar/13 00:28 This is the work to bring HCatalog into Hadoop 2.X land
Am I getting it correctly that the your latest patch essentially renders all previous patches outdated?

Konstantin Boudnik, not really I guess. If the existing stuff works for the branch-0.3 perhaps there may be reasons to use it. On the other hand, perhaps we can just backport the patch that will end up in Bigtop 0.6.0 to avoid unnecessary fork between Bigtop 0.3.1 and Bigtop 0.6.0.

I'll leave it up to the RM's decision. Please let me know if you need any of my help backporting it (it should be a really pretty clean backport).

Roman Shaposhnik
added a comment - 19/Mar/13 00:46 Konstantin Boudnik , not really I guess. If the existing stuff works for the branch-0.3 perhaps there may be reasons to use it. On the other hand, perhaps we can just backport the patch that will end up in Bigtop 0.6.0 to avoid unnecessary fork between Bigtop 0.3.1 and Bigtop 0.6.0.
I'll leave it up to the RM's decision. Please let me know if you need any of my help backporting it (it should be a really pretty clean backport).

+1 (non-committer)! Two questions, though. This appears to assume the presence of init.d.tmpl - is this intended to go atop BIGTOP-805, or will those changes be back-ported into this patch first?

Also (and this is more for my own curiosity than a review of your patch) - does the default generate_stop() function work well for stopping HCatalog? I hadn't looked very closely at the defaults for those functions because I had assumed they were rather specific to the Hadoop daemons (hence having overridden them for other daemons). But after a longer look through docs on LSB's start_daemon (which seem a little scant and inconsistent), it seems that either "stop" is a subcommand of start_daemon for which I can find no documentation, or start_daemon is entirely the wrong choice here, and I'm surprised it's working for Hadoop. Let's chat offline if my confusion is irrelevant to the patch, but I just wanted to make sure `service hcatalog-server stop` could be expected to work.

Sean Mackrory
added a comment - 19/Mar/13 02:14 - edited +1 (non-committer)! Two questions, though. This appears to assume the presence of init.d.tmpl - is this intended to go atop BIGTOP-805 , or will those changes be back-ported into this patch first?
Also (and this is more for my own curiosity than a review of your patch) - does the default generate_stop() function work well for stopping HCatalog? I hadn't looked very closely at the defaults for those functions because I had assumed they were rather specific to the Hadoop daemons (hence having overridden them for other daemons). But after a longer look through docs on LSB's start_daemon (which seem a little scant and inconsistent), it seems that either "stop" is a subcommand of start_daemon for which I can find no documentation, or start_daemon is entirely the wrong choice here, and I'm surprised it's working for Hadoop. Let's chat offline if my confusion is irrelevant to the patch, but I just wanted to make sure `service hcatalog-server stop` could be expected to work.

Bruno Mahé
added a comment - 19/Mar/13 08:18 - edited
bigtop-packages/src/common/hcatalog/hcatalog-server.default should not define HADOOP_PREFIX or HIVE_HOME. I see that HADOOP_HOME is a hack, but does it have to be in Apache HCatalog (incubating)?
bigtop-packages/src/common/hcatalog/hcatalog-server.svc. Can we name the pid in a consistent way? something like s/hcat.pid/hcatalog-server.pid/
The man page does not have any reference to incubating/incubator. The man page also has some double spaces. Shouldn't also the copyright be updated?
In the wrapper script I still see the old way to use bigtop-detect-java home. I thought this was updated recently?
bigtop-packages/src/common/hcatalog/webhcat-server.default should not define HIVE_HOME or HADOOP_HOME
bigtop-packages/src/common/hcatalog/webhcat-server.svc. Can we name the pid in a consistent way? something like s/webhcat.pid/webhcat-server.pid/ or hcatalag-webhcat.pid
bigtop-packages/src/deb/hcatalog/control still has some references to Apache Bigtop (incubating)
bigtop-packages/src/deb/hcatalog/copyright, I believe that upstream name should contain (incubating)
bigtop-packages/src/rpm/hcatalog/SPECS/hcatalog.spec. (incubating) is not mentioned in appropriate places.
bigtop-packages/src/rpm/hcatalog/SPECS/hcatalog.spec. Should we give the same treatment than Apache Hadoop regarding the use of Requires(pre): hcatalog for the daemons packages?
bigtop.mk. release notes variable should have (incubating)

Roman Shaposhnik
added a comment - 19/Mar/13 17:57 bigtop-packages/src/common/hcatalog/hcatalog-server.default should not define HADOOP_PREFIX or HIVE_HOME. I see that HADOOP_HOME is a hack, but does it have to be in Apache HCatalog (incubating)?
bigtop-packages/src/common/hcatalog/webhcat-server.default should not define HIVE_HOME or HADOOP_HOME
bigtop-packages/src/common/hcatalog/hcatalog-server.svc. Can we name the pid in a consistent way? something like s/hcat.pid/hcatalog-server.pid/
bigtop-packages/src/common/hcatalog/webhcat-server.svc. Can we name the pid in a consistent way? something like s/webhcat.pid/webhcat-server.pid/ or hcatalag-webhcat.pid
These 2 are an upstream HCatalog issues. I don't think we can workaround effectively in Bigtop. I filed HCATALOG-636 and also updated the patch to reference it.
The man page does not have any reference to incubating/incubator. The man page also has some double spaces. Shouldn't also the copyright be updated?
Thanks. I fixed those. Not sure about double spaces – it renders great by man. As for the date – it looks like a policy with Linux man pages to list the starting one.
In the wrapper script I still see the old way to use bigtop-detect-java home. I thought this was updated recently?
Great point. Fixed.
bigtop-packages/src/deb/hcatalog/control still has some references to Apache Bigtop (incubating)
bigtop-packages/src/deb/hcatalog/copyright, I believe that upstream name should contain (incubating)
bigtop-packages/src/rpm/hcatalog/SPECS/hcatalog.spec. (incubating) is not mentioned in appropriate places.
bigtop.mk. release notes variable should have (incubating)
Great point. Fixed.
bigtop-packages/src/rpm/hcatalog/SPECS/hcatalog.spec. Should we give the same treatment than Apache Hadoop regarding the use of Requires(pre): hcatalog for the daemons packages?
I don't see what would be the danger. Do you know of any downside in always doing that?

This appears to assume the presence of init.d.tmpl - is this intended to go atop BIGTOP-805, or will those changes be back-ported into this patch first?

Great point. It does indeed go on top of BIGTOP-805. Cos, if you want this to be in Bigtop 0.3.1 you'd also have to backport the bits of BIGTOP-805 that have to do with init.d templating.

Also (and this is more for my own curiosity than a review of your patch) - does the default generate_stop() function work well for stopping HCatalog? I hadn't looked very closely at the defaults for those functions because I had assumed they were rather specific to the Hadoop daemons (hence having overridden them for other daemons). But after a longer look through docs on LSB's start_daemon (which seem a little scant and inconsistent), it seems that either "stop" is a subcommand of start_daemon for which I can find no documentation, or start_daemon is entirely the wrong choice here, and I'm surprised it's working for Hadoop. Let's chat offline if my confusion is irrelevant to the patch, but I just wanted to make sure `service hcatalog-server stop` could be expected to work.

Hm. I see where you're coming from. This, however, appears to be more of a generic question rather than related to this patch. Would you mind opening up a new JIRA to address this concern about our init.d template?

Roman Shaposhnik
added a comment - 19/Mar/13 18:00 This appears to assume the presence of init.d.tmpl - is this intended to go atop BIGTOP-805 , or will those changes be back-ported into this patch first?
Great point. It does indeed go on top of BIGTOP-805 . Cos, if you want this to be in Bigtop 0.3.1 you'd also have to backport the bits of BIGTOP-805 that have to do with init.d templating.
Also (and this is more for my own curiosity than a review of your patch) - does the default generate_stop() function work well for stopping HCatalog? I hadn't looked very closely at the defaults for those functions because I had assumed they were rather specific to the Hadoop daemons (hence having overridden them for other daemons). But after a longer look through docs on LSB's start_daemon (which seem a little scant and inconsistent), it seems that either "stop" is a subcommand of start_daemon for which I can find no documentation, or start_daemon is entirely the wrong choice here, and I'm surprised it's working for Hadoop. Let's chat offline if my confusion is irrelevant to the patch, but I just wanted to make sure `service hcatalog-server stop` could be expected to work.
Hm. I see where you're coming from. This, however, appears to be more of a generic question rather than related to this patch. Would you mind opening up a new JIRA to address this concern about our init.d template?

Konstantin Boudnik
added a comment - 19/Mar/13 18:51 Cos, if you want this to be in Bigtop 0.3.1 you'd also have to backport the bits of BIGTOP-805
BIGTOP-805 isn't committed yet, so I guess I might be going with the Giri's patch if the timing is demanding.

Peter Linnell
added a comment - 08/Apr/13 03:43 +1
Giri, 0001- BIGTOP-12 -Add-HCatalog-to-Bigtop-branch-0.3.1.patch looks good to me. I'd like Cos or Roman to have a second set of eyeballs on it before committing.
Thanks for following the long and winding road on this one