Filipe Manana (Inactive)
added a comment - 19/Feb/13 11:14 AM This is an installer issue.
The file2 module in 2.0.0 doesn't have the function ensure_dir/1, it was added in 2.0.1.
This means after upgrade, a file2.beam from 2.0.0 is still being used.

Farshid Ghods (Inactive)
added a comment - 19/Feb/13 11:22 AM Andrei,
can you compare such file between fresh 2.0.1 and 2.0 installation to rule out that this could be a buildbot issues
also wonder if this is reproducible with 64-bit

I don't understand your analysis here.
Erlang VM loads files ending in .beam, so I don't get why the instaler/upgrader or whatever are producing files ending with .beam_xxx.

From the last big ls output shown, I don't see how you conclude it's the right or wrong file2.beam module.
I would grab file2.beam from 2.0.0, calculate it's md5 (with md5sum) for example, do the upgrade, then check after the upgrade that there's only one subdirectory with a file2.beam and that that file2.beam as a different md5 checksum. In case of multiple file2.beam (in different subdirectories), the Erlang VM can pick the wrong file (from 2.0.0).

The Erlang stack trace pasted before clearly (for anyone familiar with Erlang) shows that the loaded file2.beam module doesn't have the function ensure_dir/1 - that means the VM loaded a module version from 2.0.0:

Filipe Manana (Inactive)
added a comment - 20/Feb/13 8:34 AM I don't understand your analysis here.
Erlang VM loads files ending in .beam, so I don't get why the instaler/upgrader or whatever are producing files ending with .beam_xxx.
From the last big ls output shown, I don't see how you conclude it's the right or wrong file2.beam module.
I would grab file2.beam from 2.0.0, calculate it's md5 (with md5sum) for example, do the upgrade, then check after the upgrade that there's only one subdirectory with a file2.beam and that that file2.beam as a different md5 checksum. In case of multiple file2.beam (in different subdirectories), the Erlang VM can pick the wrong file (from 2.0.0).
The Erlang stack trace pasted before clearly (for anyone familiar with Erlang) shows that the loaded file2.beam module doesn't have the function ensure_dir/1 - that means the VM loaded a module version from 2.0.0:
error: undef
stacktrace: [{file2,ensure_dir,
["/opt/couchbase/var/lib/couchbase/data/@indexes/default/tmp_d3bab589b07b8f398f39483b2e415d07_main/e9e434b58f3fac3310504249a00254f9.sort"]},
{couch_set_view_util,do_new_sort_file_path,2},
{couch_set_view_updater,'-maybe_flush_merge_buffers/2-fun-1-',7},
{lists,foldl,3},
{couch_set_view_updater,flush_writes,1},
{couch_set_view_updater,'-update/8-fun-1-',15}]

That doesn't match the erlang stack trace, which clearly says the module doesn't have the endure_dir/1 function.
Maybe the Erlang VM was not restarted, which would prevent reloading the module's code in the VM.

Check with the people responsible for the installer / upgrade. There's nothing I can do here, unless the same thing would happen on a clean 2.0.1 install, in which case it would be a problem on my side. Nothing changed on index file formats, file names, etc, everything's fully compatible.

Filipe Manana (Inactive)
added a comment - 20/Feb/13 9:05 AM - edited That doesn't match the erlang stack trace, which clearly says the module doesn't have the endure_dir/1 function.
Maybe the Erlang VM was not restarted, which would prevent reloading the module's code in the VM.
Check with the people responsible for the installer / upgrade. There's nothing I can do here, unless the same thing would happen on a clean 2.0.1 install, in which case it would be a problem on my side. Nothing changed on index file formats, file names, etc, everything's fully compatible.

Steve Yen
added a comment - 21/Feb/13 4:13 PM > Maybe the Erlang VM was not restarted, which would prevent reloading the module's code in the VM
There might be something to that.
Looking at the logs from: http://qa.hq.northscale.net/view/2.0.1/job/centos-32-2.0-upgrade/33/consoleFull
I see many "shutdown failed" reports, like...
[2013-02-19 05:35:06,094] - [remote_util:1203] INFO - Stopping couchbase-serverNOTE: shutdown failed
[2013-02-19 05:35:06,094] - [remote_util:1203] INFO - {badrpc,nodedown}
Oddly, that trace also has mention of 1.8.1, too.
I need to get myself a centos 32-bit VM. In the meanwhile, perhaps we have the wrong consoleFull logs URL (or I'm not reading it right)?

Strangely, even single-node upgrade failed for me with build 161 on that VM. (But, we're up to build 164 already) I had a single default bucket, zero items, with 2 design docs each with 2 views on it...

Bin Cui
added a comment - 21/Feb/13 6:46 PM on the same 10.1.4.22 setup, i tried the following three scenarios and i haven't see the problem:
1. with data but no design doc
2. with data and with design doc
3. no data but with design doc

Farshid Ghods (Inactive)
added a comment - 21/Feb/13 6:55 PM Bin,
after upgrade did you get results from the index ?
the bug description says the test expects N rows in the view but it receives 0 rows due to an error

Farshid Ghods (Inactive)
added a comment - 21/Feb/13 6:58 PM also this occured when running a test agsinst 2 node cluster.
the test is checked in . I or someone from QE can help kick off the test if needed.

For single node, here is my test steps:
1. install 2.0.0 1976 version, and populate gamesim samples with design docs and views. Make sure query has right data returned.
2. rpm -U 2.0.1 161 version. query return same result from either UI or using curl command as Andrei mentioned.

Bin Cui
added a comment - 21/Feb/13 7:19 PM Hard to point the problem to cluster, but who knows.
For single node, here is my test steps:
1. install 2.0.0 1976 version, and populate gamesim samples with design docs and views. Make sure query has right data returned.
2. rpm -U 2.0.1 161 version. query return same result from either UI or using curl command as Andrei mentioned.
Maybe needs to try out two node cluster ...

please note that vms with 2GB of RAM( long ago sent a request to update them)

you can try with ./testrunner -i centos-32-2.0-upgrade.ini -t newupgradetests.MultiNodesUpgradeTests.offline_cluster_upgrade,initial_version=2.0.0-1976-rel,nodes_init=1,ddocs-num=3,upgrade_version=2.0.1-164-rel

Bin Cui
added a comment - 22/Feb/13 12:19 PM On 10.1.4.24, repeat the same steps that i did on 10.1.4.22. I don't see any problems. Manual steps without any automation.
Need to pay attention to the test script itself though.

Verified that file2.beam is updated accordingly during upgrade. However, maybe erlang process hold file2.beam during upgrade if indexing process is underway and it may lead to upgrade failure. We need to have better understanding about it.

Bin Cui
added a comment - 22/Feb/13 4:12 PM Verified that file2.beam is updated accordingly during upgrade. However, maybe erlang process hold file2.beam during upgrade if indexing process is underway and it may lead to upgrade failure. We need to have better understanding about it.

the test doesn't start indexing before upgrade. bucket contains only 1K items. After upgrade test runs queries without any stale param
for centos 32 I get the issue and in logs there are no mentioned that index was completed
cat /opt/couchbase/var/lib/couchbase/logs/couchdb.1| grep "updater finished"
[root@localhost couchbase]#

also, I tried to create new ddoc/view after upgrade. And still can't get result. I see that index was triggered on UI but it is not even started. see screenshotshttp://10.1.4.24:8092/_set_view/default/_design/upgrade-test-view0/_get_utilization_stats
{"total_indexing_time":0,"useful_indexing_time":0,"wasted_indexing_time":0,"updates":0,"updater_interruptions":0,"compaction_time":0,"compactions":0,"compactor_interruptions":0,"replica_utilization_stats":{"total_indexing_time":0,"useful_indexing_time":0,"wasted_indexing_time":0,"updates":0,"updater_interruptions":0,"compaction_time":0,"compactions":0,"compactor_interruptions":0}}

Andrei Baranouski
added a comment - 25/Feb/13 6:36 AM the test doesn't start indexing before upgrade. bucket contains only 1K items. After upgrade test runs queries without any stale param
for centos 32 I get the issue and in logs there are no mentioned that index was completed
cat /opt/couchbase/var/lib/couchbase/logs/couchdb.1| grep "updater finished"
[ root@localhost couchbase]#
also, I tried to create new ddoc/view after upgrade. And still can't get result. I see that index was triggered on UI but it is not even started. see screenshots
http://10.1.4.24:8092/_set_view/default/_design/upgrade-test-view0/_get_utilization_stats
{"total_indexing_time":0,"useful_indexing_time":0,"wasted_indexing_time":0,"updates":0,"updater_interruptions":0,"compaction_time":0,"compactions":0,"compactor_interruptions":0,"replica_utilization_stats":{"total_indexing_time":0,"useful_indexing_time":0,"wasted_indexing_time":0,"updates":0,"updater_interruptions":0,"compaction_time":0,"compactions":0,"compactor_interruptions":0}}
for other OSs I get that index was run and completed after upgrade

Farshid Ghods (Inactive)
added a comment - 25/Feb/13 11:09 AM Spoke with Andrei,
this issue is NOT reproducible with centos 64-bit ,
also the test as he says does not start indexing at all until upgrade is completed.
i recommend adding this to release note and adding the workaround in case someone hits this issue

Start with the installer group for gathering more succinct workaround information, Bin can you provide a clear description of what would be an ideal workaround for this issue? If it has nothing to do with the installer (applies to other team, etc) please assign it back to Jin. Thanks!

Jin Lim
added a comment - 25/Feb/13 1:17 PM Start with the installer group for gathering more succinct workaround information, Bin can you provide a clear description of what would be an ideal workaround for this issue? If it has nothing to do with the installer (applies to other team, etc) please assign it back to Jin. Thanks!

When you upgrade from Couchbase Server 2.0.0 to 2.0.1 on Linux the install may not replace the
<filename>file2.beam</filename> with the latest version. This will cause indexing and querying to fail.
The workaround is install 2.0.1 and then manually restart Couchbase Server with the following commands:
</para>
</programlisting>
sudo /etc/init.d/couchbase-server stop
sudo /etc/init.d/couchbase-server start
</programlisting>

kzeller
added a comment - 04/Mar/13 4:23 PM Added to 2.0.1 RN as:
When you upgrade from Couchbase Server 2.0.0 to 2.0.1 on Linux the install may not replace the
<filename>file2.beam</filename> with the latest version. This will cause indexing and querying to fail.
The workaround is install 2.0.1 and then manually restart Couchbase Server with the following commands:
</para>
</programlisting>
sudo /etc/init.d/couchbase-server stop
sudo /etc/init.d/couchbase-server start
</programlisting>

When you upgrade from Couchbase Server 2.0.0 to 2.0.1 on Linux the install may not replace the
<filename>file2.beam</filename> with the latest version. This will cause indexing and querying to fail.
The workaround is install 2.0.1 and then manually restart Couchbase Server with the following commands:
</para>
</programlisting>
sudo /etc/init.d/couchbase-server stop
sudo /etc/init.d/couchbase-server start
</programlisting>

kzeller
added a comment - 04/Mar/13 4:23 PM Added to 2.0.1 RN as:
When you upgrade from Couchbase Server 2.0.0 to 2.0.1 on Linux the install may not replace the
<filename>file2.beam</filename> with the latest version. This will cause indexing and querying to fail.
The workaround is install 2.0.1 and then manually restart Couchbase Server with the following commands:
</para>
</programlisting>
sudo /etc/init.d/couchbase-server stop
sudo /etc/init.d/couchbase-server start
</programlisting>

Aleksey Kondratenko (Inactive)
added a comment - 02/May/13 4:12 PM The fix is on the way and I believe (if understand question correctly) it's worth mentioning in release notes that this is fixed and MB-7770 does not apply anymore and need not be worked around.

Also let me note that closing this bug that was not fixed at all but just had documented workaround was wrong. Because without fix we're about to merge we would still have that bug and would still need to document it as known issue for 2.0.2 and 2.1 and etc.

Aleksey Kondratenko (Inactive)
added a comment - 02/May/13 4:22 PM Also let me note that closing this bug that was not fixed at all but just had documented workaround was wrong. Because without fix we're about to merge we would still have that bug and would still need to document it as known issue for 2.0.2 and 2.1 and etc.

Aleksey Kondratenko (Inactive)
added a comment - 02/May/13 4:45 PM I'm not going to tell you what to do. I just pointed out that whatever was done was not quite right.
And my intuition tells me that perhaps mixing closed-because-fixed and closed-because-documented is questionable practice. But I'm not going to argue about that. It's not my business.

Well we have a new process in place for bugs so that tickets that need to be release-noted or documented can be tagged, doc'd but will remain open for engineering unless noted. Information session this Monday.

We have been very messy in the past on handling doc requirements/fixes using any system (but then again this is not your business) ; )

On a separate issue, which IS your business. You say it's worth mentioning we fixed this in 2.0.2 so I will add it to RN 2.0.2 as a fix and reference this ticket.......

kzeller
added a comment - 02/May/13 4:51 PM Well we have a new process in place for bugs so that tickets that need to be release-noted or documented can be tagged, doc'd but will remain open for engineering unless noted. Information session this Monday.
We have been very messy in the past on handling doc requirements/fixes using any system (but then again this is not your business) ; )
On a separate issue, which IS your business. You say it's worth mentioning we fixed this in 2.0.2 so I will add it to RN 2.0.2 as a fix and reference this ticket.......

<para>In the past Couchbase Server 2.0.0 upgrades on Linux the install did not replace the
<filename>file2.beam</filename> with the latest version. This will cause indexing and querying to fail.
This has been fixed.</para>

kzeller
added a comment - 06/May/13 6:21 PM added and commented out as 2.0.2 RN fix:
<para>In the past Couchbase Server 2.0.0 upgrades on Linux the install did not replace the
<filename>file2.beam</filename> with the latest version. This will cause indexing and querying to fail.
This has been fixed.</para>