accumulo-user mailing list archives

Thanks Josh. That was helpful; yes a migration to hadoop 2 is in our future!
In the end, I decided to start a new instance like you ended up suggesting
and bulk importing.
Thanks for the help!
On Tue, Feb 24, 2015 at 1:33 PM, Josh Elser <josh.elser@gmail.com> wrote:
> Ok, that helps a bit. A few things
>
> > "Could not create ServerSocket.." error as it can't connect to the
> tserver.
>
> Note that this is a Server socket. This means that the server (master or
> tabletserver) failed to bind the socket it was going to use for the Thrift
> server. This means that Accumulo will not work as the processes can't
> communicate with each other or clients. The error message should make it
> fairly obvious as to why the exception was thrown. Hopefully, the process
> killed itself too.
>
> > Hadoop 1.2.1
>
> Hadoop 1 doesn't have the best track-record when it comes to ensure that a
> file is actually written to disk when we request it to be (a big part of
> the reason we suggest to move to Hadoop 2 when you can). Hard poweroff can
> result in bad Accumulo files in HDFS.
>
> You can try adding dfs.datanode.synconclose=true to your hdfs-site.xml
> which might help protect against this, but I'm not sure of the error
> handling of actually running out of space on the local disk. HDFS' reserved
> space configuration can help remove this worry by preventing writes when
> HDFS is nearing full instead of the actual file system.
>
> > I deleted the wal logs, hoping that it would revert to what was in
> /accumulo/tables
>
> Deleting the WALs also isn't doing what you expect it to :). The WALs,
> especially for the metadata table, are extremely important and are needed
> to ensure that data is not lost (if WALs for the metadata table are lost,
> the table might be in an inconsistent state that Accumulo can't
> automatically recover from).
>
> This is probably why your tables are not coming online.
>
> Recovering your existing instance might not be worth the hassle. It's
> likely easier to just move the RFiles in HDFS out of the way, and then
> reimport them into a reinitialized Accumulo.
>
> An outline of how to do this can be found at http://accumulo.apache.org/1.
> 6/accumulo_user_manual.html#_hdfs_failure under the *Q* "The metadata (or
> root) table has references to a corrupt WAL". If you need some more
> guidance than what is listed there, please feel free to ask!
>
> Kina Winoto wrote:
>
>> Hi Josh,
>>
>> > Versions of Hadoop and Accumulo:
>> Hadoop 1.2.1
>> Accumulo 1.6.1
>> > Are the accumulo.metadata/!METADATA and/or accumulo.root tables online?
>> Nope.. I tried to scan the tables -- it just hangs
>> > Have you checked the logs of the Master and/or TabletServer for any
>> exceptions?
>> The master log is locked for read operation (an info message). I tried
>> to shutdown the master with accumulo admin -f stopMaster, but it's still
>> unhappy.
>> The tserver log doesn't have any exceptions. However, if I run accumulo
>> tserver -a localhost, then I'll get a "Could not create ServerSocket.."
>> error as it can't connect to the tserver.
>>
>> For more context, I ran into all of this because I'm running this on a
>> vm and I ran out of disk space so Accumulo could no longer write to the
>> wal reliably and then checksums weren't matching up. After I created
>> more space on my vm, I deleted the wal logs, hoping that it would revert
>> to what was in /accumulo/tables, but then ran into this error where I
>> have zero tablets.
>>
>> Thanks for any suggestions on what to do next!
>>
>> - Kina
>>
>> On Tue, Feb 24, 2015 at 11:13 AM, Josh Elser <josh.elser@gmail.com
>> <mailto:josh.elser@gmail.com>> wrote:
>>
>> Hi Kina,
>>
>> Can you share some more information?
>>
>> * Versions of Hadoop and Accumulo
>> * Are the accumulo.metadata/!METADATA and/or accumulo.root tables
>> online?
>> * Have you checked the logs of the Master and/or TabletServer for
>> any exceptions?
>>
>> - Josh
>>
>> Kina Winoto wrote:
>>
>> Hi,
>>
>> I'm running a local instance of accumulo with just one tablet
>> server. I
>> got into a rut and now I don't have any tablets. There is data
>> still in
>> hdfs but I assume the data is corrupted so the tablets aren't
>> being
>> assigned to the tablet server. Is there a way I can force a
>> tablet to be
>> assigned? I don't mind giving up a portion of my data (or all of
>> it) at
>> this point. I'd just rather not have to reinitialize accumulo and
>> recreate all the users and set up all my tables again. Maybe I
>> can force
>> a tablet assignment and then delete the tables that are corrupted?
>>
>> I've encountered a similar issue on a many-node cluster and
>> would like
>> to know if my only option is to reinitialize accumulo.
>>
>> Thanks!
>>
>> - Kina
>>
>> —
>> Sent from Mailbox <https://www.dropbox.com/__mailbox
>> <https://www.dropbox.com/mailbox>>
>>
>>
>>