> > For performance reasons, you prefer the second option that I rejected> > where users give a file and the system finds the deletes from there. I> can> > buy that.>> That's simpler at least to understand and debug, the logs from ORC alone> are enough to find consistency issues.>> The rest of the details are implicit to the implementation, beyond a base> file and the current transaction state.>> This is nearly exactly how the LLAP ACID cache patch does today, which> looks the cache up on the base file and applies local transaction state per> query (i.e valid txns list which hides the committed deletes from an older> query).>> > I don’t follow your last comment about ROW__ID being projected out to> the> > user. ORC isn’t currently hiding that field from the reader is it?>> In general, a BI tool of some kind over ACID probably cares about the data> and not the metadata about which rows belong to which transaction in> general.>> Hiding ROW__ID makes the consumer side of the reader identical between> ACID and non-ACID, unless it is being read by a "SELECT FOR UPDATE" reader.>> Cheers,> Gopal>>>

> For performance reasons, you prefer the second option that I rejected

> where users give a file and the system finds the deletes from there. I can> buy that.

That's simpler at least to understand and debug, the logs from ORC alone are enough to find consistency issues.

The rest of the details are implicit to the implementation, beyond a base file and the current transaction state.

This is nearly exactly how the LLAP ACID cache patch does today, which looks the cache up on the base file and applies local transaction state per query (i.e valid txns list which hides the committed deletes from an older query).

> I don’t follow your last comment about ROW__ID being projected out to the> user. ORC isn’t currently hiding that field from the reader is it?

In general, a BI tool of some kind over ACID probably cares about the data and not the metadata about which rows belong to which transaction in general.

Hiding ROW__ID makes the consumer side of the reader identical between ACID and non-ACID, unless it is being read by a "SELECT FOR UPDATE" reader.

I’ve been looking at the OrcFile.createReader method and thinking aboutwhat I will need to do to read acid files. The first thing that strikes meis that createReader takes a file. But for acid, you need to pass thedirectory because it needs to look for any relevant delta files. Acid alsorequires a ValidTxnList. We can add that to the ReaderOptions.

It seems the best way to do this is to add a new methodOrcFile.createAcidReader that takes a directory. I don’t like that theuser has to make a different call in the acid case. But the user will haveto set the ValidTxnList in the reader options anyway, so the user willalready have to have split logic.

Every way I could think of for createReader to decide if it was dealingwith an acid directory or a non-acid file seemed to create jumbledsemantics.Does the user pass a directory for the acid case but a file for non-acid?Yuck.Does the user pass a base file in the acid case and the code walks up thepath to find the relevant directory? Seems error prone and slow.

Related to this is my assumption that I will need to write a newimplementation of Reader and RecordReader that understand acid. This seemsbetter than putting a bunch of branches into the existing code to try tohandle both cases.

> > The first thing that strikes me is that createReader takes a file.> > But for acid, you need to pass the directory because it needs to look> for any relevant delta files.>> The ACID 2.x impl, the InputFormat gets a directory - but a Reader should> still be getting an individual file.>> In fact, it should be getting something smaller than a File (ideally an> HDFS block) which is encoded as FileSplit (path, offset, len) and a> ValidTxnList.>> In the original ACID impl, multiple delta files get a single Reader, while> in the new ACID 2.x impl the concept of a "base file + deltas" is> irrelevant.>> All insert deltas are equivalent base files - the base concept is (1> Stripe) + (Relevant deletes) == 1 vector reader.>> There's no need to know which delete deltas were already read unlike the> UPDATE ops (i.e Split #1 and Split #2 can load their deletes independently,> without worrying about double row outputs).>> If a delete delta which was loaded is not found in the input split, it has> no effect on the reader's output correctness.>> > I don’t like that the user has to make a different call in the acid case.>> You need to identify within ORC whether the file provided is a base file> or a delta insert file.>> If the file is a base_xx, then all deletes exist in ./delete_deltas_*/>> if the file is named delta_xx, then all deletes exist in ../delete_deltas*/>> Those are strictly enforced by the ACID implementation & can serve as easy> assumptions.>> Other than that, a non-null ValidTxnList is all it should take.>> > the user will already have to have split logic.>> The part where the logic-splits off is into InputFormat - detecting> compaction during split generation is strictly the InputFormat's problem.>> There's a bit of magic there which is in plain sight, like how the INSERT> OVERWRITE works transactionally (HIVE-14988).>> For me, the clear division is to look at this problem as "Details about> file names" (includes HIVE-14535) and "Details about a Stripe" (Reader +> valid-txns + deletes application).>> Everything in the middle is just the same as regular ORC, like PPD.>> From the other side of the mirror, the flat ORC API is pretty much a Null> ROW__ID pruned already, with 0 deletes and Long.MAX watermark in the> ReaderOptions.>> > implementation of Reader and RecordReader that understand acid>> There's an "*" to most of the above - a reader which intends to modify the> data might need a different API, to be explicit that the ROW__ID is> projected out to the user.>> Cheers,> Gopal>>>> On 9/13/17, 3:48 PM, "Alan Gates" <[EMAIL PROTECTED]> wrote:>> I’ve been looking at the OrcFile.createReader method and thinking about> what I will need to do to read acid files. The first thing that> strikes me> is that createReader takes a file. But for acid, you need to pass the> directory because it needs to look for any relevant delta files. Acid> also> requires a ValidTxnList. We can add that to the ReaderOptions.>> It seems the best way to do this is to add a new method> OrcFile.createAcidReader that takes a directory. I don’t like that the> user has to make a different call in the acid case. But the user will> have> to set the ValidTxnList in the reader options anyway, so the user will> already have to have split logic.>> Every way I could think of for createReader to decide if it was dealing

There's no need to know which delete deltas were already read unlike the UPDATE ops (i.e Split #1 and Split #2 can load their deletes independently, without worrying about double row outputs).

If a delete delta which was loaded is not found in the input split, it has no effect on the reader's output correctness.

> I don’t like that the user has to make a different call in the acid case.

You need to identify within ORC whether the file provided is a base file or a delta insert file.

If the file is a base_xx, then all deletes exist in ./delete_deltas_*/

if the file is named delta_xx, then all deletes exist in ../delete_deltas*/

Those are strictly enforced by the ACID implementation & can serve as easy assumptions.

Other than that, a non-null ValidTxnList is all it should take.

> the user will already have to have split logic.

The part where the logic-splits off is into InputFormat - detecting compaction during split generation is strictly the InputFormat's problem.

There's a bit of magic there which is in plain sight, like how the INSERT OVERWRITE works transactionally (HIVE-14988).

For me, the clear division is to look at this problem as "Details about file names" (includes HIVE-14535) and "Details about a Stripe" (Reader + valid-txns + deletes application).

Everything in the middle is just the same as regular ORC, like PPD.

From the other side of the mirror, the flat ORC API is pretty much a Null ROW__ID pruned already, with 0 deletes and Long.MAX watermark in the ReaderOptions.

> implementation of Reader and RecordReader that understand acid

There's an "*" to most of the above - a reader which intends to modify the data might need a different API, to be explicit that the ROW__ID is projected out to the user.

Cheers,Gopal

On 9/13/17, 3:48 PM, "Alan Gates" <[EMAIL PROTECTED]> wrote:

I’ve been looking at the OrcFile.createReader method and thinking about what I will need to do to read acid files. The first thing that strikes me is that createReader takes a file. But for acid, you need to pass the directory because it needs to look for any relevant delta files. Acid also requires a ValidTxnList. We can add that to the ReaderOptions.

It seems the best way to do this is to add a new method OrcFile.createAcidReader that takes a directory. I don’t like that the user has to make a different call in the acid case. But the user will have to set the ValidTxnList in the reader options anyway, so the user will already have to have split logic.

Every way I could think of for createReader to decide if it was dealing with an acid directory or a non-acid file seemed to create jumbled semantics. Does the user pass a directory for the acid case but a file for non-acid? Yuck. Does the user pass a base file in the acid case and the code walks up the path to find the relevant directory? Seems error prone and slow.

Related to this is my assumption that I will need to write a new implementation of Reader and RecordReader that understand acid. This seems better than putting a bunch of branches into the existing code to try to handle both cases.

Thoughts?

Alan.

Gopal Vijayaraghavan 2017-09-14, 03:47

NEW: Monitor These Apps!

Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext