If you are new to Computer Science please read our FAQ before posting. Your question might have already been answered!

Do not post questions such as "should I study computer science?", "how do I get an internship?", "what sort of job can I get after school?", etc... There have been too many of these threads; they bore the regulars and scare away experts. If you have a question like this, please consider posting on cscareerquestions or askcomputerscience.

Hi, so here's a data structure I want. I want to be able to O(1) access any element at any offset, like an array.

Additionally, I want, to be able to insert an arbitrary length of content at an arbitrary offset efficiently. I.E, Effectively O(1) ordered index and insert (ordered not based on content, but based on splice point of insertion)

You can get one, but not the other as far as I can see. I think there may be some trick in counting and tagging elements that would make this possible, but I don't know of any.

To help phrase the thinking, let's presume that I have an ordered set of 1014 elements (presume say, 4 petabytes worth) spanned over a roomful of machines. Now I insert a set of size 1000 at offset, say 47 281 923 050 192. Then I do another 105 inserts in the same spirit.

Different question, is there some notation of information cost? Basically what I'm proposing is to have a substantial change in the nature of the structure essentially for free.

I think you'll either have to pay for it in increased algorithmic access time or in a required reindexing ... As if there is some fundamental minimal computational unit cost for the operation that can only be transferred and not avoided.

For instance, I could do a lazy computation, but in the aggregate it comes out more expensive than a naive reindex.

I think doing things this way would lead one to presuppose a class of solutions by presuming that your algorithm must be able to be fundamentally deconstructed into these primitives in a way wherein these primitives are the best way of expressing that solution.

Inasmuch, I think one would only get the "going through the front door" type of answer while trying to construct a system from these building blocks as the base tool.

For instance, let's say a solution used an exotic rolling hash over some kind of y-fast trie to do some skipping/filtering (like an indexed form of boyer-moore). How would you express it with these?

Uhm the trie is built with trees, which itself is built with lists. The rolling hash only changes the value of what's inside to something that is easier to organize into a trie.

So here's a problem:
In order for a list to access elements at O(1) that means that we have an absolute system of indexing, which has N indexes. It could be the way we arrange it in memory, which would make the index a structural one, but it'd still be a global index.

When we insert an object in the middle at position X of this list, we'd have to update the index, at least N-X indexes, since now it's at a different position. You can visualize this absolute index as depending of ALL previous elements, so each element (which knows its index) contains information of ALL elements preceding it (the number of elements).

If we want to allow for O(1) insert, then we need to implement a relative indexing instead, where only the element knows which is the next element on the list (or if it's done). Because only the affected element (at insertion point) is being altered, we can do the process in O(1). This though means that we have to ask each element for the next one, since we have no way of knowing it.

So here's the thing, you can't win, you can speed up the insertion, or the access, but you can't get both. Speeding up one requires shifting information to other places, making operations slower.

Unless you don't care about the order of the elements. Here we can have the index defined by an inherent property of the data you input. So object A, always has index N. This would be a hashtable. But since you want splicing, I don't think this is what you want at all.

Please be careful insinuating that people aren't helping you by telling you something can't be done. We all hear about Einstein's relativity, but little of how he wasted his last years in trying to find a grand theory that worked without quantum mechanics. If such structure existed, why wouldn't it be used everywhere? I mean hashsets are so much limited than what you want, and yet they are extremely well known structures simply because they are so fast at everything they do.

When we insert an object in the middle at position X of this list, we'd have to update the index, at least N-X indexes, since now it's at a different position.

Nope. You could update the way you compute the index.

If we want to allow for O(1) insert, then we need to implement a relative indexing instead, where only the element knows which is the next element on the list (or if it's done).

You are presuming that one element would have enough information to know about next. What if you needed a set of elements to get this information, where the next pointer is a calculation of say, 4 elements - we get to a log log n solution pretty quickly with this (usually).

So here's the thing, you can't win, you can speed up the insertion, or the access, but you can't get both. Speeding up one requires shifting information to other places, making operations slower.

The neophyte way of thinking about this makes it only appear impossible. But this may be nothing more than a convincing illusion - resulting from an ingrained institution of thought in thinking about the thing.

This does not preclude a radically different and completely novel way of dealing with it, that is fundamentally at odds and exhaustively different in form and function.

Even the mathematical proof (which is presented elsewhere in the thread) only shows that "when the problem (of accessing information) is presented in this way, here are the boundaries". It doesn't imply that a rephrasing of the problem, which gets to the same solution, but is mathematically distinct in approach (a different class of problem with effectively the same outcome), is out of the question.

That's why in the problem set above I removed "Delete" and ordering by anything other than splice point. That may actually change the problem in a fundamentally important way.

I understand your point, but my whole point is that to implement O(1) access you require that each element of the list contain all the information of at least all the elements that precede it, or follow it.
This also means that each element's information (position) is held by all elements that precede it or follow it.
This is because for O(1) we need absolute positioning, given elements #2 and #5 I know there are 3 elements in-between.
In a linked list, elements are not indexed, there is no absolute positioning (only relative). So given two elements, I don't know how many elements are in-between without traversing the list. They only hold the information of at least one adjacent element. They may have information for more (as in skip lists) nodes following.

Consider that all operations on list elements occur relative to the position of one element. Even in arrays, we are using an offset from the first node.

Now with this let's see the operations in terms of the biggest number element nodes that know the relative position of a specific node (let's call this K):
Access:
If we are at a A, and wish to find a B, if A knows about B we can do the jump. The probability of A knowing where B is N/K.
O(N/K) (as K becomes N, we get O(1), as K becomes 1 we get O(N) )

Insertion:
When we insert A we have to update the information of at least all other elements that have to know its relative position. That number could be at most K. So insertion is O(K).

So now we need a value of K such that K and N/K both are 1. Such value does not exist. Therefore you can't create a list that allows arbitrary insertion and access in O(1).

But lookmeat, you might say, what about an array to which I only append data? I would be able to access any element in O(1) and insertion would only take O(1). At which point I'd take a deep breath and explain: insertions are not arbitrary in that list, my statement only applies if both access and insertions are arbitrary. That list you describe is one were elements have information of all elements that precede it (if I am at arr[5] I know there are at least 5 elements before it, but I don't know if this is the end of the list or there are more). When I append at the end, no one knows, or cares, about the last element, so the insertion time is actually O(1). Basically I'm forcing everyone to play in the best case for insertion in an array, but I'm not really breaking the limits set above.

tl;dr: There is a logic behind why you can't. Stop saying that it's nay-saying and stopping innovation. That's like saying that Gödel was a party pooper with his Incompleteness Theorem, or that Turing's Halting Problem is limiting creativity and innovation. Learn the limitations of what you want to do, make the problem more specific and see if that gives you a chance to do something.

There are solutions, such as ropes and skip lists, that can give you better efficiency in some operations, at the cost of speed in others. Start understanding those and see if they can apply to your problem.

Don't you think that if such list existed, it'd already be used instead of trees, hash-tables, linked lists, arrays, etc. etc.? My neophyte intuition was that there was such solution in lieu of space, but now, having actually investigate into the problem, I began understanding it better.

yes, both index and insert being aggregate effectively O(1) forming an ordered set (as in you have a notion of first, last, next, current, and previous). If this doesn't exist, then why?

Disclaimer: I intuitively feel it can't, but I can't really back up this feeling with any hard math.

There may be something really crafty though; like arranging the elements in such a way that you can permute input parameters to say, a cooley-tukey FFT that can somehow compact multiple inserts into a coherent concise transformation (by say, arranging the insertion in a way that can factor down the O(N log N) time ie, recursive trellis modulation acting as the offsets to form a transitive reduction).

Just a note: You still have the uniform depth problem with this solution (not knowing the granularity and range of the recurse) ... and that's something I see consistently, but there may be a way around it. I just don't know.

oh yes, I thought of something like that. If you make the array cyclic, then the beginning and end can be wherever you want it to be as long as you maintain an offset. But then what do you do after the second insert? And now we are back to the same problem.

It's like there is information (the insert) there that has this minimal cost, almost like a "thermodynamic law of information" and you can transform it into various things, but can't ignore it.

Just to be clear - you don't just want O(1) read/write/update/delete speeds, but you also want to be able to traverse all N elements in O(N) time, yes? I'm pretty sure that's what your second paragraph is asking for, in a kinda round-about way.

You can get O(1) read/write/update/delete speeds is you can control the size of the table and you're willing to give up the linear time traversal (i.e., you can't efficiently retrieve the records in any particular order).
So if you don't really need the efficient traversal you can solve your problem with off-the-shelf library parts today :)

If it takes you O(1) to read a single element, and finding the next one by reading all of them costs you O(N), then reading all N elements in order = N * O(N) = O(N2) (essentially) work to traverse N elements

You can't do it with trees (any kind), tries, heaps, linked lists, hashes ... You may be able to pull it off with a hypergraph but I can't think of how.

Unless I'm totally forgetting something, this isn't an easy question. I think more than likely, it's not possible and there's probably a proof out there of it.

Perhaps you are crediting me with phrasing the problem well, and if so, than I thank you.

I'm trying to add associated entries to a list, adding a depth essentially

Basically I want to insert an unspecified number of elements after a given offset and then be able to index to an offset without re-indexing or the complexity being pegged to the number of operations.

It sounds like something you can prove impossible via induction (using cantor's countable sets), but I don't know if it's been done. Maybe a contrapositive proof would work here too. I'm really rusty on this stuff actually.