Could tell me whether SciDB support the concurrent read and write?
Now I am using netCDF to store my large data in parallel python. netCDF support the parallel read, but not parallel write.
Best regards,

( 3 ) Readers can still be working while data is being written to a SciDB array, but the writer’s data is only visible to readers once the writer’s transaction is complete and committed.

In other words, we support concurrent readers with at most one writer.

I wasn’t aware that netCDF provided any transactional guarantees. Would you be so kind as to post a link to a location where this is described? Doing transactions right is not easy and my hat would be off to that team if they’ve achieved it.

Transactional quality of service guarantees mean a lot more than concurrent readers / writers. It means …

If the change you’re making to your data files fails for some reason … such as the writer program crashes … transactional systems guarantee that it will leave the data unaffected. This is atomicity.

If you have a series of small writes – appending data to the end of the data set, say – concurrent readers are completely unaware of these changes. In fact, concurrent readers may have different views of the data, depending on when they start, relative to write operations. This is transactional isolation.

Once you’ve written the data, SciDB makes copies of it to ensure that if one copy disappears when a piece of your hardware blows up, we have a spare. That’s durability.

And when you write to a SciDB array, we ensure that any data in the system complies with any of the rules you’ve said the array should obey. That’s consistency.

The important point is that all of this happens without developers and users needing to do anything at all.

Now … after reading your note, I went and had a look at netCDF with specific attention being paid to the way this concurrent reader / writer is handled. I first headed here … unidata.ucar.edu/software/ne … brary.html … where it says this:

At most one process should have a netCDF dataset open for writing at one time. The library is designed to provide limited support for multiple concurrent readers with one writer, via disciplined use of the nc_sync function and the NC_SHARE flag. If a writer makes changes in define mode, such as the addition of new variables, dimensions, or attributes, some means external to the library is necessary to prevent readers from making concurrent accesses and to inform readers to call nc_sync before the next access.

The intent was to allow sharing of a netCDF dataset among multiple readers and one writer, by having the writer call nc_sync after writing and the readers call nc_sync before each read. For a writer, this flushes buffers to disk. For a reader, it makes sure that the next read will be from disk rather than from previously cached buffers, so that the reader will see changes made by the writing process …

So what this says (to me) is that if you’re really careful, you can implement a one-writer / multiple readers access to your data in netCDF files. But the actual implementing is entirely up to you. And for various technical reasons … for example, suppose you have multiple concurrent readers accessing a data set, each of them will require a distinct copy of the data in memory, and it’s not clear to me how a reader ought to determine when to call nc_sync as that might be a pretty expensive operation … I don’t think the underlying mechanic is especially viable for large numbers of concurrent readers.

Mind you, none of these observations should be read as me pointing at netCDF and laughing. Scientific file formats are designed and implemented with very specific use-cases in mind, and they prioritize the interests of different kinds of users to the ones who generally use SciDB. If you’ve got the time and skill to implement your own concurrent access control in your own programs and doing so is critical to your application – I can think of all kinds of reasons that it might be – then you should absolutely use those tools in the best way you can.

But most of SciDB’s users aren’t programmers. So they appreciate not having to deal with all these details.

Good luck with your netCDF work! I’m always curious to learn more about how people use netCDF.

This is good lessen for me. Now I understand the difference between netCDF and SciDB.
My code is not very complex and I use the netCDF file to share the data between different processors because of the limited RAM problem.
These is also one netcdf4-python library like SciDB, quite simple.
Hope the concurrent write could be implemented in SciDB .