I also made something like this a while ago. I decided to go for the 2-rows-solution: by doing that you don't have the need for super columns. Cassandra is really good at reading, so this should not be an issue.

Thanks Andrey and Chris. It sounds like we don't necessarily have to use composite columns. From what I understand about dynamic CF, each row may have completely different data from other rows; but in our case, the data in each row is similar to other rows; my concern was more about the homogeneity of the data between columns.

In our original supercolumn-based schema, one special supercolumn is called "metadata" which contains a number of subcolumns to hold metadata describing each collection (e.g. number of documents, etc.), then the rest of the supercolumns in the same row are all IDs of documents belong to the collection, and for each document supercolumn, the subcolumns contain the document content as well as metadata on individual document (e.g. checksum of each document).

To move away from the supercolumn schema, I could either create two CFs, one to hold metadata, the other document content; or I could create just one CF mixing metadata and doc content in the same row, and using composite column names to identify if the particular column is metadata or a document. I am just wondering if you have any inputs on the pros and cons of each schema.

you can iterate over the columns in a single row to get a state's city names and their zip code and you can do a get_range_slices on all keys for the columns starting and ending on the city name to find out the zip codes for a cities with the given name.