since a while I've been collecting some thoughts what to do next for Persistence Extension in form of a mind map and it's quite growing. So I believe it's just about the time to ask for your help with prioritizing and to brainstorm about the next great features.

Scriptable Data Sets

Might be helpful when dealing for instance with time-sensitive data. For example you can define last login date for yesterday using Groovy

{code}useraccount:

-firstname:Clark

lastname:Kent

username:superman

password:LexLuthor

lastlogin: "groovy:new Date().previous()"

{code}

Another thing (just an idea as for now) is to have possibility to verify relationships easier. Since ids are generated during the test execution it's not really the best approach to assume something about them. Would be much more convenient to use some sort of business key to reference to the other rows / entities. Let me show you possiblle approach for @ShouldMatchDataSet:

Entity / object oriented way of defining data sets

DBUnit data sets are great, but they are really low-level, row oriented description of the data. As alternative I was thinking about providing YAML / JSON based representation of the objects, so something like:

Integration with DB migration tools

Hibernate schema auto-creation feature is simply great for prototyping and testing, but for living application it's sometimes not enough. Hence there is quite a bit of migration tools available for Java project, such as

Why we should integrate it with Arquillian Persistence Extension? I see at least these reasons:

As replacement for Hibernate / SQL schema creation (create-drop or custom scripts). Additional value would be to quickly check if your migration scripts are valid (also some sort of integration. to assure that your whole process of deploying the new version will work smoothly).

Some of the tools, such as Liquibase, offers db insertions. So people can use the format which is already familiar for them instead of DBunit data sets.

This might actually result with some nice module split and introduce extension mechanism (yeah, extensions of extension), so we can end up having following structure:

arquillian-persistence-core

arquillian-persistence-dbunit

arquillian-persistence-liquibase

....

Another ideas

Unitils has nice utility assertion to verify if JPA mapping is consitent

I definitely agree that relying on syntheic ids in the data assertions is brittle. In fact, in some cases it may not even possible. I've seen integration test suites where the data is not reset after each run because it's needed in order for the system to function. Instead, data is just appended to the table and assigned the next available id (values in unique fields need to be unique, of course, using something like a timestamp).

When deciding on the match syntax, I like to look around and build on what is familiar to the developer. One possible model to emulate is the Drools DSL syntax. It uses a matching syntax that could be applied here in spirit.

The precense of the brackets in the right-hand side (RHS) would signal that you are dealing with a reference (quotes would be required to distinguish regular data that contains brackets). The syntax pattern is:

tablename(columnname1 = "value1", columnname2 = "value2", ...)

You could take this a step furthur and use the Drools match variable assignment concept to keep the references DRY:

I also like the idea of allowing the values to be derived from scripts. That would accomodate the case I referenced in my first reply about having to create unique values for fields. Here's an example:

customer:
- name: "groovy:'ACME_' + System.currentTimeMillis()"

However, I'd prefer to remove the quotes and call out that it's a script more explicitly. Nodes may be labeled with a type or tag using exclamation points followed by a string, which would accomodate embedded scripts nicely:

customer:
- name: !groovy |
ACME_' +
System.currentTimeMillis()

You'd have to play with the parser to see what it allows, but hopefully something like that will work.

Supporting a script location (and perhaps even a function in that script file) might be a nice addition as well. That externalizes the code and allows for reuse...though I guess the classpath already serves that purpose).

The only hangup there is that YAML might try to actually weave this information into it's own structure rather than for the purpose we are trying to use it. I'm not sure if the two models will conflict.

While the Hibernate schema generation (and import.sql) has certainly proven useful on many occations, I actually find it to be somewhat of a burden because it is tied to the choice of ORM vendor. While other vendors, like EclipseLink, provide equivalent functionality, it's never *exactly* the same. That's where having a tool dedicated to this purpose is much better. And Arquillian is all about portabilities. Additionally, you likely have the chance to get in there an tweak or customize the behavior, which is not true of the feature in Hibernate and EclipseLink.

I read an article recently on Liquibase in NFJS magazine and I got a very good impression from it. That would be my first choice.

Feel free to split the persistence module into multiple artifacts. Having a few artifacts is better than having a monolitic project in the long run you can't untangle. I would recommend keeping the modules under a single github repository (and thus a unified release) for now. Drone is also a single repository with a multi-module build.

After working through these feedback, I think a good next action is to idenitify some test scenarios and think about whether we have the features we need to address the requirements.

For instance,

When the application deploys, it should successfully run the migration scripts, such as add new columns.

I should be able to validate the records in the database without relying on synthetic ids.

Anyway, those are just some ideas. If we keep a catalogue of those stories, I think it not only helps you stay grounded, but it also helps motivate people to contribute...because they don't just see the what (the issue report), but also the why. The why is more important than anything.

thanks a lot for this tremendous feedback! Once I'm done with Alpha4 release (mainly bug fixes and some improvements -should be ready pretty soon) I will focus on prioritization and research more on these topics. I will share the roadmap in this thread. You should expect more questions in next few weeks

We checked he capabilities of the framework to validate database state and currently found it not suitable for our demands. The problem is that we have quiet some timestamps which are set during test execution. These are not nullable in DB, hence they exist in the Datasets. Since we cannot determine the timestamp values the only option would be to ignore these columns while matching. This feature is implemented in DBUnit and should be provided by the persistence extension as well (e.g. as a optional parameter list in @ShouldMatchDataset).

I suppose we need some way to be able to reference an IColumnFilter (or an abstraction over it). I think that's cleaner than a string-based list in the annotation. Perhaps the annotation should allow a reference to a filter implementation. This may be way off, but how about something like:

We could also consider making this attribute an array type to allow multiple filters to be specified. Another solution is to use a composite filter (e.g., multiple filters are combined under a single filter implementation).

** In fact, you gave me an idea about how to phrase reviews like this in the future. Something like: "Is there anything about the software that does not meet your demands or expectations." That really helps bring out the ideas, I think.

thanks for another great feedback! I'm considering some annotation based customization as part of ARQ-718, and in fact evolving @ShouldMatchDataSet from strict mode to more liberal is on my tick list for a while. Also I would like to take this opportunity and thank you again for the great mail discussion about limitations of Alpha3. This kind of feedback is what I was looking for! Kudos!

@Dan - thanks for this great suggestion. I think we should provide both - abstraction over dbunit filtering capabilities (to not be coupled with dbunit) with some reasonable subset of types provided out of the box and give a user possibility to define their own custom / composed filters, but also a simple way to give list of column names to filter. For the much awaited Alpha4 I will provide latter, but put the filter implementations on the roadmap!

Another feature suggestion: allow the database to be cleared and populated by separate scripts/datasets. We have multiple DBUnit datasets, and none of them use every table in the database. As a result, DBUnit was unable to clear the database when a test run used a different dataset than the previous test because, as noted in the documentation, DBUnit doesn't touch any tables not present in the dataset. Assume table A is referenced by tables B and C, and the last dataset to be loaded put data into tables A and B, but the new dataset loads tables A and C. If you ask DBUnit to do a DELETE_ALL or CLEAN_INSERT, this results in foreign key constraint violations, because DBUnit tries to empty table A, but doesn't know it needs to empty table B first.

One could solve this by referencing every table in every dataset, of course, but if the schema changes, this requires you to change EVERY dataset, and if you extract a subset of production data for testing, you have to modify the dataset before you can use it. One could also break the data down into smaller datasets and make certain sets "depend" on other sets, but that would be a headache to maintain. Hence, our solution was to define a DBUnit dataset that contains one empty element for every table, then use that special dataset to clear the database, followed by loading whatever dataset is actually needed. This ensures that if the schema changes, we only have to update one file, no dependencies between datasets are needed, and any extract can be used without modification.

With regards to clean-up. Currently it's done using DELETE_ALL operation. You can turn off constraints (if supported by your database) using for instance initStatement property (example below works for HSQL DB) :

On the other hand you can also use @UsingScript annotation to run your custom script before the test execution. It's triggered before @UsingDataSet so you can use it to clean / tune your db for the given test case before you seed data.

I hope it helps.

Many thanks for describing your use case and approach you used. This will definitely help me to improve clean-up strategy on which I'm currently working.