Jump to:

It's WAY too early to start working on this, but I want to get this task identified. We need to create an upgrade path for any affected modules, CCK plus any core fields that get into D7 (profile? poll? taxonomy? whatever gets in). I realize CCK is contrib but the last time we moved part of CCK into core we found there were core implications for getting the upgrade working right and I assume the same will be true this time.

One thing that will help is that in every case the new storage location for the data will be different than the current storage location, so the data can live in both (or be retained in its old location as long as necessary). This is a little different than the D5 upgrade path where core used the same table name we were already using in CCK.

We'll need a matching issue in CCK for the CCK-specific part of this task, but I'm not going to create one until we get further along.

The CCK to Field API upgrade process will run in D7. We will therefore need D7 code that understands at least the D6 CCK content_* tables. This includes at least the D6 CCK CRUD functions to identify which fields and instances exist.

We will have a loop that iterates through the D6 CCK fields and instances and creates D7 fields and instances. We'll have to assume that all node types have been upgraded already.

We will have D7 field modules but not D6 field modules. Not all field types will map 1:1. For example, a D6 text field with allowed values will be converted to a D7 list field. This suggests that each D7 field module will need to register itself as being able to import certain kinds of D6 fields, and it will have to embed whatever info about the D6 CCK layout is necessary to do the upgrade. In fact this will probably look a lot like the "Field Conversion" logic for field_update_field(). So when we loop through all the D6 CCK fields, we'll call hook_field_upgrade_from_d6_to_d7($d6_field). In the allowed values case, perhaps list.module will implement that hook and handle any text or number fields with allowed values.

I guess hook_field_upgrade() can both create the D7 field and migrate the data into it. Or perhaps those should be separate passes/separate loops.

I do not think this can be a normal hook_upgrade_N() function because it is unlikely that all field modules a site is using will be available to upgrade all at once, so the upgrade process will require multiple passes. Thus, it should be a module with a UI, perhaps showing all the D6 fields & modules, the D7 fields and modules they map to, and the status of each (upgraded/not upgraded). As Karen points out, we have the luxury of being able to keep the D6 data tables forever, so the upgrade can be re-run multiple times until it is right. This also means that any errors during upgrade can be handled by just deleting the D7 field(s) and running the upgrade again.

Better title.
Thing is, we now know this will need a process separate from update.php and possibly a UI (see #3), and thus will need to be provided in a contrib module.
As such, this does not fully belong to the core queue.

The reason for posting this in the core issue queue is because we may need the assistance of core in making things work even if the upgrade path is handled by the contrib module. Plus as a reminder to everyone that even tho fields are in core in D7, the fields you had in D6 will only be migrated to their D7 versions by CCK contrib, core will NOT be doing any field migration. I suspect that will be a point of confusion for a lot of people.

As I mentioned in the original post:

I realize CCK is contrib but the last time we moved part of CCK into core we found there were core implications for getting the upgrade working right and I assume the same will be true this time.

So this issue is to discuss what needs to happen and whether or not we need any changes to core to make that work. You could certainly make the case that it does not belong in the core queue unless we find we need core fixes, so I'll defer to whatever seems best about that, but I also think it is good to keep this in a place where it will get lots of attention. And I would make the case that anytime core takes over all or part of a contrib module that core has some responsibility to make sure the transition goes smoothly, and you can't tell if it will until the upgrade path has been written and tested.

If a CCK_to_D7 contrib module isn't ready the day D7 ships, it means people cannot upgrade D6 sites on day 1, but can start fresh D7 sites. I'm not sure this qualifies as 'release blocker' (or that drieschick will delay a release for this reason when the day comes)

1. I think this does definitely qualify as a release blocker, and is in fact the very definition of one.
2. I think anyone who thinks this upgrade path shouldn't be part of core itself is smoking a particularly string variation of crack. Can someone explain why this makes sense?

Tricky and difficult code is exactly where you want to leverage the advantages of the Drupal core queue with multiple eyeballs on it. I've no idea why it would be desirable to push this code to a dark corner in contrib.

"2. I think anyone who thinks this upgrade path shouldn't be part of core itself is smoking a particularly string variation of crack. Can someone explain why this makes sense?"

I don't see why we want to teach D7 core the intricacies upgrading the D6 Date contrib module to the D7 Date contrib module which isn't even written yet and may not be before D7 ships. Sure, core will provide some mechanisms for an upgrade path, and we need to validate that it is sufficient before release. But we can't expect a single magical CCK to Field API upgrade path to exist separate from all the contrib modules that provide CCK/Field API functionality.

The "more hands and eyeballs" argument is IMO the only one towards CCK_to_D7 in core. Other than that, I really don't see why core should burden with this.

- CCK D6 is contrib, core D6 has no fields to migrate. It's perfectly fine if you need a contrib module to get your contrib D6 fields after switching to D7
- as Barry outlined in #3 and #16, the upgrade cannot be a regular, one-off process running 'silently' along with other update.php updates (because core or even one single contrib module can't predict how to migrate date fields or wicked_contrib_field_type fields). The upgrade process needs to be iterative, provided by a cck_upgrade module that exposes a page with the list of D6 fields that it's able to migrate given the current state of enabled modules. Press 'submit', eligible fields are batch-migrated.
Until date.module D7 (or a submodule) is enabled and implements hook_cck_upgrade() or whatever, stating that it knows how to migrate date fields, your D6 date fields are not visible in D7. Come back to the 'migrate fields' page when you have the module.

<catch> webchick: still agree with http://drupal.org/node/366364#comment-2169648 ?<Druplicon> http://drupal.org/node/366364 => Upgrade path for D6 CCK fields => Drupal, field system, critical, active, 23 comments, 2 IRC mentions<webchick> catch, Less and less every day. :P<webchick> catch, My perfectionist side wants to still believe it, but my pragmatist side says the only way that's ever getting done is to release D7 without it.<webchick> Else we wouldn't be sitting here 4 months later with you asking me that question. ;)<catch> webchick: I couldn't find any issue anywhere for it apart from that one.<catch> webchick: so I will downgrade it.<catch> webchick: can I quote you?

While, technically this isn't a critical issue, a failure to address this issue, in neither Core nor contrib, could possibly negate all the good work of the D7CX movement.

A lack of contributed modules prevented a lot of site builders using D6 early, and hence D6 wasn't used much to begin with.
A consequence of that D6 is didn't get good early testing by site builders [as opposed to developers].
The D7CX movement could/should mean that site builders can give D7 a spin and see how it feels.

As webchick notes in the alpha 2 announcement.

You can help us reach the final release date sooner by testing this alpha and providing feedback.

An upgrade path for D6 CCK fields would broaden the available field of testers.
Some people are 'selfish' in the sense, that would only bother trying an alpha if they can test it for their own use case.
For many people who maintain one or two sites, they might not bother to test the Alpha unless they could run a test on a copy of their own site.

Similarly, while D7 could be released, it won't be 'ready' for many unless there is an upgrade path.

I'm sorry to reckon that I won't be able to lead the effort on the CCK upgrade path myself.
I couldn't find the time to move forward on this in the last 4 months, and things are not getting lighter in the forthcoming weeks :-/.

a. CCK field modules being upgraded to D7. Apart from the several that were added to core. user reference and node reference are all ported. This is light years ahead of D6 porting efforts (as is Views). There are sites being built (although not launching just yet) with Drupal 7 field API right now.

b. The second is the data migration between D6 CCK and Field API - so that people can port existing Drupal 6 sites with CCK fields to Drupal 7. This is unlikely to slow Drupal 7 adoption too much - people are generally much slower upgrading existing sites to newer releases than they are starting new sites with it - however it's absolutely critical that this in place fairly soon, because without it there'll be very few D6 sites able to port to D7 at all

This is pretty much what #yched said in #14.

I've re-titled this issue to make it more explicit that a. is the missing piece. Also Barry's thoughts in #3 sound eerily like table wizard and migrate..

> We will have a loop that iterates through the D6 CCK fields and instances and creates D7 fields and instances. We'll have to assume that all node types have been upgraded already

I am (slowly) working on prototype code to do this. I'm planning a hook_schema sort of thing that defines what fields the system should create, before running a conversion process. I don't have that much free time to work on this, so if anyone wants to collaborate on this, I'll put code I have so far on github.

I am hoping to make time to do some work on this but will absolutely not have time to do it alone, so we need to keep code posted in a place where multiple people can work on this collaboratively. I don't use github and don't have time to figure that out on top of working on this code. If you have made a start on this, please post what you have.

The whole point of github is that it's good for working collaboratively. I find it's much easier to see successive commits than muck about posting one patch after another. But I agree that you don't want to be having to learn something new at this point :)

So here's a zipfile :)

This is a module because I started out with the intention of making a helper module for contrib modules to use to migrate data to FieldAPI (eg user terms, term images, oldskool image).
It will need some work to go inside the update system in such a way that it can be used by core or contrib.

The idea is that you define migration plans which are then run as a batch.
Each plan tells you: what fields and field instances to create, how to get
the objects to manipulate, and then with a loaded object, how to transfer data
from one lot of object properties to another.
Eg your custom module might load stuff into $node->myprop, and this should go
into $node->field_new_field.
The plan tells you which nodes to do this to and how to get the data for the old
property if there are no longer hook_load() implementations to do this from old tables.

More explanations in code comments. There's a demo plan for dummy data and I made just now a very rough start on taxonomy, which as it's one of the more complex cases should help thrash out anything this system is missing.

I'm out rest of today but around tomorrow morning (0800-1200 UTC) if anyone wants to chat about this.

Thanks so much!! Yes I know github is good for collaboration, but I suspect our best chance of getting other people involved is keeping the work here on d.o.

I am simultaneously working on this, getting Date ported (to help make sure complex contrib fields will work), and trying to put together some documentation on how to update field modules. All this stuff is intertwined, so it all needs to be done at once.

My work will be sporadic, but I will post back anything I come up with as I go along. If anyone else wants to dive into this, don't wait on me, go for it :)

I haven't even looked at it yet. Whatever part is core code needs to be here as a patch. Whatever part is contrib can live in CCK, which is where I think we always figured the contrib upgrade would live.

My code is not aimed at just CCK, but any module that stores something that could migrate to FieldAPI.
Some good examples would be user terms, taxonomy image, image attach. All the code to help this should be in core.

A couple of notes about what I posted: I've not done the batch looping properly yet, nor implemented all the stuff that's in the info array such as the postprocessor. I think the whole plan needs pre- and post- callbacks too for cleanup, table deletion, etc.

I'm really glad to see that there has been some progress on this. I will try to pitch in. Like KarenS and yched, I worry about my ability to have the time to lead it. Unfortunately we're the three most likely candidates. :-/

Here's the current status from my end. I am porting Date module to D7 and documenting the update steps as I go as a guide for other developers. That's getting me back into the code and giving me a complex contrib field that can be used in the update.

I'm planning to create a D6 database that contains nodes with all the kinds of fields that need to be updated, using every possible combination of fields and widgets: body using the teaser splitter, body text with no teaser splitter, taxonomy tags, taxonomy lists, text fields, number fields, nodereference fields, userreference, filefields, imagefields, date fields, fields that have allowed values lists, fields that use php for allowed values, fields with default values, etc. When I wrote the D5->D6 update I found it helped a LOT to have a specific database I could keep re-using to try the results. If someone is interested in moving this along and wants to help out but doesn't know enough about the code to do the update, creating a database like that would be a great project.

I'm trying to carve out a few hours a day to keep this project moving forward.

- creates a field for each vocab
- creates an instance for each node type the vocab applied to on D6
- TODO: the serialized data properties so the field know which vocab it is: the API needs to be able to handle the $field.data stuff.
- UNTESTED: bring over old data from {term_node} into the new fields.

@joachim : Upgrading of taxonomy data is supposed to be already handled in core updates - except, well, it's currently broken : #706842: Improve comments for the taxonomy upgrade path :-/.
At any rate, this specific job is about migrating core D6 data (taxonomy), as opposed to contrib data (D6 CCK fields), so it has to be handled in core updates.

@Karen : the sample db you mention would be cool to ship in the CVS code, for testing purposes. Not sure if it could be used in a simpletest automated testing scenario, but having it at least available for everyone to try out and enrich sounds precious.

I'm not sure we're going to be able to use this in update hooks, but I'm going to take a stab at what you have and see what I can come up with. I now need a database to update, so I guess I'll go ahead and create a simple one to start with.

Taking a first swing through this. First we have three kinds of field migration to think about:

1) Transforming items that were not fields to make them into fields (i.e. taxonomy, node body)
2) Transforming D6-style fields into D7-style fields where the new fields have become core fields.
3) Transforming D6-style fields into D7-style fields where the new fields will still be contrib fields.

Trying to deal with all those situations in the same code will make unnecessarily complicated. I think #1 should be addressed in update hooks of the related modules using code that will specifically handle that transformation. There are already update hooks for body and taxonomy (notwithstanding that they may need more work). That seems like the best place for them.

We can create some code that handles #2 and #3 coherently. The biggest difference between them is that for #2 we know the new module is available and working, for #3 we have to allow for the likelihood that there is no D7 version of the module available at the time of the D6-D7 update, and that those modules will get updated over time (or perhaps never get updated) and we can't use those fields and make those conversions until the modules are available.

Again, for #2 and #3 we have information about the former state of the field, all its settings, from the values we have stored in the database. We can actually figure out most, if not all, of the new settings for the field from those values. We just have to allow for the possibility that some modules may need to intervene to move a $field setting to be an $instance setting, or something like that. So maybe we do what we know and provide a hook for the module to change what it wants before saving the new field settings.

And for #3 and #3 we have the data in the current tables that we can use to create the new tables. If I'm not missing anything, I think we can almost automatically create the new data tables from those values, as long as we have the 'columns' information from the field settings table. We pull out data that is stored in per-content tables into new per-field tables and change nid to entity_id, vid to revision_id, add deleted=0, etid=1, and language=und, along with the field columns. For per-field tables we create a new table with the new name and the new column names.

So what part of this should be in core? I can move the update code to CCK if CCK is going to do all these updates. Or we can make this a core module that creates the update system and provide ways for contrib modules to invoke it.

Agreed that 1) is a different animal. Data upgrades for 'X as field', where X was not a CCK field in D6, is a job for the module that implements X, be it core or contrib. From the examples we had in core so far (body, taxo, file uploads), those involve business logic specific to the task at hand, and I'm not sure there are lots to share with 2) and 3) (D6 CCK field -> D7 Field API field)

For 2) and 3), I don't think I see any reason why any of this should live in core ?

I think I was considering the following workflow - hook names are obviously negotiable :

hook_cck_upgrade_upgradable($D6_field) lets modules say 'I'm able to convert that field'

a page at admin/field/upgrade or whatever presents a table listing all the D6 CCK fields available, with an 'Upgrade' checkbox at the beginning of each row, and a 'Submit' button. Rows for which the hook above returned all FALSE are disabled.

On submit, a batch API process is launched, migrating each one of the selected fields
for each field :
- One batch op creates the D7 $field and $instance arrays :
The op calls hook_cck_upgrade_definitions($D6_field), returning array('field' => $D7_field, 'instances' => $array_of_D7_instances), and then performs the field_create_field() and field_create_instance() calls.
- One multistep batch op migrates the field values :
For each D6 value, the op calls hook_cck_upgrade_value($D6_field, $D7_field, $D7_instance, $value), returning the D7 value to insert, and then inserts the value.
- Once all values are migrated, the D6 fields and data are moved out of the way "somehow" (simply removed from the db ? marked as 'upgraded' ?), for the next time you visit admin/field/upgrade.

When crazy_contrib_field_type.module becomes available for D7, stating in its hook_cck_upgrade_upgradable() that it knows how to upgrade fields of type 'crazy_contrib_field_type', then all fields of this type become 'upgradable' the next time you visit admin/field/upgrade.

This workflow, however, assumes that upgrading any given field is entirely hardcodable, without the need for any user-defined choice or adjustment (i.e, $field['foobar'] in D6 might translate to $instance['setting']['barbaz'] = 'a' or $instance['setting']['barbaz'] = 'b').
If that's not the case, there might be a need for an additional 'configure migration' step before running the batch. I'm not sure we need to consider this case right now, though.

Upgrading taxonomy is two things:
- upgrade term and vocabulary tables to become objects and bundles. That's currently outside this API.
- upgrade vocabulary node type associations and term_node association to become fields and field data. This API can handle that easily, and already does.

My concern is that making this into an api that will convert anything into a field is going to make this much more complicated than it needs to be, and this has already drug on too long. Changing the core field updates to use this API would also involve re-writing core code that is already committed and in use, which will be another slow-down.

And this code is not really doing the same thing as what yched described. You are not using the information in the database to see what needs to be converted, you are expecting modules to announce themselves and provide all their conversion information. That will be a lot more work for module developers, since each of them will have to understand how to create this data array. And we don't have any way to see what is and is not converted, we only see fields that have provided information, we can't see anything about fields that are not converted. (Plus they apparently still show up as available for conversion after their data is converted).

I can't get this code working, it's making assumptions about what is in my database and throwing fatal errors, so I am going to try going in the direction the yched and barry mentioned earlier, providing an automatic list of all the fields in the system, where the ability to convert them is only turned on if the module is installed, and then automatically converting them based on the information we have in the old database. Which reminds me that our test database I mentioned earlier should include fields from several modules that are not available so we can be sure that case is handled correctly.

If we can assume all the field settings and data are in expected places in the database, I'm hopeful that we can handle the migration automatically for most modules without the developers needing to do anything. At most, they will have to intervene if they do unusual, out of the ordinary things. That will make it easier for developers which will speed up the process of getting all those modules ported.

Another nice thing about this approach (the table with a list of fields and a flag to indicate which are converted) is that we should be able to let people migrate a field, roll it back if there are problems, and then try again later, something that would not be possible with a one-time update done in an update hook.

> Changing the core field updates to use this API would also involve re-writing core code that is already committed and in use,

I thought it was broken... :/

> That will be a lot more work for module developers

I disagree. Module developers have only to provide basic information about fields, rather than figure out how to create them and correctly migrate their data.

> I can't get this code working, it's making assumptions about what is in my database and throwing fatal errors

It's still pretty rough about the edges. So far it's working on test data that I've chucked into a D7 by grabbing it out of an entirely separate D6 install, which is quick and dirty but quicker than doing a whole D6-7 upgrade. I am still poking this whole concept with a stick to see if it works, but I think it can. It's prototype code, but it doesn't need *that* much cleaning up.

> And we don't have any way to see what is and is not converted, we only see fields that have provided information, we can't see anything about fields that are not converted.

CCK should get some special handling to list all existing D6 CCK fields as migration plans. Shouldn't be too hard to do.

> (Plus they apparently still show up as available for conversion after their data is converted).

Yup, not implemented that yet... Not too hard -- let each plan specify a table name: eg taxonomy says 'if term_node exists then I have not run yet'.

Ultimately, this is my opinion on the matter: I maintain or co-maintain several contrib modules that should really move to FieldAPI in D7. I am not going to write several upgrade paths which at their heart are identical. If every contrib module has to rebuild this one wheel, it's a colossal waste of Drupal developing time across the community. I am doing this once, and once only.

> Changing the core field updates to use this API would also involve re-writing core code that is already committed and in use,

I thought it was broken... :/

It happens to be broken currently, but this does not make it as if it was as good as not written and not existant ;-)

I am not going to write several upgrade paths which at their heart are identical. If every contrib module has to rebuild this one wheel, it's a colossal waste of Drupal developing time across the community. I am doing this once, and once only.

In practice, the bits that are identical are fairly minor IMO, merely a general structure (create fields and instances, then loop on all nodes). There are not much common parts to abstract out of the upgrade paths for 'node body as field', 'comment body as field', 'taxo as field' updates that currently exist in HEAD. Each time, the trickiest part is the business logic specific to the kind of data you're migrating to be a Field.
Abstracting out a generic framework able to handle all cases of 'data migration of X as field'
a) is a tough job (that's some quite 'meta' code you have in #45)
b) instead of writing a couple 'regular' (ok, multipass) update funcs, requires module authors to learn the framework
c) is not required for and won't directly serve the goal discussed in this thread (migrate D6 fields)

So as Karen said, I think this is a different topic. 'Migrate CCK D6 fields' is an issue with a fairly well defined scope, yet complex and challenging enough, let's not have it depend on first setting up a generic 'Migrate anything as field' solution.

I've simplified the plan array somewhat by having the API get information from hook_entity_info(), and I've added a way for plans to check if they've already run. Updated code on github.

I understand what you're both saying about the complexity here, but I really think it's worth it.
The $plan array is really just a few basic keys followed by chunks reproduced almost verbatim from what you need to pass to FieldAPI. I could maybe bring it in line to match completely, then module authors wanting to use this would just create a template field by hand on a demo site, export that from code, and copy-paste it into their plan definition.

The workflow I have is exactly what you describe in #50, and CCK can sit on top of it and provide further simplification: in other words, the CCK field migration is a special case which is easier because CCK already knows a lot about its own fields.

One question I'm not sure how to deal with is the matter of loaders that are gone by the time we are on D7. For example, in my taxonomy_migrate_pre_process() I'm having to do the work of the old D6 function that loads terms for a given node, because on a D7 installation that code has vanished. The same problem will happen to any module, CCK included: you no longer need this code on D7, so it's ripped out, but the migration system needs it. Any thoughts?

1) I think you are still going down the road of creating something that will, or can, run in an update hook. All three of us, Yves, Barry, and I, think this has to be handled outside the update system in a way that it can be run on demand only when certain criteria are met. It's harder to write code for update hooks because many functions are not available there, and there is no way to allow users (or other modules) to intervene. And they run once only and never again.

2) The three of have a visualization of a workflow that should work, but it will only work for fields, and it's not the same as the workflow you are creating here.

3) All your changes are on git and I don't have time to keep going over to see what has changed. I had time to work on this this weekend and had an idea of how to approach the problem, but it's different than yours, so now I'm kind of stymied.

I think the approach that Barry, Yves, and I have been discussing belongs in contrib anyway, so I'm going to work on some code in CCK to do it the way we've been discussing. If you get something working as an API for anything to be a field and it goes into core, I'll see if we can adapt our code to use it. This way I can still get something done in what's left of the weekend.

The CCK field update module will create the old fields by reading the data in the content fields tables and constructing it into the right format and create the new fields by massaging that array into the new format and using the field api to create the tables. It will create the new data tables by copying the old data tables, which are already configured nearly the way we need them. Those techniques won't work for things that weren't fields to begin with. The CCK module can also re-create some CCK D6 functions and processes if that becomes necessary, and be turned off once all the fields are migrated.

Also, there's nothing keeping you from creating this 'anything as an field' API as a contrib module, it doesn't have to go into core. As a contrib module you can quit worrying about taxonomy and focus on creating a method for non-field contrib modules to convert their field handling.

The code posted in this issue won't run without fatal errors, so I'm changing this to 'needs work'. If you have better code on git that you want reviewed you can post it here and change the status.

> 1) I think you are still going down the road of creating something that will, or can, run in an update hook. All three of us, Yves, Barry, and I, think this has to be handled outside the update system in a way that it can be run on demand only when certain criteria are met.

What I've made so far is in fact an on-demand system because it seemed the quickest thing to build to prototype my idea. I had assumed anything of this sort should run in an update hook, but if the on-demand is generally deemed preferable then that's great! One less thing to figure out!

> 2) The three of have a visualization of a workflow that should work, but it will only work for fields, and it's not the same as the workflow you are creating here.

The workflow I have is really really just the same as yched outlined above. There are minor differences in things like "returning array('field' => $D7_field, 'instances' => $array_of_D7_instances)" but that's easily changed.

> 3) All your changes are on git and I don't have time to keep going over to see what has changed.

Go here: http://github.com/joachim-n/field_convert and click 'Download source'. You will get a zipfile identical to what I could upload here. I have not yet posted a patch because I don't yet know how to best work this into core. That's one of the things I was hoping for help with.

> The CCK field update module will create the old fields by reading the data in the content fields tables and constructing it into the right format and create the new fields by massaging that array into the new format and using the field api to create the tables.

Right - so CCK needs a generic preprocessor that all CCK field plans use to load up their data.

> Those techniques won't work for things that weren't fields to begin with.

They already do! I migrated test data from user_terms module to taxonomy fields created on users!

re Karen :
I've been thinking that maybe the script doesn't need to build the $D6_field array in the exact format CCK D6 builds and manipulates it (merges field + instance info).

Since the 'create fields and instances' step will need to iterate on instances of a given field, to build separate $D7_field and $D7_instance arrays, maybe it would be easier to have the script work on the raw data as loaded from the D6 tables content_node_field and content_node_field_instance (keeping $D6_field and $D6_instance separate), rather than having the records go through the D6 API that merges them into a series of $D6_field_plus_instance arrays that have to be sorted out (like, which ones of these are the 'same field').

Internally, the D6 raw db data is closer to the $D7_field / $D7_instance structure than the $D6_field_plus_instance API structure.

This issue is far too important to de-rail on discussions about optimal approaches. joachim, what you're working on sounds cool, but please open a separate issue for it (and please, for the love of God, post PATCHES instead of forcing people to go to some off-site thing and learn how to interact with it if you want anyone at all to review the code). Let's get the basic update path working here, in this issue, in the way that the three field API maintainers have already discussed and agreed upon.

Here's a start, lots left to to. Based on webchick's comments above, I gave it a 'Field' namespace and put it in the modules folder of the Field module.

There is code to pull the D6 style information out of the tables for each field and manipulate it into a D7-style format. There are several helper functions to use in figuring out where the old table and columns are for each field to make it easier to migrate data. There is a simple checklist page of all the fields where you can select a field to launch a process of creating the fields and instances and then copying the data from the old tables to the new tables.

I added a drupal_alter() to the field and instance arrays so modules can intervene to change values that are wrong. CCK or other field modules could use those to clean things up.

Many of the formatter names, widget type names, and even field type names may have changed from D6 to D7 and all that has to be cleaned up.

And none of the UI discussed above is in place, where you would be able to see which fields are migrated and which are not and maybe roll back a migration.

I also did not yet write the code to copy the data from the old tables to the new ones, but added code to figure out which tables and columns are affected.

BTW, joachim, please DO go create a project for your code. I think it will be useful for other modules and it would be a shame to lose all that work. I just don't think it is helpful for THIS migration.

I really tried to see if I could do things in the same way you did them in your API, but it was making everything harder and more complicated. We can use a lot of shortcuts when migrating former fields and your API forces us to do things in more complicated ways to match the workflow that is needed for data that was never in fields. For instance, your code is loading in a list of nodes and updating them one by one to move data to the new tables. We don't have to do anything like that to migrate data for former fields. We can just copy the existing field table data to new locations, the new tables that the Field API created.

Latest update. I added status and error messages, found that we have to be sure the max_length is set for textfields or there will be sql errors when you try to create a field with them, and took a first stab at creating a query to migrate the data to the new field tables.

- It might be helpful to use var names like $D6_field, $D7_instance, instead of just $field and $instance, which get a little confusing ;-)

- I think we changed the semantics of values between D6 'multiple' and D7 'cardinality' (the value for 'unlimited' has changed, if I'm not mistaken), so we cannot just copy the value.
Also, cardinality is a property of the instance in D7, while it was a property of the field in D6, so it has to be set in field_migrate_get_instances(), and needs the $D6_field as a param, not just the instance.

This possibly is a sign that the creation of the D7 fields and instances from the D6 definitions is difficult to split into independant "a) do fields b) do instances" steps, but should be done globally (you need the D6 field and instances to create both the D7 field and the D7 instances)

- You can get D7 db API to get result rows directly as associative arrays, instead of objects that you then manually cast to arrays :

- The direct 'old table to new table' copy of field data is faster, but I'm not sure we can spare a value-by-value migration :
The values might need some custom massaging between D6 and D7 - that's the hook_cck_upgrade_value() step in #50 above.
Also, having the writing of values go through the regular field_attach_update() would let the script support migrating fields to other storage engines than the core default SQL one (there is already a mongodb storage engine out there : http://drupal.org/project/mongodb) or play nicely with PBS.

There are no D6 $fields created here so there are not D6 fields and D7 fields, the function that reads the D6 tables immediately transforms the D6 values into their D7 equivalents and splits them between the field and the instance to match the way they are used in D7. The drupal_alter() there is so other modules can tweak the transformation. Those functions create the $field and $instance arrays used throughout the code.

Those arrays are incomplete, they have only the settings passed through from the D6 table. They are then passed to field_create_field() and field_create_instance(), which finally create complete field and instance arrays, called $new_field and $new_instance. This is based on the discussion above that there is no reason to bother creating the D6-style field in this code. It is easier not to try to create the D6 field and as far as I can see we don't need it, so I didn't.

But I can see your point about the names being confusing. Calling it $D6_field would be wrong, since it doesn't have the original D6 field values and it isn't formatted like the D6 field was, but some name to indicate it is not yet a complete D7 field would be useful. I am not sure how to describe it. Maybe call them $field_values and $instance_values.

That means that your suggestion to pass the D6 field and D7 field and instance to a value callback won't work, we don't have the D6 field available. I'm not sure why the value of a field would change between D6 and D7? I can't visualize any reason why that would happen. It doesn't happen in any of the core fields as far as I know and I can't see any reason it should happen for other fields. Do we want to even suggest that developers try to change values during this migration? I'm just struggling to see a use case for that.

As to 'array('fetch' => PDO::FETCH_ASSOC)', I knew there was a way to do that but couldn't find it at the time, so I'll make that change.

I missed the change in cardinality, I'll have to fix that. The code that retrieves the D6 settings out of the old field tables is already grabbing values from both tables so it is easy enough to get the cardinality value and pass it to the instance. It is already doing that with 'required'.

- rearranged plan properties so chunks of it that are just FieldAPI settings look exactly like FieldAPI arrays. (Apart from a 'bundles' shortcut :)
- farmed out big chunks of code to helper functions so it's much easier to get an overview of what the batch process does. also it makes the debug code simpler.
- general cleaning up.

I talked to Angie about this at DrupalCon and we have decided we are in agreement that this upgrade code should go into CCK instead of core, particularly because it won't be a simple one-step update process. I am working on getting my earlier code working with the latest HEAD and then I will commit to to CCK where we can work on it more. When I do that, I'll create a CCK issue and come back here and create a link to the CCK issue.

We should preserve fields, bundles, field instances, and field data from CCK to Field API. Everything else, including widget selection, widget settings, field settings, display settings, etc., we should not bother with, leaving them for a human to fix.

Rationale:

Upgrading from D6 to D7 will not be a simple completely automated process for any but the most simple sites. The most simple sites do not have many fields and custom settings, so re-creating them will not be that big a burden, whereas writing the code to completely automate it will be an enormous burden. Compare the Views 1 to Views 2 upgrade process, which never existed at all. People re-created their Views and survived.

bjaspan's approach seems reasonable to me. In general, the Drupal philosophy is to preserve *data* from major release to major release, and that's what we're doing. If we preserve the data with clear expectations it's better than trying to preserve everything with huge expectations and lots of bugs.

I agree totally that we cannot do an automated update. The approach I have come up with is a list of all the fields that exist in the D6 tables with a checkbox next to each of them so you can select and convert them. If they are already converted or if there is no D7 module available yet the checkbox is disabled, so you only have the option to convert fields that do not yet exist in D7 where there is a D7 module available.

So this is a page that you will have to re-visit over and over until all your field modules are available and all your fields are converted.

I actually think we can do a reasonable migration of the settings and the data, so that is what my code is doing -- pulling up the D6 field settings, re-arranging them so they are in the places that they belong in the D7 code, and then creating a D7 field from those values. That also then creates the D7 field table, and it is then fairly easy to just copy the data from the D6 table to the D7 table.

There may be things that won't work for some complicated fields and people will always want to double check the new settings to see if they want to make any changes in light of the new way that D7 work, but based on what I have so far I actually think we can get this working reasonably well for most situations.

I'm creating a database of all kinds of D6 field settings and data to use for testing the migration and doing some more tweaking of the code. Please try it out and see what you think when it's ready. I'll post back here when it is.

Another point. Since this is not automatic it can be skipped. We can use this code to attempt a migration. The original data will live on in the original table, so we can also roll back a migration (delete the field and the new field data table). So people can choose to skip this process completely for one or more fields (since it isn't automatically invoked) and do something different, like re-creating their fields from scratch or creating some custom code to migrate things in a different way.

I was asked if the Migrate module could be used to do this. Someone could also explore that alternative.

I don't think it's accurate to compare re-creating fields to re-creating views. Missing views won't make your site non-functional. Missing data will. And most sites I've seen have far more fields than views, so it would take much longer to create them from scratch. And we would still have to create some process that would allow them to say that a new field they just created has data in a D6 table that should be copied over into the D7 table after the field is created.

I created a CCK issue at #781088: Updating CCK Fields and Data from D6 to D7 and have committed some code for a Content Migrate module. If anyone wants to take things in a different direction you are welcome to do so, I just thought I'd try to get something functional and available.