Charles,
While reading one of the recent threads, I realized that my Merge code was written before your custom column code. The merge code finds the first record selected (the "dest" destination record), then loops through the other selected book records (the "src" source records) in the order they were selected. For each src record, it copies the book formats in that record (provided that format doesn't already exist in the dest record) into the dest record.

Then it loops through the src records again to merge in the metadata. For each type of metadata there is a check to see if it exists in the source metadata information (src_mi), but not in the dest (dest_mi), and if so, it's written into the dest record. It occurred to me that there are no checks (at least none written by me) for user defined metadata. Thus, merge would lose that data if it's not in the src record. Some types of metadata are handled in a special way (Comments are appended, authors may be "Unknown," so they exist, but are still handled as if they don't, etc.)

All this "action" happens starting at line 802 of calibre\gui2\actions.py (pun not intended) in the merge_metadata(self, dest_id, src_ids) function.

Here's a quick example of the merger of the cover:

Code:

if src_mi.cover and not dest_mi.cover:
dest_mi.cover = src_mi.cover

I did a brief test of a custom user defined Yes/No column and it was not merged. I wonder if you would look it over and add in appropriate tests to merge in the user defined custom columns. I'm not up to speed on how to handle your new metadata, but I suspect it would only take you a few minutes to fix it.

Will do, but perhaps not for a few days. My wife found me a short consulting gig, and in a fit of madness I said yes. Now I must deliver on it.

No problem. No one has found it yet If I get time, I may look at how you handle your metadata. With the previous limited set of predefined metadata, I just checked all the metadata fields. With the new user-defined fields, I suppose I'd have to ask for a list, then cycle through that list. Some may have to be handled in special ways. I don't know how to do either - ask for the list or figure out which items might need special handling during merge.

(Maybe "a few minutes" to do this was excessively optimistic, even for one of your talents )

Look at the new class library.field_metadata, available as db.field_metadata. This a dict, keyed by the attribute name. The data fetched using the key has everything needed, including the datatype and its column number for fetching and storing the information.

Something like:

Code:

for key in db.field_metadata:
if db.field_metadata[key]['is_custom']:
col_num = db.field_metadata[key]['col_num']
# now do what is needed, according to type. rec_index is used
# to get the value you are working with. something like
from_record_value = db.get_custom(from_id, num=col_num)
# process ...
set_custom(to_id, val, num=col_num)

Look at the new class library.field_metadata, available as db.field_metadata. This a dict, keyed by the attribute name.

It looks like that includes non-custom keys as well. Correct? Will "for key in db.field_metadata:" loop through all fields, including, e.g. title and authors? If I'm going back into the code again it might make sense to make a single loop for all metadata, instead of keeping the current non-custom tests and adding a loop for the 'is_custom' fields. That way, if the old fields are ever renamed or a new non-custom is added, it will still be merged (default of: if it doesn't exist in the destination, but does exist in a source, merge it in.)

Can you think of any of your field types that need special handling? I've forgotten all of the different types, but if you have one for text, like comments, should a src be appended to the dest or ignored?

Any other types that should be appended or that have defaults that should be overwritten (like the default author "Unknown")?

It looks like that includes non-custom keys as well. Correct? Will "for key in db.field_metadata:" loop through all fields, including, e.g. title and authors? If I'm going back into the code again it might make sense to make a single loop for all metadata, instead of keeping the current non-custom tests and adding a loop for the 'is_custom' fields. That way, if the old fields are ever renamed or a new non-custom is added, it will still be merged (default of: if it doesn't exist in the destination, but does exist in a source, merge it in.)

Yes, it has all the fields, or at least is supposed to. I wondered whether you could unify the processing. I found in the search code that I could.

If you are going to play with standard fields, then you will want to know about field_metadata[key]['rec_index']. That field is the index into the _data record. You would use db.get(id, rec_index, row_is_id=True) (found in library.caches.py) to get the value for that field for a given db_id. Don't use db.set function in caches.py, because the data won't be written to the DB.

Quote:

Can you think of any of your field types that need special handling? I've forgotten all of the different types, but if you have one for text, like comments, should a src be appended to the dest or ignored?

This is really a requirements issue. Are tags (text, is_multiple=True) merged, or do they overwrite? Are comments merged, or is one overwritten? What happens with series indices? The equivalent custom fields should have the same processing.

My guess is that text (non-tag, is_multiple=False), bool, int, float, and date columns should overwrite. Make sense to you?

Quote:

Any other types that should be appended or that have defaults that should be overwritten (like the default author "Unknown")?

This is really a requirements issue. Are tags (text, is_multiple=True) merged, or do they overwrite?

Tags are merged.

Quote:

Are comments merged, or is one overwritten?

Comments are appended.

Quote:

What happens with series indices?

There is currently no independent test for series index. Series index is treated as a pair with series name. If the series name is empty in the destination, then both series name and index are written into the dest from the src. If the dest already has a series name, the index is never changed during merger.

Quote:

The equivalent custom fields should have the same processing.

Agreed, to the extent possible.

Quote:

My guess is that text (non-tag, is_multiple=False), bool, int, float, and date columns should overwrite. Make sense to you?

Yes - ("overwrite" empty field in dest, if src is not empty)

There seem to be 4 types of text

Text has:

tag-like (merge, like tags)

comments-like (append)

plain-jane text (if dest is empty, fill from first src that is not empty)

series-like (???)

As to series-like text, how does this differ from plain-jane text? I'm thinking of someone who has set up a secondary custom series in one field with an associated (in his mind) number field for ordering. It looks like even if the series-like text field is associated with an int or float, it would still be OK to treat the fields independently. As long as the user enters something in his number field when he enters something in the series-like text field, it wouldn't be overwritten and the pair wouldn't be decoupled.

In the same way that series differs from (say) publisher. Series custom fields have an associated index.

Quote:

I'm thinking of someone who has set up a secondary custom series in one field with an associated (in his mind) number field for ordering. It looks like even if the series-like text field is associated with an int or float, it would still be OK to treat the fields independently. As long as the user enters something in his number field when he enters something in the series-like text field, it wouldn't be overwritten and the pair wouldn't be decoupled.

There isn't much you can do to maintain association correctness with columns that are paired in the user's head, so your assertion/statement makes sense. However, as series custom fields are physically paired with an index, you should apply the same rule you applied for standard series.

You might not notice the pairing for series custom fields. These fields have two pieces of information. The first is the series name, which acts like a text (is_multiple=False) field. However, in this case the index is stored in the connection record in the DB, which makes it (sort-of) a column in the books table. You can get the value of the series field from the book view -- field_metadata.cc_series_index_column_for() will give you the field number -- and using db.get_custom_extra() (in custom_columns.py). The method db.set_custom() has a keyword parameter used for setting the index field at the same time the series is set.

Look at the new class library.field_metadata, available as db.field_metadata. This a dict, keyed by the attribute name. The data fetched using the key has everything needed, including the datatype and its column number for fetching and storing the information.

Something like:

Code:

for key in db.field_metadata:

Charles,

I looked at this after we spoke. I saw a field that I think was called "datatype" that was populated with one of the 9 types of data that could be specified as user-defined data.

Does there happen to be a defined list of all datatypes that I can cycle through in a for: loop, something like:

Code:

for datatype in list_of_all_available_datatypes:

I'd like to catch any new datatypes in that loop beyond the current 9, if any are ever added.

There are 8 there. There are 9 in the pulldown list of new field types the user can create. I vaguely recall a "text*" datatype. I suspected it was the series name and indicated there was a related series number. Most of the other types seem self explanatory. Did I miscount or recall incorrectly?

How do I get/set the series_index associated with a custom field of datatype 'series'?

Get the value with db.get_custom_extra(...)

Set the value with db.set_custom(..., extra=series_index). You set the series and the series_index together. They cannot be set separately.

Quote:

Is there an easy way to merge tag-like text from two records when is_multiple is True? With tags, it's just:

Code:

for tag in from_mi.tags:
to_mi.tags.append(tag)

Duplicate tags are automatically removed.
Thanks.

Two things:

1) your code does not remove duplicates. The append() method adds the tag to the end of the list (array), with no check for duplicates. I assume that later you call db.set_tags(). That method checks for duplicate tags in the input list, which is why your code works.

There are two ways to deal with duplicates, one with a set and one with 'in' check. The set code works because sets automatically remove duplicates, and would be something like:

Code:

to_mi.tags = list(set(to_mi.tags) | set(from_mi.tags))

The list code would look something like:

Code:

for tag in from_mi.tags:
if tag not in to_mi.tags:
to_mi.tags.append(tag)

If you are willing to let the db code cull the duplicates, then you can use

Code:

to_mi.tags.extend(from_mi.tags)

2) The custom code also removes duplicates, so you can use the similar code. It would be something like: