Inspired by an answer to another question, I am reminded of the various reasons why it's difficult to have only one BIB file. I would like to have a set of command line tools that I could script to take my 'master' BIB and convert it for a specific situation. I am aware of bibtool which is probably powerful enough that it can meet most of my needs, but I haven't figured out how to use it...

Here are some common tasks that could be automated this way. Of course, the strategic goal would be solved 'the right way' by using a smarter BST (or something like biblatex) but often one is constrained (or at least it is much more convenient) to use a publisher's broken BST and make a few tweaks to the database:

Truncate all papers with more than 5 authors, and keep only the first 3 authors of any truncated papers (because you are running low on space)

Remove a field (month, url, ISSN, ISBN) because a journal has a broken BST that makes a mess of these

Remove a field conditionally (eg, remove title only for articles but not books)

Expand/collapse @string references (eg, I have 2 files, one apsjour.bib with@STRING{prl = {Phys. Rev. Lett.}}
and one fulljour.bib with@STRING{prl = {Physical Review Letters}}
so I can write \bibliography{apsjour,articlebib} or \bibliography{fulljour,articlebib} as required, but I certainly don't expect my co-authors to deal with this convention.

Remove archiveprefix, eprint, primaryclass from @articles with a page number

Remove DOIs with weird characters in them because the documentclass doesn't do enough escaping (seriously, Wiley, is10.1002/(SICI)1521-3978(200005)48:5/7<531::AID-PROP531>3.0.CO;2-#really a good idea for something that needs to end up in a URL?)

A tool that is flexible enough to do more sophisticated things would be nice, so long as the flexibility isn't at the price of usability (cf bibtool)

There are probably other use cases that I've temporarily forgotten. I feel like I'm forever making journal-specific or even paper-specific modifications of bibtex databases. JabRef makes it fairly easy, but still...

There are lots of cute tools living on CTAN in tex-archive/biblio/bibtex/utils but I feel that there must be some other place where the serious tools hide. I can't be the only one with these problems, can I? (Feel free to tell me that I'm approaching this the wrong way and let me know your personal strategy for dealing with the above issues without using commandline tools! This includes, as Jukka Suomela suggests in his answer, tools for editing the generated BBL file instead of editing the BIB.)

Here are some problems that I already know of solutions for:

Process an AUX file to keep only the cited entries (many solutions, including bibextract)

3 Answers
3

Have a look at biber which in the current 1.5 dev version on SourceForge has a new "tool" mode which allows you to use biber's reencoding and source mapping features independently of biblatex. The source mapping features are what you mainly need from your description and this is all documented in the PDF manual. I can provide specific examples if you have specific questions. biber will do everything you mention above apart from the @string expansion which would be possible to add but as you say, it's fairly idiosyncratic.

Of course, you can do this dynamically with biber too - with the changes being applied as the .bib is read but the .bib is not touched. The new tool mode allows you to write the changed .bib to another file without writing a .bbl.

For example, here is how in tool mode to tackle points 2, 3 5 and 6 in your examples. Point 1 is better handled semantically with biblatex and its max/min names options. Create a biber.conf with:

Which will look in the default locations for your biber.conf and will output a file called file_bibertool.bib.

This is also all possible, as I said, dynamically using the biber.conf as you process the file normally into a .bbl with biber and also the whole mapping functionality is available in biblatex through macros (see \DeclareSourcemap in the biblatex documentation) if you wanted to do this on a per-document basis dynamically.

Just to add that the latest biber (1.6, currently in beta on SourceForge) has an expanded tool mode which can reformat the .bib as well as apply semantic rules as above. The development version manual on SF has several tool mode examples now.
–
PLKFeb 21 '13 at 19:41

I do not know any non-programmer tools that can do those things you ask for, so I think you will need a bib scripting toolkit, and pybib is the most commonly used of those. If you are not familiar with python but do know perl, then btOOL may be more suited to you.

Thanks. I wasn't aware that pybib is intended as a scripting toolkit. This isn't advertised on the homepage, nor is there any documentation of this usage. Is there some documentation I am missing, preferably with some simple examples along the lines of what is in my question?
–
Lev BishopAug 10 '10 at 14:01

There does not appear to be any specific documentation, but all the tools in the distribution are based on the Python classes. If you know python, it should be easy to understand how it works just by looking at the existing tools.
–
Taco HoekwaterAug 11 '10 at 12:39

I have never had a situation in which I must provide a Bibtex file that compiles flawlessly with a journal-supplied BST file. Do you really need to do that?

For conference and journal submissions, one typically just sends a PDF file, not the source code. For the final version, in my experience it's usually enough to provide a Latex source code in which I have already replaced the \bibliography command by the contents of the Bibtex-generated *.bbl file.

Nobody will see how I created the *.bbl file, so I can do anything I want:

Take the journal-supplied BST file and hack it a bit. It may be much faster to simply remove some buggy parts from the BST file than to tweak your Bibtex database. This makes a lot of sense if you need to deal with the same buggy BST file frequently.

Just take the final *.bbl file and edit it directly. Particularly useful to handle some one-off special cases.

Another trick that I have used to handle some very exceptional cases is to have two entries in my Bibtex database for the same paper (obviously with different keys, like "foo" and "foo2"). This is useful if you'd like to decide case-by-case which version to use in which paper, independently of other issues such as different BST files.

Thanks. I agree that it's only the final .bbl that matters. Hacking on the journal BST is easy enough for removing a field, but my BST skills are not really great enough to do more sophisticated programming. Editing the .bbl could be a valid solution, but this merely moves my question down a level :-) Do you know a set of command-line tools that are smart enough fix up the .bbl file in the ways I described? I imagine it's harder, because the .bib format at least pretends to be a database, but .bbl is generally rather unstructured (makebst-generated BST aside).
–
Lev BishopAug 10 '10 at 13:58

For the final version, in my experience it's usually enough to provide a Latex source code in which I have already replaced the \bibliography command by the contents of the Bibtex-generated *.bbl file. - I wish everyone did that.
–
Charles StewartAug 10 '10 at 15:16