docutils-develop

On 2011-12-06, SourceForge.net wrote:
[ docutils-Patches-3434355 ]
Fix --record-dependencies crashing on non-ascii filenames
> Initial Comment:
> e.g. for the following example
> ---- 8< ---- (hello.txt)
> .. include:: мир.txt
> ---- 8< ---- (мир.txt)
> World
> it crashes:
...
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
> Fix it, by writing counterpart to `decode_path()` and using it when
> writing dependency filenames out.
>>Comment By: Günter Milde (milde)
> Date: 2011-12-06 04:18
> Message:
> We overlooked one important point:
> per definitionem, the dependencies file saves URLs, not path names.
Actually, the issue is even more complex:
test_dependencies carries this comment:
# docutils.utils.DependencyList records relative URLs, not platform paths,
# so use "/" as a path separator even on Windows (not os.path.join).
while config.txt says:
Path to a file where Docutils will write a list of files that the
input and output depend on [#dependencies]_, e.g. due to file
inclusion. [#pwd]_ The format is one filename per line. This
option is particularly useful in conjunction with programs like
``make``.
The mentioning of ``make`` suggests to me that "filename" in config.txt
implies platform paths.
@David:
What is the format of the "filename" list in a DependencyList
(URL, path, writer-dependent, ...)?
What is the preferred encoding for non-ASCII characters in filenames?
sys.getfilesystemencoding() -> good for ``make`` etc.,
'utf8' -> comprehensive encoding, deterministic,
similar to the config file encoding,
settings.input-encoding \__\ configurable, requires new option in
settings.output-encoding / / DependencyList,
settings.dependency-list-encoding -> new config setting
?
If the canonical format is URL, should DependencyList.add() quote with
urllib.quote() and test_dependencies.py test this is the case?
Günter

On 2011-12-06, Kirill Smelkov wrote:
> On Tue, Dec 06, 2011 at 01:33:56PM +0000, Guenter Milde wrote:
>> Actually, the issue is even more complex:
>> test_dependencies carries this comment:
>> # docutils.utils.DependencyList records relative URLs, not platform paths,
>> # so use "/" as a path separator even on Windows (not os.path.join).
>> while config.txt says:
>> Path to a file where Docutils will write a list of files that the
>> input and output depend on [#dependencies]_, e.g. due to file
>> inclusion. [#pwd]_ The format is one filename per line. This
>> option is particularly useful in conjunction with programs like
>> ``make``.
>> The mentioning of ``make`` suggests to me that "filename" in config.txt
>> implies platform paths.
> I could only second "filename" to be platform paths - only in such form
> they are useful for ``make`` as you've said, and I've actually hit the bug
> because --record-dependencies is used in my doc build system.
> If you decide to change --record-dependencies into URLs, could you
> please also continue to provide a way to still record dependencies as
> native file paths.
`Platform paths` seems the suitable format of the entries: According to
the specification in config.txt as well as actual behaviour in the case
of HTML export, DependencyList records files that were touched during the
document conversion, i.e. "files to watch for updates".
However:
The comment in the test reflects current behaviour: some dependencies
*are* stored in the List and written to the file as relative URLs
because this is the format required by the "image" directive and the
"stylesheet" configuration setting. This is masked by the fact that on
Unix, a URL without "scheme" part and a platform-path use the same syntax
for simple cases (no spaces or special chars).
This means that if we agree on "use platform paths", we need to change
the code in directives/images.py (and the test).
Also, it would make things clearer, if record_dependencies.add() is called
*after* reading the respective file. Then, no dependency is recorded in
case of, e.g., IO errors (this requires a change of the test, too).
The encoding used in the "record" file should be chosen so that ``make``
works wherever it is available. (How do you put (or reference) the
dependencies in the Makefile?)
Suggestion
==========
clarify wording
---------------
config.txt defines:
_`record_dependencies`
Path to a file where Docutils will write a list of files that the
input and output depend on [#dependencies]_, e.g. due to file
inclusion. [...]
This could mean a list of files required to view or process the output
document (e.g. in a browser), i.e. everything *not* embedded. Very useful
for, e.g., archiving/packing/moving the output document after creation.
However, the footnote contradicts this interpretation
.. [#dependencies] Some notes on the dependency recorder:
* Images are only added to the dependency list if the
reStructuredText parser extracted image dimensions from the file.
* Stylesheets are only added if they are embedded.
Together with the explanation
This option is particularly useful in conjunction with programs like
``make``.
the interpretation as "files to watch for updates" seems intended.
LaTeX writer
------------
With LaTeX output, the specification does not match the current
behaviour:
* For practical reasons, the output of the LaTeX writer is
considered merely an *intermediate* processing stage. The
dependency recorder records all files the *rendered* file
(e.g. in PDF or DVI format) depends on. Thus, images and
stylesheets are both unconditionally recorded as dependencies
when using the LaTeX writer.
I don't know the "practical reasons" that led to this specification.
However, also the LaTeX writer only records stylesheets if they are
embedded.
I propose to remove this "exception clause" for the LaTeX writer because it
complicates matters without need. Arguments against the "exception clause"
include:
1. non-consistent behaviour across writers
2. in the ``make`` use case, changes to e.g. figure files do not require
a re-run of the rst2latex document conversion (except when extracted
dimensions were used).
LaTeX has its own means to record files used for the latex2(pdf|ps|dvi)
conversion.
3. LaTeX searches inclusions with its own special rules along special
paths:
* Recording the non-expanded form (e.g. "fourier" for the style sheet
"/usr/share/texmf-texlive/tex/latex/fourier/fourier.sty") is not
very helpful for external programs as the expansion depends on the
context (package/style vs. included file vs. graphic) --> an
external program needs to parse the latex output anyway.
* Doing the expansion in Docutils adds complexity that is only
required if the referenced files are to be embedded in the output.
I suggest the following patch to the specification:
--- config.txt (Revision 7244)
+++ config.txt (Arbeitskopie)
@@ -387,11 +387,11 @@
--output-encoding, -o``.
_`record_dependencies`
- Path to a file where Docutils will write a list of files that the
- input and output depend on [#dependencies]_, e.g. due to file
- inclusion. [#pwd]_ The format is one filename per line. This
- option is particularly useful in conjunction with programs like
- ``make``.
+ Path to a file where Docutils will write a list of files that were
+ required to generate the output, e.g. included files or embedded
+ style sheets[#dependencies]_. [#pwd]_ The format is one file path per
+ line. This option is particularly useful in conjunction with
+ programs like ``make``.
Set to ``-`` in order to write dependencies to stdout.
@@ -1436,20 +1436,9 @@
do the overriding explicitly, by assigning ``None`` to the other
settings.
-.. [#dependencies] Some notes on the dependency recorder:
+.. [#dependencies] Images are only added to the dependency list if the
+ reStructuredText parser extracted image dimensions from the file.
- * Images are only added to the dependency list if the
- reStructuredText parser extracted image dimensions from the file.
-
- * Stylesheets are only added if they are embedded.
-
- * For practical reasons, the output of the LaTeX writer is
- considered merely an *intermediate* processing stage. The
- dependency recorder records all files the *rendered* file
- (e.g. in PDF or DVI format) depends on. Thus, images and
- stylesheets are both unconditionally recorded as dependencies
- when using the LaTeX writer.
-
.. [#footnote_space] The footnote space is trimmed if the reference
style is "superscript", and it is left if the reference style is
"brackets".
Thanks,
Günter

Dear David,
On 2011-12-06, David Goodger wrote:
> On Tue, Dec 6, 2011 at 08:33, Guenter Milde <milde@...> wrote:
>> What is the format of the "filename" list in a DependencyList
>> (URL, path, writer-dependent, ...)?
> What the comment says: relative URLs.
> I don't understand the issue well enough to give a better answer.
...
>> If the canonical format is URL, should DependencyList.add() quote with
>> urllib.quote() and test_dependencies.py test this is the case?
> I leave the issue in your able hands.
Thanks for your fast answer and the confidence.
See my reply to Kyril for my suggestion to solve the issue.
Günter

I must say that I don't understand each and every detail. But as we
(Günter and me) were discussing similiar issues in the other list I
nevertheless would like to contribute.
First I'd like to point to
http://article.gmane.org/gmane.text.docutils.devel/2306/match=dependency
That's where the dependency scanner had been introduced. Reading that
helped me to understand.
Speaking for Windows:
>> test_dependencies carries this comment:
>>
>> # docutils.utils.DependencyList records relative URLs, not platform paths,
>> # so use "/" as a path separator even on Windows (not os.path.join).
>
>That was r4519, in 2006.
A "normalized" or "standardized" format makes the most sense to me.
"/" looks fine - I haven't understood the details of the difference
between URLs and platform paths. You probably have so it's ok.
Windows has no genuine 'make' - so we don't have a clear target
format. We should rather think of a broad variety of tools that would
like to reuse the dependency list.
If you use sys.getfilesystemencoding() on Windows as encoding for the
file content you end up with encoding "mbcs". And that's a synonym for
'ascii'. See:
http://coverage.livinglogic.de/Lib/encodings/mbcs.py.html
Ok, writing the dependency list with encoding:mbcs and
errorhandling:backslashreplace will work. But most tools won't be able
to read those files.
That's why I think: Make the 'dependencylist.txt' UTF-8 encoded -
always!
The worst that can happen is that a tool cannot handle the file
directly. Then an intermediate transformation will be necessary. And
this can be done comparably easy knowing that the coding is either
plain Ascii or UTF-8. Having to decode backslashreplaced strings puts
up a much bigger hurdle.
Martin
--
http://mbless.de

On 2011-12-07, Martin Bless wrote:
> A "normalized" or "standardized" format makes the most sense to me.
> "/" looks fine - I haven't understood the details of the difference
> between URLs and platform paths. You probably have so it's ok.
> Windows has no genuine 'make' - so we don't have a clear target
> format. We should rather think of a broad variety of tools that would
> like to reuse the dependency list.
> If you use sys.getfilesystemencoding() on Windows as encoding for the
> file content you end up with encoding "mbcs". And that's a synonym for
> 'ascii'. See:
> http://coverage.livinglogic.de/Lib/encodings/mbcs.py.html
Not really: the Python doc (library/codecs.html) says:
Codec Aliases Operand type Purpose
mbcs dbcs Unicode string Windows only: Encode operand according to
the ANSI codepage (CP_ACP)
and howto/unicode.html says:
On Windows, Python uses the name “mbcs” to refer to whatever the
currently configured encoding is.
Encoding filenames (under Python on Windows) with "mbcs" will result in
filenames that could be used as-is by other applications - as long as
"whatever the currently configured encoding is" supports all characters of
the file name (this excludes, e.g., cyrillic names under CP_850).
> Ok, writing the dependency list with encoding:mbcs and
> errorhandling:backslashreplace will work. But most tools won't be able
> to read those files.
> That's why I think: Make the 'dependencylist.txt' UTF-8 encoded -
> always!
> The worst that can happen is that a tool cannot handle the file
> directly. Then an intermediate transformation will be necessary. And
> this can be done comparably easy knowing that the coding is either
> plain Ascii or UTF-8. Having to decode backslashreplaced strings puts
> up a much bigger hurdle.
The advantage of UTF8 is, that this is universal: an external application is
much more likely to understand UTF8 than Pythons "backslashreplace".
The disadvantag is that it does not work without re-coding in case the file
system encoding is *not* utf8.
(As I use utf8 encoded filenames, I have no problem with either approach,
but we need to find a consensus.)
Günter

[Guenter Milde] wrote & schrieb:
>> If you use sys.getfilesystemencoding() on Windows as encoding for the
>> file content you end up with encoding "mbcs". And that's a synonym for
>> 'ascii'. See:
>> http://coverage.livinglogic.de/Lib/encodings/mbcs.py.html
>
>Not really: the Python doc (library/codecs.html) says:
True. I was wrong and misread that code. There is really something
important going on. Very different from just "plain ascii".
>Encoding filenames (under Python on Windows) with "mbcs" will result in
>filenames that could be used as-is by other applications - as long as
>"whatever the currently configured encoding is" supports all characters of
>the file name (this excludes, e.g., cyrillic names under CP_850).
I have doubts. I've never encountered such an encoding in windows
batch files or so. OTH I have not enough insight to layout and explain
the whole story.
Now, what is that "mbcs" encoding? I have digged one level deeper and
looked into sources: Python-2.7.2/Objects/unicodeobjec.c. About line
3838 there is this section:
/* --- MBCS codecs for Windows
-------------------------------------------- */
#if SIZEOF_INT < SIZEOF_SIZE_T
#define NEED_RETRY
#endif
/* XXX This code is limited to "true" double-byte encodings, as
a) it assumes an incomplete character consists of a single byte,
and
b) IsDBCSLeadByte (probably) does not work for non-DBCS multi-byte
encodings, see IsDBCSLeadByteEx documentation. */
static int is_dbcs_lead_byte(const char *s, int offset)
And so on. "double-byte encodings" is an important clue, I think, as
it leads to:
http://msdn.microsoft.com/en-us/library/cc194788.aspx
The whole chapter is "nice", like at
http://msdn.microsoft.com/en-us/library/cc194786.aspx
as it says
"""
[...] The reasons for this complexity are historical, and as
technology evolves, working with character encodings will get easier.
A band of international thinkers has created a standard for the future
called Unicode, which newer operating systems such as Microsoft
Windows NT have adopted. [...]
"""
Which reminds us that there were times where unicode was just an
upcoming standard for the future created by "international thinkers".
>The advantage of UTF8 is, that this is universal: an external application is
>much more likely to understand UTF8 than Pythons "backslashreplace".
+2!
>The disadvantag is that it does not work without re-coding in case the file
>system encoding is *not* utf8.
I know that windows internally works with this "wide encodings" with
16bit = one char. I don't use Outlook. But I know that if you save an
Outlook mail it has double byte encoding. Which can't be read by other
"normal" programs. And I know that for historicals reasons in the
Windows API "wide" file names have 16bit per characters. But that's
always "internal". All "normal" files are ANSI (cp1252) or UTF-8.
Hhm, wait: I've seen Unicode files with always two bytes per char as
well.
At this point I sort of wanted to "give up". But then I had this idea
for a test:
test.py
=====
import codecs
f2 = codecs.open('test.txt','w','mbcs')
f2.write(u'ÄÖÜäöüß\n')
f2.write(u'\u043a\u0430\u0440\u0442\u0438\u043d\u0430.jpg\n')
for u in range(ord(u'\u2030'),ord(u'\u205F')):
f2.write(unichr(u))
f2.close()
produces 'test.txt'
=============
ÄÖÜäöüß
???????.jpg
?'??`????????????/??????????????????????????
'test.txt' has ANSI=cp1252 encoding. And all these question marks you
see in this posting are right in there. No artefact by mailers or so.
Aha, and now I probably know, why there is that "???????.jpg" in the
output of test_dependencies.py. It is that cyrillic file name.
This tells me: 'mbcs' is of no use in this case. It can't cope in a
reasonable way with the case that filenames are not ANSI. Which they
very well can.
I think this strongly favours UTF-8!
I think it's well worth the effort to find a good solution. Very
exciting, sehr spannend, and instructive!
Thanks,
Martin
--
http://mbless

On 2011-12-08, Martin Bless wrote:
> [Guenter Milde] wrote & schrieb:
>>Encoding filenames (under Python on Windows) with "mbcs" will result in
>>filenames that could be used as-is by other applications - as long as
>>"whatever the currently configured encoding is" supports all characters of
>>the file name (this excludes, e.g., cyrillic names under CP_850).
> I have doubts. I've never encountered such an encoding in windows
> batch files or so. OTH I have not enough insight to layout and explain
> the whole story.
>From just browsing over the code, I suppose that the "mbcs.py" codec
submodule delegates the encoding job to some Windows system library. This
means that "mbcs" standf for an encoding that is configured outside of
Python (and not necessarily known to Python).
> Now, what is that "mbcs" encoding? I have digged one level deeper and
> looked into sources:
...
I'll pass over the detail for now, as I want to look at the problem first
from a users perspective.
>>The advantage of UTF8 is, that this is universal: an external application is
>>much more likely to understand UTF8 than Pythons "backslashreplace".
> +2!
>>The disadvantag is that it does not work without re-coding in case the file
>>system encoding is *not* utf8.
> I know that windows internally works with this "wide encodings" with
> 16bit = one char. I don't use Outlook. But I know that if you save an
> Outlook mail it has double byte encoding. Which can't be read by other
> "normal" programs. And I know that for historicals reasons in the
> Windows API "wide" file names have 16bit per characters. But that's
> always "internal". All "normal" files are ANSI (cp1252) or UTF-8.
> Hhm, wait: I've seen Unicode files with always two bytes per char as
> well.
> At this point I sort of wanted to "give up". But then I had this idea
> for a test:
> test.py
>=====
> import codecs
> f2 = codecs.open('test.txt','w','mbcs')
> f2.write(u'ÄÖÜäöüß\n')
> f2.write(u'\u043a\u0430\u0440\u0442\u0438\u043d\u0430.jpg\n')
> for u in range(ord(u'\u2030'),ord(u'\u205F')):
> f2.write(unichr(u))
> f2.close()
> produces 'test.txt'
>=============
> ÄÖÜäöüß
> ???????.jpg
> ?'??`????????????/??????????????????????????
> 'test.txt' has ANSI=cp1252 encoding. And all these question marks you
> see in this posting are right in there. No artefact by mailers or so.
> Aha, and now I probably know, why there is that "???????.jpg" in the
> output of test_dependencies.py. It is that cyrillic file name.
Yes. It seems the codec works similar to Python's "replace" encoding
error handler.
Also, I suppose on a different Windows version or, e.g., on a Windows
system configured for Russian or Greek, the expansion of "mbcs" (and
hence the actual encoding of test.txt) can be different.
> This tells me: 'mbcs' is of no use in this case.
> It can't cope in a reasonable way with the case that filenames are not
> ANSI. Which they very well can.
Of course it does not make sense to test proper storage of
"non-encodable" file names, we should just test whether Docutils somehow
overcomes the problem or whether it aborts with an error.
On the other hand, we have to distinguish testing from real usage:
Your link to the "thread that started it all" helped me to establish the
use-case for the --record-dependencies feature: automatic generation/update
of rules for the "make" program.
The important point is that the user should be able to copy and past the
filenames from the record file (or your above example test.txt) into an
"open file" dialogue or the DOSbox and be able to open the referenced
file. (A more practical use case would be a batch file or some other program
that works with files given to it in form of a file of filenames.)
This means that if the system can not work with some filenames, there is
no need to store them "unmangled" in the "records" file generated by
running a Docutils front-end with --record-dependencies=records.txt.
(BTW: The configuration setting only decides whether the dependency list
is written to a file. Independent of the config value, the dependency
list is always generated as a Python object holding Unicode strings and
can be accessed from a Python wrapper.)
> I think this strongly favours UTF-8!
To find out whether an utf8-encoded depencency list can be useful on a
Windows system with cp1252 filenames, I would like to know whether a program
or a batch file could use the information to open a file with a
"non-encodable" name.
Can you create a text file in e.g. notepad and save it under a Cyrillic or
Greek name (like картина.txt)?
If yes, can you see this file in the file manager?
If yes, is it shown with the correct cyrillic characters?
If no, what does it look like?
Can you open the file from the file manager?
Can you copy-paste it to a file-open dialogue in some other program?
Can you open it from the command line (DOS box)? (what does the pasted name
look like there?)
After all, I prefer a solution that could be easily used by a "make"
implementation or a similar program on all platforms.
> I think it's well worth the effort to find a good solution. Very
> exciting, sehr spannend, and instructive!
Thanks for your patience and testing efforts,
Günter

[Guenter Milde] wrote & schrieb:
>Can you create a text file in e.g. notepad and save it under a Cyrillic or
>Greek name (like ???????.txt)?
... and so on.
>Thanks for your patience and testing efforts,
No problem, don't worry. I have the endurance and I'm interested in a
good solution. And I must say I'm learning myself a lot.
I need a bit of time to find several things out. And then I'll be back
here.
Martin
--
http://mbless.de

Hello Günter,
to cut a long story short: The answer is, well, not 42, but UTF-8!
I've done a lot of (re)search and testing now and I'm absolutely sure
that nothing else makes sense.
Answering your last questions first:
[Guenter Milde] wrote & schrieb:
>The important point is that the user should be able to copy and past the
>filenames from the record file
This works with UTF-8 in the windows explorer.
>Can you create a text file in e.g. notepad and save it under a Cyrillic or
>Greek name (like ???????.txt)?
Yes.
>If yes, can you see this file in the file manager?
Yes.
>If yes, is it shown with the correct cyrillic characters?
Yes.
>Can you open the file from the file manager?
Yes.
>Can you copy-paste it to a file-open dialogue in some other program?
Maybe. I can't test all. Windows is not Linux!
>Can you open it from the command line (DOS box)? (what does the pasted name
>look like there?)
No, yes, see below.
Let me try to remember my thoughts and findings here:
A friend of mine is a real expert and consultant for all kinds of
complicated server stuff. He is a "Microsoft Certified something" as
well and works as consultant. He is off for holidays right now but has
signalled already that he probably can't help. That fits with all you
learn from Googling: It's either Unicode or "useless", if you use
filenames with characters from outside the 256 Ansi chars.
The biggest enlightment for me was to find out that Windows 7 in a
"Dosbox" can handle almost everything. All you have to do is to alter
the font from "Rasterschriftart" to "Consolas". If you do that it will
show cyrillic glyphs even if you codepage is still 850. Crazy, isn't
it?
And, even better, you can switch to codepage 65001, which means
Unicode!
Now the biggest disappointment: Python doesn't work with that setting.
Firstly it complains that "cp65001" is not a known encoding. If I
tweak the encodings/aliases.py Python starts but crashes on the next
command. The first answer given here describes the whole tragedy very
well:
http://stackoverflow.com/questions/7014430/getting-python-to-print-in-utf8-on-windows-xp-with-the-console
Does it work better with Python 3.2 instead of 2.7? No, it's all the
same mess.
The new "Powershell" thing that is supposed to offer better scripting
than CMD (Dosbox) has exactly the same Problems. How the third method
windows offers, the WSH (windows script host) is used by Python I'm
not sure of. I guess it's the way PythonWin works.
There are third party shells als well. They are offering an unicode
entry method.
With Cygwin I can install a Linux (-like system) within windows. Or I
can install several "unxtools", that is, Unix tools compiled for
Windows. With these I would never try to use something else then ANSI
or Unicode.
With CMD /u /c dir >listing-UCS2.txt
I kann tell CMD to produce Unicode output. But this only works if it
is piped. And its created in two byte transfer encoding, not UTF-8.
But that's no problem if you copy and paste in a text editor.
"cp65001" is a famous infamous search term for google as well:
http://www.google.de/search?q=cp65001
But I had the impression that the situation is going to get better
with Python 3.x
Ok, I've reached the end of that path.
Ceterum censeo ... the dependency list should be in UTF-8 encoding!
Martin
... in a hurry ... hoping there aren't too many typos ...
--
http://mbless.de

On 2011-12-12, Martin Bless wrote:
> to cut a long story short: The answer is, well, not 42, but UTF-8!
> I've done a lot of (re)search and testing now and I'm absolutely sure
> that nothing else makes sense.
You convinced me. (With a little help by David A. Wheeler
http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html).
Encoding the dependency_record file in utf8 is
* simple, save, portable
* works out of the box in many cases.
* not specific to Python, easy to recode (see example in config.txt)
* backwards compatible with the "ascii" encoding used in releases <= 0.8.
The alternative would be guesswork (locale_encoding or
sys.filesystemencoding if not 'mcbs').
See Revision 7256 for the implementation, test, and documentation.
I hope this fixes the remaining issues.
Günter

[Guenter Milde] wrote & schrieb:
>You convinced me.
Very good - I'm feeling relieved.
(With a little help by David A. Wheeler
>http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html).
Wow! What an essay. That's an important source of information.
>Encoding the dependency_record file in utf8 is
>
>* simple, save, portable
>* works out of the box in many cases.
>* not specific to Python, easy to recode (see example in config.txt)
>* backwards compatible with the "ascii" encoding used in releases <= 0.8.
>
>The alternative would be guesswork (locale_encoding or
>sys.filesystemencoding if not 'mcbs').
I couldn't have said that better.
>See Revision 7256 for the implementation, test, and documentation.
Good job!
>I hope this fixes the remaining issues.
YES, it does:
E:\> python alltests.py
Ran 1178 tests ...
OK
Victory in the end - no test failures left.
Martin
--
http://mbless.de

On 2012-01-10, Kirill Smelkov wrote:
> On Wed, Dec 07, 2011 at 10:49:49AM +0000, Guenter Milde wrote:
>> On 2011-12-06, Kirill Smelkov wrote:
>> > On Tue, Dec 06, 2011 at 01:33:56PM +0000, Guenter Milde wrote:
>> `Platform paths` seems the suitable format of the entries: According to
>> the specification in config.txt as well as actual behaviour in the case
>> of HTML export, DependencyList records files that were touched during the
>> document conversion, i.e. "files to watch for updates".
...
>> The encoding used in the "record" file should be chosen so that ``make``
>> works wherever it is available. (How do you put (or reference) the
>> dependencies in the Makefile?)
> First of all sorry for looong delay with replying and thanks for
> choosing utf8/make approach together with Martin
> (http://repo.or.cz/w/docutils.git/commitdiff/5fe99c434c93fb9e2fe7950b6db587df85e08e8d).
> Regarding your question on how to integrate dependency tracking into
> Makefile here is how I do it:
...
> That's how it works.
I am glad to hear that our long discussion led to an approach that works in
praxi.
> ~~~~
> I know you use TeX, so maybe below info would be a bit useful too
...
This looks interesting (and quite complex). If this is (or there
is) a generic version of your work-chain, maybe we can add it to the
sandbox or add a link to some other place in the Docutils link list.
Some pointers:
* a nice Python package for compiling LaTeX documents is rubber
https://launchpad.net/rubber
* Sphinx - for projects comprising more than a single document.
Thanks,
Günter

On Tue, Jan 10, 2012 at 02:47:42PM +0000, Guenter Milde wrote:
> On 2012-01-10, Kirill Smelkov wrote:
> > On Wed, Dec 07, 2011 at 10:49:49AM +0000, Guenter Milde wrote:
> >> On 2011-12-06, Kirill Smelkov wrote:
> >> > On Tue, Dec 06, 2011 at 01:33:56PM +0000, Guenter Milde wrote:
>
> >> `Platform paths` seems the suitable format of the entries: According to
> >> the specification in config.txt as well as actual behaviour in the case
> >> of HTML export, DependencyList records files that were touched during the
> >> document conversion, i.e. "files to watch for updates".
> ...
> >> The encoding used in the "record" file should be chosen so that ``make``
> >> works wherever it is available. (How do you put (or reference) the
> >> dependencies in the Makefile?)
>
> > First of all sorry for looong delay with replying and thanks for
> > choosing utf8/make approach together with Martin
> > (http://repo.or.cz/w/docutils.git/commitdiff/5fe99c434c93fb9e2fe7950b6db587df85e08e8d).
>
> > Regarding your question on how to integrate dependency tracking into
> > Makefile here is how I do it:
>
> ...
>
> > That's how it works.
>
> I am glad to hear that our long discussion led to an approach that works in
> praxi.
Thanks
> > ~~~~
>
> > I know you use TeX, so maybe below info would be a bit useful too
> ...
>
> This looks interesting (and quite complex). If this is (or there
> is) a generic version of your work-chain, maybe we can add it to the
> sandbox or add a link to some other place in the Docutils link list.
The tools are generic. Makefile snippets could be placed into e.g.
doc-rules.mk with the intent to be included from a Makefile, and thus
shared between projects.
I don't mind adding this stuff into sandbox.
> Some pointers:
>
> * a nice Python package for compiling LaTeX documents is rubber
> https://launchpad.net/rubber
>
> * Sphinx - for projects comprising more than a single document.
Thanks. I know about rubber and sphinx.
If I recall correctly rubber wants to substitute itself for make, not to
integrate with it, and my approach is to do everything (not only doc,
and sometimes doc depends on code, tables, etc...) via make.
As to sphinx, it's maybe good, but what we have here is not multiple
documents, but one document comprising of several "source" files, so I
see no reason to further distantiate from plain docutils.
Thanks again for the pointers, I'm maybe wrong somewhere - if so, could
you please correct me?
Kirill