Any time I try to convert a .txt ebook, I get no formatting in the result. Typically I try to convert from .txt to mobi. Is there any reason why this might be? I can't quite figure it out. I've attached an example of such a file.

Any time I try to convert a .txt ebook, I get no formatting in the result. Typically I try to convert from .txt to mobi. Is there any reason why this might be? I can't quite figure it out. I've attached an example of such a file.

You have to select the method you want to use to format your text file within the Text Input area during conversion. For the file you attached you should check Treat each line as a paragraph. Hover over each option to get an explanation.

Applying Markdown to plain text source files can also be used to good effect with calibre. This can help improve the look-and-feel of the final book considerably, including automatically produced multilevel TOCs and formatting for headings/sub-headings. It takes about 15-30 mins per book. If you're interested then there are several threads in this forum that will get you started.

Here is the problem, as I see it. All the text editors that I am aware of use CR/LF sequence to mark the end of line. Calibre looks for two consecutive CR/LFs to identify a paragraph but totally ignores the single CR/LF making it impossible to create bullet lines.

Here is the problem, as I see it. All the text editors that I am aware of use CR/LF sequence to mark the end of line. Calibre looks for two consecutive CR/LFs to identify a paragraph but totally ignores the single CR/LF making it impossible to create bullet lines.

Hi John, Perhaps you did not investigate the link quoted in your post thoroughly enough

I have attached your TXT file updated with a little Markdown to produce the EPUB also in the attached zip.

You will see that it has:

Bullet points (note the asterisks in the TXT)

The address at the top is also formatted correctly (note the 2 extra spaces at the end of each line.

Here is the problem, as I see it. All the text editors that I am aware of use CR/LF sequence to mark the end of line. Calibre looks for two consecutive CR/LFs to identify a paragraph but totally ignores the single CR/LF making it impossible to create bullet lines.

The default is to use two new line markers as the paragraph boundary. This example illistrates why:

Code:

This is all
one paragraph
This is also all one paragraph. We use
two new line markers to separate
paragraphs for the following reason.
If I make a
list of items separated by
a single new line marker then,
I can't tell if it's paragraph
or a list. So I assume it's
a paragraph because they're
more common.

Your two options are to put a second new line after each item in the list or as jackie_w suggested use markdown to give a higher degree of formatting.

Also, only Windows uses this sequence. Unix based system (both Kovid and I use Linux, GRiker uses OS X) use LF only. Apple's OS 9 and earlier used CR only to denote a new line. TXT input must support all of these variations including any combination of the above new line markers within the same file. Due to this we cannot do something like: CR/LF denotes new line and CR only denotes items in a list. Internally TXT input converts all new line markers to LF. This solves the different OS using different markers and allows for TXT input to easily match against a single new line character.

Also, only Windows uses this sequence. Unix based system (both Kovid and I use Linux, GRiker uses OS X) use LF only. Apple's OS 9 and earlier used CR only to denote a new line. TXT input must support all of these variations including any combination of the above new line markers within the same file. Due to this we cannot do something like: CR/LF denotes new line and CR only denotes items in a list. Internally TXT input converts all new line markers to LF. This solves the different OS using different markers and allows for TXT input to easily match against a single new line character.

I'll admit I haven't written a line of code since C was a pup nor do I wish to. I'll also admit I can be pretty dense at times. But I just don't get it.

I took the test file that I attached earlier and replaced all CR+LF with a single LF and later with a single CR. All three versions displayed the same using the Calibre Viewer (name and address lines displayed on a single line).

So here is my question: If two consecutive EOL markers (CR+LF, CR or LF) can be detected and used as a paragraph marker why can't a single instance be detected and used as a single EOL marker. Here is a list of allowed EOL terminators all others are ignored.

By the way, I really don't have a problem. I use a text editor to create address book, prescription and other personal info to keep handy. Text (.txt) files display just fine on all my eReaders if I transfer them directly. Calibre will convert Wordpad (.rtf) to epub or mobi correctly if you use shift+enter key to terminate single lines. So I'm in good shape.

It seems I wasted my time writing post #8 advising you exactly how to fix your TXT file in about 10 seconds. You seem to be hung up on CR and LF which are a non-issue.

No, you did not waste your time. In fact , I had read the link info and examined your markdown attachment carefully. I meant to thank you for your effort but something got in between - lazy maybe.

My original reply was to Kovid. After reading the documentation this statement made me wonder; "by default calibre only groups lines in the input document into paragraphs. The default is to assume one or more blank lines are a paragraph boundary:" Here is my quibble - why group lines into only paragraphs. Why not into paragraphs and single lines? It is certainly possible and then markdowns would only be necessary if you wanted to add other basic formatting.

By the way, showing that it could be done was what all that focus on CR and LF was all about - which was the issue.

I think that the key point is that in most modern ebook formats (that are HTML based) end-of-line is simply treated as white space that is equivalent to a single space. This is great for reflowing text which is what they aim to support to fit a wide variety of screen sizes and allow font zooming.

You then need some special logic for recognizing paragraph breaks which then get there own special tag ( typically <p> or an equivalent). The whole concept of end-of-line therefore tends to be stripped out.

My original reply was to Kovid. After reading the documentation this statement made me wonder; "by default calibre only groups lines in the input document into paragraphs. The default is to assume one or more blank lines are a paragraph boundary:"

Kovid isn't weighing in very much because I'm the author and maintainer of TXT input. For TXT files paragraphs are the only reliable components that they can be broken down into.

Quote:

Originally Posted by Jabby

Here is my quibble - why group lines into only paragraphs. Why not into paragraphs and single lines? It is certainly possible and then markdowns would only be necessary if you wanted to add other basic formatting.

There are two parts to this. The easy part is Markdown was chosen as the method for adding formatting to TXT files. It is easy, quick, and the markup even looks good when just viewing it as a standard text file. We have one all purpose formatting method that handles pretty much every case short of not using HTML. Adding other formatting methods that do the same thing is unnecessary.

1) Many TXT documents (look at project Gutenberg) have this formatting:

Code:

I am all one
paragraph split
along multiple lines
with a single new
line character.

By your line ending description above it would turn into:

[code]
<p>
I am all one<br />
paragraph split<br />
along multiple lines<br />
with a single new<br />
line character.<br />
</p>

The whole point of TXT input is to take a fixed placement document and turn it into a reflowable format. calibre's conversion process actually requires this. Input -> reflowable intermediate format -> Output.

You've removed the entire idea of a reflowable paragraph that changes layout to fit with the page width but by doing the above. The TXT input is based on intent. Novels are the typical input and it is designed to handle the majority of their formatting cases. There is a "Treat each line as a paragraph" and Markdown to handle cases corner cases such as yours.

2) I'm not 100% clear but if you are implying that we allow for mixed CR/LF characters (aside from the standard Windows CRLF) within the document to denote different meaning? Such as LFLF for paragraph and CR for new line, No. CR and LF are invisible characters. They are all treated the same because that's how the majority of text editors treat them. Many uses will edit a file on Windows and then on say OS X. Some editors will convert all new lines to the system's standard and some insert their systems new line where indicated while still displaying correctly. Uses will become very confused when viewring their converted documents and different lines behave in different ways. Especially when they can't see there is a CR instead of an LF chracter in the source TXT file. In this case telling a user to open their document in a hex editor is not acceptable.