Yesterday I converted about 220 articles from the Standford Encyclopedia of Philosophy into epub and transferred them to my Kobo. I was not surprised to see that it appeared to get stuck somewhere "Processing Content". By the very tedious and laborious process described by others I was eventually able to transfer all but two of the files.

This really is unacceptable. Perhaps the Kobo team is unable to prevent their "Processing Content" stage from freezing. But this should provide better information to help people around their inadequate software. Instead of saying what percentage it thinks it has completed, the Kobo can tell us what file it is currently processing. We can then highly suspect that the file it last reports is the one it is stuck on. And it can provide a list of the files it processed that it had no problem with. It could probably even actually transfer those files instead of tranferring NO files if has trouble with one file. (If you get to "90%" for example, on 200 files, NO file at all will be transferred after all that work!)

Putting this issue aside, what was the problem with those two files? I looked into the original html trying to find something that those files had and none of the other 220 files had. And indeed there was something: the value of the "name" attribute in various <a> files has colons in them. For references, for example "Carnap:23a" was used as a reference to a paper Carnap published in 1923. This was carried over to Calibre's command line ebook-convert epub files. I don't know if this is allowed in html or epub's xhtml or not, except that every program I tried does accept it: all browsers, all epub readers I have on my pc.

The only thing that did not like it, and could not tell me why, and either hung or crashed in processing such a file was the Kobo.

I changed all these attribute values and the Kobo finally allowed me to add the files.

A couple of other comments: even if colons are not allowed (I don't feel too much like checking this out now, but I can't imagine why they are not), it is not because of that that Kobo failed. Otherwise it should just tell is there is a syntax error in the epub file. It should give much better behavior in this case, not a hang or crash.

Second: it took me a lot longer to do this because once I found the first file it could not handle (about fifty files in), I put it aside and kept going with the laborious trial and error process until I got all but one of the remaining files on. Only then did I look at the two files. Had I taken the first problematic file when I found it, and looked for an anomaly in it, I might have been able to avoid much of the rest of the long process. Hopefully others may take a smarter path in this regard.

However, Kobo must fix this. Not just the problem with colons, but also the messages it gives just before it crashes or hangs on an epub it cannot handle.

Putting this issue aside, what was the problem with those two files? I looked into the original html trying to find something that those files had and none of the other 220 files had. And indeed there was something: the value of the "name" attribute in various <a> files has colons in them. For references, for example "Carnap:23a" was used as a reference to a paper Carnap published in 1923. This was carried over to Calibre's command line ebook-convert epub files. I don't know if this is allowed in html or epub's xhtml or not, except that every program I tried does accept it: all browsers, all epub readers I have on my pc.
...
A couple of other comments: even if colons are not allowed (I don't feel too much like checking this out now, but I can't imagine why they are not), it is not because of that that Kobo failed. Otherwise it should just tell is there is a syntax error in the epub file. It should give much better behavior in this case, not a hang or crash.

You may well have struck upon something here. I had a web page from Wikipedia, saved in htm via Word and converted to epub by Calibre that would consistently freeze up the Kobo on page 6. Eventually I deleted everything after page 5, as that was all I really needed, and now the Touch can handle it.

I just went back and checked the original Word file, there are three instances of "Template:" in the offending page. So even though the epub imported into the Touch ok it looks as though the Touch may freeze on rendering colons in tags.

Hi BensonBear I have some of my epub that cannot get processed by my kobo. I can identify which books seems corrupted (copying one by one). How can I "repair" them ? I have not created them from html.

Unfortunately there is no general answer to that. There is no real way to know what the Kobo is getting confused by. You can examine the files in the epub zip file to see if there is something that is unusual but it could be anything as far as we know. It seems at least one other person has a very similar problem (colons in the names in <a> tags). So you can search for that in the html files. You can just unzip the epub file (it is just a zip file) and use whatever editor you want and then rezip, or it might be easier to get an epub editor such as those mentioned (calibre also has one) and use that to examine and then edit the html files).

Quote:

Originally Posted by plib

I just went back and checked the original Word file, there are three instances of "Template:" in the offending page. So even though the epub imported into the Touch ok it looks as though the Touch may freeze on rendering colons in tags.

Are these name attributes in <a> tags? I am guessing not, since at least your file was parsed without complaint. It seems like Kobo might have problem with the colons in names in other locations as well. (btw I don't know what Word does but I would not introduced extra middlemen in this process if not necessary). Can you try replacing the "Template:" with something else? If it is just a name attribute value any unique replacement should be okay.

Quote:

I agree that it's a very inelegant way of handling errors

I believe Kobo is not "handling" an error here, it is *committing* an error. Even if colons are not allowed in name attributes (and why should they not be? everybody else seems to use them and handle them well) Kobo is treating the files incorrectly and it's not just a moot point since it renders using the Kobo a real pain if one has such files.

There's the special thing about them that in CSS, the colon needs to be
"escaped", usually as \: (i.e., a backslash is written before the
colon). The reason is that the colon is a special character in CSS
syntax in pseudo-class and pseudo-element selectors.

Not sure if that helps any but using a colon in a class name is not something I would ever do.

Are these name attributes in <a> tags? I am guessing not, since at least your file was parsed without complaint. It seems like Kobo might have problem with the colons in names in other locations as well. (btw I don't know what Word does but I would not introduced extra middlemen in this process if not necessary). Can you try replacing the "Template:" with something else? If it is just a name attribute value any unique replacement should be okay.

I believe Kobo is not "handling" an error here, it is *committing* an error. Even if colons are not allowed in name attributes (and why should they not be? everybody else seems to use them and handle them well) Kobo is treating the files incorrectly and it's not just a moot point since it renders using the Kobo a real pain if one has such files.

I don't know how this will quote but here is what seems to be the offending passage from the original Word htm file, which was a direct copy/paste from the Wikipedia page.

The items with the colons are the vertical "v", "d", "e" references which include "Template:Honorverse" and "Template_talk:Honorverse" tags. I removed these plus everything following, which I didn't really need for the book/short story chronological sequence anyway, and it now behaves fine in the Kobo.

Maybe you're right and someone from Kobo will read this. I did originally send it to their tech support department, but heard nothing back.

I don't know how this will quote but here is what seems to be the offending passage from the original Word htm file, which was a direct copy/paste from the Wikipedia page.

Yes, this is the same problem, except from the href side instead of the anchor side. The names that my files had trouble with were internal links defining an anchor to use in an href, whereas these are external links used in a href. Same problem. If the links are internal anchors , you can simply remove all the colons in the offending attribute values provided they remain unique which they most likely will be. If they are external urls like these, and you need to link to them you are in trouble unless you can rename the items they are linked to.

In this case you can simply remove the colons and keep the rest of the file. The corresponding external links will be lost but they probably are not included in your epub conversion anyway.

Should repeat: it is too much to expect the Kobo people to fix all problems of this nature. However, they really really should recover from errors of parsing in a better manner, so that a person transferring many documents is told by the Kobo exactly which document it is that the Kobo cannot handle instead of having to work for hours trying to find it by trial and error.

As noted elsewhere, many of the issues along these lines appear to occur with EPUBs that have been created by the users as opposed to those available commercially.

The problem may be in the resulting size or the method of conversion. That said however, although we have the standards for EPUB widely available, the end-user tools that we have to make the conversion still appear to be in its infancy. (The standard tool available, Sigil is version 0.5x)

I wonder therefore, whether the issue (bug) is with the quality of EPUB files and the tools we use to make the conversion?

As noted elsewhere, many of the issues along these lines appear to occur with EPUBs that have been created by the users as opposed to those available commercially.

If that is true, that is probably because when someone makes an epub available commercially, they fiddle around with it until it works on the various ereaders they are trying to sell it on.

Quote:

The problem may be in the resulting size or the method of conversion. That said however, although we have the standards for EPUB widely available, the end-user tools that we have to make the conversion still appear to be in its infancy. (The standard tool available, Sigil is version 0.5x)

Size? Method of conversion? What is wrong with the size in the epubs I made with calibre? They are no larger than they have to be as can be easily seen if one unzips them and looks at the files. "Method of conversion?" Don't know what that means. I do know that if I remove the colons from the attribute values of the name field in the <a> tags, like I said in the original post, the epubs work fine on the Kobo. Perhaps colons are not allowed in these locations in epubs, and the Kobo is way ahead of all those other epub readers that stupidly read these files fine, when instead they could hang or crash like the Kobo does when encountering them?

Quote:

I wonder therefore, whether the issue (bug) is with the quality of EPUB files and the tools we use to make the conversion?

I highly doubt it. Note also there is clearly a problem with the Kobo processing content: it HANGS or CRASHES. No program should do that. And if it hangs or crashes while processing 1000 files, it copies NONE of them successfully even if has successfully processed 999 files, and the problem is in the last file. It gives no clue which file it has had a problem with, leaving the user with a real headache of a task.

None of these very poor features can be attributed to the tools users have used to make epubs. Epubs which, by the way, in my case, can be read by every other epub reader I have (okular, sigil, fbreader, calibre).

@Benson; have you followed the normal advice for getting in touch with Kobo and reporting this issue and what you have found, or have just posted here hoping it will be seen?

I have not tried to contact Kobo directly about any of the things I consider to be shortcomings, as it would require a lot of work to articulate clearly, and I would have to prioritize this issue along with all the others, and from what I have read here such correspondence does not seem to have much of a detectable effect.

Also I am sure they are already aware of this problem. In fact before reporting now would have to check it out on the new release. I am only replying to posts about it now out of a sense of obligation because I posted the original post here.

I personally don't have any strong "hope" that is is solved since I am not using the Kobo anymore anyway except as a platform to play around with.