Mixed Line Ending Detection. Possible?browsing

I learned that it is possible to have files with mixed line endings. I made a file with 3 different line endings (LF, CRLF, CR).
Results: LF has the priority and the file will be recognized as an LF ending file, even if it has other ones in there, (the user can not know unless if he has line endings display turned on and is going through line by line)
CRLF has the priority over CR, and CR is last.

This order makes perfect sense and is the way it should be. And, I understand that no file is supposed to have mixed line endings like this and operating systems do not use anything mixed (windows uses CRLF, mac and linux use LF)

BUT, it is possible to have files with mixed line endings, thus I would like to be able to detect if a file is in such a condition (without having to turn on line ending display and going through line by line)

So ideally you’d have some code watching to see if you ever get a “bad” line ending in your editor tab buffer. “bad” is defined as:

Your file is Windows (CR LF) (see status bar) and you get either a LF line-ending or a CR line-ending (somehow) in its editing buffer

Your file is UNIX (LF) and you get either a CR LF line-ending or a CR line-ending in its editing buffer

Your file is MAC (CR) and you get either a CR LF line-ending or a LF line-ending in its editing buffer

However, one of the problems with doing this, at least with one of the scripting languages, is what’s known as the “big file problem”. As long as your files are relatively small, scanning them often (like when you make a change that could potentially put a wrong line-ending in) runs fairly quickly, but if your files are large then you start to see the time it takes to scan a file because it shows up in a sluggish user interface.

So then you make compromises. You can scan a file for mismatches when it is first opened. You can scan it at save time. You can come up with some complex algorithm to scan pieces of it at different times. Your choice.

That being said, normally it is a difficult task to get a wrong line-ending in your editor buffer. Scintilla normally takes care of keeping the line-endings consistent and correct. For example, if you copy lines out of a UNIX file buffer and into a Windows file buffer, Scintilla will convert the line endings from LF to CRLF during the paste. So…is it a really big issue after all?

If it is that hard to determine that there are bad line endings, then we might as well not bother with it at all!

It’s not necessarily hard, there just isn’t some magic that makes it especially easy.

I actually do this line-ending mismatch detection for myself, using pretty much the same technique used in this thread. I monitor what is happening in the currently viewable area, figuring that this is the likeliest place for a mismatched line ending to get inserted. If it is detected, the code turns on visible line-endings (normally I have that turned off) to really hit me with the problem.

Perfect? No. Really easy? No. Really hard? No. Acceptable? Yes.

By the way, here is a good way to get Unix line-endings in your Windows files: Copy some multiline text from the Pythonscript console window, then use the Clipboard History panel to paste it into a Windows editor tab! BOOM! LF(only) line-endings in your Windows buffer. Note that this series of events goes around Scintilla’s watchful paste protection.

That code is wrapped up into something “bigger” that I have–not worth it to me to extract it. In general, I share code here in a couple of cases:

I already have the code done to do a certain specific thing when people post asking how it can be accomplished

Somebody posts an idea (that I hadn’t previously thought of) that can be of use to me; thus I write the code that implements it and share it

An example of case #2 is your request for this. I hadn’t thought of that previously, but when you brought it up, I thought “I can benefit from having that”…and now that it’s done I like it even more than I thought I would.

So…yea, if you wanna work on coding it yourself, I can give you hints if you get stuck, but with the framework from that other code a lot of the “hard” part is done already.

Hi Guy, yea, this comes up a lot it seems, but people seem to want something that runs automagically–and unfortunately running a regular-expression replacement doesn’t fit in that scheme (hmmm, maybe a regex replacement macro that is run at save time–can NppExec do that? dunno…). However, take heart in that the regexs from your posting above do appear in the code for this, here’s a sample:

normally it is a difficult task to get a wrong line-ending in your editor buffer

I just thought I’d point out that with an errant regular-expression replacement, it is actually quite easy to get wrong line-endings in your buffer. For example, if your replacement expression involves line-ending characters, make sure you get it correct for your file-type, as shown above. Scintilla does nothing to protect you from messing up your files in this case (like it does when pasting). Absolute power (regex) corrupts absolutely! :-) Or maybe I should say that it can !

Oh, so when I paste things into scintilla, scintilla automatically converts whatever eol my paste has to what the document is set to, (which I Can see in the statusbar)

I didn’t know that but I guess that makes sense and protects me from mixed line endings.
I like that.

I still would like a script to be able to highlight my mixed EOLs though (just the same way I highlight my line final whitespace), or at the least turn on the EOL displays like how your script has. (which you aren’t pasting here = ( )