Update 2013/05/28: The selnolig package is now on the CTAN. Comments and critiques always welcome! If you wish to contact me about any aspects of the package, please use the email address that's given at the bottom of the title page of the package's user guide. (Relative to the state of the package described in the question below, I've managed to squash at least one bug, and I suggest better work-arounds for the remaining bugs -- at least the ones I'm aware of!)

I'm in the process of readying a LuaLaTeX package for "official" release to the CTAN, but I need to squash a few remaining bugs first. The bug described in this question concerns incorrect behavior of my package that occurs if the fontspec package is loaded; if fontspec is not loaded, none of the problems described here occur. Obviously, asking potential users of my package not to load fontspec is not an option. Incidentally, the identifier string of the LuaTeX version on my system is "beta-0.70.2-2012062819", distributed with MacTeX2012. For much more information about the full selnolig package, which does automated, selective suppression of typographic ligatures, please see New package, selnolig, that automates suppression of typographic ligatures.

The MWE (see image below, and code provided below image) illustrates several instances of failure to perform ligature suppression if -- and apparently also only if -- fontspec is loaded. Specifically, ligature suppression fails for:

a word that's followed immediately by a comment (%) sign

the last word in the argument of a command such as \footnote and \section

a word that immediately precedes the start of an environment such as enumerate and itemize

the final word of an \item statement, i.e, the final word before the next \item statement and/or the environment's closing \end{enumerate/itemize} directive

A common theme of these problems is that they occur if the word in question (plus any trailing punctuation characters) is at the very end of some environment, group, or argument to some macro. In all cases, a "remedy" of sorts is to insert either a space, a blank line, or a space plus something like \vphantom{x} [!]. Clearly, these remedies are not real solutions but merely kludgy hacks, and I certainly wouldn't contemplate asking users of my package to do implement these hacks.

My questions are, then:

How can I make my lua code more robust to whatever is being done by the fontspec pacakge (or some package that's loaded by fontspec)?

Is there a way to load either fontspec (or some of the packages called by fontspec) to suppress the interference with my lua code?

Or, have I discovered a bug in fontspec (or one or more of the packages loaded by fontspec) that needs to be fixed anyway?

% !TEX TS-program = lualatex
\documentclass{article}
% If the next line is commented out, everything works fine!
\usepackage{fontspec}
\RequirePackage{luatexbase,luacode,expl3}
% Load lua code
\directlua{ require("ld-orig.lua") } % see below for contents of ld-orig.lua
% Define the user macro "nolig"
\providecommand\nolig[2]{ \directlua{
suppress_liga( "\luatexluaescapestring{#1}",
"\luatexluaescapestring{#2}" )}}
% Provide a ligature suppression rule
% (the full package obviously provides many more such macros)
\nolig{lfful}{lf|ful} % shelfful -> shelf|ful
% Just for this MWE:
\usepackage[textheight=8cm]{geometry}
\setlength\parindent{0pt}
\pagestyle{empty}
\begin{document}
Two shelffuls of \TeX-related books: it works!
\bigskip
% word to be de-ligated is followed immediately by % (comment character)
Ligature suppression doesn't work here: shelfful%
% leaving a space between word and % makes it work even if fontspec is loaded
But it does work in this case: shelfful %
\bigskip
bad\footnote{This doesn't work: shelfful.} % w/o space
good\footnote{But this does work: shelfful. \vphantom{x}} % w/ space and \vphantom directive
\bigskip
% Two more problem cases: (i) last word before start of an
% itemize/enumerate environment, (ii) last word of an \item
one shelfful, two shelffuls % no ligature suppression for "shelffuls"
\begin{itemize}
\item shelfful % no ligature suppression here either
\item shelfful \vphantom{x} % inserting space and \vphantom does the trick...
\end{itemize}
% problem also occurs in arguments of sectioning commands
\section*{sad shelfful} % again no ligature suppression
\subsection*{happy shelfful } % adding space at end of argument makes it work!
\end{document}

Contents of ld-orig.lua:

--- Credits to Patrick Gundlach, Taco Hoekwater, and Steffen Hildebrandt!
local glyph = node.id('glyph')
local glue = node.id("glue")
local whatsit = node.id("whatsit")
local userdefined
for n,v in pairs(node.whatsits()) do
if v == 'user_defined' then userdefined = n end
end
local identifier = 123456 -- any unique identifier
local noliga={}
debug=false -- default: don't write debugging info to log file
function debug_info(s)
if debug then
texio.write_nl(s)
end
end
local blocknode = node.new(whatsit, userdefined)
blocknode.type = 100
blocknode.user_id = identifier
function process_ligatures(nodes,tail)
local s={}
local current_node=nodes
local build_liga_table = function(strlen,t)
local p={}
for i = 1, strlen do
p[i]=0
end
for k,v in pairs(t) do
-- debug_info("Match: "..v[3])
local c= string.find(noliga[v[3]],"|")
local correction=1
while c~=nil do
-- debug_info("Position "..(v[1]+c))
p[v[1]+c-correction] = 1
c = string.find(noliga[v[3]],"|",c+1)
correction=correction+1
end
end
-- debug_info("Liga table: "..table.concat(p, ""))
return p
end
local apply_ligatures=function(head,ligatures)
local i=1
local hh=head
local last=node.tail(head)
for curr in node.traverse_id(glyph,head) do
if ligatures[i]==1 then
-- debug_info("Current glyph: "..unicode.utf8.char(curr.char))
node.insert_before(hh,curr, node.copy(blocknode))
hh=curr
end
last=curr
if i==#ligatures then
-- debug_info("Leave node list on position: "..i)
break
end
i=i+1
end
if(last~=nil) then
-- debug_info("Last char: "..unicode.utf8.char(last.char))
end--]]
end
for t in node.traverse(nodes) do
if t.id==glyph then
s[#s+1]=string.lower(unicode.utf8.char(t.char))
elseif t.id== glue then
local f=string.gsub(table.concat(s,""),"[\\?!,\\.]+","") -- add all interpunction
local throwliga={}
for k, v in pairs(noliga) do
local count=1
local match= string.find(f,k)
while match do
count=match
-- debug_info("pattern match: "..f .." - "..k)
local n = match + string.len(k)-1
table.insert(throwliga,{match,n,k})
match= string.find(f,k,count+1)
end
end
if #throwliga==0 then
-- debug_info("No ligature substitution for: "..f)
else
-- debug_info("Do ligature substitution for: "..f)
local ligabreaks=build_liga_table(f:len(),throwliga)
apply_ligatures(current_node,ligabreaks)
end
s={}
current_node=t
end
end
end
function suppress_liga(s,t)
noliga[s]=t
end
function drop_special_nodes (nodes,tail)
for t in node.traverse(nodes) do
if t.id == whatsit and t.subtype == userdefined and t.user_id == identifier then
node.remove(nodes,t)
node.free(t)
end
end
end
luatexbase.add_to_callback("ligaturing", process_ligatures,"Filter ligatures", 1)

Postscript: The solution to the bug described in this posting. The key sequence in the lua code given above is that caused the bug was:

for t in node.traverse(nodes) do
if t.id==glyph then
s[#s+1]=string.lower(unicode.utf8.char(t.char))
elseif t.id==glue then
...

All that was required to fix the bug is to change this code snippet to:

for t in node.traverse(nodes) do
if t.id==glyph then
s[#s+1]=string.lower(unicode.utf8.char(t.char))
end
if ( t.id==glue or t.next==nil or t.id==kern or t.id==rule ) then
...

The point is that the sequence of characters that needs to be processed by selnolig can end in ways other than just with some amount of (TeX) "glue" (e.g., whitespace). Another way for the sequence to end if the word is the very last item being processed, e.g., if it's the last word in the argument of a command such as \section{}; if that's the case, the variable t.next will be equal to nil. Finally, the two remaining if conditions -- t.id==kern and t.id==rule -- are provided in case a user has inserted a "kern" or "rule" item manually.

"...release to the CTAN, but I need to squash a few remaining bugs first...." Things were different in my day:-)
–
David CarlisleApr 7 '13 at 12:56

If you move the code following the elseif t.id== glue then after the end of the node.traverse() loop, some problems go away. Didn't check of side effects though. The case with the % doesn't have glue at the end so the t.id == glue will never be true in that case.
–
topskipApr 7 '13 at 13:31

I think, I’m only speculating here, that the use of ligaturing callback is the culprit, you should have used pre_linebreak_filterandhpack_filter, but your code need to be modified, process_ligatures() need to return the modified head.
–
Khaled HosnyApr 7 '13 at 14:25

1

\label{sec:selffulling} stuff shouldn’t be an issue regardless of the callback you use, by the time you are working on the node list it will not be there (in never makes it to the node list, it is handled much earlier).
–
Khaled HosnyApr 7 '13 at 17:37

for t in node.traverse(nodes) do
if t.id==glyph then
s[#s+1]=string.lower(unicode.utf8.char(t.char))
elseif t.id== glue then
...
(process ligatures)
...
end
end

makes it clear that only a glue activates the ligature processing.

I'd suggest using a different kind of looping for ligature processing.

The difference between fontspec activated or not is the following: with fontspec deactivated, the ligaturing callback disables all ligaturing. What you see is not the effect of the command \nolig, but a general "no ligature" mode. Try words like fluffiest fish and you see that. With fontspec enabled, the result is "always ligatures" unless you block them with the code you use.

So the ligaturing callback is not the perfect way to deal with the situation, I am afraid. You could however call node.ligaturing() at the beginning of the callback and then do what you are doing. But that would probably interfere with fontspec.

+1 for the viznodelist module!! Your detailed analysis is very helpful indeed. Real quick: (i) Would you mind posting new images, that also show what happens -- if anything -- between the "f" and "f" in "shelfful"? (Currently, the images start with the second "f", but don't show anything to the left of that second "f".) (ii) If it is only glue that activates the (de)ligaturing loop, why do the ligature suppression algorithms work unlessfontspec (or, possibly, one or more of the packages loaded by fontspec) isn't loaded?
–
MicoApr 7 '13 at 16:22

Many thanks for the expande answer. I've familiarized myself with the viznodelist module. As you've noted, the string shelfful% generates the same viz-output irrespective of whether fontspec is loaded or not; specifically, there's a "glyph box" for the first f, a "disc box" (for the discretionary hyphen), and a "glyph box" for the second f. However, we know that withfontspec loaded, there will be an "ff" ligature, whereas there won't be a ligature in the case without fontspec. So there must be a difference lurking somewhere not caught by viznodelist. :-(
–
MicoApr 8 '13 at 18:31