I have document which contains Chinese and English languages. For both we have separate fonts to be used. I am trying to find if any Chinese language texts applied with English language font or not. If anything present i need to apply some specific color.

You'll need to put together a list of all possible Chinese characters. You can search for ranges of characters in GREP using their Unicode values. So in the script above, something like this will find the basics:

You don't need to list all the words! You just need to list all the possible characters! Get a list of all possible characters. Type them into the Grep find field. Set the formatting to the English font. Now do a GREP search.

The result will be all Chinese characters that have the English font applied to them.

My screenshot below is showing some dummy text where a condition is applied to the found text:

I think the correct way would be to seach for unicode ranges like you already suggested in reply 3 .

And apply something to it. A condition like the one I presented in my screenshot would be great.

And one could chose a color for the condition that also can print or exported with a PDF.

FWIW: There is really no need for applying a character style.

On the contrary: We should spare character styles for other kinds of formatting, because maybe character styles are already in use and it would be destructive to apply a character style for the found Chinese characters.

The used GREP could include ranges for blocks containing Han Ideographs like suggested at Github.

Block

Range

Comment

CJK Unified Ideographs

4E00-9FFF

Common

CJK Unified Ideographs Extension A

3400-4DBF

Rare

CJK Unified Ideographs Extension B

20000-2A6DF

Rare, historic

CJK Unified Ideographs Extension C

2A700–2B73F

Rare, historic

CJK Unified Ideographs Extension D

2B740–2B81F

Uncommon, some in current use

CJK Unified Ideographs Extension E

2B820–2CEAF

Rare, historic

CJK Compatibility Ideographs

F900-FAFF

Duplicates, unifiable variants, corporate characters

CJK Compatibility Ideographs Supplement

2F800-2FA1F

Unifiable variants

But the ranges above are not sufficient as the following little experiment is showing:I copied some Chinese text—no idea what it is saying—from the net to my InDesign page and ran the following GREP on it:

What obviously is missing are punctuation characters, brackets and quotation marks.Plus—at least in this text sample—a simple blank character that maybe should not be there ( second line of the Chinese text ).

Yes i have some character styles applied within the document. Also my idea is not getting the grep list of all Chinese characters, because i may miss some thing anyhow. So better am checking whether the character is english or not.. if no then i marked with swatches.

As you can see from my example, the pasted Chinese text is using some typical characters that are ALSO used with English text.Among them a pair of brackets ( 0028 and 0029 ) and a "stray" blank ( 0020 ). So it's not "English or not" whatever English means. E.g. German would share the same characters with English. But not the other way around.

Thank you for your suggestions It is really helpful. Yes the punctuation as well as a problem. Using Grep will reduce the manual work little, but need to look the other characters like punctuation separately.

Feeling bad because not able to provide the client document to you. Sorry about that.

One of the toughest things could be to find out if some text that is meant for English is falsely typed with FULLWIDTH characters somewhere in the range FF10 to FF5A. Then you would need to map FULLWIDTH characters to "normal" characters, if you like to apply "ITC Avant Garde Gothic Std" that would not contain FULLWIDTH characters.

Same for FULLWIDTH digits perhaps.

E.g a FULLWIDTH DIGIT ZERO ( FF10 ) could be mapped to DIGIT ZERO ( 0030 ).But that will depend on the individual case of course.