If this is your first visit, be sure to
check out the our friendly community. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below. Membership is free and once you logged in as member you can remove advertisements. So quickly and join our friendly community.

Announcement

Collapse

Please use the Hentai ID thread for all hentai ID requests. Click me for link!

Interactive Text Hooker - new text extraction tool

08-25-2010, 06:08 AM

Interactive Text Hooker (ITH) is a tool to help you extract text from Japanese games.
It works very like AGTH. if you are familiar with AGTH you will find it easy to work with ITH.
Right now ITH is not quite stable and under developing. Please help me test it and report any bugs you find.
Also any suggestion about new features or improvement is welcome.
Latest ITH 2.3 (2011.7.9). ITH64 1.0 (2011.5.15). 3.0 test.
Latest engine support module(10.15).

System requirement:
Intel Pentium4 or later processor. Recommend OS is Windows XP or later.
Technically your processor must support SSE2 and OS support common control library 6.
ITH is also assumed to work in 64-bit Windows.

Basic usage:

Spoiler

Please put ITH.exe, ITH.dll and ITH_engine.dll in the same folder.
To let ITH extract text from a game, click on the process button,
it will open a dialog with a process list. Find your target and click attach then.
When you don't want to extract more text from that game, click detach to tell ITH stop extracting.
After attached, the shorter drop-down list will contain the pid and name of the game you select.
If ITH could extract some text, the longer drop down list in the main window will have more than one item.
When you select one text will appeared in the big square.
Every item is called a thread. Try to go through every item and find if the text is the same
to the text in the game.

User-defined hooks:

Spoiler

Also UserHook. AGTH uses this term in its thread.
When the default hooks doesn't give you right text you want, you will need to install a user-defined hook.
A special string is needed to inform ITH about the hook you want to install.
This string is called H-code in AGTH terms. Usually it's game and version dependent.
Refer to AGTH help and Freaka's video tutorial for further information.
ITH can handle AGTH H-code, so if you have an AGTH H-code for a game, ITH will also work well for that one.
Input this string right to the process list(the shorter drop down list) and press enter, a new hook will be installed.

Thread window:

Spoiler

You can manage thread linking information and comment here. Thread linking is a mechanism to merge thread.
Select the sender thread at the top, then select a thread in link to list.
Click set, a link will be created to that. Notice that no cyclic link is permitted. Like 1>2>3>1.
Link list will list all thread on this chain one by one.
Last sentence contains the last sentence from one thread.
Comment is some text to describe the thread.
After you have commented some thread, its name will change in the main window.

Hook window:

Spoiler

Click hook in the main window will open a dialog to help you manage hooks.
But it is for advanced users who are familiar with H-code internal.

H-code defined by Setx: /H[X]{A|B|W|S|Q|H}[N][data_offset[*drdo]][:sub_offset[*drdo]]@addr[:module[:{name|#ordinal}]]
addr->Hook Address, data_offset->Data Offset (when data_offset is negative value, sub more 4 from that, e.g. -8 for EAX, but still -4 for EAX in H-code)
*drdo->Data Indirection when after data_offset, Split Indirection when after sub_offset
sub_offset->Split Parameter, also sub 4 when negative.
Module/Function Base(ITH original): Here fill hash values of module and name. You enter a string in the right blank,
click hash module/function, then hash values is calculated and filled into this 2 blanks.
CheckBoxes on the left enable correspond function.
CheckBoxes On the right correspond to charset option.
A->Big Endian (ITH different from AGTH defination), B->None, W->Unicode
S->String, Q->String&Unicode, H->Hex value, N->No context.
Last Char(ITH original), give in string pointer and extract last char of that string.
Click generate code you will see h-code of this hook in the bottom.
Notice that module and name is string in AGTH h-code, but ITH can't get string back from hash.
Click Remove Hook to remove current selected hook from target process and clear all threads from that hook.
Click Modify Hook to modify current selected hook. In fact original hook is removed and ITH insert new hook base on parameters in hook window.

Profile:

Spoiler

After attached to some process, you can add that to profile. ITH will record its path.
If you enable auto inject in option window, ITH will monitor process and attach to that whose path has been record.
You can also assign up to 4 user-define hook codes(h-code) to a record.
If you enable auto insert in option window, ITH will insert these hooks after attached.
Hook code contains module/function name will be transform into hash value.
They represent equivalent hook. Module name is case insensitive while function name is case sensitive.
Original : change to ! to indicate that's hash value.
e.g. /HA4@123:foo.bar:abc -> /HA4@123!BD097770!C5840063

On the left is a list of all games you have attached and added to profile.
They other three boxes stores information about remote downloaded profiles.
Click Refresh to list all profiles stored locally. You can update this list with the updater.
Click a game on the left and click Find, ITH will find a correspond profile according to executable hash value.
The click import to copy all information and insert hooks.
.

Option:

Spoiler

Split time: Time interval to insert line break. At least 100.
Process delay: ITH will check one process if it's in profile. At least 50.
If there are N processes running on your system, it takes N*PD for one round.
This is the longest time ITH waits to attach after a process in profile launched.
Inject delay: ITH delay attach after process in profile found. At least 1000.
Insert delay: ITH delay insert hooks after attached. At least 200.
Auto attach: ITH will attach to processes in profile automatically.
Auto insert: ITH will insert hooks after attached automatically. Notice that auto insert will not work if auto attach is not enabled.
Time unit is millisecond.
Suppress: Enable suppress repetition function. This is the case ABCABCABC.
Clipboard: ITH will copy the last sentence to the clipboard. Other tools which monitors clipboard will make use of it.
Here "last sentence" means characters from right after the last line break to the current character

Global filter: Global filter is a customizable filter that will apply to all threads.
Currently only single character policy is implemented.
Maybe in the future I will introduce more complex rules into ITH.
All characters in the filter list will be filterd out before dispatch to correspond thread.
Therefore those characters will not appear in final output.

Full space at the beginning is by default filtered. If it's in the middle of a sentence,
explicitly add it to global filter list.

Miscellaneous:

Spoiler

Top: ITH will stay on top when pushed.
Clear: Wiped out the text in the current thread.
Save: Save profile for current game.
This includes UserHooks, thread links, thread comment, and current select thread.

Suspend/Terminate thread: You can suspend terminate some thread of some process.
Select one thread and operation type, then click Execute.
There's an box in the right upper corner of the process dialog.
If you enter an function address here then operation will be proceed on all threads with the same start address.

ITH is able to attach to multiple processes at same time, although it seems useless now.
When you close ITH while a program, which is already attached by ITH, is still running,
open ITH once more then ITH will automatically attached to that program.

Link: You can type L[num1]-[num2] in the command line (without brackets, only number).
ITH will make a link from thread num1 to thread num2.
All text thread num1 receives will also be sent to thread num2.

ITH will remove single character repetition, that is the case AAABBBCCC....

ITH64:

Spoiler

Based on worldwide data taken during June 2010 from Windows Update 46% of Windows 7 PCs run the 64-bit edition of Windows 7.
It's likely that more and more game engine will have a 64 bit version. Currently already one exist(CMVS64).
Neither current ITH nor AGTH can hook 64 bit process, since they're all 32 bit program.
ITH64 is designed to address this problem. It's native 64 bit program. Its internal architecture is reformed to fit the 64 bit environment.
Although it's possible for ITH64 to hook 32 bit process, I want to leave that task to original ITH currently.
In other word, ITH64 will NOT hook ANY 32 bit process. Please use original ITH instead.
Maybe at some future point I will write a compatible layer. Then you need only ITH64 to do all your hook task.

Usage of ITH64 is almost the same to original ITH. Only difference is about register representation in h-code.
Original h-code has the following register map:
EAX -> -4, ECX -> -8 ... EDI -> -20
New 64 bit version is as this:
RAX -> 0, RCX -> -8 ... RDI -> -38, R8-> -40 ... R15 -> -78
It becomes zero-based and the increment is changed from 4 to 8.

Example code for current CMVS64 engine.
/HA-40:-48@4E050:cmvs64.exe
This means that at 4E050 in module cmvs64, r8 contains data and r9 stores split parameter.
Be aware of architecture difference when writing h-code for ITH64.
I strongly recommend that new code use a base-offset style to indicate the real address.
Not only because the address has become longer, but also to avoid problems when the target module is map into random address.

Why ITH:

Spoiler

AGTH is a big success in text extraction.
With UserHook function it can solve more than 95% current text extract issues.
But new games usually need H-code to help AGTH working, and common users have no way to write one.
ITH is designed to recognize much more game engines than AGTH and insert proper hooks automatically.

1)ITH now can detect many popular game engines.
Currently KiriKiri, BGI, RealLive, ShinaRio, CMVS, MAJIRO, rUGP, Malie, NitroPlus, Lune, QLIE,
Apricot, CandySoft, AB2Try, Debonosu, System40, CIRCUS, AtelierKaguya, Waffle, YU-RIS,
TinkerBell, AbelSoftware, SofthouseChara, LiveMaker, Bruns, CaramelBox, Pensil.
More will be added later. If you find some engine ITH currently can't detect, feel free to request it here.
I will then study that engine and try to find a way to detect it.
General speaking ITH works well without special codes for more than 70% new released games.

3)ITH is able to insert multiple UserHooks into target process, while AGTH only one.

4)ITH can join threads together as your wish(Link function), while AGTH will join many together, sometimes with useless threads.
Since ITH is able to insert multiple UserHooks, this also means you can join text from different hooks together.
This is useful when the text process function appears at different place.

5)ITH can detach from process, remove/modify UserHook while the process is running.
You don't need to restart the process when you find you have inserted wrong hooks.
Bad hooks won't crash the process, just yield error message.
This means you can use try-error method to guess hook code more efficiently.

6)ITH is open source and is under developing. More features will be added to ITH in future versions.

a) AGTH has option to hook common system routine(/X?), ITH currently only hook APIs in GDI32.dll

IMPORTANT note:
I have submit this program to VirusTotal, some anti-virus software report ITH as malware.
I use NOD32 and it report nothing here. There is some aggressive technique that may be used in virus.
ITH requires administrative privilege to function properly, means it has potential to damage your computer.
I promise that original ITH will not
1)spy programs other than you tell it to attach,
2)create/modify/move/delete any files without explicit prompt, other than "ITH.pro" and "ITH.ini" resides in its folder,
In the case of ITH64, it will create "ITH64.pro" and "ITH64.ini" respectively.
3)create/write/delete any system registry keys,
4)send/receive any information through network.
Make sure you have checked hash values to ensure it is original version.

From 2.2 source code of ITH is under GPLv3. Older source is no longer available.
ITH is written in C++ and inline assembly, compiled by VC10.0.
A ready-for-compile project pack is also uploaded. Please get ntdll.lib and msvcrt.lib from latest WDK.
Since I begin develop ITH with VS10.0 so maybe it's inconvenient for those under 10.0.
I develop this program in WIN7 64 so it's assumed to work well under both 32 and 64bit OS.

Comment

Great to see someone else playing with this kinda thing. Especially someone who releases source code.

Two things:
1) Might want to share your project files, to encourage other people to mess with your code.
2) If you want to be able to send text directly to TA and have it treated either like the clipboard (Single auto-translated context) without actually using the clipboard, or send text to TA like the internal hooking does (Contexts treated independently, so can set up TA's filters on a per-game basis - TA's are pretty bad, though, so might be better off with the first option), happy to put together a sockets interface to do so. No pipes, though. I really don't care for their locking behavior. This is something I've been thinking about doing for a while. Might be cool to make it so other programs can get text from TA, too, for things like devOSD.

Nice thing about not using the clipboard is you turn off TA's clipboard context, and then can modify the text in TA a bit, copying it to/from 3rd party windows without worrying about TA grabbing it when you don't want it to do so.

Comment

@kactaplb: See section "Why ITH or why not". Hope it will help you get a brief image of ITH over AGTH.

@ScumSuckingPig: I arranged my project files and upload a pack.
About designing an interface, I'm very glad to see TA to provide a standardized API to let 3rd party join in.
At first I also tried SOCKET for IPC, every time I attach to a process it takes a while to establish a connection.
Then I change to named pipe seems pipes do this faster. I think I will try to write a separate dll to handle
all SOCKET communication then and prevent ITH from generating any network traffic.

@hyakki: Thank you for pointed out that.
Actually ITH do the same now. A manager dialog is in plan,
maybe future versions will provide you a convenient way.

Comment

AGTH has a great feature. You can just add it to the shortcut for a game and that's it. Then you can use Windows to start the game. There's no better way to start games, because Windows is optimized for starting applications. I hope you add this functionality to ITH in the future, because using the GUI to attach to a process is bothersome.

Comment

ITH now is able to detect some popular game engine. Currently KiriKiri, BGI and RealLive.
I'm not sure if the currently heuristic method works well for every game.
If you find one ITH doesn't work with please post it here. Then I will try to improve it.
Current identify rule
KiriKiri: game folder contain data.xp3
BGI: game folder contain bgi.hvl
RealLive: process name has substring "reallive"
Games I have tested:

Comment

Thanks for you work. From your log I noticed a fatal bug of my program...
When PID of a process is 3 digit(<1000) then engine detect may not work properly.
Now it's fixed, and a new button is added to switch on/off copy to clipboard function.
Hex value issue is also resolved.

Comment

@setsumi: Thanks for point out that. It's now fixed. It happens when BGI doesn't import TextOut or GetGlyphOutline.
I forgot to return from some branches. I think I will have games I list here tested more carefully then.

System40 engine from Alicesoft is support now.
Tested on [080229][Alicesoft]超昂閃忍ハルカ