I have a very tricky question. It looked easy to me at first but I am not able to achieve this. I have a PS file with millions of records in it. Consider that the file has two keys. This file is sorted ascending based on one of the key. I need to group the records based on the second key without changing the order.

For Example. The input file has data like

Code:

2AAA
2BBB
1CCC
2DDD
4EEE
6FFF
5GGG
4HHH

The file is sorted 2 to 4 bytes.

I need to group the data based on the 1st byte but not sorting it. I want the output data to look like this.

The distance between the records is not always same or close. The file that I want to achieve this is huge (about 1 million records). Its a VB file with record length 28237. The input data would look like this. (Sample with the first 30 bytes)

The first 8 bytes are always the same. I would need to group the data from position 9 to next 18 bytes, without sorting the data.

I tried to take the firstdup using ICETOOL and the joining using SORT with the actual file. Even then the data gets sorted based on the key specified in the join keys. (And I learnt that we cannot join without sorting the data)
[/code]

I should have thought of this before. My bad. I was trying to add sequence number to the huge file and working it out. Now I have got the output I needed. Here is what I did.

1. Added sequence number to the huge file and created a temp file with just the keys.
2. Used ICETOOL to extract the first entry from the key file.
3. Joined the huge file with the Key file and built my desired output.

You haven't yet addressed the fact that your big file is VB. You'll have to use OUTFIL VTOF to get your fixed-length key file.

You're sorting the big boy twice, and reading it all another time.

I think I can get rid of the two SORTs and save on a lot of data-movement as you add sequence numbers, but it looks like it might be a once-off task, so if you have a working solution, unless you have to sit and watch it for 20 hours, you'd be good already.

A simple way to do it, with two SORTs, is to SORT on the key (with EQUALS) and then use IFTHEN=(WHEN=GROUP to propagate the first sequence number of a key across all records of that key, then SORT on that sequence number and strip them off.

Extracting the keys becomes useful if you want to avoid the SORTs. Basically removing the data from where you don't want it (doesn't change the original order of the first reference of each key) and inserting the removed data after the last record of the first group of that key.

A simple way to do it, with two SORTs, is to SORT on the key (with EQUALS) and then use IFTHEN=(WHEN=GROUP to propagate the first sequence number of a key across all records of that key, then SORT on that sequence number and strip them off.

Can you please explain it more?? I am not familiar with IFTHEN=(WHEN=GROUP.

IFTHEN=(WHEN=GROUP is a way to mark a group of records. It comes with PUSH which is similar in action to OVERLAY but which can only use data from the current record or use the specialised ID and SEQ (ID is a sequence number per group, SEQ a sequence number within the group).

It is documented in the SyncSORT manual, and you will find examples here and in the DFSORT part of the forum, and through your favourite internet search engine.

DEFSORT has KEYBEGIN for WHEN=GROUP. SyncSORT does not/may not, but it can be emulated by a SEQNUM with RESTART= and then BEGIN= for zero in that position.

Find the documentation, find some examples, experiment. If you have problems, ask a new question rather than continuing this one.

It would be great to know all the things which are now in 1.4.x which weren't there previously, and which of those are documented, or just work.

If you have something of a list, we can make it a "sticky" on this forum and extend it as more information becomes available. JNFnCNTL on JOINKEYS is supported but, I think, not documented, for instance.