Ok I have made a start not sure it is right but getting there. Just reading comments at the moment.

Each packet starts with 'OggS' and the comments are in the second packet.

1. open the file and search for the second 'OggS'2. From this point search for 'vorbis'3. The next four bytes are a little endian number so read these four bytes as a long number

In a HEX editor you will see as an example76 6F 72 62 69 73 34 00 00 00which is 'vorbis' followed by a four byte number in real life 00 00 00 34 (52 in decimal remember it is little endian)

4. Read in the next 52 bytes as a UTF-8 encoded string. This is the vendor string.

5. Read the next four bytes as a number this gives you the number of comments in the file.

6. The four bytes after this are the length of the comment.

7. Read in this number of bytes as a UTF-8 encoded string

Repeat steps 6 and 7 for the number of comments retrieved in step 5

All comments are in the form 'TITLE=Tell Me Why'

The first part before the = is the comment title (not TITLE) the second part is the comment content.

Some files have picture metadata in them I have not sorted this out yet but it is still read the same way.

Writing new comments I have not looked into yet but I think it is copy file to comment start to another temp file add new comments including total and length for each then from comments end, in original, copy all bytes to temp file, delete original file then rename temp to original.

Here is the code for a module and example use to read ogg comments please feel free to suggest improvements.

Code:

DeclareModule oggTags

Structure tag Title.s Value.s EndStructure

Declare GetComments(FileName.s,List Comments.tag())

EndDeclareModule

Module oggTags

Structure ByteArray byte.b[0] EndStructure

Procedure.i QuickSearch (*mainMem.ByteArray, mainSize.i, *findMem.ByteArray, findSize.i, startOff.i=0) ; -- Simplification of the Boyer-Moore algorithm; ; searches for a sequence of bytes in memory ; (not for characters, so it works in ASCII mode and Unicode mode) ; in : *mainMem: pointer to memory area where to search ; mainSize: size of memory area where to search (bytes) ; *findMem: pointer to byte sequence to search for ; findSize: number of bytes to search for ; startOff: offset in <mainMem>, where the search begins (bytes) ; out: offset in <mainMem>, where <findMem> was found (bytes); ; -1 if not found ; Note: The first offset is 0 (not 1)! ; ; after <http://www-igm.univ-mlv.fr/~lecroq/string/node19.html#SECTION00190>, 31.8.2008 ; (translated from C to PureBasic by Little John) Protected i.i, diff.i Protected Dim badByte.i(255)

; Preprocessing For i = 0 To 255 badByte(i) = findSize + 1 Next For i = 0 To findSize - 1 badByte(*findMem\byte[i] & #FF) = findSize - i Next

Procedure.q FindInFile (infile.i, *find, findSize.i, startOff.q=0, bufferSize.i=4096) ;Code From Purebasic Forum By littleJohn ; -- Looks in <infile> for byte sequence at *find; ; works in ASCII mode and Unicode mode. ; in : infile : number of a file, that was opened for reading ; *find : pointer to byte sequence to search for ; findSize : number of bytes to search for ; startOff : offset in the file where the search begins (bytes) ; bufferSize: size of used memory buffer (bytes) ; out: offset in the file, where byte sequence at *find was found (bytes), ; -1 if byte sequence at *find was not found in <infile>, ; -2 on error ; Note: The first offset is 0 (not 1)! Protected *buffer Protected offset.q, move.i, bytes.i

What's not entirely clear to me is how to handle ogg files with multiple streams inside them.Do you know if an ogg file can contain multiple vorbis streams ?

I also read a comment header can span multiple pages.Do you know if the multiple pages from a comment header always follow each other directly when there are multiple streams or can a page from another stream be multiplexed in between them ?

The biggest problem I have had is the terminology used it differs from one explanation to another so this is my explanation.

An ogg file is split into pages every page begins with "OggS" and a version number byte after (allways 0). Each page ends when another page starts.

The little routine I wrote to look for a pattern in a file returning the position in the file of each occurrence of the pattern returns a list of offsets.

So for example the third offset returned is the start of the third page and is also the end of the second page.

Now each page actually contains two parts, a header and segments.

First the segments. A segment is a block of 0 to 255 bytes. Each page can contain up to 255 segments.

Now the header. The header is of variable length, you will read that the header is fixed at 27 bytes but it also has a segment table after the 27 bytes whose length is determinedby the number of segments used.

It is the 27 bytes you are interested in for the streams. Here is a header structure to explain.

The interesting parts are the Stream serial number and the page number (I find it easier to call this sequence it counts from 1 to number of pages in stream in increments of 1).

To assemble a stream completely you would have to go through the whole file finding all the pages with the same Stream serial number and then assemble themin the order dictated by the page number (sequence).

If there is a second stream there will be pages with a different stream serial number.

Now vorbis. As I understand it a vorbis stream is comments followed by stream data comments are not mixed up in the vorbis stream thay are only at the beginning.

So for any vorbis stream find the length of the comments then you have the start of the stream data.

Of course we are talking vorbis in an ogg file here. I can find nothing which guarantees the position of an ogg page so with stream serial and sequence they can be anywhere in any order.

Except page 0 of course. Pages of different streams can then be mixed sorted when read.

Comments can logically span multiple pages this is worked out using the segment table in a page header.

vorbis comments are allways first in a vorbis stream so find the page of the stream you are interested in then calculate the length of the comments section.

Here is a typical segment table

10 D3 FF FF FF FF FF FF FF FF FF FF FF FF FF FF

The first byte is 10 (16) showing 16 segments used in this page. (Confused here seems 16 - 1 are used)

From the 10 you start with zero and read each byte in turn if it is FF add to previous byte(0 at first not 10) if less than FF add to previous and this is the end of this section i.e. comments in this case.

In the example the next byte is D3 which is < FF so comments uses D3 bytes.

The next byte in a vorbis stream is the start of the stream data.

If the whole segment table was FF then you have to move to page 2 in the stream and keep reading to get the comments length.

Hope this helps

CD

_________________Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.

When talking about ogg vorbis we are talking about two separate things.

ogg is a container which can hold many types of file. I imagine it as having a piece of paper with writing on it. this can be seen as a standard text file extension .txt I can put this sheet into an envelope, now to read the text I have to open the envelope to get at the piece of paper. The envelope in this case is called .ogg so to read the text file I have to take it out of the ogg.

Confusion starts as vorbis have hogged the ogg envelope (forgive the pun).

You can pack any file into an ogg container.

vorbis is a data compression format especially for audio files I think. You could have a .vorbis file the inside of which would look like

vorbis comments|vorbis bitstream

ogg vorbis is a vorbis bit stream packed into an ogg container.

Ah more mud.

CD

_________________Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.

I think the biggest problem when editing comments would be handling comment packets that span multiple pages.Especially if the updated comment packet consists of a different number of pages compared to the original one.Since the crc also seems to include the page number, you would have to recalculate the crc for all pages were the page number changes.

Luckily comments on an audio file rarely exceed 2Kb but I am looking at taking it easy and padding the comments page to get 8Kb of comment space then add the original or edited comments back. Only one page to deal with then. Have to remember that a cover art image can be encoded as a comment so limited size available (resized images not whole hidef images).

The first page of a stream allways seems to be loaded with 15 segments of preamble for the bitstream the comments normally taking up just one or two segments so with 255 segments available in the page padding the comments section to use 32 segments should not be a problem. I cannot see comments taking up more than this in every ogg vorbis file.

I am doing this to extend the ogg player I am writing here https://www.purebasic.fr/english/viewtopic.php?f=12&t=73239 and have thought about when people want more info than comments can deliver and maybe add a link to a relevant CDDB page as a comment and add a link to a personal music database on the clients machine where they can store as much as they like. Needs to be a balance between what is easily achievable and client expectations.

Most other ogg players will simply ignore these comments some may display them but they will only have meaning in my little player.CD

_________________Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.

;The offset is iLoop i.e. where in this loop we are at plus the amount of data allready read FoundAt() = iLoop + DataAmount

EndIf

Next iLoop

If DataRead = BufferSize ;Not End Of File If SearchPosition = 0 ;If this is the first block to stop negative FileSeek() SearchPosition = BufferSize - (SLength- 1) ;SLength- 1 just in case string found in last few bytes so next block read will not include whole string ;but will include the whole string if the string crosses the read boundary Else ;Same as above but totaling all data read SearchPosition = SearchPosition + (BufferSize - (SLength- 1)) EndIf

;Set DataAmount to be actual amount of dataread for this loop DataAmount = SearchPosition

;SLength is the length of the buffer required for the string in a particular format SLength = StringByteLength(StringToFind,#PB_UTF8) ;Zero string terminator not counted *SearchBuffer = AllocateMemory(SLength) PokeS(*SearchBuffer,StringToFind,SLength,#PB_UTF8|#PB_String_NoZero)

;Used to search in a .txt file FindPatternInFile(FileToSearch,*SearchBuffer,SLength,OffSets())

ShowStreamSerial(FileToSearch)

EndIf

Select an ogg file

debug shows each page found with Stream serial and segments used.

I do not have a multistream ogg file so my serial is allways the same but if you have one then two different serials will show on the page where they are. Not counting number of streams or saving offsets to load each stream.

Cd

_________________Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.

After some more reading online ...If I understand correctly, an ogg file can have multiple streams but only 1 vorbis stream at a time.Chaining however is allowed. So you can have multiple vorbis streams sequentially in one ogg file that should be played one after the other.

Do you have an ogg file with embedded artwork ?I tried to find one online to test with but can't find one.

If I am reading the vorbis spec correctly there are three required vorbis (not ogg) headers each beginning with 'vorbis' the last of these finishes the ogg page on which it resides with the audio data begining on the next ogg page.

The file I sent you does indeed have comments spanning two pages so with a variable number of pages depending on the length of the comments I will have redo all pages in the ogg file if the number of pages changes Argh!

Thanks wilbert great

_________________Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.

The idea is that a vorbis stream in an ogg file can be extracted. It has four sections in total.

1. An Identity header2. A comments header3. A header for the codec4. The audio stream

Can you explain a bit more what you exactly want ?Extracting a stream if the ogg file has multiple streams should be as simple as extracting all pages with the same bitstream serial number.For your initial idea of editing a comment tag, this shouldn't be required.Editing the comments header and updating the page number of all following pages should be sufficient.