It could be good to have a small client along with it, just like lib bearssl have brssl. At least for testing if works across the whole chain.

If libgbt needs other libraries, this opens a library chain: someone writing a wrapper library over libgbt to build its client more "dev friendly", which is in turn used in a GUI widget to stream videos used in a larger program.

What about an open and free implementation of sha1.c and md5.c in the repo ?

Maybe it could not support md5 at all as it is quite deprecated in the RFC.

Having a client shipped with it is indeed a good idea. The library is too young yet though.
It's you're talking about free impl. of SHA1, as I just started exporting sha1.c from libtomcrypt!
For MD5, we could do the same if needed

r4ndom is working on the bencode() function right now. I wrote a small code to dump a torrent's content to stdout, but it's not really useful. For such tools, I don't think it's a good idea to include them in the repo, as we'll end up with many tools for no practical reason. I prefer keeping them outside the tree until the lib is feature complete.

As for the prefixes, I settled on simply 'b' for bencoding related function. And no, a list of prefixes in comment shouldn't be needed as it should be obvious when reading existing code. That's also why the first patches someone submits should be reviewed by existing contributors

(06-08-2017, 07:10 AM)z3bra Wrote: As for the prefixes, I settled on simply 'b' for bencoding related function

Perfect :)

(06-08-2017, 07:10 AM)z3bra Wrote: to include them in the repo

Maybe some could be turned into tests.

It may be a bit early to think about it, but this came on its own while reading about bittorrent.
Once we start to transfer data from a peer, how do we store it? Here is a prososition:

An approach is to store the parts into files and directories: Parts gets downloaded to a memory buffer, and once one is complete, it gets saved to the disk as <hash of torrent>/<hash of the part>:

Code:

|-- 2072a695613e5103d9ac03c2885c5e2656cb5ff0 # hash of the torrent #1
| |
| |-- 80c50e142f978130d9a69b4a15267896f0d72abe # parts of the torrent
| |-- d345da49668dbfc05d559e18e5d5975951fc41ac # named after their hash
| `-- 376456435d4ceec3acb6ab963107280ef80aca1b # one file per part
|
`-- 903e39fd73579cfe5a2d97daa6ec9bcc61cd01cc # hash of the torrent #2
|
`-- 376456435d4ceec3acb6ab963107280ef80aca1b # has the same part as #1

Advantages:

permit multiple workers (threads, processes...) to read the torrents parts at the same time.

very intuitive and transparent for the end user or dev writing a client: save part <phash> of a torrent <thash> == save to somedir/thash/phash

very easy to check the integrity oh the parts:
filename_of_part == computed_hash_of_part

low memory usage: only the piece currently being downloaded needs to be cached into a memory buffer, the rest goes to the disk.

Disadvantage:

memory usage could be improved by storing data into files as it gets received, at the price of atomicity. Or having a file listing the complete parts, but that is less intuitive.

parts that are similar across multiple torrents gets downloaded twice.

what other problems? do you want another design?

To overcome this, it would be possible to store every single part in an unique dir for all the torrents, but then, race condition could occur: if two process/threads download the same part at the same time, the first one write it to the disk.

Instead, before starting a download, a worker could seek in the parts directories of the other torrent if it can find the existing part.

I would go to the simplest way, with one directory per torrent, which still permit optimizations.

After talking a bit on ##bittorrent @freenode, I learned how clients seems to implement it:

Some put parts on multiple files in some way or another (like above).

But most are putting the parts directly in the torrent file:

1 - Write parts at the beginning of the torrent file (the full data blob, not the .torrent metainfo file), and sort them as they come:

Let say I have part 3: I put it in the beginning of a new file

Then I get part 1, I put it in front of part 3

Finally I get part 2, and put it in-between

This saves space, at the risk of "not enough space on disk" error if the file gets two big. You need to keep a track of where are the parts, and wait the sorting is finished while before to read/write further parts. This temporarily saves space on disk, but looks quite complex.

2 - Or they allocate storage for the file (such as an empty 2GB file) and fill it with the parts as they come, writing them with the correct offset. This way is much simpler: as you have a list of which part goes where, there is no sorting involved: read where should the part go, an you have where you should read it.

With this latter approach, in the case of multifile torrents:

fill one big file with all the parts, and split it when all the parts are there

directly split the parts as they come.

These two approaches (1- and 2-) has an advantage: no need to keep the parts files (which cost a lot of storage [EDIT] and inodes). On the other hand, if the final file is moved, it can not be seeded anymore.

If I was me, I would still do one file per part, but you are no me. :)