Alex Akselrod and I would like to propose a new light client BIP forconsideration:* https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki

This BIP proposal describes a concrete specification (along with areference implementations[1][2][3]) for the much discussed client-sidefiltering reversal of BIP-37. The precise details are described in theBIP, but as a summary: we've implemented a new light-client mode that usesclient-side filtering based off of Golomb-Rice coded sets. Full-nodesmaintain an additional index of the chain, and serve this compact filter(the index) to light clients which request them. Light clients then fetchthese filters, query the locally and _maybe_ fetch the block if a relevantitem matches. The cool part is that blocks can be fetched from _any_source, once the light client deems it necessary. Our primary motivationfor this work was enabling a light client mode for lnd[4] in order tosupport a more light-weight back end paving the way for the usage ofLightning on mobile phones and other devices. We've integrated neutrinoas a back end for lnd, and will be making the updated code public verysoon.

One specific area we'd like feedback on is the parameter selection. UnlikeBIP-37 which allows clients to dynamically tune their false positive rate,our proposal uses a _fixed_ false-positive. Within the document, it'scurrently specified as P = 1/2^20. We've done a bit of analysis andoptimization attempting to optimize the following sum:filter_download_bandwidth + expected_block_false_positive_bandwidth. Alexhas made a JS calculator that allows y'all to explore the affect oftweaking the false positive rate in addition to the following variables:the number of items the wallet is scanning for, the size of the blocks,number of blocks fetched, and the size of the filters themselves. Thecalculator calculates the expected bandwidth utilization using the CDF ofthe Geometric Distribution. The calculator can be found here:https://aakselrod.github.io/gcs_calc.html. Alex also has an empiricalscript he's been running on actual data, and the results seem to match uprather nicely.

We we're excited to see that Karl Johan Alm (kallewoof) has done some(rather extensive!) analysis of his own, focusing on a distinct encodingtype [5]. I haven't had the time yet to dig into his report yet, but Ithink I've read enough to extract the key difference in our encodings: hisfilters use a binomial encoding _directly_ on the filter contents, will weinstead create a Golomb-Coded set with the contents being _hashes_ (we usesiphash) of the filter items.

Using a fixed fp=20, I have some stats detailing the total index size, aswell as averages for both mainnet and testnet. For mainnet, using thefilter contents as currently described in the BIP (basic + extended), thetotal size of the index comes out to 6.9GB. The break down is as follows:

Finally, here are the testnet stats which take into account the increasein the maximum filter size due to segwit's block-size increase. The maxfilter sizes are a bit larger due to some of the habitual blocks Icreated last year when testing segwit (transactions with 30k inputs, 30koutputs, etc).

For those that are interested in the raw data, I've uploaded a CSV fileof raw data for each block (mainnet + testnet), which can be found here:* mainnet: (14MB):https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0* testnet: (25MB):https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0

Thanks for sending this proposal! I look forward to having a greatdiscussion around this.

- Eric

On Thursday, June 1, 2017, Olaoluwa Osuntokun via bitcoin-dev <

Post by Olaoluwa Osuntokun via bitcoin-devHi y'all,Alex Akselrod and I would like to propose a new light client BIP for* https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawikiThis BIP proposal describes a concrete specification (along with areference implementations[1][2][3]) for the much discussed client-sidefiltering reversal of BIP-37. The precise details are described in theBIP, but as a summary: we've implemented a new light-client mode that usesclient-side filtering based off of Golomb-Rice coded sets. Full-nodesmaintain an additional index of the chain, and serve this compact filter(the index) to light clients which request them. Light clients then fetchthese filters, query the locally and _maybe_ fetch the block if a relevantitem matches. The cool part is that blocks can be fetched from _any_source, once the light client deems it necessary. Our primary motivationfor this work was enabling a light client mode for lnd[4] in order tosupport a more light-weight back end paving the way for the usage ofLightning on mobile phones and other devices. We've integrated neutrinoas a back end for lnd, and will be making the updated code public verysoon.One specific area we'd like feedback on is the parameter selection. UnlikeBIP-37 which allows clients to dynamically tune their false positive rate,our proposal uses a _fixed_ false-positive. Within the document, it'scurrently specified as P = 1/2^20. We've done a bit of analysis andfilter_download_bandwidth + expected_block_false_positive_bandwidth. Alexhas made a JS calculator that allows y'all to explore the affect ofthe number of items the wallet is scanning for, the size of the blocks,number of blocks fetched, and the size of the filters themselves. Thecalculator calculates the expected bandwidth utilization using the CDF ofhttps://aakselrod.github.io/gcs_calc.html. Alex also has an empiricalscript he's been running on actual data, and the results seem to match uprather nicely.We we're excited to see that Karl Johan Alm (kallewoof) has done some(rather extensive!) analysis of his own, focusing on a distinct encodingtype [5]. I haven't had the time yet to dig into his report yet, but Ithink I've read enough to extract the key difference in our encodings: hisfilters use a binomial encoding _directly_ on the filter contents, will weinstead create a Golomb-Coded set with the contents being _hashes_ (we usesiphash) of the filter items.Using a fixed fp=20, I have some stats detailing the total index size, aswell as averages for both mainnet and testnet. For mainnet, using thefilter contents as currently described in the BIP (basic + extended), the* total size: 6976047156* total avg: 14997.220622758816* total median: 3801* total max: 79155* regular size: 3117183743* regular avg: 6701.372750217131* regular median: 1734* regular max: 67533* extended size: 3858863413* extended avg: 8295.847872541684* extended median: 2041* extended max: 52508In order to consider the average+median filter sizes in a world worth* total size: 2753238530* total avg: 5918.95736054141* total median: 60202* total max: 74983* regular size: 1165148878* regular avg: 2504.856172982827* regular median: 24812* regular max: 64554* extended size: 1588089652* extended avg: 3414.1011875585823* extended median: 35260* extended max: 41731Finally, here are the testnet stats which take into account the increasein the maximum filter size due to segwit's block-size increase. The maxfilter sizes are a bit larger due to some of the habitual blocks Icreated last year when testing segwit (transactions with 30k inputs, 30koutputs, etc).* total size: 585087597* total avg: 520.8839608674402* total median: 20* total max: 164598* regular size: 299325029* regular avg: 266.4790836307566* regular median: 13* regular max: 164583* extended size: 285762568* extended avg: 254.4048772366836* extended median: 7* extended max: 127631For those that are interested in the raw data, I've uploaded a CSV file* mainnet: (14MB): https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0* testnet: (25MB): https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0We look forward to getting feedback from all of y'all!-- Laolu[1]: https://github.com/lightninglabs/neutrino[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs[4]: https://github.com/lightningnetwork/lnd/-- Laolu

Quick comment before I finish reading it completely, looks like you have no way to match the input prevouts being spent, which is rather nice from a "watch for this output being spent" pov.

Post by Olaoluwa Osuntokun via bitcoin-devHi y'all,Alex Akselrod and I would like to propose a new light client BIP for*https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawikiThis BIP proposal describes a concrete specification (along with areference implementations[1][2][3]) for the much discussed client-sidefiltering reversal of BIP-37. The precise details are described in theBIP, but as a summary: we've implemented a new light-client mode that usesclient-side filtering based off of Golomb-Rice coded sets. Full-nodesmaintain an additional index of the chain, and serve this compact filter(the index) to light clients which request them. Light clients then fetchthese filters, query the locally and _maybe_ fetch the block if a relevantitem matches. The cool part is that blocks can be fetched from _any_source, once the light client deems it necessary. Our primarymotivationfor this work was enabling a light client mode for lnd[4] in order tosupport a more light-weight back end paving the way for the usage ofLightning on mobile phones and other devices. We've integrated neutrinoas a back end for lnd, and will be making the updated code public verysoon.One specific area we'd like feedback on is the parameter selection. UnlikeBIP-37 which allows clients to dynamically tune their false positive rate,our proposal uses a _fixed_ false-positive. Within the document, it'scurrently specified as P = 1/2^20. We've done a bit of analysis andfilter_download_bandwidth + expected_block_false_positive_bandwidth. Alexhas made a JS calculator that allows y'all to explore the affect oftweaking the false positive rate in addition to the followingthe number of items the wallet is scanning for, the size of the blocks,number of blocks fetched, and the size of the filters themselves. Thecalculator calculates the expected bandwidth utilization using the CDF ofhttps://aakselrod.github.io/gcs_calc.html. Alex also has an empiricalscript he's been running on actual data, and the results seem to match uprather nicely.We we're excited to see that Karl Johan Alm (kallewoof) has done some(rather extensive!) analysis of his own, focusing on a distinctencodingtype [5]. I haven't had the time yet to dig into his report yet, but Ithink I've read enough to extract the key difference in our encodings: hisfilters use a binomial encoding _directly_ on the filter contents, will weinstead create a Golomb-Coded set with the contents being _hashes_ (we usesiphash) of the filter items.Using a fixed fp=20, I have some stats detailing the total index size, aswell as averages for both mainnet and testnet. For mainnet, using thefilter contents as currently described in the BIP (basic + extended), the* total size: 6976047156* total avg: 14997.220622758816* total median: 3801* total max: 79155* regular size: 3117183743* regular avg: 6701.372750217131* regular median: 1734* regular max: 67533* extended size: 3858863413* extended avg: 8295.847872541684* extended median: 2041* extended max: 52508In order to consider the average+median filter sizes in a world worth* total size: 2753238530* total avg: 5918.95736054141* total median: 60202* total max: 74983* regular size: 1165148878* regular avg: 2504.856172982827* regular median: 24812* regular max: 64554* extended size: 1588089652* extended avg: 3414.1011875585823* extended median: 35260* extended max: 41731Finally, here are the testnet stats which take into account theincreasein the maximum filter size due to segwit's block-size increase. The maxfilter sizes are a bit larger due to some of the habitual blocks Icreated last year when testing segwit (transactions with 30k inputs, 30koutputs, etc).* total size: 585087597* total avg: 520.8839608674402* total median: 20* total max: 164598* regular size: 299325029* regular avg: 266.4790836307566* regular median: 13* regular max: 164583* extended size: 285762568* extended avg: 254.4048772366836* extended median: 7* extended max: 127631For those that are interested in the raw data, I've uploaded a CSV filehttps://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0We look forward to getting feedback from all of y'all!-- Laolu[1]: https://github.com/lightninglabs/neutrino[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs[4]: https://github.com/lightningnetwork/lnd/-- Laolu

Thanks Eric! We really appreciated the early feedback you gave on theinitial design.

One aspect which isn't in this BIP draft is direct support for unconfirmedtransactions. I consider such a feature an important UX feature for mobilephones, and something which I've personally seen as an importantUX-experience when on-boarding new users to Bitcoin. This was brought upin the original "bfd" mailing list chain [1]. Possible solutions are: anew beefier INV message which contains enough information to be able toidentify relevant outputs created in a transaction, or a "streaming" p2pextension that allows light clients to receive notifications of mempoolinclusion based on only (pkScript, amount) pairs.

Within the integration for lnd, we specifically use this feature to beable to watch for when channels have been closed within the network graph,or channels _directly_ under our control have been spent (eitherunilateral channel closure, or a revocation beach).

Post by Eric Lombrozo via bitcoin-devQuick comment before I finish reading it completely, looks like you haveno way to match the input prevouts being spent, which is rather nice from a"watch for this output being spent" pov.On June 1, 2017 3:01:14 PM EDT, Olaoluwa Osuntokun via bitcoin-dev <

Post by Olaoluwa Osuntokun via bitcoin-devHi y'all,Alex Akselrod and I would like to propose a new light client BIP for*https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawikiThis BIP proposal describes a concrete specification (along with areference implementations[1][2][3]) for the much discussed client-sidefiltering reversal of BIP-37. The precise details are described in theBIP, but as a summary: we've implemented a new light-client mode that usesclient-side filtering based off of Golomb-Rice coded sets. Full-nodesmaintain an additional index of the chain, and serve this compact filter(the index) to light clients which request them. Light clients then fetchthese filters, query the locally and _maybe_ fetch the block if a relevantitem matches. The cool part is that blocks can be fetched from _any_source, once the light client deems it necessary. Our primarymotivationfor this work was enabling a light client mode for lnd[4] in order tosupport a more light-weight back end paving the way for the usage ofLightning on mobile phones and other devices. We've integrated neutrinoas a back end for lnd, and will be making the updated code public verysoon.One specific area we'd like feedback on is the parameter selection. UnlikeBIP-37 which allows clients to dynamically tune their false positive rate,our proposal uses a _fixed_ false-positive. Within the document, it'scurrently specified as P = 1/2^20. We've done a bit of analysis andfilter_download_bandwidth + expected_block_false_positive_bandwidth. Alexhas made a JS calculator that allows y'all to explore the affect oftweaking the false positive rate in addition to the followingthe number of items the wallet is scanning for, the size of the blocks,number of blocks fetched, and the size of the filters themselves. Thecalculator calculates the expected bandwidth utilization using the CDF ofhttps://aakselrod.github.io/gcs_calc.html. Alex also has an empiricalscript he's been running on actual data, and the results seem to match uprather nicely.We we're excited to see that Karl Johan Alm (kallewoof) has done some(rather extensive!) analysis of his own, focusing on a distinct encodingtype [5]. I haven't had the time yet to dig into his report yet, but Ithink I've read enough to extract the key difference in our encodings: hisfilters use a binomial encoding _directly_ on the filter contents, will weinstead create a Golomb-Coded set with the contents being _hashes_ (we usesiphash) of the filter items.Using a fixed fp=20, I have some stats detailing the total index size, aswell as averages for both mainnet and testnet. For mainnet, using thefilter contents as currently described in the BIP (basic + extended), the* total size: 6976047156* total avg: 14997.220622758816* total median: 3801* total max: 79155* regular size: 3117183743* regular avg: 6701.372750217131* regular median: 1734* regular max: 67533* extended size: 3858863413 <(385)%20886-3413>* extended avg: 8295.847872541684* extended median: 2041* extended max: 52508In order to consider the average+median filter sizes in a world worth* total size: 2753238530* total avg: 5918.95736054141* total median: 60202* total max: 74983* regular size: 1165148878* regular avg: 2504.856172982827* regular median: 24812* regular max: 64554* extended size: 1588089652* extended avg: 3414.1011875585823* extended median: 35260* extended max: 41731Finally, here are the testnet stats which take into account the increasein the maximum filter size due to segwit's block-size increase. The maxfilter sizes are a bit larger due to some of the habitual blocks Icreated last year when testing segwit (transactions with 30k inputs, 30koutputs, etc).* total size: 585087597* total avg: 520.8839608674402* total median: 20* total max: 164598* regular size: 299325029* regular avg: 266.4790836307566* regular median: 13* regular max: 164583* extended size: 285762568* extended avg: 254.4048772366836* extended median: 7* extended max: 127631For those that are interested in the raw data, I've uploaded a CSV filehttps://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0We look forward to getting feedback from all of y'all!-- Laolu[1]: https://github.com/lightninglabs/neutrino[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs[4]: https://github.com/lightningnetwork/lnd/-- Laolu

Post by Olaoluwa Osuntokun via bitcoin-devOne aspect which isn't in this BIP draft is direct support for unconfirmedtransactions. I consider such a feature an important UX feature for mobilephones, and something which I've personally seen as an importantUX-experience when on-boarding new users to Bitcoin.

Totally agree. My first thought is maybe you could keep bip37 filteringas optional for unconfirmed transactions. Since you're only interestedin incoming transactions in this case you can create one big filter withall your wallet's addresses and reuse that filter. The bip37 privacyissues mainly come up when trying to get the filter to match bothincoming and outgoing transactions, which is not needed in this case.

Otoh, if you download the block from the same peer that you gave a bip37filter then they could probably test the txs in the block against bothfilters. :/

Post by Olaoluwa Osuntokun via bitcoin-devOne aspect which isn't in this BIP draft is direct support for unconfirmedtransactions. I consider such a feature an important UX feature for mobilephones, and something which I've personally seen as an importantUX-experience when on-boarding new users to Bitcoin.

Totally agree. My first thought is maybe you could keep bip37 filteringas optional for unconfirmed transactions. Since you're only interested

Really bad for privacy. Data for transactions at the tip is only14kb/s-- potentially less if segwit is in use and you're not gettingwitnesses. Is that really that burdensome?

FWIW, leaving a mobile browser just running while pointed at somewebsites seems to use more traffic than that just loading advertising.:)

I agree with Greg and Laolu; BIP-37 filtering for transactions is no betterthan for blocks and completely destroys privacy.

A constant stream of transactions is OK, but even cheaper for light clientswould be Laolu's proposal of streaming more tx data than existing invmessages but less than existing tx messages.

We could make a bit field of things to include in every inv-with-metadatamessage, such as:- witness data- scriptSig data pushes- scriptPubKey- hash of scriptPubKey (unnecessary if full scriptPubKey is sent)- scriptPubKey data pushes- etc.

This way a full node might be able to tell what application (or type ofapplication) a light client is running, but not the client's addresses oroutputs, except maybe when the client originates transactions.

Totally agree. My first thought is maybe you could keep bip37 filteringas optional for unconfirmed transactions. Since you're only interested

Really bad for privacy. Data for transactions at the tip is only14kb/s-- potentially less if segwit is in use and you're not gettingwitnesses. Is that really that burdensome?FWIW, leaving a mobile browser just running while pointed at somewebsites seems to use more traffic than that just loading advertising.:)_______________________________________________bitcoin-dev mailing listhttps://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

As I mentioned the issue is the potential collision with the block filter,but bloom filters by themselves aren't inherently bad for privacy. Theyjust reduce the anonymity set. The reason bip37 doesn't work for privacy ishow the filters have to be used/abused to make it work. That's not what Imentioned above.

Data for transactions at the tip is only

14kb/s-- potentially less if segwit is in use and you're not gettingwitnesses. Is that really that burdensome?

Oops, realized I made a mistake. These are the stats for Feb 2016 untilabout amonth ago (since height 400k iirc).

-- Laolu

Post by Olaoluwa Osuntokun via bitcoin-devHi y'all,Alex Akselrod and I would like to propose a new light client BIP for*https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawikiThis BIP proposal describes a concrete specification (along with areference implementations[1][2][3]) for the much discussed client-sidefiltering reversal of BIP-37. The precise details are described in theBIP, but as a summary: we've implemented a new light-client mode that usesclient-side filtering based off of Golomb-Rice coded sets. Full-nodesmaintain an additional index of the chain, and serve this compact filter(the index) to light clients which request them. Light clients then fetchthese filters, query the locally and _maybe_ fetch the block if a relevantitem matches. The cool part is that blocks can be fetched from _any_source, once the light client deems it necessary. Our primary motivationfor this work was enabling a light client mode for lnd[4] in order tosupport a more light-weight back end paving the way for the usage ofLightning on mobile phones and other devices. We've integrated neutrinoas a back end for lnd, and will be making the updated code public verysoon.One specific area we'd like feedback on is the parameter selection. UnlikeBIP-37 which allows clients to dynamically tune their false positive rate,our proposal uses a _fixed_ false-positive. Within the document, it'scurrently specified as P = 1/2^20. We've done a bit of analysis andfilter_download_bandwidth + expected_block_false_positive_bandwidth. Alexhas made a JS calculator that allows y'all to explore the affect ofthe number of items the wallet is scanning for, the size of the blocks,number of blocks fetched, and the size of the filters themselves. Thecalculator calculates the expected bandwidth utilization using the CDF ofhttps://aakselrod.github.io/gcs_calc.html. Alex also has an empiricalscript he's been running on actual data, and the results seem to match uprather nicely.We we're excited to see that Karl Johan Alm (kallewoof) has done some(rather extensive!) analysis of his own, focusing on a distinct encodingtype [5]. I haven't had the time yet to dig into his report yet, but Ithink I've read enough to extract the key difference in our encodings: hisfilters use a binomial encoding _directly_ on the filter contents, will weinstead create a Golomb-Coded set with the contents being _hashes_ (we usesiphash) of the filter items.Using a fixed fp=20, I have some stats detailing the total index size, aswell as averages for both mainnet and testnet. For mainnet, using thefilter contents as currently described in the BIP (basic + extended), the* total size: 6976047156* total avg: 14997.220622758816* total median: 3801* total max: 79155* regular size: 3117183743* regular avg: 6701.372750217131* regular median: 1734* regular max: 67533* extended size: 3858863413 <(385)%20886-3413>* extended avg: 8295.847872541684* extended median: 2041* extended max: 52508In order to consider the average+median filter sizes in a world worth* total size: 2753238530* total avg: 5918.95736054141* total median: 60202* total max: 74983* regular size: 1165148878* regular avg: 2504.856172982827* regular median: 24812* regular max: 64554* extended size: 1588089652* extended avg: 3414.1011875585823* extended median: 35260* extended max: 41731Finally, here are the testnet stats which take into account the increasein the maximum filter size due to segwit's block-size increase. The maxfilter sizes are a bit larger due to some of the habitual blocks Icreated last year when testing segwit (transactions with 30k inputs, 30koutputs, etc).* total size: 585087597* total avg: 520.8839608674402* total median: 20* total max: 164598* regular size: 299325029* regular avg: 266.4790836307566* regular median: 13* regular max: 164583* extended size: 285762568* extended avg: 254.4048772366836* extended median: 7* extended max: 127631For those that are interested in the raw data, I've uploaded a CSV filehttps://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0We look forward to getting feedback from all of y'all!-- Laolu[1]: https://github.com/lightninglabs/neutrino[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs[4]: https://github.com/lightningnetwork/lnd/-- Laolu

I've pushed a series of updates to the text of the BIP repo linked in theOP.The fixes include: typos, components of the specification which wereincorrect(N is the total number of items, NOT the number of txns in the block), and afew sections have been clarified.

The latest version also includes a set of test vectors (as CSV files), whichfor a series of fp rates (1/2 to 1/2^32) includes (for 6 testnet blocks,one ofwhich generates a "null" filter):

* The block height* The block hash* The raw block itself* The previous basic+extended filter header* The basic+extended filter header for the block* The basic+extended filter for the block

The size of the test vectors was too large to include in-line within thedocument, so we put them temporarily in a distinct folder [1]. The codeused togenerate the test vectors has also been included.

Oops, realized I made a mistake. These are the stats for Feb 2016 untilabout amonth ago (since height 400k iirc).-- Laolu

Post by Olaoluwa Osuntokun via bitcoin-devHi y'all,Alex Akselrod and I would like to propose a new light client BIP for*https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawikiThis BIP proposal describes a concrete specification (along with areference implementations[1][2][3]) for the much discussed client-sidefiltering reversal of BIP-37. The precise details are described in theBIP, but as a summary: we've implemented a new light-client mode that usesclient-side filtering based off of Golomb-Rice coded sets. Full-nodesmaintain an additional index of the chain, and serve this compact filter(the index) to light clients which request them. Light clients then fetchthese filters, query the locally and _maybe_ fetch the block if a relevantitem matches. The cool part is that blocks can be fetched from _any_source, once the light client deems it necessary. Our primary motivationfor this work was enabling a light client mode for lnd[4] in order tosupport a more light-weight back end paving the way for the usage ofLightning on mobile phones and other devices. We've integrated neutrinoas a back end for lnd, and will be making the updated code public verysoon.One specific area we'd like feedback on is the parameter selection. UnlikeBIP-37 which allows clients to dynamically tune their false positive rate,our proposal uses a _fixed_ false-positive. Within the document, it'scurrently specified as P = 1/2^20. We've done a bit of analysis andfilter_download_bandwidth + expected_block_false_positive_bandwidth. Alexhas made a JS calculator that allows y'all to explore the affect ofthe number of items the wallet is scanning for, the size of the blocks,number of blocks fetched, and the size of the filters themselves. Thecalculator calculates the expected bandwidth utilization using the CDF ofhttps://aakselrod.github.io/gcs_calc.html. Alex also has an empiricalscript he's been running on actual data, and the results seem to match uprather nicely.We we're excited to see that Karl Johan Alm (kallewoof) has done some(rather extensive!) analysis of his own, focusing on a distinct encodingtype [5]. I haven't had the time yet to dig into his report yet, but Ithink I've read enough to extract the key difference in our encodings: hisfilters use a binomial encoding _directly_ on the filter contents, will weinstead create a Golomb-Coded set with the contents being _hashes_ (we usesiphash) of the filter items.Using a fixed fp=20, I have some stats detailing the total index size, aswell as averages for both mainnet and testnet. For mainnet, using thefilter contents as currently described in the BIP (basic + extended), the* total size: 6976047156* total avg: 14997.220622758816* total median: 3801* total max: 79155* regular size: 3117183743* regular avg: 6701.372750217131* regular median: 1734* regular max: 67533* extended size: 3858863413 <(385)%20886-3413>* extended avg: 8295.847872541684* extended median: 2041* extended max: 52508In order to consider the average+median filter sizes in a world worth* total size: 2753238530* total avg: 5918.95736054141* total median: 60202* total max: 74983* regular size: 1165148878* regular avg: 2504.856172982827* regular median: 24812* regular max: 64554* extended size: 1588089652* extended avg: 3414.1011875585823* extended median: 35260* extended max: 41731Finally, here are the testnet stats which take into account the increasein the maximum filter size due to segwit's block-size increase. The maxfilter sizes are a bit larger due to some of the habitual blocks Icreated last year when testing segwit (transactions with 30k inputs, 30koutputs, etc).* total size: 585087597* total avg: 520.8839608674402* total median: 20* total max: 164598* regular size: 299325029* regular avg: 266.4790836307566* regular median: 13* regular max: 164583* extended size: 285762568* extended avg: 254.4048772366836* extended median: 7* extended max: 127631For those that are interested in the raw data, I've uploaded a CSV filehttps://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0We look forward to getting feedback from all of y'all!-- Laolu[1]: https://github.com/lightninglabs/neutrino[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs[4]: https://github.com/lightningnetwork/lnd/-- Laolu

Really wish I'd known you were working on this a few weeks ago, butsuch is life. Hopefully I can provide some useful feedback.

On Fri, Jun 2, 2017 at 4:01 AM, Olaoluwa Osuntokun via bitcoin-dev

Post by Olaoluwa Osuntokun via bitcoin-devFull-nodesmaintain an additional index of the chain, and serve this compact filter(the index) to light clients which request them. Light clients then fetchthese filters, query the locally and _maybe_ fetch the block if a relevantitem matches.

Is it necessary to maintain the index all the way to the beginning ofthe chain? When would clients request "really old digests" and why?

Post by Olaoluwa Osuntokun via bitcoin-devOne specific area we'd like feedback on is the parameter selection. UnlikeBIP-37 which allows clients to dynamically tune their false positive rate,our proposal uses a _fixed_ false-positive. Within the document, it'scurrently specified as P = 1/2^20. We've done a bit of analysis andfilter_download_bandwidth + expected_block_false_positive_bandwidth. Alexhas made a JS calculator that allows y'all to explore the affect ofthe number of items the wallet is scanning for, the size of the blocks,number of blocks fetched, and the size of the filters themselves. Thecalculator calculates the expected bandwidth utilization using the CDF ofhttps://aakselrod.github.io/gcs_calc.html. Alex also has an empiricalscript he's been running on actual data, and the results seem to match uprather nicely.

I haven't tried the tool yet, and maybe it will answer some of my questions.

On what data were the simulated wallets on actual data based? How didfalse positive rates for wallets with lots of items (pubkeys etc) playout? Is there a maximum number of items for a wallet before it becomestoo bandwidth costly to use digests?

Post by Olaoluwa Osuntokun via bitcoin-devWe we're excited to see that Karl Johan Alm (kallewoof) has done some(rather extensive!) analysis of his own, focusing on a distinct encodingtype [5]. I haven't had the time yet to dig into his report yet, but Ithink I've read enough to extract the key difference in our encodings: hisfilters use a binomial encoding _directly_ on the filter contents, will weinstead create a Golomb-Coded set with the contents being _hashes_ (we usesiphash) of the filter items.

I will definitely try to reproduce my experiments with Golomb-Codedsets and see what I come up with. It seems like you've got a littleless than half the size of my digests for 1-block digests but Ihaven't tried making digests for all blocks (and lots of early blocksare empty).

On the BIP proposal itself:

In Compact Filter Header Chain, you mention that clients shoulddownload filters from nodes if filter_headers is not identical, andban offending nodes. What about temporary forks in the chain? Whatabout longer forks? In general, I am curious how you will deal withreorgs and temporary non-consensus related chain splits.

I am also curious if you have considered digests containing multipleblocks. Retaining a permanent binsearchable record of the entire chainis obviously too space costly, but keeping the last X blocks asbinsearchable could speed up syncing for clients tremendously, I feel.

It may also be space efficient to ONLY store older digests in chunksof e.g. 8 blocks. A client syncing up finding a match in an 8-blockchunk would have to grab those 8 blocks, but if it's not recent, thatmay be acceptable. It may even be possible to make 4-, 2-, 1-blockdigests on demand.

How fast are these to create? Would it make sense to provide digestson demand in some cases, rather than keeping them around indefinitely?

Really wish I'd known you were working on this a few weeks ago, butsuch is life. Hopefully I can provide some useful feedback.

Your feedback is greatly appreciated!

On Fri, Jun 2, 2017 at 4:01 AM, Olaoluwa Osuntokun via bitcoin-dev

Post by Olaoluwa Osuntokun via bitcoin-devFull-nodesmaintain an additional index of the chain, and serve this compact filter(the index) to light clients which request them. Light clients then fetchthese filters, query the locally and _maybe_ fetch the block if a relevantitem matches.

Is it necessary to maintain the index all the way to the beginning ofthe chain? When would clients request "really old digests" and why?

Without a soft fork, this is the only way for light clients to verify thatpeers aren't lying to them. Clients can request headers (just hashes of thefilters and the previous headers, creating a chain) and look for conflictsbetween peers. If a conflict is found at a certain block, the client candownload the block, generate a filter, calculate the header by hashingtogether the previous header and the generated filter, and banning anypeers that don't match. A full node could prune old filters if you wantedand recalculate them as necessary if you just keep the filter header chaininfo as really old filters are unlikely to be requested by correctlywritten software but you can't guarantee every client will follow bestpractices either.

I haven't tried the tool yet, and maybe it will answer some of my questions.

On what data were the simulated wallets on actual data based? How didfalse positive rates for wallets with lots of items (pubkeys etc) playout? Is there a maximum number of items for a wallet before it becomestoo bandwidth costly to use digests?

The simulations are based on completely random data within givenparameters. For example, it will generate a wallet of a specified size andgenerate blocks of specified size with specified number of transactions ofspecified format, all guaranteed to not match the wallet. It then tries tomatch the wallet and tracks the filter size and the bandwidth used by blockdownloads which are all due to false positives. The maximum wallet size canbe millions or more of addresses and outpoints before the filter isn'tworth it.

I published the simulation code athttps://gist.github.com/aakselrod/0ee665205f7c9538c2339876b0424b26 but thecalculation code gives you the same results (on average but very close witha big enough sample size) much faster.

I will definitely try to reproduce my experiments with Golomb-Codedsets and see what I come up with. It seems like you've got a littleless than half the size of my digests for 1-block digests but Ihaven't tried making digests for all blocks (and lots of early blocksare empty).

Filters for empty blocks only take a few bytes and sometimes zero when thecoinbase output is a burn that doesn't push any data (example will be inthe test vectors that I'll have ready shortly).

On the BIP proposal itself:

In Compact Filter Header Chain, you mention that clients shoulddownload filters from nodes if filter_headers is not identical, andban offending nodes. What about temporary forks in the chain? Whatabout longer forks? In general, I am curious how you will deal withreorgs and temporary non-consensus related chain splits.

The cfheaders messages give you the hash of the final block for whichthere's a header in the message. This means you can ignore the message asnecessary rather than ban the peer, or track cfheaders for multiple forksif desired.

I am also curious if you have considered digests containing multipleblocks. Retaining a permanent binsearchable record of the entire chainis obviously too space costly, but keeping the last X blocks asbinsearchable could speed up syncing for clients tremendously, I feel.

We hadn't (or I hadn't) until we read your recent post/paper and areconsidering it now.

It may also be space efficient to ONLY store older digests in chunksof e.g. 8 blocks. A client syncing up finding a match in an 8-blockchunk would have to grab those 8 blocks, but if it's not recent, thatmay be acceptable. It may even be possible to make 4-, 2-, 1-blockdigests on demand.

This is also something we (or at least I) hadn't considered before yourrecent post. We have been working on this for a few months now so didn'thave time to work on trying out and possibly incorporating the idea beforerelease.

How fast are these to create? Would it make sense to provide digestson demand in some cases, rather than keeping them around indefinitely?

They're pretty fast and can be pruned if desired, as mentioned above, aslong as the header chain is kept.

Post by Alex Akselrod via bitcoin-devWithout a soft fork, this is the only way for light clients to verify thatpeers aren't lying to them. Clients can request headers (just hashes of thefilters and the previous headers, creating a chain) and look for conflictsbetween peers. If a conflict is found at a certain block, the client candownload the block, generate a filter, calculate the header by hashingtogether the previous header and the generated filter, and banning any peersthat don't match. A full node could prune old filters if you wanted andrecalculate them as necessary if you just keep the filter header chain infoas really old filters are unlikely to be requested by correctly writtensoftware but you can't guarantee every client will follow best practiceseither.

Ahh, so you actually make a separate digest chain with prev hashes andeverything. Once/if committed digests are soft forked in, it seems abit overkill but maybe it's worth it. (I was always assuming committeddigests in coinbase would come after people started using this, andthat people could just ask a couple of random peers for the digesthash and ensure everyone gave the same answer as the hash of thedownloaded digest..).

I noticed an increase in FP hits when using real data sampled fromreal scriptPubKeys and such. Address reuse and other weird stuff. See"lies.h" in github repo for experiments and chainsim.c initial part ofmain where wallets get random stuff from the chain.

Post by Alex Akselrod via bitcoin-devI will definitely try to reproduce my experiments with Golomb-Codedsets and see what I come up with. It seems like you've got a littleless than half the size of my digests for 1-block digests but Ihaven't tried making digests for all blocks (and lots of early blocksare empty).Filters for empty blocks only take a few bytes and sometimes zero when thecoinbase output is a burn that doesn't push any data (example will be in thetest vectors that I'll have ready shortly).

I created digests for all blocks up until block #469805 and actuallyended up with 5.8 GB, which is 1.1 GB lower than what you have, butmay be worse perf-wise on false positive rates and such.

Post by Alex Akselrod via bitcoin-devHow fast are these to create? Would it make sense to provide digestson demand in some cases, rather than keeping them around indefinitely?They're pretty fast and can be pruned if desired, as mentioned above, aslong as the header chain is kept.

For comparison, creating the digests above (469805 of them) tookroughly 30 mins on my end, but using the kstats format so probablyhigher on an actual node (should get around to profiling that...).

Post by Karl Johan Alm via bitcoin-devI am also curious if you have considered digests containing multipleblocks. Retaining a permanent binsearchable record of the entire chain isobviously too space costly, but keeping the last X blocks as binsearchablecould speed up syncing for clients tremendously, I feel.

Originally we hadn't considered such an idea. Grasping the concept a bitbetter, I can see how that may result in considerable bandwidth savings(for purely negative queries) for clients doing a historical sync, orcatching up to the chain after being inactive for months/weeks.

If we were to purse tacking this approach onto the current BIP proposal,we could do it in the following way:

* The `getcfilter` message gains an additional "Level" field. Usingthis field, the range of blocks to be included in the returned filterwould be Level^2. So a level of 0 is just the single filter, 3 is 8blocks past the block hash etc.

* Similarly, the `getcfheaders` message would also gain a similar fieldwith identical semantics. In this case each "level" would have adistinct header chain for clients to verify.

For larger blocks (like the one referenced at the end of this mail) fullconstruction of the regular filter takes ~10-20ms (most of this spentextracting the data pushes). With smaller blocks, it quickly dips down tothe nano to micro second range.

Whether to keep _all_ the filters on disk, or to dynamically re-generate aparticular range (possibly most of the historical data) is animplementation detail. Nodes that already do block pruning could discardvery old filters once the header chain is constructed allowing them tosave additional space, as it's unlikely most clients would care about thefirst 300k or so blocks.

Yep, this is only a hold-over until when/if a commitment to the filter issoft-forked in. In that case, there could be some extension message tofetch the filter hash for a particular block, along with a merkle proof ofthe coinbase transaction to the merkle root in the header.

Post by Karl Johan Alm via bitcoin-devI created digests for all blocks up until block #469805 and actually endedup with 5.8 GB, which is 1.1 GB lower than what you have, but may be worseperf-wise on false positive rates and such.

Interesting, are you creating the equivalent of both our "regular" and"extended" filters? Each of the filter types consume about ~3.5GB inisolation, with the extended filter type on average consuming more bytesdue to the fact that it includes sigScript/witness data as well.

It's worth noting that those numbers includes the fixed 4-byte value for"N" that's prepended to each filter once it's serialized (though thatdoesn't add a considerable amount of overhead). Alex and I wereconsidering instead using Bitcoin's var-int encoding for that numberinstead. This would result in using a single byte for empty filters, 1byte for most filters (< 2^16 items), and 3 bytes for the remainder of thecases.

Post by Karl Johan Alm via bitcoin-devFor comparison, creating the digests above (469805 of them) tookroughly 30 mins on my end, but using the kstats format so probablyhigher on an actual node (should get around to profiling that...).

Does that include the time required to read the blocks from disk? Or justthe CPU computation of constructing the filters? I haven't yet kicked offa full re-index of the filters, but for reference this block[1] on testnettakes ~18ms for the _full_ indexing routine with our current code+spec.

Post by Alex Akselrod via bitcoin-devbetween peers. If a conflict is found at a certain block, the client candownload the block, generate a filter, calculate the header by hashingtogether the previous header and the generated filter, and banning any

Post by Alex Akselrod via bitcoin-devas really old filters are unlikely to be requested by correctly writtensoftware but you can't guarantee every client will follow best practiceseither.

Ahh, so you actually make a separate digest chain with prev hashes andeverything. Once/if committed digests are soft forked in, it seems abit overkill but maybe it's worth it. (I was always assuming committeddigests in coinbase would come after people started using this, andthat people could just ask a couple of random peers for the digesthash and ensure everyone gave the same answer as the hash of thedownloaded digest..).

parameters.I noticed an increase in FP hits when using real data sampled fromreal scriptPubKeys and such. Address reuse and other weird stuff. See"lies.h" in github repo for experiments and chainsim.c initial part ofmain where wallets get random stuff from the chain.

Post by Alex Akselrod via bitcoin-devI will definitely try to reproduce my experiments with Golomb-Codedsets and see what I come up with. It seems like you've got a littleless than half the size of my digests for 1-block digests but Ihaven't tried making digests for all blocks (and lots of early blocksare empty).Filters for empty blocks only take a few bytes and sometimes zero when

I created digests for all blocks up until block #469805 and actuallyended up with 5.8 GB, which is 1.1 GB lower than what you have, butmay be worse perf-wise on false positive rates and such.

Post by Alex Akselrod via bitcoin-devHow fast are these to create? Would it make sense to provide digestson demand in some cases, rather than keeping them around indefinitely?They're pretty fast and can be pruned if desired, as mentioned above, aslong as the header chain is kept.

For comparison, creating the digests above (469805 of them) tookroughly 30 mins on my end, but using the kstats format so probablyhigher on an actual node (should get around to profiling that...)._______________________________________________bitcoin-dev mailing listhttps://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

I see the inner loop of construction and lookup are free ofnon-constant divmod. This will result in implementations beingneedlessly slow (especially on arm, but even on modern x86_64 adivision is a 90 cycle-ish affair.)

I believe this can be fixed by using this approachhttp://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/which has the same non-uniformity as mod but needs only a multiplyand shift.

Correct me if I'm wrong, but from my interpretation we can't use thatmethod as described as we need to output 64-bit integers rather than32-bit integers. A range of 32-bits would be constrain the number of itemswe could encode to be ~4096 to ensure that we don't overflow with fpvalues such as 20 (which we currently use in our code).

If filter commitment are to be considered for a soft-fork in the future,then we should definitely optimize the construction of the filters as muchas possible! I'll look into that paper you referenced to get a feel forjust how complex the optimization would be.

Post by Gregory Maxwell via bitcoin-devShouldn't all cases in your spec where you have N=transactions ben=indexed-outputs? Otherwise, I think your golomb parameter and falsepositive rate are wrong.

Yep! Nice catch. Our code is correct, but mistake in the spec was anoversight on my part. I've pushed a commit[1] to the bip repo referencedin the OP to fix this error.

I've also pushed another commit to explicitly take advantage of the factthat P is a power-of-two within the coding loop [2].

https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawikiI see the inner loop of construction and lookup are free ofnon-constant divmod. This will result in implementations beingneedlessly slow (especially on arm, but even on modern x86_64 adivision is a 90 cycle-ish affair.)I believe this can be fixed by using this approachhttp://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/which has the same non-uniformity as mod but needs only a multiplyand shift.Otherwise fast implementations will have to implement the code tocompute bit twiddling hack exact division code, which is kind ofcomplicated. (e.g. via the technique in "{N}-bit Unsigned Division via{N}-bit Multiply-Add" by Arch D. Robison).Shouldn't all cases in your spec where you have N=transactions ben=indexed-outputs? Otherwise, I think your golomb parameter and falsepositive rate are wrong.

Had a chat with gmax off-list and came to the realization that the method_should_ indeed generalize to our case of outputting 64-bit integers.We'll need to do a bit of bit twiddling to make it work properly. I'llmodify our implementation and report back with some basic benchmarks.

Very cool, I wasn't aware of the existence of such a mapping.Correct me if I'm wrong, but from my interpretation we can't use thatmethod as described as we need to output 64-bit integers rather than32-bit integers. A range of 32-bits would be constrain the number of itemswe could encode to be ~4096 to ensure that we don't overflow with fpvalues such as 20 (which we currently use in our code).If filter commitment are to be considered for a soft-fork in the future,then we should definitely optimize the construction of the filters as muchas possible! I'll look into that paper you referenced to get a feel forjust how complex the optimization would be.

Post by Gregory Maxwell via bitcoin-devShouldn't all cases in your spec where you have N=transactions ben=indexed-outputs? Otherwise, I think your golomb parameter and falsepositive rate are wrong.

Yep! Nice catch. Our code is correct, but mistake in the spec was anoversight on my part. I've pushed a commit[1] to the bip repo referencedin the OP to fix this error.I've also pushed another commit to explicitly take advantage of the factthat P is a power-of-two within the coding loop [2].-- Laoluhttps://github.com/Roasbeef/bips/commit/bc5c6d6797f3df1c4a44213963ba12e72122163dhttps://github.com/Roasbeef/bips/commit/578a4e3aa8ec04524c83bfc5d14be1b2660e7f7a

https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawikiI see the inner loop of construction and lookup are free ofnon-constant divmod. This will result in implementations beingneedlessly slow (especially on arm, but even on modern x86_64 adivision is a 90 cycle-ish affair.)I believe this can be fixed by using this approachhttp://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/which has the same non-uniformity as mod but needs only a multiplyand shift.Otherwise fast implementations will have to implement the code tocompute bit twiddling hack exact division code, which is kind ofcomplicated. (e.g. via the technique in "{N}-bit Unsigned Division via{N}-bit Multiply-Add" by Arch D. Robison).Shouldn't all cases in your spec where you have N=transactions ben=indexed-outputs? Otherwise, I think your golomb parameter and falsepositive rate are wrong.

I would like to consider how this compares to another light client typewith rather different security characteristics where each client wouldreceive for each transaction in each block,* The TXID (uncompressed)* The spent outpoints (with TXIDs compressed)* The pubkey hash (compressed to reasonable amount of false positives)

A rough estimate would indicate this to be about 2-2.5x as big per blockas your proposal, but comes with rather different securitycharacteristics, and would not require download since genesis.The client could verify the TXIDs against the merkle root with a muchstronger (PoW) guarantee compared to the guarantee based on theassumption of peers being distinct, which your proposal seems to make.Like your proposal this removes the privacy and processing issues fromserver-side filtering, but unlike your proposal retrieval of all txidsin each block can also serve for a basis of fraud proofs and(disprovable) fraud hints, without resorting to full block downloads.I don't completely understand the benefit of making the outpoints andpubkey hashes (weakly) verifiable. These only serve as notifications andtherefore do not seem to introduce an attack vector. Omitting data isalways possible, so receiving data is a prerequisite for verification,not an assumption that can be made. How could an attacker benefit from"hiding notifications"?I think client-side filtering is definitely an important route totake, but is it worth compressing away the information to verify themerkle root?Regards,Tomas van der Wansembitcrust

Post by Tomas via bitcoin-devA rough estimate would indicate this to be about 2-2.5x as big per blockas your proposal, but comes with rather different securitycharacteristics, and would not require download since genesis.

Our proposal _doesnt_ require downloading from genesis, if by"downloading" you mean downloading all the blocks. Clients only need tosync the block+filter headers, then (if they don't care about historicalblocks), will download filters from their "birthday" onwards.

Post by Tomas via bitcoin-devThe client could verify the TXIDs against the merkle root with a muchstronger (PoW) guarantee compared to the guarantee based on the assumptionof peers being distinct, which your proposal seems to make

Our proposal only makes a "one honest peer" assumption, which is the sameas any other operating mode. Also as client still download all theheaders, they're able to verify PoW conformance/work as normal.

Post by Tomas via bitcoin-devI don't completely understand the benefit of making the outpoints andpubkey hashes (weakly) verifiable. These only serve as notifications andtherefore do not seem to introduce an attack vector.

Not sure what you mean by this. Care to elaborate?

Post by Tomas via bitcoin-devI think client-side filtering is definitely an important route to take,but is it worth compressing away the information to verify the merkleroot?

That's not the case with our proposal. Clients get the _entire_ block (ifthey need it), so they can verify the merkle root as normal. Unless one ofus is misinterpreting the other here.

-- Laolu

On Thu, Jun 8, 2017 at 6:34 AM Tomas via bitcoin-dev <

Post by Tomas via bitcoin-devHi y'all,Alex Akselrod and I would like to propose a new light client BIP for*https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawikiVery interesting.I would like to consider how this compares to another light client typewith rather different security characteristics where each client wouldreceive for each transaction in each block,* The TXID (uncompressed)* The spent outpoints (with TXIDs compressed)* The pubkey hash (compressed to reasonable amount of false positives)A rough estimate would indicate this to be about 2-2.5x as big per blockas your proposal, but comes with rather different security characteristics,and would not require download since genesis.The client could verify the TXIDs against the merkle root with a muchstronger (PoW) guarantee compared to the guarantee based on the assumptionof peers being distinct, which your proposal seems to make. Like yourproposal this removes the privacy and processing issues from server-sidefiltering, but unlike your proposal retrieval of all txids in each blockcan also serve for a basis of fraud proofs and (disprovable) fraud hints,without resorting to full block downloads.I don't completely understand the benefit of making the outpoints andpubkey hashes (weakly) verifiable. These only serve as notifications andtherefore do not seem to introduce an attack vector. Omitting data isalways possible, so receiving data is a prerequisite for verification, notan assumption that can be made. How could an attacker benefit from "hidingnotifications"?I think client-side filtering is definitely an important route to take,but is it worth compressing away the information to verify the merkle root?Regards,Tomas van der Wansembitcrust_______________________________________________bitcoin-dev mailing listhttps://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Post by Tomas via bitcoin-devI don't completely understand the benefit of making the outpoints andpubkey hashes (weakly) verifiable. These only serve as notifications andtherefore do not seem to introduce an attack vector.

Not sure what you mean by this. Care to elaborate? Additionally, Full nodes can nearly undetectably lie by omission causing a denial of service which can

lead to undesirable failure modes in applications whose safetycritically relies on responding to certainon-chain events.

I understand that the compact header chain is used to mitigate againstthis, but I am unsure about the usecases and trade-offs.

For a normal wallet, the only thing I can imagine an attacker could dois pretending a transaction did not confirmyet, causing nuisance.

Several times. It's been debated if unconfirmed transactions arenecessary, methods of doing more private filtering have been suggested,along with simply not filtering unconfirmed transactions at all. Mycollected data suggests that there is very little use of BIP37 atpresent, based on incoming connections to nodes I know end up in the DNSseed responses (no "SPV" clients do their own peer management).

Post by Andreas Schildbach via bitcoin-devI'm not sure if this has been brought up elsewhere in this thread.This proposal doesn't seem to be a complete replacement of BIP37: Itdoesn't provide a filter for unconfirmed transactions like BIP37 does.That means that most light clients will continue to use BIP37 even ifthey may use this BIP as a supplement. Otherwise users would not gettimely notification of incoming payments any more.

Why would it not be needed? Any SPV client (when used as a payment-receiver)requires this from a simple usability point of view.

I think many users would be willing ...a) âŠ to trade higher privacy (using client side filtering) for not having the âincoming transactionâ featureb) â if they want 0-conf â to fetch all inved transactions/jonas

Why would it not be needed? Any SPV client (when used as a payment-receiver)requires this from a simple usability point of view.

I think many users would be willing ...a) … to trade higher privacy (using client side filtering) for not having the „incoming transaction“ featureb) – if they want 0-conf – to fetch all inved transactions

Another number: I'm answering dozens of support inquiries aboutdelayed/missing transactions per day. Over the 7 years of BitcoinWallet's existence, I estimate about 50000 inquiries.

On the other hand, I remember only 1 (one) inquiry about the privacyproblems of BIP37 (or privacy at all).

From a regular user's point of view, privacy is non-issue. Sure,everyone would take it for free, but certainly not if it a) delaysincoming payments or b) quickly eats up your traffic quota.

IMO privacy its something developers should make sure users have it.Also, I think, todays SPV wallets should make users more aware of the possible privacy implications.

Do users know, if they pay for a good in a shop while consuming the shops WIFI, that the shop-owner as well as the ISP can use that data to combine it with the user profile (and ~ALL FUTURE purchases you do with the same wallet IN ANY LOCATION online or in-person)?

Do users know, that ISPs (cellular; including Google) can completely link the used Bitcoin wallet (again: all purchase including future ones) with the to the ISP well known user profile including credit-card data and may sell the Bitcoin data to any other data mining company?

If you use BIP37, you basically give your transaction history (_ALL TRANSACTIONS_ including transactions in future) to everyone.

Post by Andreas Schildbach via bitcoin-devFrom a regular user's point of view, privacy is non-issue. Sure,everyone would take it for free, but certainly not if it a) delaysincoming payments or b) quickly eats up your traffic quota.

This may be true because they are not aware of the ramification and I donât think client side filtering is a drop-in replacement for todays, smartphone SPV-model.

This has been brought up several times in the past, and I agree withJonas' comments about users being unaware of the privacy losses due toBIP37. One thing also mentioned before but not int he current threadis that the entire concept of SPV is not applicable to unconfirmedtransactions. SPV uses the fact that miners have committed to atransaction with work to give the user an assurance that thetransaction is valid; if the transaction were invalid, it would becostly for the miner to include it in a block with valid work.

Transactions in the mempool have no such assurance, and are costlesslyforgeable by anyone, including your ISP. I wasn't involved in anydebate over BIP37 when it was being written up, so I don't know howmempool filtering got in, but it never made any sense to me. The factthat lots of lite clients are using this is a problem as it givesfalse assurance to users that there is a valid but yet-to-be-confirmedtransaction sending them money.

Most SPV wallets make it quite clear that unconfirmed transactions arejust that.

Post by adiabat via bitcoin-devThis has been brought up several times in the past, and I agree withJonas' comments about users being unaware of the privacy losses due toBIP37. One thing also mentioned before but not int he current threadis that the entire concept of SPV is not applicable to unconfirmedtransactions. SPV uses the fact that miners have committed to atransaction with work to give the user an assurance that thetransaction is valid; if the transaction were invalid, it would becostly for the miner to include it in a block with valid work.Transactions in the mempool have no such assurance, and are costlesslyforgeable by anyone, including your ISP. I wasn't involved in anydebate over BIP37 when it was being written up, so I don't know howmempool filtering got in, but it never made any sense to me. The factthat lots of lite clients are using this is a problem as it givesfalse assurance to users that there is a valid but yet-to-be-confirmedtransaction sending them money.-Tadge

The reason that BIP37 presents a long list of problems is that it is a client-server scenario wedged into a peer-to-peer network. The only possible excuse for this design was implementation shortcut.

As this thread and others demonstrate, reproducing this design flaw will not eliminate the problems. The fact that there are many wallets dependent upon it is an unfortunate consequence of the original sin, but is not likely to last. There is no rationale for node operators to support wallets apart from their own. As a node implementer interested in privacy, security and scalability, I would never waste the time to code BIP37, or and client-server feature into the P2P protocol, especially one that delegates some aspect of validation.

Other nodes (servers) provide independent, securable, client-server interfaces. Many of these are made available as community servers for use at no charge. They could also provide mechanisms for operator payment without polluting the P2P network.

However as a community we should be working toward full node wallets. A secured personal node/server can support remote mobile wallets with security, privacy and no wasted bandwidth. And if we avoid counterproductive increases in blockchain growth rate, full nodes will eventually be able to run on mobile platforms with no difficulty whatsoever. A wallet that delegates full validation to node operators is just another centralization pressure that we do not need.

IMO privacy its something developers should make sure users have it.Also, I think, todays SPV wallets should make users more aware of the possible privacy implications.Do users know, if they pay for a good in a shop while consuming the shops WIFI, that the shop-owner as well as the ISP can use that data to combine it with the user profile (and ~ALL FUTURE purchases you do with the same wallet IN ANY LOCATION online or in-person)?Do users know, that ISPs (cellular; including Google) can completely link the used Bitcoin wallet (again: all purchase including future ones) with the to the ISP well known user profile including credit-card data and may sell the Bitcoin data to any other data mining company?If you use BIP37, you basically give your transaction history (_ALL TRANSACTIONS_ including transactions in future) to everyone.

Post by Andreas Schildbach via bitcoin-devFrom a regular user's point of view, privacy is non-issue. Sure,everyone would take it for free, but certainly not if it a) delaysincoming payments or b) quickly eats up your traffic quota.

This may be true because they are not aware of the ramification and I don’t think client side filtering is a drop-in replacement for todays, smartphone SPV-model./jonas_______________________________________________bitcoin-dev mailing listhttps://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Why would it not be needed? Any SPV client (when used as apayment-receiver) requires this from a simple usability point of view.

I think many users would be willing ...a) … to trade higher privacy (using client side filtering) for not havingthe „incoming transaction“ feature b) – if they want 0-conf – to fetchall inved transactions

You seem to misunderstand the usecase.If you send me a transaction, both of use are using our phones, then I needto be able to have immediate feedback on the transaction being broadcast onthe network.This is not about zero-conf, this is simple seeing what is happening whileit is happening.

Additionally, when the transaction that is meant for my wallet is broadcast,I want my SPV wallet to parse and check the actual transaction.It is not just to see that *something* was actually send, but also to beable to see how much is being paid to me. Maybe If the transaction is markedas RBF-able, etc.

Really basic usability: provide information to your users when you can,should they want to, and by default on.

You seem to misunderstand the usecase.If you send me a transaction, both of use are using our phones, then I needto be able to have immediate feedback on the transaction being broadcast onthe network.This is not about zero-conf, this is simple seeing what is happening whileit is happening.Additionally, when the transaction that is meant for my wallet is broadcast,I want my SPV wallet to parse and check the actual transaction.It is not just to see that *something* was actually send, but also to beable to see how much is being paid to me. Maybe If the transaction is markedas RBF-able, etc.Really basic usability: provide information to your users when you can,should they want to, and by default on.

I see this use case.But I did receive bank wire transfers for the last decades without _immediately_ knowing that someone sent funds to me.I personally would ALWAYS trade the higher bandwidth consumption (300MB mempool filtering) or slower notification time (maybe ~1h) for preserving privacy.I agree, there are use cases where you want immediate notification, those use cases could probably be solved by not trowing away privacy (âparsingâ all transactions and running in the background).

Just to give you a number: based on the statistics of the Bitcoin Walletapp there are at least 2 million wallets depending on BIP37. Not allwould need instant notification but based on the daily support enquiriesinstant notificaton is the most asked property of Bitcoin.

Post by bfd--- via bitcoin-devSeveral times. It's been debated if unconfirmed transactions arenecessary, methods of doing more private filtering have been suggested,along with simply not filtering unconfirmed transactions at all. Mycollected data suggests that there is very little use of BIP37 atpresent, based on incoming connections to nodes I know end up in the DNSseed responses (no "SPV" clients do their own peer management).

Post by Andreas Schildbach via bitcoin-devI'm not sure if this has been brought up elsewhere in this thread.This proposal doesn't seem to be a complete replacement of BIP37: Itdoesn't provide a filter for unconfirmed transactions like BIP37 does.That means that most light clients will continue to use BIP37 even ifthey may use this BIP as a supplement. Otherwise users would not gettimely notification of incoming payments any more.

Post by Andreas Schildbach via bitcoin-devJust to give you a number: based on the statistics of the Bitcoin Walletapp there are at least 2 million wallets depending on BIP37. Not allwould need instant notification but based on the daily support enquiriesinstant notificaton is the most asked property of Bitcoin.

Yes. Users probably like this feature and client side filtering is not a drop-in replacement for BIP37.

We should also consider:BIP37 works, because node-operators are willing to offer that service for free (which maybe change over time).BIP37 consumes plenty of horsepower (disk/cpu) from nodes. Filtering a couple of days of blocks (assume 1000+) eats lots of resources for something, that has no direct long-term value for Bitcoin (the filters data is unique and will be "thrown awayâ [canât be used by other peers]). Same applies for mempool (filtering mempool of a couple of hundred of mb each time the HD gap limit has been exceeded or the app gets sent to the foreground again).

Purely relying on the availability of BIP37 seems fragile to me and start to explore other ways is very reasonable.

Several times. It's been debated if unconfirmed transactions are necessary,methods of doing more private filtering have been suggested, along withsimply not filtering unconfirmed transactions at all. My collected datasuggests that there is very little use of BIP37 at present, based onincoming connections to nodes I know end up in the DNS seed responses (no"SPV" clients do their own peer management).

Sending just the output addresses of each transaction would use about1 kilobit/s of data. Sending the entire transactions would use~14kbit/sec data. These don't seem to be a unsustainable tremendousamount of data to use while an application is running.

Doubly so for SPV wallets which are highly vulnerable to unconfirmedtransactions, and many which last I heard testing reports on becamepretty severely corrupted once given a fake transaction.

Can someone make a case why saving no more than those figures wouldjustify the near total loss of privacy that filtering gives?

"Because they already do it" isn't a good argument when talking abouta new protocol feature; things which already do BIP37 will presumablycontinue to already do BIP37.

First, your figures are wrong and also fall out of the sky with nojustification. Can’t debunk something that is pure garbage.

Second, stating that a bloom filter is a "total loss of privacy" is equallybaseless and doesn’t need debunking.

Post by Gregory Maxwell via bitcoin-dev"Because they already do it" isn't a good argument when talking abouta new protocol feature; things which already do BIP37 will presumablycontinue to already do BIP37.

I think you just made the case for completely rejecting this proposal basedon the fact that nobody will use it, BIP37 already exists.

Not sure if I agree with that, improvements are always useful and we shouldbe able to come up with replacements.But arguing against a feature you don’t like, especiallyh one used bymillions every day, is a sad way to stiffle innovation, Greg.

"On the Privacy Provisions of Bloom Filters in Lightweight BitcoinClients"

Post by Tom Zander via bitcoin-devWe show analytically and empirically that the reliance on Bloom filterswithin existing SPV clients leaks considerable information about theaddresses of Bitcoin users. Our results show that an SPV client whouses a modest number of Bitcoin addresses (e.g., < 20) risks revealingalmost all of his addresses.

Post by Tom Zander via bitcoin-devWe show analytically and empirically that the reliance on Bloom filterswithin existing SPV clients leaks considerable information about theaddresses of Bitcoin users. Our results show that an SPV client who uses amodest number of Bitcoin addresses (e.g., < 20) risks revealing almost allof his addresses.