tag:blogger.com,1999:blog-62454133463752181882018-03-19T12:07:06.842-07:00Nerd Ralphscience and technology stuffRalph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.comBlogger114125tag:blogger.com,1999:blog-6245413346375218188.post-641398896641453092018-03-03T14:47:00.000-08:002018-03-03T14:47:22.175-08:00Fast small prime checker in golangAnyone who does any crypto coding knows that the ability to generate and test prime numbers is important.&nbsp; A search through the <a href="http://golang.org/pkg/crypto/">golang crypto packages</a>&nbsp;didn't turn up any function to check if a number is prime.&nbsp; The "math/big" package has a <a href="https://golang.org/pkg/math/big/#Int.ProbablyPrime">ProbablyPrime</a> function, but the documentation is unclear on what value of n to use so it is "100% accurate for inputs less than 2⁶⁴".&nbsp; For the Ethereum miner I am writing, I need a function to check numbers less than 26-bits, so I decided to write my own.<br /><br />Since int32 is large enough for the biggest number I'll be checking, and 32-bit integer division is usually faster than 64-bit, even on 64-bit platforms, I wrote my prime checking function to take a uint32.&nbsp; A basic prime checking function will usually test odd divisors up to the square root of N, skipping all even numbers (multiples of two).&nbsp; My prime checker is slightly more optimized by skipping all multiples of 3.&nbsp; Here's the code:<br /><b><span style="font-family: Courier New, Courier, monospace;">func i32Prime(n uint32) bool {</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; //&nbsp; &nbsp; if (n==2)||(n==3) {return true;}</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; if n%2 == 0 { return false }</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; if n%3 == 0 { return false }</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; sqrt := uint32(math.Sqrt(float64(n)))</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; for i := uint32(5); i &lt;= sqrt; i += 6 {</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; if n%i == 0 { return false }</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; if n%(i+2) == 0 { return false }</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; }</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; return true</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">}</span></b><br /><div><br /></div><div>My code will never call isPrime with small numbers, so I have the first line that checks for two or three commented out.&nbsp; In order to test and benchmark the function, I wrote <a href="https://github.com/nerdralph/goyard/blob/master/prime_test.go">prime_test.go</a>.&nbsp; Run the tests with "go test prime_test.go -bench=. test".&nbsp; For numbers up to 22 bits, i32Prime is one to two orders of magnitude faster than ProbablyPrime(0).&nbsp; In absolute terms, on a Celeron G1840 using a single core,&nbsp;BenchmarkPrime reports 998 ns/op.&nbsp; I considered further optimizing the code to skip multiples of 5, but I don't think the ~20% speed improvement is worth the extra code complexity.</div><div><br /></div><div><br /></div><div><br /></div>Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-20729161585694566552018-02-24T12:56:00.000-08:002018-02-24T12:56:01.939-08:00Let's get going!<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-CeVQebJZlJo/WpG84tPSQ6I/AAAAAAAAm2E/pfOjXJNqRxIwx4uTJ5YjGq0OOydjALbbgCLcBGAs/s1600/gopher.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="164" data-original-width="307" src="https://1.bp.blogspot.com/-CeVQebJZlJo/WpG84tPSQ6I/AAAAAAAAm2E/pfOjXJNqRxIwx4uTJ5YjGq0OOydjALbbgCLcBGAs/s1600/gopher.png" /></a></div><br />You might be asking if this is just one more of the many blog posts about go that can be found all over the internet.&nbsp; I don't want to duplicate what other people have written, so I'll mostly be crypto functions sha3/keccak in go.<br /><br />Despite <a href="http://nerdralph.blogspot.ca/2016/07/mining-sia-coin-on-ubuntu.html">a brief experiment with go</a> almost two years ago, I had not done any serious coding in go.&nbsp; That all changed when early this year I decided to write an ethereum miner from scratch.&nbsp; After maintaining and improving&nbsp;<a href="https://github.com/nerdralph/ethminer-nr">https://github.com/nerdralph/ethminer-nr</a>, I decided I would like to try something other than C++.&nbsp; My first attempt was with <a href="http://dlang.org/">D</a>, and while it fixes some of the things I dislike about C++, 3rd-party library support is minimal.&nbsp; After working with it for about a week, I decided to move on.&nbsp; After some prototyping with python/<a href="http://cython.org/">cython</a>, I settled on <a href="http://golang.org/">go</a>.<br /><br />After <a href="https://blog.golang.org/8years">eight years of development</a>, go is quite mature.&nbsp; As I'll explain later in this blog post, my concerns about code performance were proven to be unwarranted.&nbsp; Although it is quite mature, I've found it's still new enough that there is room for improvements to be made in go libraries.<br /><br />Since I'm writing an ethereum miner, I need code that can perform <a href="https://keccak.team/">keccak</a> hashing.&nbsp; Keccak is the same as the official sha-3 standard with a different pad (aka domain separation) byte.&nbsp; The <a href="https://github.com/golang/crypto/tree/master/sha3">crypto/sha3</a> package internally supports the ability to use arbitrary domain separation bytes, but the functionality is not exported.&nbsp; Therefore I <a href="https://github.com/nerdralph/crypto">forked the repository</a> and added <a href="https://github.com/nerdralph/crypto/blob/master/sha3/keccak.go">functions for keccak-256 and keccak-512</a>.&nbsp; A common operation in crypto is XOR, and the sha3 package includes <a href="https://github.com/nerdralph/crypto/blob/master/sha3/xor_unaligned.go">an optimized XOR implemenation</a>.&nbsp; This function is not exported either, so I added <a href="https://github.com/nerdralph/crypto/blob/master/sha3/fast_xor_words.go">a fast XOR function</a> as well.<br /><br />Ethereum's proof-of-work uses a DAG of about 2GB that is generated from a 32MB cache.&nbsp; This cache and the DAG changes and grows slightly every 30,000 blocks (about 5 days).&nbsp; Using my modified sha3 library and based on the description from the <a href="https://github.com/ethereum/wiki/wiki/Ethash">ethereum wiki</a>, I wrote <a href="https://github.com/nerdralph/goyard/blob/master/stratum.go">a test program</a> that connects to a mining pool, gets the current seed hash, and generates the DAG cache.&nbsp; The final hex string printed out is the last 32 bytes of the cache.&nbsp; I created an internal debug build of <a href="https://github.com/nerdralph/ethminer-nr">ethminer-nr</a>&nbsp;that also outputs the last 32 bytes of the cache in order to verify that my code works correctly.<br /><br />When it comes to performance, I had read some <a href="https://gist.github.com/mmstick/f3d758af73c63e98de31">old benchmarks</a> that show gcc-go generating much faster code than the stock go compiler (gc).&nbsp; Things have obviously changed, as the go compiler in my tests was much faster in my tests.&nbsp; My ETH cache generation test program takes about 3 seconds to run when using the standard go compiler versus 8 seconds with gcc-go using -O3 -march=native.&nbsp; This is on an Intel G1840 comparing&nbsp;go version go1.9.2 linux/amd64 with&nbsp;go1.6.1 gccgo.&nbsp; The versions chosen were the latest pre-packaged versions for Ubuntu 16 (golang-1.9 and&nbsp;gccgo-6).&nbsp; At least for compute-heavy crypto functions, I don't see any point in using gcc-go.<br /><br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-7916730183264421352018-02-04T12:21:00.000-08:002018-02-04T12:21:16.300-08:00Ethereum mining pool comparisons<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-PfZfreoitjQ/VxKAKbEMP3I/AAAAAAAAi7c/6oz9QHx7MjQ4fRjh-AaR365qkb7avfOHACPcBGAYYCw/s1600/ETHEREUM-LOGO_PORTRAIT_Black_small.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="394" data-original-width="500" height="252" src="https://3.bp.blogspot.com/-PfZfreoitjQ/VxKAKbEMP3I/AAAAAAAAi7c/6oz9QHx7MjQ4fRjh-AaR365qkb7avfOHACPcBGAYYCw/s320/ETHEREUM-LOGO_PORTRAIT_Black_small.png" width="320" /></a></div><br />Since I started mining ethereum, the focus of my optimizations have been on mining software and hardware tuning.&nbsp; While overclocking and software mining tweaks are the major factor in maximizing earnings, choosing the best mining pool can make a measurable difference as well.<br /><br />I tested the top three pools with North American servers: <a href="http://ethermine.org/">Ethermine</a>, <a href="http://miningpoolhub.com/">Mining Pool Hub</a>, and <a href="http://eth.nanopool.org/">Nanopool</a>.&nbsp; I tested mining on each pool, and wrote a <a href="https://github.com/nerdralph/ethminer-nr/blob/master/poolmon.py">small program</a>&nbsp;to monitor pools.&nbsp; Nanopool came out on the bottom, with Ethermine and Mining Pool Hub both performing well.<br /><br />I think the biggest difference between pool earnings has to do with latency.&nbsp; For someone in North America, using a pool in Asia with a network round-trip latency of 200-300ms will result in lower earnings than a North American pool with a network latency of 30-50ms.&nbsp; The reason is higher latency causes a higher stale share rate.&nbsp; If it takes 150ms for a share submission to reach the pool, with Ethereum's average block time of 15 seconds, the latency will add 1% to your stale share rate.&nbsp; How badly that affects your earnings depends on how the pool rewards stale shares, something that is unfortunately not clearly documented on any of the three pools.<br /><br />When I first started mining I would do simple latency tests using ping.&nbsp; Following Ethermine's recent migration of their servers to AWS, they no longer respond to ping.&nbsp; What really matters is not ping response time, but how quickly the pool forwards new jobs and processes submitted shares.&nbsp; What further an evaluation of different pools, is that they often have multiple servers for one host name.&nbsp; For example, here are the IP address for&nbsp;us-east1.ethereum.miningpoolhub.com from dig:<br /><span style="font-family: Courier New, Courier, monospace;"><b>us-east1.ethereum.miningpoolhub.com. 32 IN A&nbsp; &nbsp;192.81.129.199</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>us-east1.ethereum.miningpoolhub.com. 32 IN A&nbsp; &nbsp;45.56.112.78</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>us-east1.ethereum.miningpoolhub.com. 32 IN A&nbsp; &nbsp;45.33.104.156</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>us-east1.ethereum.miningpoolhub.com. 32 IN A&nbsp; &nbsp;45.56.113.50</b></span><br /><div><br /></div><div>Even though 45.56.113.50 has a ping time about 40ms lower than 192.81.129.199, the 192.81.129.199 server usually sent new jobs faster than 45.56.113.50.&nbsp; The difference between the first and last server to send a job was usually 200-300ms.&nbsp; With nanopool, the difference was much more significant, with the slowest server often sending a new job 2 seconds (2000ms) after the fastest.&nbsp; Recent updates posted on nanopool's site suggest their servers have been overloaded, such as changing their static difficulty from 5 billion to 10 billion.&nbsp; Even with miners submitting shares at half the rate, it seems they are still having issues with server loads.</div><div><br /></div><div>Less than a week ago, us1.ethermine.org resolved to a few different IPs, and now it resolves to a single AWS IP:&nbsp;18.219.59.155.&nbsp; I suspect there are at least two different servers using load balancing to respond to requests for the single IP.&nbsp; By making multiple simultaneous stratum requests and timing the new jobs received, I was able to measure variations of more than 100ms between some jobs.&nbsp; That seems to confirm my conclusion that there are likely multiple servers with slight variations in their performance.</div><div><br /></div><div>In order to determine if the timing performance of the pools was actually having an impact on pool earnings, I looked at stats for blocks and uncles mined from&nbsp;<a href="http://etherscan.io/">etherscan.io</a>.</div><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-Z6llXA4SP8Q/Wndiwr2wOkI/AAAAAAAAmv4/zT5zXjCOGl8GgCD7TVhTCX-ri6OmMAiBwCLcBGAs/s1600/Uncles.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="428" data-original-width="740" height="370" src="https://4.bp.blogspot.com/-Z6llXA4SP8Q/Wndiwr2wOkI/AAAAAAAAmv4/zT5zXjCOGl8GgCD7TVhTCX-ri6OmMAiBwCLcBGAs/s640/Uncles.png" width="640" /></a></div><div>Those stats show that although Nanopool produces about half as many blocks as Ethermine, it produces more uncles.&nbsp; Since uncles receive a reward of at most 2.625 ETH vs 3 ETH for a regular block, miners should receive higher payouts on Ethermine than on Nanopool.&nbsp; Based solely on uncle rate, payouts on Ethermine should be slightly higher than MPH.&nbsp; Eun, the operator of MPH has been accessible and responsive to questions and suggestions about the pool, while the Ethermine pool operator is not accessible.&nbsp; As an example of that accessibility, three days ago I emailed MPH about 100% rejects from one of their pool servers.&nbsp; Thirty-five minutes later I received a response asking me to verify that the issue was resolved after they rebooted the server.</div><div><br /></div><div>In conclusion, either Ethermine or MPH would make reasonable choices for someone mining in North America.&nbsp; This pool comparison has also opened my eyes to optimization opportunities in mining software in how pools are chosen.&nbsp; Until now mining software has done little more than switch pools when a connection is lost or no new jobs are received for a long period of time.&nbsp; My intention is to have my mining software dynamically switch to mining jobs from the most responsive server instead of requiring an outright failure.</div><div><br /></div>Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-45257370368001570492017-12-14T10:21:00.000-08:002017-12-14T10:21:22.798-08:00Mining with AMDGPU-PRO 17.40 on Linux<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-Ck_UjhSLYzg/WjKsDkUOv1I/AAAAAAAAmik/GsFKIogpLLk6VgLMdW6nS9iqcJ2txjI1gCLcBGAs/s1600/sgminer-l1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="390" data-original-width="644" src="https://3.bp.blogspot.com/-Ck_UjhSLYzg/WjKsDkUOv1I/AAAAAAAAmik/GsFKIogpLLk6VgLMdW6nS9iqcJ2txjI1gCLcBGAs/s1600/sgminer-l1.png" /></a></div><br /><a href="http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-Pro-Beta-Mining-Driver-for-Linux-Release-Notes.aspx">A 17.40 beta</a> was released on October 16, with <a href="http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-for-Linux-Release-Notes.aspx">a final release</a> following on October 30th.&nbsp; There have been <a href="https://community.amd.com/message/2832721">some issues with corrupt versions</a> of the final release, but I think they are resolved now.&nbsp; I encountered lots of problems with this release, which was much of the motivation for making this post.<br /><br />Until earlier this year, the AMDGPU-PRO drivers were targeted at the new Polaris cards, and support for even relatively recent Tonga was lacking.&nbsp; Because of this, I was using the fglrx drivers for Tonga and Pitcairn cards.&nbsp; The primary reason for upgrading now is for <a href="https://bitcointalk.org/index.php?topic=2361268">large page support</a>, which improves performance on algorithms that use a large amount (2GB or more) of memory.&nbsp; With the promise of better performance, and since fglrx is no longer being maintained, I decided to upgrade.<br /><br />I've been using <a href="http://nerdralph.blogspot.ca/2017/03/amdgpu-pro-1660-on-ubuntu-kernel-4105.html">AMDGPU-PRO with kernel 4.10.5</a> for my Rx 470 cards, so I decided to use the same kernel.&nbsp; I can't say there is any problems with using a newer kernel like 4.10.17 or even 4.14.5, so they might work just as well.&nbsp; I left the on-board video enabled (i915), so I would not have to be connecting and disconnecting video cables when testing the GPUs.&nbsp; After installing Ubuntu 16.04.3, I updated the kernel and rebooted.&nbsp; For installing the AMDGPU-PRO drivers, I used the px option (amdgpu-pro-install --px), as<a href="https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/amd-linux/927510-amdgpu-pro-16-60-released/page12"> it is supposed to support mixed iGPU/dGPU use</a>.<br /><br />My normal procedure for bringing up a multi-GPU machine is to start with a single GPU in the 16x motherboard slot, as this avoids potential issues with flaky risers.&nbsp; Even with just one R9 380 card in the 16x slot, I was having problems with powerplay.&nbsp; When it is working, pp_dpm_sclk will show the current clock rate with an asterisk, but this was not happening.&nbsp; After two days of troubleshooting, I concluded <a href="https://community.amd.com/thread/223406">there is a bug with powerplay and some motherboards</a>&nbsp;when using the 16x slot.&nbsp; When using only the 1x slots, powerplay works fine.<br /><br />Since I wasn't able to use the 16x motherboard slot, testing card and riser combinations was more difficult.&nbsp; Normally when I have a problem with a card and riser, I'll move the card to the 16x slot.&nbsp; If the problems go away, I'll mark the riser as likely defective.&nbsp; Mining algorithms like ethash use little bandwidth between the CPU and GPU, so there is no performance loss to using 1x risers.&nbsp; Even the slowest PCIe 1.1 transfer rate is sufficient for mining.&nbsp; Using "lspci -vv",&nbsp; I could see the link speed was 5.0GT/s (LnkSta:), which is PCIe gen2 speed.&nbsp; Reducing the speed to gen1 would mean lower quality risers could be used without encountering errors.<br /><br />My first thought was to try to set the PCIe speed in the motherboard BIOS.&nbsp; Setting gen1 in the chipset options made no difference, so perhaps it is only the speed used during boot-up before the OS takes over control of the PCIe bus.&nbsp; Next, using "modinfo amdgpu", I noticed some module options related to PCIe.&nbsp; Adding "amdgpu.pcie_gen2=0" had no effect.&nbsp; Apparently <a href="https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/968433-amdgpu-change-pci-2-to-3">the module no longer supports that option</a>.&nbsp; I could not find any documentation for the "pcie_gen_cap", but luckily the open-source amdgpu module supports the same module parameter.&nbsp; By looking at <a href="http://elixir.free-electrons.com/linux/v4.9/source/drivers/gpu/drm/amd/include/amd_pcie.h#L27">amd_pcie.h in the kernel source code</a>, I determined&nbsp;"0x10001" will limit the link to gen1.&nbsp; I added "pcie_gen_cap=0x10001" to /etc/default/grub, ran update-grub, and rebooted.&nbsp; With lspci I was able to see that all the GPUs were running at 2.5GT/s.<br /><br />For clock control, and monitoring I've previously written about <a href="https://github.com/RadeonOpenCompute/ROC-smi">ROC-smi</a>.<br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">====================&nbsp; &nbsp; ROCm System Management Interface&nbsp; &nbsp; ====================</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">================================================================================</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">&nbsp;GPU&nbsp; DID&nbsp; &nbsp; Temp&nbsp; &nbsp; &nbsp;AvgPwr&nbsp; &nbsp;SCLK&nbsp; &nbsp; &nbsp;MCLK&nbsp; &nbsp; &nbsp;Fan&nbsp; &nbsp; &nbsp; Perf&nbsp; &nbsp; OverDrive&nbsp; ECC</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">&nbsp; 3&nbsp; &nbsp;6938&nbsp; &nbsp;66.0c&nbsp; &nbsp; 100.172W 858Mhz&nbsp; &nbsp;1550Mhz&nbsp; 44.71%&nbsp; &nbsp;manual&nbsp; &nbsp; 0%&nbsp; &nbsp; &nbsp; &nbsp;N/A</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">&nbsp; 1&nbsp; &nbsp;6939&nbsp; &nbsp;64.0c&nbsp; &nbsp; 112.21W&nbsp; 846Mhz&nbsp; &nbsp;1550Mhz&nbsp; 42.75%&nbsp; &nbsp;manual&nbsp; &nbsp; 0%&nbsp; &nbsp; &nbsp; &nbsp;N/A</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">&nbsp; 4&nbsp; &nbsp;6939&nbsp; &nbsp;62.0c&nbsp; &nbsp; 118.135W 839Mhz&nbsp; &nbsp;1500Mhz&nbsp; 47.84%&nbsp; &nbsp;manual&nbsp; &nbsp; 0%&nbsp; &nbsp; &nbsp; &nbsp;N/A</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">&nbsp; 2&nbsp; &nbsp;6939&nbsp; &nbsp;77.0c&nbsp; &nbsp; 123.78W&nbsp; 839Mhz&nbsp; &nbsp;1550Mhz&nbsp; 64.71%&nbsp; &nbsp;manual&nbsp; &nbsp; 0%&nbsp; &nbsp; &nbsp; &nbsp;N/A</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">GPU[0]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : PowerPlay not enabled - Cannot get supported clocks</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">GPU[0]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : PowerPlay not enabled - Cannot get supported clocks</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">&nbsp; 0&nbsp; &nbsp;0402&nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; None%&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; N/A</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">================================================================================</span><br /><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">====================&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;End of ROCm SMI Log&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ====================</span><br /><br />I also use <a href="https://github.com/OhGodACompany/OhGodATool">Kristy's utility</a> to set specific clock rates:<br /><span style="font-family: Courier New, Courier, monospace;"><b>ohgodatool -i 1 --mem-state 3 --mem-clock 1550</b></span><br /><br /><span style="font-family: inherit;">Unfortunately <a href="https://github.com/nerdralph/ethminer-nr/tree/110">ethminer-nr</a>&nbsp;doesn't work with this setup.&nbsp; I suspect the new driver doesn't support some old OpenCL option, so the fix should be relatively simple, once I make the time to debug it.</span><br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com2tag:blogger.com,1999:blog-6245413346375218188.post-69257858570727378712017-12-06T18:44:00.001-08:002017-12-06T18:44:07.385-08:00Powering GPU mining rigs<div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-1h5bgLxxmr8/Wih0GH9RWnI/AAAAAAAAmf4/AOxu-5ZSphke9hpiG2-mokENu91Tous0gCLcBGAs/s1600/rig.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="587" data-original-width="1222" height="305" src="https://2.bp.blogspot.com/-1h5bgLxxmr8/Wih0GH9RWnI/AAAAAAAAmf4/AOxu-5ZSphke9hpiG2-mokENu91Tous0gCLcBGAs/s640/rig.jpg" width="640" /></a></div><br />Since I started mining ethereum almost two years ago, I have found that power distribution is important not just for equipment safety, but also for system stability.&nbsp; When I started mining I thought my rigs should be fine as long as I used a robust server PSU to power the GPUs, with heavy 16 or 18AWG cables.&nbsp; After frying one motherboard and more than a couple ATX PSUs, I've learned a lot of careful design and testing is required.<br /><br />Using Dell, IBM, or HP server power supplies for mining rigs is not a new idea, so I won't go into too much detail about them.&nbsp; I do recommend making <a href="http://nerdralph.blogspot.ca/2017/06/server-psu-interlock.html">an interlock connector</a>&nbsp;so the server PSU turns on at the same time as the motherboard.&nbsp; I also recommend only connecting the server PSU to power the GPU PCIe power connectors, as they are isolated from the 12V supply for the motherboard.&nbsp; If you try to power ribbon risers, the 12V from the ATX and server PSUs will be interconnected and can lead to feedback problems.&nbsp; Server PSUs are very robust and unlikely to be harmed, but I have killed a cheap 450W ATX PSU this way.&nbsp; If you use USB risers, they are isolated from the motherboard's 12V supply, and therefore can be safely powered from the server PSU.<br /><br />In the photo above, you might notice the grounding wire connecting all the cards, which then connects to a server PSU.&nbsp; I recently added this to the rig after measuring higher current flowing through two of the ground wires connected to the 6-pin PCIe power plugs.&nbsp; As I mentioned in <a href="http://nerdralph.blogspot.ca/2016/03/hacking-gpu-pcie-power-connections.html">my post about GPU PCIe power connections</a>, there are only two ground pins, with the third ground wire being connected to the sense pin.&nbsp; With two ground pins and three power pins, the ground wires carry 50% more current than the 12V wires.&nbsp; Although the ground wires weren't heating up from the extra current, the connector was.&nbsp; Adding the ground bypass wire reduced the connector temperature to a reasonable level.<br /><br />For ATX PSUs, I've used a few of the EVGA 500B, and do not recommend them.&nbsp; While even my cheap old 300W power supplies use 18AWG wire for the hard drive power connectors, the SATA and molex power cables on the 500B are only 20AWG.&nbsp; Powering more than one or two risers with a 20AWG cable is a recipe for trouble.&nbsp; I burned the 12V hard drive power wire on two 500B supplies before I realized this.&nbsp; I recently purchased a Rosewill 500W 80plus gold PSU that was on sale at Newegg, and it is much better than the EVGA 500B.&nbsp; The Rosewill uses 18AWG wire in the hard drive cables, and it also has a 12V sense wire in the ATX power connector.&nbsp; This allows it to compensate for the voltage drop in the cable from the PSU to the motherboard.&nbsp; The sense wire is the thinner yellow wire in the photo below.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-mTNoNM4I_gw/WiiBRRe5I4I/AAAAAAAAmgY/5N2crZSEzoYocHnBEkwfyxmrNfuAJQrYQCLcBGAs/s1600/SenseWire.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="972" data-original-width="1296" height="300" src="https://4.bp.blogspot.com/-mTNoNM4I_gw/WiiBRRe5I4I/AAAAAAAAmgY/5N2crZSEzoYocHnBEkwfyxmrNfuAJQrYQCLcBGAs/s400/SenseWire.jpg" width="400" /></a></div><br />Speaking of voltage drop, I recommend checking the voltage at the PCIe power connector to ensure it is close to 12V.&nbsp; Most of my cards do not have a back plate, so I can use a multi-meter to measure at the 12V pins of the the power connector where they are soldered to the GPU PCB.&nbsp; I also recommend checking the temperature of power connectors since good quality low-resistance connectors are just as important as heavy gauge wires.&nbsp; Warm connectors are OK, but if they so hot that you can't hold your fingers to them, that's a problem.<br /><br />My last recommendation is for people in North America (and some other places) where 120V AC power is the norm.&nbsp; Wire up the outlets for your mining rigs for 240 instead of 120.&nbsp; Power supplies are slightly more efficient at 240V, and will draw half as much current compared to 120V.&nbsp; Lower current draw means less line loss going to the power supply and therefore less heat generated in power cords and plugs.&nbsp; Properly designed AC power cables and plugs should never overheat below 10-15 Amps, however I have seen melted and burned connectors at barely over 10A of steady current draw.<br /><br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com3tag:blogger.com,1999:blog-6245413346375218188.post-43892695857066713002017-06-23T19:02:00.001-07:002017-06-23T19:02:33.508-07:00Server PSU interlock<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-FCpIe14wdnk/WU3CwYiWKUI/AAAAAAAAlUI/_-VNiP9FQ4AKSViFQCS0Ur0tvsjAt-XPwCLcBGAs/s1600/InterlockPSU.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="540" data-original-width="775" height="278" src="https://3.bp.blogspot.com/-FCpIe14wdnk/WU3CwYiWKUI/AAAAAAAAlUI/_-VNiP9FQ4AKSViFQCS0Ur0tvsjAt-XPwCLcBGAs/s400/InterlockPSU.png" width="400" /></a></div><br />On my multi-GPU rigs, I use server PSUs like the Dell N750P to provide the 12V power to the PCI-E connectors. &nbsp;These PSUs do not have power switches, so initially I would just pull the power cord out when I wanted to power them down. &nbsp;After experimenting with the PSU control pins, I realized they have an active low "power on" pin. &nbsp;Instead of using a jumper to connect it to ground, I decided to use an electronic switch to power the server PSU when the motherboard powers up.<br /><br />The switch I used is a common, cheap model 817 optocoupler (<a href="http://www.vishay.com/docs/83522/k817p.pdf">pdf datasheet</a>). &nbsp;When current flows from pin 1 to 2, the optocoupler is turned on, creating a short from pin 4 to pin 3. &nbsp;For my small circuit shown above, pin 4 is connected to the PS_ON signal, and pin 3 is connected to ground on the server PSU. &nbsp;Pin 1 is connected to 12V (from the 4-pin 3.5" floppy drive power connector), and pin 2 is connected to ground. &nbsp;On the back of the board is a 1K current-limiting resistor in series with the red LED which is a power on indicator.<br /><br />I also made an even simpler interlock using only an optocoupler with the pins straightened and 0.1" header pins:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-_TkqyAL29kw/WU3HMaTSKqI/AAAAAAAAlUU/38b_qJ95OVoH6IlsuLDepZVZ2eluYBH_ACLcBGAs/s1600/Interlock2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="308" data-original-width="867" height="141" src="https://2.bp.blogspot.com/-_TkqyAL29kw/WU3HMaTSKqI/AAAAAAAAlUU/38b_qJ95OVoH6IlsuLDepZVZ2eluYBH_ACLcBGAs/s400/Interlock2.png" width="400" /></a></div>I connect pins 1 and 2 to the motherboard's power LED pins, which would normally light up a LED &nbsp;when the motherboard powers up. &nbsp;The motherboard already has a current-limiting resistor for the power LED, which typically limits the current to around 10mA.<br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-34655851676604139792017-05-12T18:06:00.002-07:002017-05-12T18:06:35.816-07:00Dummy plugs for headless GPU rigs<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-xPQ03owBhsM/WRZWPsAmrOI/AAAAAAAAlAs/cirJzpQeFL8YOMT4RYjv4mx-Uape6JH0ACLcB/s1600/HDMI-VGA.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="272" src="https://1.bp.blogspot.com/-xPQ03owBhsM/WRZWPsAmrOI/AAAAAAAAlAs/cirJzpQeFL8YOMT4RYjv4mx-Uape6JH0ACLcB/s400/HDMI-VGA.jpg" width="400" /></a></div><br />I've read about people claiming they needed to plug a monitor (or dummy plug) into one GPU card or else they couldn't use the card. &nbsp;I had never encountered any problems with either fglrx or AMDGPU-Pro drivers until recently. &nbsp;I moved a 4GB R9 380 card from an Ubuntu 14.04/fglrx rig to a Ubuntu 16.04/AMDGPU-Pro rig. &nbsp;The remaining cards are 2GB R7 370 cards, and I started getting memory allocation errors for the primary card. &nbsp;After checking with "<span style="font-family: Courier New, Courier, monospace;">ethminer --list-devices</span>", I noticed the first card had about half the maximum memory allocation limit of the others:<br /><b><span style="font-family: Courier New, Courier, monospace;">Genoil's ethminer 0.9.41-genoil-1.2.0nr</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">=====================================================================</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">Forked from github.com/ethereum/cpp-ethereum</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">CUDA kernel ported from Tim Hughes' OpenCL kernel</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">With contributions from nicehash, nerdralph, RoBiK and sp_</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;"><br /></span></b><b><span style="font-family: Courier New, Courier, monospace;">Please consider a donation to:</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;"><br /></span></b><b><span style="font-family: Courier New, Courier, monospace;">[OPENCL]:</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">Listing OpenCL devices.</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">FORMAT: [deviceID] deviceName</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">[0] Pitcairn</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_TYPE: GPU</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_GLOBAL_MEM_SIZE: 1920991232</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_MAX_MEM_ALLOC_SIZE: 970981376</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_MAX_WORK_GROUP_SIZE: 256</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">[1] Pitcairn</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_TYPE: GPU</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_GLOBAL_MEM_SIZE: 2095054848</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1868562432</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_MAX_WORK_GROUP_SIZE: 256</span></b><br /><div><br /></div>I have an old VGA LCD monitor that I connected using a HDMI-VGA adapter. &nbsp;After connecting the monitor, nearly the full amount became available:<br /><span style="font-family: Courier New, Courier, monospace;"><b>Genoil's ethminer 0.9.41-genoil-1.2.0nr</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>=====================================================================</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>Forked from github.com/ethereum/cpp-ethereum</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>CUDA kernel ported from Tim Hughes' OpenCL kernel</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>With contributions from nicehash, nerdralph, RoBiK and sp_</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b><br /></b></span><span style="font-family: Courier New, Courier, monospace;"><b>Please consider a donation to:</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b><br /></b></span><span style="font-family: Courier New, Courier, monospace;"><b>[OPENCL]:</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>Listing OpenCL devices.</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>FORMAT: [deviceID] deviceName</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>[0] Pitcairn</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_TYPE: GPU</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_GLOBAL_MEM_SIZE: 1969225728</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1750073344</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_MAX_WORK_GROUP_SIZE: 256</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>[1] Pitcairn</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_TYPE: GPU</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_GLOBAL_MEM_SIZE: 1968177152</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1750073344</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp; &nbsp; &nbsp; &nbsp; CL_DEVICE_MAX_WORK_GROUP_SIZE: 256</b></span><br /><div><br /></div><div>I also found the monitor doesn't have to be plugged in, just the HDMI-VGA adapter. &nbsp;While there might be a way to configure fglrx so that the full memory is available without the adapter, I'm more interested in learning more about <a href="http://nerdralph.blogspot.ca/2017/03/amdgpu-pro-1660-on-ubuntu-kernel-4105.html">AMDGPU-Pro</a>.</div>Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com2tag:blogger.com,1999:blog-6245413346375218188.post-34521278083101673372017-05-10T15:18:00.000-07:002017-05-10T15:18:46.283-07:00GDDR5 memory timing details<div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-XDcf4U0QG5w/WRHAt8-K6yI/AAAAAAAAk_Y/5eobHZYjTdgZU4Hv8rwS1ef3xwY1AIGCACLcB/s1600/GDDR5ActivateCycle.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="244" src="https://2.bp.blogspot.com/-XDcf4U0QG5w/WRHAt8-K6yI/AAAAAAAAk_Y/5eobHZYjTdgZU4Hv8rwS1ef3xwY1AIGCACLcB/s640/GDDR5ActivateCycle.png" width="640" /></a></div><br /><br />In my&nbsp;<a href="http://nerdralph.blogspot.ca/2016/09/advanced-tonga-bios-editing.html">Advanced Tonga BIOS editing</a>&nbsp;post, I discussed some basic memory timing information, but did not get into the details. &nbsp;GDDR5 memory is much more complex than the asynchronous DRAM of 20 years ago. &nbsp;There are many sources of information on SDRAM, while GDDR information is harder to come by. &nbsp;Although a thorough description of GDDR5 can be found in the spec published by <a href="https://www.jedec.org/">JEDEC</a>, neither nVIDIA nor AMD share information on how their memory controllers are programmed with memory timing information. &nbsp;By analyzing the <a href="https://github.com/torvalds/linux/tree/5924bbecd0267d87c24110cbe2041b5075173a25/drivers/gpu/drm/amd">AMD video driver source</a>, and with help from people contributing to <a href="https://bitcointalk.org/index.php?topic=1758267.0">a discussion on bitcointalk</a>, I have come to understand most of the workings of AMD BIOS timing straps.<br /><br />When a modern (R9 series and Rx series) AMD GPU card boots up, memory timing information (straps) are copied from the BIOS to registers in the memory controller. &nbsp;Some timing information such as refresh frequency is not dependent on the memory speed and therefore is not contained in the memory strap table, but much of the important timing information is. &nbsp;The memory controller registers are 32-bits wide, and so the 48-byte memory straps map to 12 different memory controller registers. &nbsp;The <a href="https://github.com/torvalds/linux/blob/5924bbecd0267d87c24110cbe2041b5075173a25/drivers/gpu/drm/amd/include/asic_reg/gmc/gmc_8_1_sh_mask.h">shift masks</a>&nbsp;in the Linux driver source are therefore non-functional, and can only be taken as hints as to the meaning of the individual bits. &nbsp;Due to an apparently bureaucratic process for releasing open-source code, AMD engineers are generally reluctant to update such code.<br /><br />Jumping right to the code, here's a C structure definition for the Rx memory straps:<br /><span style="font-family: Courier New, Courier, monospace;"><b>SEQ_WR_CTL_D1_FORMAT SEQ_WR_CTL_D1;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>SEQ_WR_CTL_2_FORMAT SEQ_WR_CTL_2;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>SEQ_PMG_TIMING_FORMAT SEQ_PMG_TIMING;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>SEQ_RAS_TIMING_FORMAT SEQ_RAS_TIMING;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>SEQ_CAS_TIMING_FORMAT SEQ_CAS_TIMING;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>SEQ_MISC_TIMING_FORMAT SEQ_MISC_TIMING;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>SEQ_MISC_TIMING2_FORMAT SEQ_MISC_TIMING2;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>uint32_t SEQ_MISC1;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>uint32_t SEQ_MISC3;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>uint32_t SEQ_MISC8;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>ARB_DRAM_TIMING_FORMAT ARB_DRAM_TIMING;</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>ARB_DRAM_TIMING2_FORMAT ARB_DRAM_TIMING2;</b></span><br /><br />Looking at the RAS timing, it consists of 6 fields: RCDW, RCDWA, RCDR, RCDRA, RRD, and RC. &nbsp;The full field definitions can be found in <a href="https://github.com/nerdralph/OhGodADecode/blob/master/ohgodadecode.h">my fork</a> of&nbsp;Kristy-Leigh's code. &nbsp;Many of the "pad" fields are likely the high bits of the preceding field that are not currently used. &nbsp;<a href="https://bitcointalk.org/index.php?topic=1758267.msg18640269#msg18640269">I tested a couple pad fields already</a>&nbsp;(MISC RP_RDA &amp; RP), confirming that the pad bits were actually the high bits of the fields.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-sNJzpAF9Fv0/WROHC6rSfhI/AAAAAAAAk_0/OddE3Sbq5EoqWNcI8hpWgAmGAC9vfUexACLcB/s1600/GDDR5table.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-sNJzpAF9Fv0/WROHC6rSfhI/AAAAAAAAk_0/OddE3Sbq5EoqWNcI8hpWgAmGAC9vfUexACLcB/s1600/GDDR5table.png" /></a></div><br />For GDDR5, some timing values have both Long and Short versions that apply for access within a bank group or to different bank groups. &nbsp;The RRD field of RAS timing is likely RRDL, because the values typically seen for this field are 5 and 6. &nbsp;If RRDS was 5, this would mean at most one page could be opened every five cycles, limiting 32-byte random read performance to 2/5 or 40% of the maximum interface speed. &nbsp;From my work with <a href="http://nerdralph.blogspot.ca/2016/08/ethereum-mining-on-ubuntu-linux.html">Ethereum mining</a>, I know that RRDS can be no more than 4. &nbsp;In addition, performance tests with RRD timing reduced to 5 from 6 are congruent with it being RRDL. &nbsp;The actual value of RRDS used by the memory controller does not seem to be contained in the timing strap. &nbsp;The default 1750Mhz strap for Samsung K4G4 memory has a value of 10 for FAW, which can be no more than 4 * RRDS. &nbsp;Therefore RRDS is most likely less than 4, and possibly as low as 2.<br /><br />To simplify the process of modifying memory straps for improved performance, I wrote&nbsp;<a href="https://github.com/nerdralph/strapread/blob/master/strapmod.py">strapmod</a>. &nbsp;I also wrote a cgi wrapper for the program, which you can <a href="http://45.62.227.192/cgi-bin/strapmod?777000000000000022CC1C0010626C49D0571016B50BD509004AE700140514207A8900A003000000191131399D2C3617">run from my server</a>&nbsp;http://45.62.227.192/cgi-bin/strapmod. &nbsp;For example, this is the output with the 1750Mhz strap for Samsung K4G4 memory:<br /><span style="font-family: Courier New, Courier, monospace;"><b>Rx strap detected</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>Old, new RRD: 6 , 5</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>Old, new FAW: A , 0</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>Old, new 32AW: 7 , 0</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>Old, new ACTRD: 19 , 0x10</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>777000000000000022CC1C0010626C49D0571016B50BD509004AE700140514207A8900A003000000191131399D2C3617</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>777000000000000022CC1C0010625C49D0571016B50BD50900400700140514207A8900A003000000101131399D2C3617</b></span><br /><div><br /></div>Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com32tag:blogger.com,1999:blog-6245413346375218188.post-65864910713768380522017-03-25T12:17:00.001-07:002017-04-08T16:06:59.324-07:00AMDGPU-Pro 16.60 on Ubuntu kernel 4.10.5 with ROCM-smi<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-2TfvZvMJJaY/WNa73CorZdI/AAAAAAAAkuI/5NwvGZzh7wcbTd653qL1XuugSN8p01avgCLcB/s1600/ROC-smi.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-2TfvZvMJJaY/WNa73CorZdI/AAAAAAAAkuI/5NwvGZzh7wcbTd653qL1XuugSN8p01avgCLcB/s1600/ROC-smi.png" /></a></div><br />Although <a href="http://nerdralph.blogspot.ca/2017/03/amdgpu-pro-on-ubuntu.html">AMDGPU-Pro 16.40 with kernel 4.8</a> has been working fine for me, I decided to try 16.60 with kernel 4.10. &nbsp;After my problems with 16.60 on 4.8, I read a few reports claiming it works well with kernel 4.10.<br /><br />I started with a fresh Ubuntu desktop 16.04.2 install, and then installed <a href="http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.5/">4.10.5 from the Ubuntu ppa</a>. &nbsp;Although the process is not very complicated, I wrote <a href="https://gist.github.com/nerdralph/d061545b9c0ec10cd61866cd882c6a08">a small script</a> which downloads the files and installs them. &nbsp;After rebooting, I downloaded and installed the AMDGPU-Pro 16.60 drivers according to <a href="http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Install.aspx">the instructions</a>. &nbsp;Finally, I installed <a href="https://github.com/RadeonOpenCompute/ROC-smi">ROC-smi</a>, a utility which simplifies clock control using the sysfs interface. &nbsp;To test the install, run "rocm-smi -a" which will show all info for any amdgpu cards installed.<br /><br />Unfortunately, the new drivers no longer work with <a href="https://github.com/nerdralph/ethminer-nr/tree/110">my ethminer fork</a>, but <a href="https://github.com/genesismining/sgminer-gm">sgminer-gm 5.5.5</a> works as was well as it did with 4.8/16.40. &nbsp;On GCN3 and newer cards like Tonga and Polaris, the optimal core clock for mining ETH is often between 55% and 56% of the memory clock. &nbsp;On my Sapphire Rx470 I have the memory overclocked to 2100Mhz, so dpm 6 at 1169Mhz is a perfect fit:<br /><b><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">./rocm-smi -d 0 --setsclk 6</span></b><br /><br />Once sgminer was running for a couple minutes, the speed settled at about 29.1Mh/s. &nbsp;Note that the clock setting is only temporary for the next opencl program to run. &nbsp;Just run the rocm-smi command each time.<br /><h4>Update 2017-04-08</h4><div>&nbsp;<a href="http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.9/">4.10.9 was uploaded to the Ubuntu ppa</a>&nbsp;today, so I would recommend it instead of 4.10.5.</div><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com18tag:blogger.com,1999:blog-6245413346375218188.post-30945617650683932232017-03-14T06:52:00.000-07:002017-03-15T15:02:53.698-07:00Riser Recycling<div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-kHhMmpsm8-w/WMfsp3zxauI/AAAAAAAAkq8/e4vlOTfwEhoDKkjQhT2l0qEqCxvFqh50gCLcB/s1600/Riser1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="261" src="https://2.bp.blogspot.com/-kHhMmpsm8-w/WMfsp3zxauI/AAAAAAAAkq8/e4vlOTfwEhoDKkjQhT2l0qEqCxvFqh50gCLcB/s640/Riser1.jpg" width="640" /></a></div><br />If you build multi-GPU servers, you'll likely encounter flaky or bad risers. &nbsp;I've had a bad riser where I could see a burned trace on the PCB, and I've had flaky risers that appeared to be caused by poor soldering of the ribbon cable. &nbsp;While the problem risers may not work with a GPU, chances are the power connectors are still good. &nbsp;The riser shown above has a 6-pin PCI-e and a 4-pin molex connector, both of which I tested for continuity with a multi-meter. &nbsp;With some fresh flux I was able to desolder the ribbon cable, so I could re-use the riser as a PCI-e to molex power adapter. &nbsp;If you are wondering what I would use it for, look at the photo below.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-YQ8K6XqwQIo/WMfwoZ6PrsI/AAAAAAAAkrI/PGRTb8Jf-jwx3WutEVi4BjavpNoIbGjAQCLcB/s1600/Molex4pin.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="297" src="https://1.bp.blogspot.com/-YQ8K6XqwQIo/WMfwoZ6PrsI/AAAAAAAAkrI/PGRTb8Jf-jwx3WutEVi4BjavpNoIbGjAQCLcB/s400/Molex4pin.jpg" width="400" /></a></div><br />Heat has caused the yellow 12V line to turn brown. &nbsp;The cable was plugged into the motherboard's supplemental PCI-e power which is used when more than two GPUs are plugged in. &nbsp;Each GPU will usually draw between 50 and 75 watts over the PCI-e bus, which is pushing the 18AWG (or even 20AWG on some power supplies) cable well beyond it's recommended rating. &nbsp;By plugging the next molex connector in the chain into the riser, and by providing power to the 6-pin connector on the same riser, current will flow into the motherboard molex connector from both directions.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-vhGo2qad0JU/WMf0YyxP15I/AAAAAAAAkrU/XWD0imFCJdEqm0dCs0snYHjzE_HhzFd3ACLcB/s1600/Riser2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="218" src="https://1.bp.blogspot.com/-vhGo2qad0JU/WMf0YyxP15I/AAAAAAAAkrU/XWD0imFCJdEqm0dCs0snYHjzE_HhzFd3ACLcB/s400/Riser2.jpg" width="400" /></a></div><br />With the current through the brown wire cut in half, the power dissipated (and therefore the heat generated) is reduced by 75%, since P = I^2 * R.<br /><br /><h4>Supplemental mod</h4><div>Bitcointalk user BChydro questioned the current-carrying ability of the riser PCB, which turns out to be rather poor for the 12V trace. &nbsp;The solder mask over the 12V trace was starting to turn brown after only a couple days of use, and a thermal image shows the trace getting hot.</div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-7U5-PWNoqXE/WMm4b9AVZUI/AAAAAAAAksE/zaz_0SisncwN5PFHxw3sIVA-5Tn9qUDdwCLcB/s1600/img_thermal_1489611499287.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://1.bp.blogspot.com/-7U5-PWNoqXE/WMm4b9AVZUI/AAAAAAAAksE/zaz_0SisncwN5PFHxw3sIVA-5Tn9qUDdwCLcB/s400/img_thermal_1489611499287.jpg" width="400" /></a></div><div><br /></div><div>To solve the problem I added a 18AWG jumper wire between the 12V pins:</div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-SqHNFtrGMyE/WMm54ynMZzI/AAAAAAAAksQ/GeSKXwBQU2szwHGpFLJ4U8yjAru3oxCTwCLcB/s1600/PCBjumper.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="223" src="https://1.bp.blogspot.com/-SqHNFtrGMyE/WMm54ynMZzI/AAAAAAAAksQ/GeSKXwBQU2szwHGpFLJ4U8yjAru3oxCTwCLcB/s400/PCBjumper.jpg" width="400" /></a></div><div><br /></div>Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-80323823932019053192017-03-05T17:20:00.004-08:002017-03-05T17:20:58.017-08:00AMDGPU-Pro on UbuntuIt's been almost a year since the first AMDGPU-Pro driver release. &nbsp;There are now two main release versions; <a href="http://support.amd.com/en-us/kb-articles/Pages/AMD-Radeon-GPU-PRO-Linux-Beta-Driver%E2%80%93Release-Notes.aspx">16.40</a> and <a href="http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-for-Linux-Release-Notes.aspx">16.60</a>. &nbsp;Although both versions supposedly support Ubuntu 16.04, version 16.40 with <a href="http://releases.ubuntu.com/16.04/">Ubuntu Desktop 16.04.2</a>&nbsp;is the only combination that works without a kernel update.<br /><br />Ubuntu 16.04.2 is the first 16.04 release to use kernel version 4.8 instead of version 4.4. &nbsp;Using 16.40 with kernel version 4.4 would sometimes lead to problems such as kernel message log floods or powerplay problems. &nbsp;The typical powerplay problem was that the card would not switch to the full system and memory clock when running OpenCL programs.<br /><br />Before a fresh Ubuntu install, I suggest disabling safeboot, since the AMDGPU-Pro drivers are not signed and therefore do not work with safeboot. &nbsp;If safeboot is already set up on your system, the driver install script will prompt you to disable it. &nbsp;Unlike the fglrx drivers, I have found the AMDGPU-Pro drivers will work along with the Intel i915 drivers. &nbsp;In a multi-GPU system, I like to leave a monitor connected to the on-board video for a system console. &nbsp;GPUs can easily be swapped in and out without having to move the monitor connection.<br /><br />Before installing the driver, make sure your card is detected by running, "lspci | grep VGA". &nbsp;The <a href="http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Install.aspx">installation instructions</a> are straightforward, and don't forget to update the video group as mentioned at the end of the instructions. &nbsp;Otherwise OpenCL programs will not detect the GPU. &nbsp;Note that there is a bug in clinfo (/opt/amdgpu-pro/bin/clinfo) that causes it to display 14 for "Max compute units" instead of the actual number of GPU compute units. &nbsp;This bug is fixed in 16.60, which requires kernel 4.10 to work properly.<br /><br />To test your GPU and the driver, you could try <a href="https://github.com/nerdralph/ethminer-nr/raw/110/releases/ethminer-1.2.0nr-OCL.tgz">my ethminer fork</a>. &nbsp;Although I built and tested it on Ubuntu 14.04/fglrx, it works perfectly on Ubuntu 16.04.2 with AMDGPU-Pro 16.40. &nbsp;Once you've started ethminer (or any other OpenCL program), you can check the core and memory clocks with the following commands:<br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp;cat /sys/class/drm/card0/device/pp_dpm_sclk</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b>&nbsp;cat /sys/class/drm/card0/device/pp_dpm_mclk</b></span><br /><span style="font-family: Courier New, Courier, monospace;"><b><br /></b></span>The driver does not come with a tool like aticonfig for custom clock control. &nbsp;The driver does expose ways of controlling the clocks and voltage, and some developers have written custom programs using information from the kernel headers. &nbsp;Although nobody seems to have released a utility, <a href="https://github.com/genesismining/sgminer-gm/blob/sysfs-test/sysfs-gpu-controls.c">the sgminer-gm sysfs code</a> could likely be used as a template to create a stand-alone utility.<br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com2tag:blogger.com,1999:blog-6245413346375218188.post-20073470107943900572017-02-20T08:55:00.001-08:002017-12-09T14:01:27.159-08:00Inside AMD GCN code execution<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-LhoNfda9GQ0/WKsKDWaRymI/AAAAAAAAkfc/xp_vsajZl4o27S9D0cXJuBjUMzwSX3jCwCLcB/s1600/GCN.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="344" src="https://3.bp.blogspot.com/-LhoNfda9GQ0/WKsKDWaRymI/AAAAAAAAkfc/xp_vsajZl4o27S9D0cXJuBjUMzwSX3jCwCLcB/s640/GCN.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">AMD's <a href="http://www.amd.com/en-us/innovations/software-technologies/gcn">Graphics Core Next architecture</a>&nbsp;was introduced over five years ago. &nbsp;Although there have been many documents written to help developers understand the architecture, and thereby write better code, I have yet to find one that is clear and concise. &nbsp;AMD's <a href="https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf">best GCN documentation</a> is often cluttered with unnecessary details on the old VLIW architecture, when the GCN architecture is already complicated enough on it's own. &nbsp;I intend to summarize my research on GCN, and what that means for OpenCL and GCN assembler kernel developers.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">As shown in the top diagram (GCN Compute Unit), the GPU consists of groups of four compute units. &nbsp;Each CU has four SIMD units, each of which can perform 16 simultaneous 32-bit operations. &nbsp;Each of these 16 SIMD "lanes" is also called a shading unit, so the R9 380 with 28 CUs has 28 * 4 * 16 = <a href="https://www.techpowerup.com/gpudb/2734/radeon-r9-380">1792 shading units</a>.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">AMD's documentation makes frequent reference to "wavefronts". &nbsp;A wavefront is a group of 64 operations that executes on a single SIMD. &nbsp;The SIMD operations take a minimum of four clock cycles to complete, however SIMD pipelines allow a new operation to be started every clock. &nbsp;"The compute unit selects a single SIMD to decode and issue each cycle, using round-robin arbitration." (<a href="https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf">AMD GCN whitepaper</a>&nbsp;pg 5, para 3). &nbsp;So four cycles after SIMD0 has been issued an instruction, the CU is ready to issue it another.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">In OpenCL, when the local work size is 64, the 64 work-items will be executed on a single SIMD. &nbsp;Since a maximum of four SIMD units can access the same local memory (LDS), AMD GCN devices support a maximum local work size of 256. &nbsp;When the local work size is 64, the OpenCL compiler can leave out barrier instructions, so performance will often (but not always) be better than using a local work size of 128, 192, or 256.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">The SIMD units only perform vector operations such as mulitply, add, xor, etc. &nbsp;Branching for loops or function calls is performed by the scalar unit, which is shared by all four SIMD units. &nbsp;This means that when a kernel executes a branch instruction, it is executed by the scalar unit, leaving a SIMD unit available to perform a vector operation. &nbsp;The two operations (scalar and vector) must come from different waves, so to ensure the SIMD units are fully utilized, the kernel must allow for 2 simultaneous wavefronts to execute. &nbsp;For information on how resource usage such as registers and LDS impacts the number of simultaneous wavefronts that can execute, I suggest reading <a href="http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/">AMD's OpenCL Optimization Guide</a>. &nbsp;Note that some sources state that full SIMD occupancy requires four waves, when it is technically possible with just one wave using only vector instructions. &nbsp;Most kernels will require some scalar instructions, so two waves is the practical minimum.</div><div class="separator" style="clear: both; text-align: left;"><br /></div>Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com2tag:blogger.com,1999:blog-6245413346375218188.post-41370614090823766572017-01-09T15:47:00.000-08:002017-01-09T15:47:37.957-08:00Hot Video Cards<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-TzuFyWpVEvw/WHQZw5lLyRI/AAAAAAAAkRg/eMf6NknQTe8T93AfJexsgs1fGKLnAzRogCLcB/s1600/R9380IR.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" src="https://1.bp.blogspot.com/-TzuFyWpVEvw/WHQZw5lLyRI/AAAAAAAAkRg/eMf6NknQTe8T93AfJexsgs1fGKLnAzRogCLcB/s640/R9380IR.png" width="640" /></a></div><br />When I read discussions about video card temperatures, the vast majority are about the GPU core temperature. &nbsp;With older GPUs like the R9 290, temperature-based throttling when the GPU core temperature hits 94C can be a problem. &nbsp;With newer GPUs like the R9 380 and especially with the Rx series cards, there is rarely issues with GPU core temperatures, even with low-end cooling systems. &nbsp;While the GPU core is always cooled with a heatsink and fans, often the RAM is not. &nbsp;The infrared image above shows how much of a difference that can make in RAM temperatures.<br /><br />The image was taken of a 4GB MSI R9 380 card with the memory clocked at 1600Mhz while running <a href="https://github.com/nerdralph/ethminer-nr/tree/110">ethminer-nr</a>. &nbsp;The memory chips above the GPU are connected to the heatsink through a thermal pad, but the chips to the left of the GPU are not. &nbsp;Using an infrared thermometer I measured temperatures between 95 and 100C on the back side of the PCB from the RAM, so the RAM die temperatures are likely well in excess of 100C.<br /><br />Keeping RAM cool can make a material difference in the clock speeds that can be achieved. &nbsp;Instead of 1600Mhz, I have found that 1500Mhz-rated GDDR5 can reach stable speeds of 1700Mhz when connected to a basic heat spreader. &nbsp;The brand of the memory, Elpida, Hynix, or Samsung, makes little difference in performance when compared to cooling.<br /><br />While manufacturers will rarely provide enough detail in their specifications or product images to determine if the RAM is cooled, card tear-down reviews will often show the connection between the heatsink and RAM. &nbsp;Of the cards I have used, only a MSI R9 380 Gaming card had all the RAM cooled. &nbsp;Neither MSI Armor2X cards nor Gigabyte Windforce cards have all the RAM chips cooled with a heatsink or heat spreader. &nbsp;I also own an Asus Rx 470 Strix card, and that also lacks active cooling for some of the memory chips.<br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-43049144681051330992016-10-29T11:59:00.001-07:002016-10-30T08:12:59.838-07:00zcash mining<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-bEguDsauiyk/WBThiz5gBZI/AAAAAAAAjdQ/AZDYAF73wQcrBiuOQL572PmD_xi1PsWQQCLcB/s1600/zcash.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-bEguDsauiyk/WBThiz5gBZI/AAAAAAAAjdQ/AZDYAF73wQcrBiuOQL572PmD_xi1PsWQQCLcB/s1600/zcash.png" /></a></div><br /><a href="http://z.cash/">Zcash</a>&nbsp;is the hottest coin this month, after going live on October 28th, following several of months of testing. &nbsp;Zcash promises private transactions, so that they cannot be viewed on the public blockchain like bitcoin or ethereum.<br /><br />I did not expect zcash mining to be immediately profitable, since mining rewards are being ramped up over the first month. &nbsp;However the first hour of trading on <a href="http://poloniex.com/">Poloniex</a>&nbsp;saw zcash (ZEC) trading at insane values of over 1000 bitcoin per ZEC. &nbsp;Even after 24 hours, 1 ZEC is trading for about 6 BTC, or US$4300. &nbsp;Despite the low mining reward rate, mining pool problems, and buggy mining software, I was able to earn 0.005 ZEC in one day with a couple rigs.<br /><br />Zcash has both private address starting with "z", and public or transparent address starting with "t". &nbsp;A bug in the zcash network software has meant problems with private transfers, so it is recommended for miners to use only transparent wallet addresses until the bug is fixed. &nbsp;Miners using the "z" address have apparently had problems receiving their zcash payouts from mining pools.<br /><br />I have been using <a href="https://github.com/eXtremal-ik7/xpmclient/tree/version/zcash">eXtremal's miner</a>&nbsp;version 0.2.2, which uses OpenCL kernels from the <a href="https://zcashminers.org/submissions">zcash open-source miner competition</a>. &nbsp;Windows and Linux binaries can be downloaded <a href="http://coinsforall.io/distr/zcashclient-0.2.2.zip">from coinsforall.io</a>, the pool the software is designed for. &nbsp;I get the best performance with the silentarmy kernel, but with only one instance as running 2 instances results in a crash. &nbsp;On Windows running driver version 16.10.1 I get about 26 solutions/s with a Rx 470. &nbsp;Under Ubuntu with fglrx drivers I get about 11 solutions/s for both R7 370 and R9 380 cards.<br /><br />I experimented with the worksize and threads values in config.txt, but was unable to improve performance compared to the default 256/8192. &nbsp;Increasing the core clock on the R9 380 cards from 900Mhz to 1Ghz increased the performance by 3-4%.<br /><br />Genoil has <a href="https://github.com/Genoil/ZECMiner">released a miner</a>, but only Windows binaries with tromp's kernel at this time. &nbsp;A version including silentarmy's kernel is in the works.<br /><br />I was unable to find any zcash mining calculators, so I wrote <a href="https://gist.github.com/nerdralph/dc4b6ad4f0f8f58098187be2f6debd5f">a short python calculator</a>. &nbsp;Here's an example based on the network hashrate (in thousands) at block 1072, for a rig mining 140 solutions/s:<br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><b>./zec.py 1072 1840 140</b></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><b>Daily ramped mining reward in blocks: 308</b></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><b>Your estimated earnings: 0.0234347826087</b></span><br /><div><br /></div><div>At the current price of 6BTC/ZEC, the earnings work out to about US$100. &nbsp;Even if the price drops to 3BTC/ZEC, the daily earnings are still more than double what the same hardware could make mining ethereum. &nbsp;Apparently many other ethereum miners have realized this, since the ethereum network hashrate has dropped by about 25% in less than 30 hours. &nbsp;I expect this trend to continue in the coming days, and eventually reach an equilibrium as the ZEC price continues to drop until it is below parity with BTC.<br /><br /><h4>2016-10-30 update</h4></div><div>Coinsforall is still having stability problems, and now 1 ZEC is worth about 1.2 BTC. &nbsp;Therefore I've switched back to eth mining for all my cards except one Rx 470. &nbsp;With <a href="https://github.com/Genoil/ZECMiner">Genoil's ZECminer</a>&nbsp;I'm getting about 26 sol/s. &nbsp;I started using zcash.miningpoolhub.com, and after an hour of mining the pool has been stable. &nbsp;Reported hashrate on the pool is about 12H/s, or half the solution rate as expected.</div><div><br /></div>Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com3tag:blogger.com,1999:blog-6245413346375218188.post-76422325112554578972016-09-18T13:23:00.002-07:002016-09-18T13:23:58.458-07:00Advanced Tonga BIOS editing<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-z8q4JHkzS7s/V9s7UfikwTI/AAAAAAAAjZY/Rwzq3yAeZNs8SxjGUTEzHs9bc4mF4tjMwCLcB/s1600/TongaBiosReader.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="560" src="https://1.bp.blogspot.com/-z8q4JHkzS7s/V9s7UfikwTI/AAAAAAAAjZY/Rwzq3yAeZNs8SxjGUTEzHs9bc4mF4tjMwCLcB/s640/TongaBiosReader.png" width="640" /></a></div><br />I recently decided to spend some time to figure out some of the low-level details of how the BIOS works on my R9 380 cards. &nbsp;A few months ago I had found <a href="https://github.com/Hedzin/TongaBiosReader">Tonga Bios Editor</a>, but hadn't done anything more than modify the memory frequency table so the card would default to 1500Mhz instead of 1375. &nbsp;My goal was to modify the memory timing and to reduce power usage.<br /><br />The card I decided to test the memory timing mods on was a Club3D 4GB R9 380 with Elpida W4032BABG-60-F RAM. &nbsp;Although the RAM is rated for 6Gbps/1.5Ghz, the default memory clock is 1475Mhz. &nbsp;In my previous testing I found that the card was stable with the memory overclocked well above 1.5Ghz, but the mining performance was actually slower at 1.6Ghz compared to 1.5Ghz. &nbsp;Unfortunately Tonga Bios Reader does not provide a way to edit the memory timings aka straps, so I'd have to use a hex editor.<br /><div class="separator" style="clear: both; text-align: center;"></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-_4h3qpgY5-Y/V9tVMisspbI/AAAAAAAAjZo/USXf3lG50R8PzYZQFTQtcoAou_xEvn-zACLcB/s1600/1500MhzStrap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-_4h3qpgY5-Y/V9tVMisspbI/AAAAAAAAjZo/USXf3lG50R8PzYZQFTQtcoAou_xEvn-zACLcB/s1600/1500MhzStrap.png" /></a></div><br />I've highlighted the 1500Mhz memory timing in the screen shot above. &nbsp;I found it by searching for the string F0 49 02, which you first have to convert from little-endian to get 249F0, and then from hex to get 150,000, which is expressed in increments of .01Mhz. &nbsp;The timing for up to 1625Mhz (C4 7A 02) comes after it, and then 1750Mhz (98 AB 02). &nbsp;The Club3D BIOS actually has 2 sets of timings, one for memory type 01 (the number after F0 49 02), as and for memory type 02 (not shown). &nbsp;This is so the same BIOS can be used on a card that can be made with different memory. &nbsp;Obviously one type of memory the BIOS supports is Elpida, and from comparing BIOS images from other cards, I determined that memory type 02 is for Hynix.<br /><br />To reduce the chance of bricking my card, the first time I modified only the 1625Mhz memory timing. &nbsp;Since the default memory timing is 1475Mhz, my modified timing would only be used when overclocking the memory over 1500Mhz. &nbsp;So if the the card crashed on the 1625Mhz timing, it would be back to the safe 1500Mhz timing after a reboot. &nbsp;To actually make the change I copied the 1500Mhz timing (starting with 77 71) to the 1625Mhz timing. &nbsp;After the change, the BIOS checksum is invalid, so I simply loaded the BIOS in Tonga Bios Reader and re-saved it in order to update the checksum.<br /><br />I used <a href="https://www.techpowerup.com/downloads/2531/atiflash-2-71">Atiflash 2.71</a>&nbsp;to flash the BIOS since I have found no DOS or Linux flash utilities for Tonga GPUs. &nbsp;After flashing the updated BIOS, I overclocked the RAM to 1625Mhz, and my eth mining speed went from just under 21Mh to about 22.5Mh. &nbsp;To get even faster timings, I copied the 1375Mhz timings from a MSI R9 380 with Elpida RAM to the Club3d 1625Mhz memory timing. &nbsp;That boosted my mining speed at 1625Mhz to slightly over 23Mh<br /><br />I then tried a number of ways to improve the timing beyond 1625Mhz, but I found nothing that was both stable and faster at 1700Mhz. &nbsp;Different cards may overclock better, depending on both the GPU asic and the memory. &nbsp;Hynix memory seems to overclock a bit better than Elpida, while Samsung memory, which seems rather rare on R9 380 cards, tends to overclock the best. &nbsp;The memory controller on the GPU also needs to be able overclock from 1475Mhz. &nbsp;Unlike the simple&nbsp;<a href="http://nerdralph.blogspot.ca/2016/09/hawaii-bios-voltage-modding.html">voltage modding the Hawaii BIOS</a>, there is no easy way to modify the memory controller voltage (VDDCI) on Tonga. &nbsp;The ability to over-volt the memory controller would make it easier to overclock the memory speed beyond 1625Mhz.<br /><br />Since the Club3D BIOS supports both Elpida and Hynix memory, I improved the timing for both memory types. &nbsp;This allows me to use a single BIOS image for cards that have either Elpida or Hynix memory. &nbsp;It's also dependent on the card having a NCP81022 voltage controller, but all my R9 380 cards have the same voltage controller. &nbsp;I've <a href="https://drive.google.com/drive/folders/0BwLnDyLLT3WkRUQ4VU5kVm5qM0k">shared it on my google drive</a> as 380NR.ROM if you want to try it (at the possible risk of bricking your card). &nbsp;Atiflash checks the subsystem ID of the target card against the BIOS to be flashed, so it is necessary to use the command-line version of atiflash with the "-fs" option:<br /><b><span style="font-family: Courier New, Courier, monospace;">atiflash -p 0 380RN.ROM -fs</span></b><br /><br />In addition to improving memory speeds, I wanted to reduce power usage of my 380 cards. &nbsp;On Windows it is possible to use a tool like MSI Afterburner to reduce the core voltage (VDDC), but on Linux there is no similar tool. &nbsp;To reduce the voltage in the BIOS, modify value0 in Voltage Table2 for the different DPM states. &nbsp;After a lot of experimenting, I made two different BIOSes with different voltage levels since some cards under-volt better than others. &nbsp;The first one has 975, 1050, and 1100 mV for dpm 5, 6, &amp; 7, while the other has 1025, 1100, &amp; 1150 mV. &nbsp;These are also <a href="https://drive.google.com/drive/folders/0BwLnDyLLT3WkRUQ4VU5kVm5qM0k">shared on my google drive</a>&nbsp;as 380NR1100.ROM and 380NR1150.ROM.<br /><br />With the faster RAM timing and voltage modifications I've improved my eth mining hashrates by about 10%, without any material change in power use. &nbsp;I've tried my custom ROM on four different cards. &nbsp;Although two of them seem to be OK with 900/1650Mhz clocks, I'm playing it safe and running all four at 885/1625Mhz. &nbsp;If you are lucky and have a card that is stable at 925/1700Mhz, you can mine eth at almost 25Mh/s. &nbsp;With most cards you can expect to get between 23 and 24Mh/s.<br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com56tag:blogger.com,1999:blog-6245413346375218188.post-8453309425811977932016-09-11T10:46:00.000-07:002016-09-11T10:46:10.359-07:00Hawaii BIOS voltage modding<div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-DnsyKIe91t0/V9IOeNjHizI/AAAAAAAAjX0/M2oabHU5KtgDDNTOrPlu26KRvk5U0ouFQCLcB/s1600/HawaiiPowerPlay.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-DnsyKIe91t0/V9IOeNjHizI/AAAAAAAAjX0/M2oabHU5KtgDDNTOrPlu26KRvk5U0ouFQCLcB/s1600/HawaiiPowerPlay.png" /></a></div><br />When using Hawaii GPUs such as the R9 290 on Linux, aticonfig does not provide the ability to modify voltages. &nbsp;Even under windows, utilities such as MSI Afterburner usually have limits on how much the GPU voltage can be increased or decreased. &nbsp;In order to reduce power consumption I decided to create a custom BIOS with lower voltages for my MSI R9 290X.<br /><br />The best tool I have found for Hawaii BIOS mods is&nbsp;<a href="https://github.com/OneB1t/HawaiiBiosReader">Hawaii Bios Reader</a>. &nbsp;For reading and writing the BIOS to Hawaii cards, I use <a href="https://www.techpowerup.com/downloads/2531/atiflash-2-71">ATIFlash 2.71</a>. &nbsp;It woks from DOS, so I can use the FreeDOS image included in <a href="https://www.system-rescue-cd.org/SystemRescueCd_Homepage">SystemRescueCD</a>.<br /><br />In the screen shot above, I've circled two voltages. &nbsp;The first, VDDCI, is the memory controller voltage. &nbsp;Reducing it to 950mV gives a slight power reduction.<br /><br />The second voltage is the DPM0 GPU core voltage. &nbsp;DPM0 is the lowest power state, when the GPU is clocked at 300Mhz, and powered at approximately 968mV. &nbsp;I say approximately because the actual voltage seems to be close to the DPM0 value, but not always exact. &nbsp;This may be related to the precision of the voltage regulator on the card, or the BIOS may be using more than just the DPM0 voltage table to control the voltage. &nbsp;The rest of the DPM values are not voltages, but indexes into a table that has a formula for the BIOS to calculate the increase in voltage based on the leakage characteristics of the GPU. &nbsp;I do not change them.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-cbo-K3TsSSM/V9WWW0JaAWI/AAAAAAAAjY4/qYqhGoc5SjsTJ-rTADzhBMRZOb0B1B9KACLcB/s1600/HawaiiLimitTables.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-cbo-K3TsSSM/V9WWW0JaAWI/AAAAAAAAjY4/qYqhGoc5SjsTJ-rTADzhBMRZOb0B1B9KACLcB/s1600/HawaiiLimitTables.png" /></a></div><br />For reasons I have not yet figured out, the DPM0 voltage in each of the limit tables has to match the PowerPlay table. &nbsp;After modifying the four limit tables, the BIOS can be saved and flashed to the card.<br /><br />I've created modified BIOS files for a MSI R9 290X 4GB card with DM0 voltages of 868, 825, and 775. &nbsp;With the 775mV BIOS I was able to reduce power consumption by over 20% compared to 968mV.<br /><br /><a href="https://drive.google.com/drive/folders/0BwLnDyLLT3WkRUQ4VU5kVm5qM0k">https://drive.google.com/drive/folders/0BwLnDyLLT3WkRUQ4VU5kVm5qM0k</a><br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-34848736528731778542016-09-07T06:26:00.000-07:002016-09-07T06:35:20.540-07:00Monero mining on Linux<div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-_tGTPDHt3BM/V83vbSLST8I/AAAAAAAAjXQ/brdBeDeAHiAMd9MHtGsn__Y7y2pHRQFQgCLcB/s1600/monero.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="84" src="https://4.bp.blogspot.com/-_tGTPDHt3BM/V83vbSLST8I/AAAAAAAAjXQ/brdBeDeAHiAMd9MHtGsn__Y7y2pHRQFQgCLcB/s320/monero.png" width="320" /></a></div><br />With Monero's recent jump in price to over $10, it's the new hot coin for GPU mining. &nbsp;Monero has been around for a couple years now, so there are a couple options for mining. &nbsp;There's a closed-source miner from Claymore, and <a href="https://github.com/wolf9466/wolf-xmr-miner">the open-source miner from Wolf</a> that I used.<br /><br />I used the same <a href="http://nerdralph.blogspot.ca/2016/08/ethereum-mining-on-ubuntu-linux.html">Ubuntu/AMD rig that I set up for eth mining</a>. &nbsp;Building the miner took a couple updates compared to building ethminer. &nbsp;First, since stdatomic.h is missing from gcc 4.8.4, <a href="https://github.com/wolf9466/wolf-xmr-miner/issues/13">you need to use gcc 5 or 6</a>. &nbsp;Second, <a href="http://www.digip.org/jansson/">jansson</a> needs to be installed. &nbsp;On Ubuntu the required package is libjansson-dev. &nbsp;The default makefile uses a debug build with no optimization, so I modified the makefile to use O3 and LTO "OPT = -O3 -flto". &nbsp;I've shared the compiled binary <a href="https://drive.google.com/file/d/0BwLnDyLLT3Wka2Z4RW5lWUJIaFk/view?usp=sharing">on my google drive</a>.<br /><br />To mine with all the GPUs on your system, you'll have to edit the xmr.conf file and add to the "devices" list. &nbsp;The "index" is the card number from the output of "aticonfig --lsa". &nbsp;Although the miner supports setting GPU clock rates and fan speeds, I prefer to use my aticonfig scripts instead. &nbsp;It is also necessary to modify &nbsp;"rawintensity" and "worksize" for optimal performance. &nbsp;The xmr.conf included in the tgz file has the settings that I found work best for a R9 380 card clocked at 1050/1500. &nbsp;For a R7 370 card, I found a rawintensity setting of 640 worked best, giving about 400 hashes per second.<br /><br />Although Monero was more profitable to mine than ethereum for a few days, the difficulty increase associated with more miners has evened it out. &nbsp;Dwarfpool has a <a href="http://dwarfpool.com/xmr/calc">XMR calculator</a>&nbsp;that seems accurate. &nbsp;The pool I used was <a href="http://monerohash.com/">monerohash.com</a>, and instead of running the monero client, I created an account online using mymonero.com.<br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-87455537175215127322016-08-21T11:05:00.001-07:002016-08-21T11:05:11.460-07:00Ethereum mining on Ubuntu Linux<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-G9H9dnnERKs/V7njN94LmlI/AAAAAAAAjUg/c84dFdLojy0aLPdCIfi2EwHLdcv9lvEHwCLcB/s1600/ethminer-nr.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="402" src="https://3.bp.blogspot.com/-G9H9dnnERKs/V7njN94LmlI/AAAAAAAAjUg/c84dFdLojy0aLPdCIfi2EwHLdcv9lvEHwCLcB/s640/ethminer-nr.png" width="640" /></a></div><br />For a couple months, I've been intending to do a blog post on mining with Ubuntu. &nbsp;Now that <a href="http://nerdralph.blogspot.ca/2016/07/improving-genoils-ethminer.html">I've been able do make a static build of Genoil's ethminer</a>, that process has become much easier. &nbsp;Since I have no Nvidia GPUs, this post will only cover how to mine with AMD GPUs like the R7 and R9 series.<br /><br />The first step is to download a <a href="http://releases.ubuntu.com/14.04/">64-bit Ubuntu 14.04 desktop release</a>. &nbsp;I use the desktop distribution since it includes X11, although it is possible to use Ubuntu server and then install the X11 packages separately. &nbsp;I recommend installing Ubuntu without any GPU cards installed (use your motherboard's iGPU), in order to confirm the base system is working OK. &nbsp;Follow the <a href="http://www.ubuntu.com/download/desktop/install-ubuntu-desktop">installation instructions</a>, and at step 7, choose "Log in automatically". &nbsp;This will make it easier to have your rig start mining automatically after reboot.<br /><br />After the initial reboot, I recommend installing ssh server. &nbsp;It can be installed from the shell (terminal) with: "sudo apt-get install openssh-server -y". &nbsp;Ubuntu uses <a href="https://github.com/lathiat/avahi">mDNS</a>, so if you chose 'rig1' as the computer name during the installation, you can ssh to 'rig1.local' from other computers on your LAN.<br /><br />Shutdown the computer and install the first GPU card, and plug your monitor into the GPU card instead of the iGPU video port. &nbsp;Most motherboards will default to using the GPU card when it is installed, and if not, there should be a BIOS setup option to choose between them. If you do not even see a boot screen, try plugging the card directly into the motherboard instead of using a riser. &nbsp;Also double-check your card's <a href="http://nerdralph.blogspot.ca/2016/03/hacking-gpu-pcie-power-connections.html">PCI-e power connections</a>.<br /><br />Once you are successfully booting into the Ubuntu desktop,&nbsp;<a href="https://ubuntuforums.org/showthread.php?t=2220552&amp;p=13009774#post13009774">edit /etc/init/gpu-manager.conf</a>&nbsp;to keep gpu manager from modifying /etc/X11/xorg.conf. &nbsp;Then install the AMD fglrx video drivers: "sudo apt-get install fglrx -y". &nbsp;If the fglrx drivers installed successfully, running "sudo aticonfig --lsa" will show your installed card. &nbsp;Next, to set up your xorg.conf file, run "sudo rm /etc/X11/xorg.conf" and "sudo aticonfig --initial --adapter=all".<br /><br />After rebooting, if the computer does not boot into the X11 desktop, ssh into the computer and verify that&nbsp;/<a href="http://etc/modprobe.d/fglrx-core.conf">etc/modprobe.d/fglrx-core.conf was created</a> when the fglrx driver was installed. &nbsp;This keeps Ubuntu from loading the open-source radeon drivers, which will conflict with the proprietary fglrx drivers. &nbsp;For additional debugging, look at the /var/log/Xorg.0.log file.<br /><br />Continue with installing the rest of your cards one at a time. &nbsp;Re-initialize your xorg.conf each time, by executing "sudo rm /etc/X11/xorg.conf" and "sudo aticonfig --initial --adapter=all". &nbsp;Reboot one more time, and then exeucte, "aticonfig --odgc --adapter=all". &nbsp;This will display all the cards and their core/memory clocks. &nbsp;If you are connecting remotely via ssh, you need to run, "export DISPLAY=:0" or you will get the "X needs to be running..." error. &nbsp;You can use aticonfig to change the clock speeds on your card. &nbsp;For example, "aticonfig --od-enable --adapter=2 --odsc 820,1500" will set card #2 to 820Mhz core and 1500Mhz memory (a good speed for most R9 380 cards). &nbsp;To simplify setting clock speeds on different cards, <a href="https://gist.github.com/nerdralph/fbb03aa403b36b9d1d44e7263febd290">I created a script</a>&nbsp;which reads a list of card types and clock rates from a clocks.txt file.<br /><br />Once your cards are installed and configured, you can use <a href="https://github.com/nerdralph/ethminer-nr/tree/110/releases">my ethminer build</a>:<br /><b><span style="font-family: Courier New, Courier, monospace;">wget github.com/nerdralph/ethminer-nr/raw/110/releases/ethminer-1.1.9nr-OCL.tgz</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">tar xzf ethminer-1.1.9nr-OCL.tgz</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">cd ethminer-nr</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">./mine.sh</span></b><br /><br />Once you've confirmed that ethminer is working, you can edit the mine.sh script to use your own mining pool account. &nbsp;If you want your rig to start mining automatically on boot-up, edit your .bashrc and add "cd ethminer-nr" and "./mine.sh" to the end of the file.Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com12tag:blogger.com,1999:blog-6245413346375218188.post-35822668122225065282016-07-31T11:07:00.001-07:002016-07-31T11:07:21.550-07:00Improving Genoil's ethminer<div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-hlW_zOSJjbI/V54ur2N6HwI/AAAAAAAAjSU/Hn7yCMyt8xQa5DcMNMYRTxFwq6937y96wCLcB/s1600/Genoil.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://4.bp.blogspot.com/-hlW_zOSJjbI/V54ur2N6HwI/AAAAAAAAjSU/Hn7yCMyt8xQa5DcMNMYRTxFwq6937y96wCLcB/s320/Genoil.png" width="320" /></a></div><br />In my <a href="http://nerdralph.blogspot.ca/2016/04/more-about-mining.html">last post about mining ethereum</a>, I explained why I preferred <a href="https://github.com/Genoil/cpp-ethereum">Genoil's fork</a> of the <a href="https://github.com/ethereum/webthree-umbrella/releases">Ethereum Foundation's ethminer</a>. &nbsp;After that post, I started having stability problems with one of the newer releases of Genoil's miner. &nbsp;I suspected the problem was likely deadlocks with <a href="https://github.com/Genoil/cpp-ethereum/blob/master/libstratum/EthStratumClient.cpp#L179">mutexes that had been added to the code</a>. &nbsp;They had been added to reduce the chance of the miner submitting stale or invalid shares, but in this case the solution was worse than the problem, since there is no harm in submitting a small number of invalid shares to a pool. &nbsp;After taking some time to review the code and discuss my ideas with the author, I decided to make some improvements. &nbsp;The result is <a href="https://github.com/nerdralph/ethminer-nr/tree/110">ethminer-nr</a>.<br /><br />A description of some of the changes can be found on <a href="https://github.com/Genoil/cpp-ethereum/issues">the issues tracker for Genoil's miner</a>, since I expect most of my changes to be merged upstream. &nbsp;The first thing I did was remove the mutexes. &nbsp;This does open the possibility of a rare race condition that could cause an invalid share submit when one thread processes a share from a GPU while another thread processes a new job from the pool. &nbsp;On Linux the threads can be serialized using the <a href="http://linux.die.net/man/1/taskset">taskset</a> command to pin the process to a single CPU. &nbsp;On a multi-CPU system, use "<span style="font-family: Courier New, Courier, monospace;">taskset 1 ./ethminer ...</span>" &nbsp;to pin the process to the first CPU.<br /><br />As <a href="https://github.com/Genoil/cpp-ethereum/issues/83">described in the issues tracker</a>, I added per-GPU reporting of hash rate. &nbsp;I also reduced the stats output to accepted (A) and rejected (R), including stales, since I have never seen a pool submit fail, and only some pools will report a rejected share. &nbsp;The more compact output helps the stats still fit on a single line, even with hashrate reporting from multiple GPUs:<br /><span style="font-family: Courier New, Courier, monospace;">&nbsp; m &nbsp;13:28:46|ethminer &nbsp;15099 24326 15099 =54525Khs A807+6:R0+0</span><br /><div><br /></div><div>To help detect when a pool connection has failed, instead of trying to manage timeouts in the code, I decided to rely on the TCP stack. &nbsp;The first thing I did was enable <a href="http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html">TCP keepalives</a> on the stratum connection to the pool. &nbsp;If the pool server is alive but just didn't have any new jobs for a while, the socket connection will remain open. &nbsp;If the network connection to the pool fails, there will be no keepalive response and the socket will be closed. &nbsp;Since the default timeouts are rather long, I reduced them to make network failure detection faster:</div><div><div><span style="font-family: Courier New, Courier, monospace;"><b>sudo sysctl -w net.ipv4.tcp_keepalive_time=30</b></span></div><div><span style="font-family: Courier New, Courier, monospace;"><b>sudo sysctl -w net.ipv4.tcp_keepalive_intvl=5</b></span></div><div><span style="font-family: Courier New, Courier, monospace;"><b>sudo sysctl -w net.ipv4.tcp_keepalive_intvl=3</b></span></div></div><div><br /></div><div>I wasn't certain if packets sent to the server will reset the keepalive timer, even if there is no response (even an ACK) from the server. &nbsp;Therefore I also reduced the default TCP retransmission count to 5, so the pool connection will close after a packet is sent (i.e. share submit) 5 times without an acknowledgement.</div><div><div><b><span style="font-family: Courier New, Courier, monospace;">sudo sysctl -w net.ipv4.tcp_retries2=5</span></b></div></div><div><br /></div><div>I was also able to make <a href="https://github.com/nerdralph/ethminer-nr/blob/110/releases/ethminer-genoil-1.1.8nr.tgz">a stand-alone linux binary</a>. &nbsp;Until now the Linux builds had made extensive use of shared libraries, so the binary could not be used without first installing several shared library dependencies like boost and json. &nbsp;I had to do some of the build manually, so to make your own static linked binary you'll have to wait a few days for some updates to the <a href="http://cmake.org/">cmake</a> build scripts. &nbsp;If you want to try it now anyway, you can add "-DETH_STATIC=1" to the cmake command line.</div><div><br /></div><div>As for future improvements, since I've <a href="http://nerdralph.blogspot.ca/2016/07/diving-into-opencl-deep-end.html">started learning OpenCL</a>, I'm hoping to optimize the ethminer OpenCL kernel to improve hashing performance. &nbsp;Look for something in late August or early September.</div>Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com3tag:blogger.com,1999:blog-6245413346375218188.post-54949521483196068982016-07-17T08:38:00.002-07:002016-07-17T08:38:52.170-07:00Diving into the OpenCL deep end<div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/--1LpKmsk59k/V4fc8hv2C8I/AAAAAAAAjRQ/ZgPoMLM4rhYrB-zOMy8DKxKDF8zF0pItwCLcB/s1600/OpenCL_Logo.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/--1LpKmsk59k/V4fc8hv2C8I/AAAAAAAAjRQ/ZgPoMLM4rhYrB-zOMy8DKxKDF8zF0pItwCLcB/s1600/OpenCL_Logo.png" /></a></div><br />Programs for mining on GPUs are usually written in <a href="https://www.khronos.org/opencl/">OpenCL</a>. &nbsp;It's based on C, which I know well, so a few weeks ago I decided to try to improve some mining OpenCL code. &nbsp;My intention was to both learn OpenCL and better understand mining algorithims.<br /><br />I started with <a href="https://github.com/Genoil/cpp-ethereum/issues/68">simple changes</a> to the <a href="https://github.com/Genoil/cpp-ethereum/blob/110/libethash-cl/ethash_cl_miner_kernel.cl">OpenCL code for Genoil's ethminer</a>. &nbsp;I then spent a lot of time reading <a href="http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf">GCN architecture</a>&nbsp;and <a href="http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf">instruction set</a> documents to understand how AMD GPUs run OpenCL code. &nbsp;Since I recently started <a href="http://nerdralph.blogspot.ca/2016/07/mining-sia-coin-on-ubuntu.html">mining Sia</a>, I took a look at the <a href="https://github.com/nerdralph/gominer-nr/blob/master/kernel.go">gominer kernel code</a>, and thought I might be able to optimize the performance. &nbsp;I tested with the AMD fglrx drivers under Ubuntu 14.04 (OpenGL version string: 4.5.13399) with a r9 290 card.<br /><br />The first thing I tried was replacing the rotate code in the ror64 function to use <a href="https://www.khronos.org/registry/cl/extensions/amd/cl_amd_media_ops.txt">amd_bitalign</a>. &nbsp;The bitalign instruction (v_alignbit_b32) can do a 32-bit rotate in a single cycle, much like the <a href="http://www.davespace.co.uk/arm/introduction-to-arm/barrel-shifter.html">ARM barrel shifter</a>. &nbsp;I was surprised that the speed did not improve, which suggests the AMD OpenCL drivers are optimized to use the alignbit instruction. &nbsp;What was worse was that the kernel would calculate incorrect hash values. &nbsp;After double and triple-checking my code, I found <a href="http://www.openwall.com/lists/john-dev/2015/10/15/3">a post indicating a bug with amd_bitalign</a> when using values divisible by 8. &nbsp;I then tried amd_bytealign, and that didn't work either. &nbsp;I was able to confirm the bug when I found that a bitalign of 21 followed by 3 worked (albeit slower), while a single bitalign of 24 did not.<br /><br />It would seem there is no reason to use the amd_bitalign any more. &nbsp;Relying on the driver to optimize the code makes it portable to other platforms. &nbsp;I couldn't find any documentation from AMD saying the bitalign and other media ops are deprected, but I did verify that the pragmas make no difference in kernel:<br /><b><span style="font-family: Courier New, Courier, monospace;">#pragma OPENCL EXTENSION cl_amd_media_ops : enable</span></b><br /><div><b><span style="font-family: Courier New, Courier, monospace;">#pragma OPENCL EXTENSION cl_amd_media_ops : disable</span></b></div><div><br /></div><br />After finding <a href="https://community.amd.com/thread/158497">a post stating the rotate() function is optimized to use alignbit</a>, I tried changing the "ror64(x, y)" calls to "rotate(x, 64-y)". &nbsp;The code functioned properly but was &nbsp;actually slower. &nbsp;By using&nbsp;<a href="http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/">AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps</a>, I was able to view the assember .isa files, and could tell that the calls to rotate with 64-bit values were using v_lshlrev_b32, v_lshrrev_b64, and v_or_b32 instead of a pair of v_alignbit_b32 instructions. &nbsp;Besides using 1 additional instruction, the 64-bit shift instructions apparently <a href="https://github.com/CLRX/CLRX-mirror/wiki/GcnTimings">take 2 or even 4 times longer to execute on some platforms</a>.<br /><br />In the end, I wasn't able to improve the kernel speed. &nbsp;I think re-writing the kernel in <a href="http://gpuopen.com/amdgcn-assembly/">GCN assembler</a> is probably the best way to get the maximum hashing performance.Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-52702975783655624682016-07-11T16:27:00.001-07:002016-07-11T16:27:09.982-07:00Mining Sia coin on Ubuntu<div class="separator" style="clear: both; text-align: center;"><img border="0" height="316" src="https://3.bp.blogspot.com/-sSXlkYdNues/V4QbOKKOkII/AAAAAAAAjQk/ZsJD8ESJ7E8qV31kfw31UxZu6WyXU0JAACLcB/s320/SiaLogo.png" width="320" /></div><br /><a href="http://sia.tech/">Sia</a> is a hot crypto-currency for miners. &nbsp;Just a week ago, <a href="http://explore.sia.tech/block.html?height=58600">the sia network hashrate was 6.5 Th/s</a>, and the only way to mine was solo as there were no public pools. &nbsp;In the last three days, <a href="http://sia.nanopoool.org/">sia.nanopoool.org</a>, and <a href="http://siamining.com/">siamining.com</a>&nbsp;started up and the network hashrate grew to 14.7 Th/s, with the two pools making up 80% of the total network hashrate.<br /><br />Mining on Windows is relatively easy, with nanopool posting a binary build of <a href="https://github.com/SiaMining/gominer/tree/poolmod3">siamining's gominer fork</a>. &nbsp;For Ubuntu, you need to build it from the source. &nbsp;For that, you'll need to install go first. &nbsp;If you type 'go' in Ubuntu 14.04, you'll get the following message:<br /><b><span style="font-family: Courier New, Courier, monospace;">The program 'go' is currently not installed. You can install it by typing:</span></b><br /><b><span style="font-family: Courier New, Courier, monospace;">apt-get install gccgo-go</span></b><br /><div><b><span style="font-family: Courier New, Courier, monospace;"><br /></span></b></div>I tried the similar package 'gccgo', which turned out to be a rabbit hole. &nbsp;The version 1.4.2 referred to in the gominer readme is a version of the package 'golang'. &nbsp;Neither gccgo-go or gccgo have the latest libraries needed my gominer. &nbsp;And the most recent version of golang in the standard Ubuntu repositories is 1.3.3. &nbsp;However the Ethereum foundation publishes a 1.5.1 build of golang in <a href="https://launchpad.net/~ethereum">their ppa</a>.<br /><br />Even with the golang 1.5.1, building gominer wasn't as simple as "go get github.com/SiaMining/gominer". &nbsp;The reason is that the gominer modifications to support pooled mining are in the "poolmod3" branch, and there is no option to install directly from a branch. &nbsp;So I <a href="https://github.com/nerdralph/gominer-nr">made my own fork</a>&nbsp;of the poolmod3 branch, and added detailed install instructions for Ubuntu:<br /><pre style="background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, &quot;Liberation Mono&quot;, Menlo, Courier, monospace; font-size: 13.6px; font-stretch: normal; line-height: 1.45; margin-bottom: 16px; overflow: auto; padding: 16px; word-wrap: normal;"><code style="background: transparent; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: Consolas, &quot;Liberation Mono&quot;, Menlo, Courier, monospace; font-size: 13.6px; line-height: inherit; margin: 0px; overflow: visible; padding: 0px; word-break: normal; word-wrap: normal;">add-apt-repository -y ppa:ethereum/ethereum<br />sudo apt-get update<br />apt-get install -y git ocl-icd-libopencl1 opencl-headers golang<br />go get github.com/nerdralph/gominer-nr</code></pre>Once I got it running on a single GPU, I wanted to find out if it was worthwhile to switch my eth mining rigs to sia. &nbsp;I couldn't find a good sia mining calculator, so I pieced together some information about mining rewards and used the <a href="http://siapulse.com/page/tools">Sia Pulse calculator</a>. &nbsp;I wanted to compare a single R9 290 clocked at 1050/1125, which gets about 29Mh/s mining eth, <a href="http://karldiab.com/EthereumMiningCalculator/">earning $2.17/day</a>. &nbsp;For Sia, the R9 290 gets about 1100Mh, which if you put that into the Sia Pulse calculator along with the current difficulty of 4740Th, it will calculate daily earnings of 6015 SC/day. &nbsp;Multiplying by the 62c/1000SC shown on sia.nanopool.org will give you a total of $3.73/d, but that will be wrong. &nbsp;The Sia Pulse calculator defaults to a block reward of 300,000, but that goes down by 1 for each block. &nbsp;So at block 59,900, the block reward is 240,100. and the actual earnings would be $2.99/d.<br /><br />Since the earnings are almost 40% better than eth, I decided to switch my mining rigs from eth to sia. &nbsp;I had to adjust the overclocking settings, as sia is a compute-intensive algorithm instead of a memory-intensive algorithm like ethereum. &nbsp;After reducing the core clock of a couple cards from 1050 to 1025, the rigs were stable. &nbsp;When trying out nanopool, I was getting a lot of "ERROR fetching work;" and "Error submitting solution - Share rejected" messages. &nbsp;I think their servers may have been getting overloaded, as it worked fine when I switched to siamining.com. &nbsp;I also find siamining.com has more detailed stats, in particular % of rejected shares (all below 0.5% for me).<br /><br />I may end up switching back to eth in the near future, since a doubling in network hashare for sia will eventually mean a doubling of the difficulty, cutting the amount of sia mined in half. &nbsp;In the process I'll at least have learned a bit about golang, and I can easily switch between eth and sia when one is more profitable than the other.<br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com1tag:blogger.com,1999:blog-6245413346375218188.post-91305496210289643262016-06-03T18:14:00.002-07:002016-06-03T18:14:46.099-07:00When does 18 = 26? When buying cheap cables.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-6Htyhi0WwOY/V1IdBfd-ifI/AAAAAAAAi-Q/my354uzHKt4lInJtAoEzix1QJuJpk6-DACLcB/s1600/fake18AWG.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://1.bp.blogspot.com/-6Htyhi0WwOY/V1IdBfd-ifI/AAAAAAAAi-Q/my354uzHKt4lInJtAoEzix1QJuJpk6-DACLcB/s400/fake18AWG.jpg" width="400" /></a></div><br />I recently bought some cheap molex to PCI-e power adapters from <a href="http://www.aliexpress.com/store/1495338">a seller on AliExpress</a>. &nbsp;Although there are deals for quality goods on AliExpress, I was a bit suspicious when I ordered these given just how cheap they were. &nbsp;<a href="http://nerdralph.blogspot.ca/2016/03/hacking-gpu-pcie-power-connections.html">PCI-e power connectors</a> are supposed to be rated for 75W of power carried over 2 conductors at 12V, which means 3.1A per conductor. &nbsp;In order to avoid a large voltage drop the wires used are usually 18AWG, although 20AWG wires (with 1.6x the resistance) would be reasonably safe.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/--vGMn8tdA54/V1Ih6FPoqSI/AAAAAAAAi-g/PxKUz3YsoxU_Shk4apC6P4jG524W9B7kgCLcB/s1600/PCI-eMolex.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="173" src="https://1.bp.blogspot.com/--vGMn8tdA54/V1Ih6FPoqSI/AAAAAAAAi-g/PxKUz3YsoxU_Shk4apC6P4jG524W9B7kgCLcB/s400/PCI-eMolex.jpg" width="400" /></a></div><br />When the package arrived, I inspected the adapter cables, which were labeled 18AWG. &nbsp;Despite the label, they didn't feel like 18AWG wires, which have a conductor diameter of 1mm. &nbsp;I decided to do a destructive test on one of the adapters by cutting and stripping one of the wires. &nbsp;The conductor measured only 0.4mm in diameter, which is actually 26AWG. &nbsp;The first photo above shows a real 18AWG wire taken from an old ATX PSU next to the fake 18AWG wire from the adapter cables.<br /><br />When I opened a dispute through AliExpress, things got more amusing. &nbsp;I provided the photo, as well as an explanation that real 18AWG wire should be 1mm in diameter. &nbsp;The seller claimed "we never heard of this before", and after exchanging a couple more messages said, "you can't say it is fake just because it is thin". &nbsp;At that point I realized I was dealing with one of those "you can't fix stupid" situations.<br /><br />So what would happen if I actually tried to use the adapter cables on a video card that pulls 75W on the PCI-e power connector? &nbsp;Well you can find posts on overclocking sites about cables that melted and burst into flames. &nbsp;If you have a cheap PSU without short-circuit protection, when the insulation melts and the wires short, your power supply could be destroyed. &nbsp;And if that happend I'm sure the AliExpress seller is not going to replace your power supply. &nbsp;How much hotter the cables would get compared to genuine 18AWG cables is a function of the resistance. &nbsp;Each gauge has 1.26 times more resistance than the previous, so 20AWG has 1.26^2 = 1.59 times the resistance of 18AWG. &nbsp;The 26AWG wire used in these cheap adapter cables would have 1.26^8 or just over 6 times the resistance of 18AWG wire, and would have a temperature increase 6 times greater than 18AWG for a given level of current.<br /><br />It could make for a fun future project; create a resistive load of 75W, take an old ATX PSU, hook up the adapter cables, and see what happens. &nbsp;People do seem to like pictures and videos of things bursting into flames posted on the internet...<br /><br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-66170589391820836172016-05-26T12:18:00.001-07:002017-01-11T13:53:05.854-08:00Installing Python 3.5.1 on Linux<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-rv97BqSvIIc/V0dErL9LadI/AAAAAAAAi90/W7aLAXC-LxAMUohuoAT6iEtnWS5Sy-DDACLcB/s1600/PythonLogo65K.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-rv97BqSvIIc/V0dErL9LadI/AAAAAAAAi90/W7aLAXC-LxAMUohuoAT6iEtnWS5Sy-DDACLcB/s1600/PythonLogo65K.png" /></a></div><br />Perl has been my go-to interpreted language for over 20 years now, but in the last few years I've been learning (and liking) python. &nbsp;Python 2.7 is a standard part of of Linux distributions, and while many recent distributions include Python 3.4, <a href="https://www.python.org/downloads/release/python-351/">Python 3.5.1</a> is not so common. &nbsp;I'm working on some code that will use the new&nbsp;<a href="https://www.python.org/dev/peps/pep-0492/">async and await primitives</a>, which are new in Python 3.5. &nbsp;I've searched&nbsp;<a href="https://fedoraproject.org/wiki/EPEL">Extra Packages for Enterprise Linux</a>&nbsp;and other repositories for Python 3.5 binaries, but the latest I can find is 3.4. &nbsp;That means I have to build it from <a href="https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tar.xz">source</a>.<br /><br />While the installation process isn't very complicated, it does require installing gcc and associated build tools first. &nbsp;Since I'm installing it on a couple servers (devel and prod), I wrote a short (10-line) <a href="https://gist.github.com/nerdralph/b4a795faa4e14379e37a03cad879ad8c">install script</a> for rpm-based Linux distributions. &nbsp;Download <a href="https://gist.github.com/nerdralph/b4a795faa4e14379e37a03cad879ad8c">the script</a>, then run "sh py35.sh". &nbsp;The python3.5 binary will be installed in /usr/local/bin/.<br /><br />When installing pip packages for python3, use "pip3", while "pip" will install python2 packages. &nbsp;And speaking of pip, you may want to update it to the latest version:<br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><b>sudo /usr/local/bin/pip3 install --upgrade pip</b></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><b><br /></b></span>Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com2tag:blogger.com,1999:blog-6245413346375218188.post-52413403300465205822016-04-22T12:48:00.004-07:002016-04-22T12:48:43.945-07:00More about mining<div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-PfZfreoitjQ/VxKAKbEMP3I/AAAAAAAAi7c/nkE_l4Uo1d4FYU1_t7g-ovNWk675IzjygCKgB/s1600/ETHEREUM-LOGO_PORTRAIT_Black_small.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="252" src="https://4.bp.blogspot.com/-PfZfreoitjQ/VxKAKbEMP3I/AAAAAAAAi7c/nkE_l4Uo1d4FYU1_t7g-ovNWk675IzjygCKgB/s320/ETHEREUM-LOGO_PORTRAIT_Black_small.png" width="320" /></a></div><br />In <a href="http://nerdralph.blogspot.ca/2016/04/digging-into-ethereum-mining.html">my last post</a>, I gave a basic introduction to ethereum mining. &nbsp;Since there is not much information available about eth mining compared to bitcoin mining, and some of the information I have found is even wrong, I decided to go into more detail on eth mining.<br /><br />Comparing the bitcoin protocol to ethereum, one of the significant differences is the concept of <a href="https://blog.ethereum.org/2015/09/25/more-uncle-statistics/">uncle blocks</a>. &nbsp;When two miners find a block at almost the same time, only one of them can be the next block in the chain, and the other will be an uncle. &nbsp;They are equivalent to stale blocks in bitcoin, but unlike bitcoin where the stale blocks go unrewarded, uncle blocks are rewarded based on how "fresh" they are, with the highest reward being 4.375 eth. &nbsp;An example of this can be found in <a href="http://etherscan.io/block/1378035">block 1,378,035</a>. Each additional generation that passes (i.e. each increment of the block count) before an uncle block gets included reduces the reward by .625 eth. &nbsp;An example of an uncle that was 2 generations late getting included in the blockchain can be found in <a href="http://etherscan.io/block/1378048">block 1,378,048</a>. &nbsp;The miner including the uncle in their block gets a bonus of .15625 eth on top of the normal 5 eth block reward.<br /><br />Based on <a href="https://stats.etherchain.org/dashboard/db/uncles?theme=light">the current trend</a>, I expect the uncle rate to be in the 6-7% range over the next few months. &nbsp;With the average uncle reward being around 3.5 eth (most uncles are more than one generation old), uncles provide a bonus income to miners of about 4%. &nbsp;Since uncles do not factor into ethereum's difficulty formula, when more uncles are mined the difficulty does not increase. &nbsp;The mining calculators I've looked at don't factor in uncle rewards, so real-world returns from mining in an optimal setup should be slightly higher than the estimates of the mining calculators.<br /><br />Another thing the calculators do not factor is the .15625 eth uncle inclusion reward, but this is rather insignificant, and most pools do not share the uncle inclusion reward. &nbsp;Assuming a 6% uncle rate, the uncle inclusion reward increases mining returns by less than 0.2%. &nbsp;If your pool is down or otherwise unavailable for 3 minutes of the day, that would be a 0.21% loss in mining rewards. &nbsp;So a stable pool with good network connections is more important than a pool that shares the uncle inclusion reward. &nbsp;Transaction fees are also another source of mining revenue, but most pools do not share them, and they amount to even less than the uncle inclusion reward in any case.<br /><br />Finding a good pool for ethereum mining has been much more difficult than bitcoin, where it is pretty hard to beat <a href="http://antpool.com/">Antpool</a>. &nbsp;For optimal mining returns, you need to use stratum mode, and there are two main variations of the stratum protocol for eth mining; dwarf and coinotron. &nbsp;Coinotron's stratum protocol is directly supported by <a href="https://github.com/Genoil/cpp-ethereum">Genoil's ethminer</a>, which avoids the need to run <a href="https://github.com/Atrides/eth-proxy">eth-proxy</a> in addition to the miner. &nbsp;Coinmine.pl and miningpoolhub.com support coinotron's stratum protocol, while nanopool, f2pool, and mininpoolhub support dwarf's protocol. &nbsp;Miningpoolhub is able to support both on the same port since the <a href="http://www.json.org/">json</a> connection string is different.<br /><br />Coinmine.pl and coinotron only have servers in Europe, and half the time I've tried to go to coinotron's web site it doesn't even load after 15 seconds. &nbsp;Miningpoolhub has servers in the US, Europe, and Asia, and has had reasonable uptimes. &nbsp;As well, the admin responds adequately to issues, and speaks functional english. &nbsp;They have a status page that shows enough information to be able to confirm that your mining connection to the pool is working properly. &nbsp;I have a concern over how the pool reports rejected shares, but the impact on mining returns does not appear to be material. &nbsp;Rejected shares happens on other pools too, and since I am still investigating what is happening with rejected shares, there is not much useful information I can provide about it.<br /><br />So for now my recommended pool is ethereum.miningpoolhub.com. &nbsp; My recommended mining progam is v1.0.7 of&nbsp;<a href="https://github.com/Genoil/cpp-ethereum">Genoil's ethminer</a>, which added support for stratum connection failover where it can connect to a secondary pool server if the first goes down. &nbsp;The Ethereum Foundation is <a href="https://blog.ethereum.org/2016/03/29/an-open-source-mining-pool-bounty/">supporting the development of open-source mining pool software,</a>&nbsp;so we may see an ideal eth mining pool in the near future, and maybe even improvements to <a href="https://github.com/ethereum/webthree-umbrella/releases">the official ethminer</a>&nbsp;supporting stratum protocol.<br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com0tag:blogger.com,1999:blog-6245413346375218188.post-8691516821207723012016-04-16T19:47:00.002-07:002016-04-18T14:35:35.851-07:00Digging into ethereum mining<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-PfZfreoitjQ/VxKAKbEMP3I/AAAAAAAAi7U/kwCfmlgYD4gXHyGnkKGYBXo8Y-e5H7NsACLcB/s1600/ETHEREUM-LOGO_PORTRAIT_Black_small.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="252" src="https://3.bp.blogspot.com/-PfZfreoitjQ/VxKAKbEMP3I/AAAAAAAAi7U/kwCfmlgYD4gXHyGnkKGYBXo8Y-e5H7NsACLcB/s320/ETHEREUM-LOGO_PORTRAIT_Black_small.png" width="320" /></a></div>After bitcoin, <a href="http://ethereum.org/">ethereum</a>&nbsp;(eth) has the highest market capitalization of any cryptocurrency. &nbsp;Unlike bitcoin, there are no&nbsp;<a href="http://nerdralph.blogspot.ca/2015/12/bitcoin-mining-hardware-for-christmas.html">plug-and-play</a>&nbsp;mining options for ethereum. &nbsp;As was done in the early days of bitcoin, ethereum mining is done with GPUs (primarliy AMD) that are typically used for video gaming.<br /><br />The first ethereum mining I did was with a AMD R9 280x card using the ethereum foundation's <a href="https://github.com/ethereum/webthree-umbrella/releases/tag/v1.2.3">ethminer program</a>&nbsp;under Windows 7e/64. &nbsp;The installer advised that I should use a <a href="http://support.amd.com/en-us/download/desktop/previous?os=Windows%207%20-%2064">previous version</a> of AMD's Catalyst drivers, specifically 15.7.1. &nbsp;Although the AMD catalyst utilities show some information about the installed graphics card, I like <a href="https://www.techpowerup.com/downloads/SysInfo/GPU-Z/">GPU-z</a>&nbsp;as it provides more details. &nbsp;After setting up the software and drivers, I started mining using <a href="http://dwarfpool.com/">dwarfpool</a>&nbsp;since it was the largest ethereum mining pool.<br /><br />As an "open" pool, dwarf does not require setting up an account in advance. &nbsp;One potential problem with that is the eth wallet address used for mining does not get validated. &nbsp;I found this out because I had accidentally used a bitcoin wallet address, and dwarfpool accepted it. &nbsp;After fixing it, I emailed the admin and had the account balance transferred to my eth wallet.<br /><br />Dwarf recommends the use of their <a href="https://github.com/Atrides/eth-proxy">eth-proxy program</a>, which proxies between the get-work protocol used by ethminer, and the more efficient <a href="https://slushpool.com/help/#!/manual/stratum-protocol">stratum protocol</a> which is also supported by dwarfpool. &nbsp;Even using eth-proxy, I wasn't earning as much ethereum as I expected.<br /><br />The ethereum network is running the <a href="https://blog.ethereum.org/2016/02/29/homestead-release/">homestead release</a> as of 2016/03/14, which replaced the beta release called frontier. &nbsp;The biggest change in homestead was the reduction in the average block time from 17 seconds to 14.5 seconds, moving half way to the ultimate target of <a href="https://blog.ethereum.org/2014/07/11/toward-a-12-second-block-time/">a 12-second block time</a>. &nbsp;I wasn't sure if the difference in the results I was getting from mining was due to the calculators not having been updated from frontier or some other reason. &nbsp;After reading a comment in the <a href="http://forum.ethereum.org/categories/mining">ethereum mining forum</a>, I realized returns can be calculated with a bit of basic math.<br /><br />The block reward in ethereum is 5 eth, and with an average block generation time of 14.5 seconds, there is 86400/14.5 * 5 = 29793 eth mined per day. &nbsp;Ethereum blockchain statistics sites like <a href="http://etherscan.io/">etherscan.io</a>&nbsp;report the network hash rate which is currently around 2,000 gigahashes per second. &nbsp;A R9 280x card does about 20 megahashes per second, or 1/100,000th of the network hashrate, and therefore should earn about 29,793/100,000 or 0.298 eth per day. &nbsp;The manual calculations are in line with <a href="http://karldiab.com/EthereumMiningCalculator/">my favorite eth mining calculator</a> (although it can be a bit slow loading at times). &nbsp;Due to the probabilistic nature of mining, returns will vary by 5-10% up or down each day, but in less than a week you can tell if your mining is working optimally.<br /><br />Using the regular ethminer, or even using eth-proxy, I was unable to get pool returns in line with the calculations. &nbsp;However using <a href="https://github.com/Genoil/cpp-ethereum">Genoil's ethminer</a>, which natively supports the stratum protocol, I have been able to get the expected earnings from <a href="http://ethereum.miningpoolhub.com/">ethereum.miningpoolhub.com</a>. &nbsp;Dwarf uses an unsupported variation of the stratum protocol, so I could not use Genoil's ethminer with it. &nbsp;I briefly tried nanopool, but had periods where the pool stopped sending work for several minutes, even though the connection to the pool was still live.<br /><br />Both the official ethminer and Genoil's version were built using MS Visual C++, so if your system doesn't already have it installed, you'll need <a href="https://www.microsoft.com/en-us/download/details.aspx?id=48145">MS Visual Studio redistributable files</a>. &nbsp;Getting the right version of the AMD Windows catalyst drivers for ethminer to work and work well can be problematic. &nbsp;Version 15.12 works at almost the same speed as 15.7.1, however the crimson version 16 drivers perform about 20% slower.<br /><br />For me, as a Linux user for over 20 years, the easiest setup for eth mining was with Linux/Ubuntu. &nbsp;I plan to do another post about mining on Ubuntu.<br /><br />Ralph Doncasterhttp://www.blogger.com/profile/00037504544742962130noreply@blogger.com2