The Doctor Who episode titled "42" introduced its viewers to the concept of Happy Numbers. Briefly, you take a positive integer and separate and square each of its digits, then sum the squares to get a new number. Call this operation a Happy Sum. Repeat the happy sum until you either reach 1 or reach the repeating sequence 4, 16, 37, 58, 89, 145, 42, 20, 4. If a number reaches 1, it's a "Happy Number." If it doesn't, it's not a happy number.

An open question is what percentage of all numbers are happy numbers. Thus far, percentages have been calculated and published up to 10^122. That's a phenomenally large number. How was the number of happy numbers from 1 to 10^122 calculated? By realizing that the order of the digits in a number doesn't matter. If 129 is a happy number, so it 192, 219, 291, 912, and 921. Likewise, so is any variation that only inserts zeros, such as 1029, 90210, etc. With 122 digits, there are only 23,640,810,986,000 unique selections of digits. Each of these combinations was tested to determine if it's a happy number. If so, the number of ways to arrange those digits into a number was added to the count of happy numbers with 122 digits. Computing the number of happy numbers at N digits this way is O(Log(N)^9).

But, there is a much faster way to count happy numbers. It relies on two realizations:

1) If 86 is a happy number, then every number whose happy sum is 86 is a happy number. 129 is a happy number it's happy sum 86 and 86 is a happy number. Likewise, 911111, 555311, and 66321 are happy numbers, because their happy sums are all 86.

2) Appending a single digit to an number increases its happy sum by the square of that digit.

With these, we can now write a formula for the number of numbers with N digits which have a happy sum of X:

For N digits, we only need to calculate H(N,X) up to X = 81N. Each computation of H(N,X) takes only 10 operations - collecting and summing the 10 values for H(N-1,X-i^2). With the counts known, the final operation is to add all the counts where X is a happy number.

With this approach, the number of operations to compute the number of 122-digit happy numbers is reduced from 23,640,810,986,000 to 6,077,430. Generally, this approach counts the number of happy numbers at N digits in O(Log(N)^2), instead of O(Log(N)^9).

I have used this approach to calculate the number (and percentage) of happy numbers up to 6,900 digits (that's just where my PC ran out of memory). I expect that others who have more time to work on this problem, and access to more computing power, will take this approach and extend the calculations much, much farther.

The interesting thing is that the percentage of happy numbers doesn't seem to converge. It oscillates between the extremes of 11.38% and 18.58%. The rate of oscillation seems to decrease as the digits increase, but with an infinite number of digits to add, the percentage can oscillate ever more slowly without ever converging.

The data is too large to include in this post, but here are a few sample values.

Number of happy numbers at the given number of digits:N,Number of happy numbers122,15544165655506391433684532303507685962217819565438704589001871991655459268216971175017952179595699881844518065420199858912123,154485360845377664560973323837187366361428123300901954765448516303896594048403580319330850652889429925442411460554271489982124,1534847004045147685904927341344183409044530364026064204411943494566662876130413227072620036419933782270174313123324336832815125,15244593711682632352497273136169026311730421692813607912467354763254777757599100545721691881341183226700203445461184893053338126,151374495542428661767707601614894539344499845653628322250474866334934846694486023804299925710916428070638010332077829610892465127,1502752508393542025226048312991041934169287778778327571436271037047289517543589927010953832993410475450328077812076578706895739128,14915285568001444315457587881830205823412383736697062777993396206355786330352327469265675663937406269551831517509536418065836791129,148012072305605304157614902430948585259906458022371749912672972431288364833015956791011857076086142539333135797466519836183925798130,1468580782316250337097725668112987220223784374713025106971398378487880498734546788259857859168494040144154444049065807068676864909131,14569671860699052981557279925595393102599321310862050393639883547348874619528564167878799007539152969578465503205800690426113641476132,144534684261826321239137350787262313306862079110647699168620243006088946027279051575588905818163761791765900039382791843732221218102133,1433796014323529559268775155632772971373796122766235785276639719712597897783191701911675173017160388045798163980092693035013979682841134,14224002641950045836974539646971579126868586450173722737778296290358352290662069548763690533185070032197451841899323923036643125922965135,141125345482053742668626248505255315302105312713800325930819206905813704490018245885794088855898951456701964831557621819140891798203735136,1400455866481458809270529396034094412472845640352080047932836612627695718141460347471225051377434407589772710973226035303774903364137463137,13901115553303493150794782523701198143468240912504737084347126306894801460562206603412464619833751403076095231053435368480375845391177310138,138032511480024972721340769531170892635334354516522682562981589991367517023542584138672051925966724428168970188800873521432766362565286198139,1371198033093892896205129640484161632375581004512112567725099620321813098843721749620344121861844961470462663308430382578703704619628239660140,13628248567244149265788686217497402831643146805152576812114204233973776959018793420100552113121495694589308835344933408582266132782342590189141,135528937536641071964109787117144605026447788491359433699160727225955326092572352467011518353187028474566763845239038048169511933447465000977142,1348661037876474618617414502654775482756497710663505944417272144567897085830957827297954417755577580155248816988453596796593311811464934922678143,13429917946185159928323213612288610761800534156351041679052899039090052690368074394961399315955785682518527480907191838790069467229604336229219144,133831353859975545767709060607659446530266204591279523001086825384741866117594880204303954290362064157213038919769426907073072529837172347657077145,1334636288766538416868281373779332289665753030394917891206686582767377496491268344799141394237011685821199870719659782381410298845586456343014232146,13319472244281350240314823692626714628493897884524267058262013387718947212357499688741611929648126091977862094021458113238450455491894788364483772147,133021064141570959195385984627379052988536854188105221055775355159034970883343927052111448159033233664513762978873189162998669399758243317530831833148,1329368859476717026028217919998016166268271659316887885144696698215301019151368050598289725738293855474060611120099628509284759287029846631184465633149,13293440678374106345780938464345329773748484807555176459057623031849245793005164060149371982315807938833666152855601192817046537663648570423332898649150,133004105322798589095229726818914132326479332066439924016567579436794365881550142133009760309643812973232815584618752212033808970768433004588287088993

Percentage of happy numbers at the given number of digits:N,Percentage of happy numbers100,15.21944023%200,15.12995138%300,14.63159060%400,18.56506223%500,16.90527241%600,15.18765928%700,14.42772706%800,17.17536692%900,16.62088861%1000,14.67705935%1100,15.27152393%1200,14.41333187%1300,13.55160038%1400,14.24285858%1500,16.54517766%1600,16.67110306%1700,13.46856922%1800,15.54233525%1900,15.91286476%2000,14.24458227%2100,11.76708033%2200,13.27517060%2300,12.24818225%2400,11.72111717%2500,15.18004135%2600,15.01885103%2700,13.15618170%2800,14.25773404%2900,14.65954304%3000,12.52142474%3100,12.03006897%3200,12.62685213%3300,12.17304447%3400,12.56740858%3500,15.62492461%3600,17.22054088%3700,15.54455857%3800,15.38215233%3900,16.29814112%4000,15.64229625%4100,14.01423502%4200,15.20124847%4300,16.43395210%4400,15.58002249%4500,15.43396975%4600,16.48597547%4700,15.82383788%4800,14.87475039%4900,14.95818778%5000,14.53188325%5100,14.22129580%5200,13.99971056%5300,14.11526813%5400,13.74381805%5500,13.54871710%5600,14.12562143%5700,15.29334245%5800,15.05585464%5900,14.46388302%6000,14.11902622%6100,14.34864916%6200,14.81341784%6300,15.10462711%6400,14.49529915%6500,14.17316293%6600,14.48525284%6700,13.99989288%6800,13.50063539%6900,13.88878029%

I assume you ran out of memory at N~6900 because you had to tabulate O(N) numbers of up to O(N) digits, to get your exact counts H(N,X). You can get substantial memory (and time) savings, if all you want is to see roughly how the fraction of happy numbers behaves with N, by using floating-point numbers instead. (To avoid machine-precision floating-point overflow, the values you save would then instead be these fractional counts 10-NH(N,X).) I'm pretty sure this doesn't cause significant roundoff errors until at least N ~ 1/epsilon.

With this approximation I've gotten estimates up to about N=200000. Here's a graph of this happy fraction h(N):

Fraction of happy numbers up to 10^N, vs. N.

linear.png (10.38 KiB) Viewed 1844 times

(I can upload code or raw data if there's interest.)

In order to count the happy N-digit numbers, your algorithm requires that you explicitly compute the happy sums H(n,X) for each n=1,...,N. It's maybe worth noting that there's an algorithm that doesn't do this. The idea is that computation of H(n+1,X) given H(n,X) is just a convolution (represented by *),H(n+1) = H(n) * K,writing H(n) = [H(n,0), H(n,1), H(n,2), ...] and where the kernel K consists of 10 unit spikes, at the squared digits,K(n) = 1, n = 0, 1, 4, 9, ..., 81 ;K(n) = 0, otherwise.Convolutions turn into multiplications in the Fourier domain, soFFT[H(n)] = FFT[H(0)] FFT[K]nand since H(0) is just a delta function, its FFT is 1, andH(N) = IFFT[ FFT[K]N ] .For large N I think this should be faster than the straightforward implementation of the convolution, requring two O(N log(N)) FFTs and O(N) Nth powers instead of O(N2) additions. I'm not sure how the accuracy compares with the straightforward approach, though.

--

I have a vague heuristic explanation for the size and frequency of the oscillations. The first thing to notice is that almost all of the contribution to the count of happy numbers at N digits comes from a set of O(sqrt(N)) happy sums, which is basically just the central limit theorem. From the multinomial distribution of digit counts you can work out the mean (obviously m=28.5N; 28.5=(1+4+...+81)/10) and variance (v=721.05N) of the happy sum for a number randomly sampled from all numbers less than 10N. So for the N-digit numbers, most of the contribution to the count of happy numbers comes from the happy numbers within about sqrt(v)=sqrt(721.05N) of m=28.5N.

So the happy fraction of N-digit numbers is found, roughly, by basically counting the fraction of a set of ~sqrt(v) numbers (centered around m) which are happy. Now if we pretend that a number is happy at random with some probability p, mostly independent of the happiness of its neighbors, then this count is roughly binomial, with probability p and ~sqrt(v) trials; so it has variance ~p(1-p)sqrt(v) = O(sqrt(N)), relative variance ~(1-p)/(p*sqrt(v)) = O(N-1/2) and so relative standard deviation O(N-1/4). So the amplitude of the fluctuations in the happy fraction should be at most roughly O(N-1/4), a pretty slow convergence. (I think it's even slower; for extremely large N you can repeat the same argument on the distribution of the happy sums to find an even narrower distribution. There are also a couple of questionable assumptions here, so I'm not too sure about this part.)

Because the happy fraction for the (N+1)-digit numbers, for large N, is sampling mostly the same ~sqrt(v) happy sums as for the N-digit numbers, there is a high correlation between these two happy fractions. You have to move to N+O(sqrt(N)) to make the overlap between the peaks of the happy-sum distribution small; so the period of these fluctuations should go as roughly O(sqrt(N)). Here's the same graph as before, but now scaling the horizontal axis as sqrt(N) (I still labeled the ticks with N though); now the horizontal scale of the fluctuations is approximately constant across the graph:

In this graph it looks like the amplitude of the fluctuations is decreasing with N. Subtracting something close to the middle and rescaling by N1/4 as argued above (i.e., the vertical axis is now (h(N)-0.14)*N1/4) gives this:

Now if we pretend that a number is happy at random with some probability p, mostly independent of the happiness of its neighbors

As you already noted, this is just an approximation, and p itself is fluctuating. p for N-digit numbers influences p' for numbers with [imath]\frac{10^N}{285}[/imath] to [imath]\frac{10^N}{28.5}[/imath] digits - as those p(N) are correlated, the fluctuations are even more spread out. No way to see them in existing data. 2*10^5-digit-numbers have relevant happy sums of ~28.5*2*10^5 or ~6 million, and the happy fraction is quite constant for numbers up to 10^3, 10^4, 10^5, 10^6 and 10^7.While the short-term-oscillations go down with [imath]O(N^{-1/4})[/imath], I don't know if this is true for those long-term oscillations which are induced by short-term oscillations.