collatz fusion

❗ ⭐ ❤ 😎 😀 hi all. it was a very lively week last week it would seem "all cylinders were firing". the mini adventure crosscut other blogs, reddit/ social media, multiple stackexchange sites and chat rooms, mainstream media, top world mathematicians eg longtime favorite heroes eg Tao/ Gowers, and a world class award/ prize venue ie Heidelberg Laureate forum— (eg what (personal?) connection the skeptic naysayers ask?— yes! eg did get substantial hits from their blog referrals!). it is a relatively rare event for even several of those “different spheres” to intersect.

(looking on the bright side!) the Atiyah announcement ties in with some key interests/ themes of this blog covered in the past such as attacking top open problems, (cyber-) collaboration/ peer review, scientific psychology/ culture, tracking the award/ prize-winning leaders of the field, tracking current leading edge research/ controversy which sometimes also leads to real breakthroughs, empirical/ computational/ coding approaches to math theory research, physics/ math research connections/ overlap, number theory, etc., did have some real fun! it seemed to have paid off to jump in the fray with a big analysis that got many hits and a lot of comment feedback, the former uncommon and the latter a rarity for this blog. emotions ran high! the story was not the happy/ breakthru/ historic ending hoped for but it was nevertheless quite the story, and definitely still historic!

as stated in chat, do think instead of a “dud”, alas, a “Big Kahuna™ (Hooked)” somewhat on the level of Perelman or Wiles proof is on the intermediate horizon and will be seen in our lifetime. P vs NP, Riemann, Collatz, not sure which, they seem to come along about once per generation; anyway from this episode looks like all the troops are lined up and ready to pounce so to speak. (anticipating objections…)

some might laugh at putting Collatz on that list, but it really is that important to me at least, and suspect it could rise in wider community attention/ importance with an ingenious solution, esp if its a potentially broad new technique that could seemingly have applications to other problems, say algorithmic/ computational/ deep learning/ auto thm proving aspects— ie exactly what is long sought/ advocated/ promoted here, and have believed for decades something like that is lurking in the deep so to speak. for those students of history, there is no insurance that a problem thought significant will stay that way/ aka “stand the test of time” but likewise, sometimes seemingly inconsequential problems rise in importance. theres an element of fashion to science/ even math, but in a deeper sense… maybe more like uncovering hidden gold veins, possibly bulging deeper…

meantime have also been banging on some interesting angles with the collatz research, and there was some cross pollination as commenters reacted to my referencing Tao-Polymaths new “number crunching” Riemann attack. from long observation it seems some (minority of) mathematicans are “triggered” by CS-oriented approaches to math problems, and they have my full sympathies, but some are also not very conversant with or tend to pushback against long history of world class breakthrus in the area, covered in some other blogs here/ many writings elsewhere. everyone is entitled to their own opinions, but not their own facts™…

as promised/ foreshadowed/ alluded last month, did want to share/ unveil/ unleash my shiny new analogy/ “elevator pitch”, now established as a nearly monthly tradition around here. have been tracking physics research/ breakthrus in this blog since its inception. one lively area is fusion research. its very broad to try to look into, but basically there are new directions/ developments being pursued worldwide in this area and one hears of some advances. in particular “stellarators” are gaining some viability. some scattered refs in the blog, am collecting links for a new one.

in fusion research the fiendishy difficult problem is trying to control a levitating extremely-high-energy plasma using magnetic fields. in this way there is some similarity to atom trapping technology that advanced rather extraordinarily in our lifetimes, in the last few decades. however atom trapping, as scifi/ hitech as it is, seems to be relatively “easy” compared to fusion control.

in fusion research in contrast the plasma is highly unstable, much more than an atom is, the atom despite its internal energy fluctations is nearly stable in comparison. its looking like extremely sophisticated control technology is going to be required in stellarator fusion. it looks like it will require extremely fast computations to devise a control system that reacts and counteracts the very unstable/ dynamic “modulations” in the plasma. one is reminded of the classic Lorenz attractor problem which again is highly stable in comparison. lately there are new insights and maybe something like the worlds most sophisticated/ fast responding AI algorithms responding to complex measurements within milliseconds, and some very advanced theory may be invoked. another emerging beautiful combination/ fusion of physics + TCS… stay tuned more later…

anyway my new analogy with collatz is as follows: collatz trajectories are sort of wild, unstable, and even “fractal” like in the sense of a plasma. to solve the problem, one has to somehow “trap the plasma”. the plasma escapes different control methods, this is the sense in which some parameters measuring it become unbounded. others somehow bound it even as the plasma gets larger/ more energetic. the plasma is chaotic yet somehow there may be an (extraordinary, to say the least) orderly control method that converts disorder into order (down-regulates the entropy so to speak). these ideas were recently carried out in the sense of the nonmono run lengths of the right side of glides apparently looking bounded compared to left sides (despite “push-up” optimization applied on both, ie cf bitwise8 experiment) and there are many other experiments with this theme eg graphs where some lines/ measurements trend up with increasing trajectory lengths as others flatline/ plateau.

awhile back was talking to my construction-worker neighbor and told him about my experience in the dotcom era and trying to find a company that would lift off like a rocket and take my own stock option equity with it. he said that sounds like trying to capture lightning in a bottle.[x] exactly! wow, that old expression which maybe predates both, not coincidentally a noted favorite of americans!, also nearly perfectly captures/ summarizes what fusion research is aimed at, and also in a sense this collatz research…

a new finding: it looks like the lsb 1-runs of the ‘w’ scheme are an artifact of that generation process. it looks like long disordered trajectories exist with small lsb 1-runs. it now looks like the ‘w’ scheme is tending to maximize the climb slope and the lsb 1-runs seem to be connected with that. the lsb 1-runs were one of the few solid features of the ‘w’ climbs and throwing those away (so to speak) is going to be a bit painful, but one does not attack the hardest open problems in mathematics expecting a smooth ride (unless young and foolish, lol!). now on the hunt for some other exploitable feature. the disordered trajectories just got even a bit )( more disorderly.

as mentioned last blog post, this blog now has some intermittent “collatz/ number crunching” anonymous trolls and its not clear to me how many there are (“they” eschew even pseudonymity at this point), they seem to have been picked up off of reddit (similar comments made there long ago…), and were rather heavily triggered by the Atiyah post (ok, there might have been some outright baiting going on there lol). have been doing my best to ignore them for nearly a year or more, they repeat the same ½-informed/ superficial/ fatuous objections/ near-cliches (as if not-rigorous/ casual graph labelling or choice of programming language will surely utterly invalidate my approach and doom it to abysmal failure lol!), but am now taking a little bit different tact of some direct confrontation, its an(other) experiment in progress. any anti-troll hero volunteers out there? could sure use some help! and hey trolls (playing with fire™! be careful what you wish for™), now dare you to p*** on the stellarator fusion control project! cant at moment imagine how you could pull that one off but you always seem to find a way in your sheer retrograde anti-ingenuity! 😛

my open invitation/ proposition/ offer to them is now written up/ on the table and can cite it handily in response, so far unsurprisingly no takers, (as english vernacular goes “put up or shut up™!”), but around here even wrt negative human psychology the general motto/ philosophy is never say never™. (again always trying to say something constructive, energetic blog trolls are probably even harder to “control” or “bottle” than stellarator plasma or collatz/ Riemann proofs, lol!) testing the truth/ viability of the saying keep your friends close and your enemies closer™…

⭐ ⭐ ⭐

😎 😮 so, now firing up the collatz stellarator control experiment… this builds on the prior code to look at minimizing ‘a01l, a01r’, the average bit run lengths climb/ descent, and ‘mx01l, mx01r’ the max bit run lengths (right side scale), using a 2-pass run technique called with ARGV[0] 0 / 1. in the 1st graph the algorithm effectively pushes/ holds down the very flatlined averages (and somewhat counterintuitively the “right” descent max lengths green were slightly larger than the “left” climbs red, the opposite of the recent guiding conjecture relating to lsb 1-runs), in the 2nd it cant push/ hold down the maximums at about equal levels. in both cases the trajectory length metrics increased.

these results align/ mesh with earlier ideas showing that limiting bit run lengths also seems to limit trajectory lengths. this has a bunch of prior commented code for stuff like lsb runs and nonmono run length not cleaned/ ripped out, in that way its a sort of toolkit/ API for multiple experimentation. theres a slight adjustment to allow a optimization parameter to be specified with a trailing ‘-‘ to indicate minimization instead of maximization, replacing the prior negation hack.

does this mean anything? maybe just some squiggles on the graph, or maybe it seems to relate to the scale invariant fractal properties of the problem, and could be “exploitable”.

this is some quick code to re/ analyze the output of bitwise8 from last month, the 10th run that found a very low climb lsb 1 run around 5 while optimizing to push it down and push up the descent lsb 1 runs. maximums magenta, blue are on right side scale. it finds basically identical statistics of climbing 0/1-runs maximums and averages; something is definitely going on here… is it just that larger iterates in the trajectories have the bias relative to their bit size no matter where they are in trajectories or is there some deeper property at play? although it does look like it might be plateauing 2nd half of the graph.

(10/6) more curse of undifferentiability looking/ poking at these runs. but do have some ideas. the prior runs seems to show that max bit runs must increase (very gradually!) for longer glides/ trajectories. the other quite striking aspect is the nearly constant bit run average lengths ~2, have encountered that before and heres another remarkable context.

have been musing about the regular language/ transducer constructions that go way back, sometimes alluded in this blog. they had some very powerful aspects. it is possible to formulate (infinite) languages like “all integers with bit runs less than c” where ‘c’ is a constant. then one can analyze their properties wrt multiple collatz transducer compositions. for example if those languages all lead to smaller iterates after some constant # of collatz transducer compositions, think this should be provable. but am suspecting that maybe there isnt a finite # of collatz transducer compositions that can lead to that as iterate sizes get indefinitely larger, the increasing max bit runs (apparently) required for larger trajectories seems to point to that.

the 1-lsb-run x-y trap idea didnt pan out as recently pointed out, but am now thinking about something like 1-run x-y traps; they are nearly already demonstrated at this point with some prior experiments. sketch of dynamic: all iterates must converge toward iterates with “smaller” bit runs (possibly also the average lengths converging to ~2). once bounded by some smaller bit run maximum ‘x’, those trajectories cannot continue to climb beyond some limit of iterations ‘y’. most of these ideas are testable empirically using the prior constructed tools. but need to figure out the effective ways of formulating the experiments.

(10/6) starting with the theme of analyzing the longest bit runs and its relation to the problem/ climb/ glide/ trajectory lengths, this effect was noticed earlier, the longest 1-bit runs in the climb tend to be nearer to glide maxes. this somewhat quick-riff code finds the longest bit runs and determines their fractional location in the climb, and plots it vertical axis in scatterpoint diagram with color the max length. it also does a 20-bin histogram and plots this on the x2, y2 axes overplot, top/ right scales, green line. while noisy, there seems some right-heavier/ higher trend apparent esp in 1st. 2nd has large lower-region spike. 3rd has a lot of clustering in the points apparent, guessing maybe due to a larger tendency to reuse/ extend prefixes (points are plotted in order of generation), and a large down at the right end. other trends might be apparent by sorting the glides by other natural metrics like glide length etc. analysis is over the last 3 datasets named in the title. as long seen, the gnuplot formatting misinterprets the underlines in filenames as subscript formatting, need to figure out how to fix that sometime.

fired up the new bright+shiny HD projector and family watched jurassic world with big fat sound system & hi watt woofer. boom!

then hooked up laptop and goofed around late into the nite, burning a little midnight oil. during 1 interlude showed the resident STEM curious teenager some excellent high-view youtube fusion videos off the internet by linustech + seeker & he got a kick out of em! cool stuff! (esp laughed at linus spitting out the seawater scene, lol!) as stated getting urge to blog on this topic soon.

reddit was “on/ off” about Mochizuki again last nite further overstimulating my neurons/ nearly last nerve, more later.

woke up early next morning as writing this. honestly, a bit overstimulated at the moment (lately spking of mental health, aka manic side!), semi related to following… 😮

a commenter had some news about a trajectory calculation & responded to him within 10 minutes below & we had some further exchange.

topping it off, at some pt, dont remember when, accidentally set phone morning alarm for early workday schedule on weekend! in a trance! @#%&

holy cow! the commenter exchange stands out. thats an exceedingly rare moment. a few summers ago managed to get some outside engagement with this topic on codegolf stackexchange but realized early on that that whole direction was like pushing boulders up a hill (this from someone who has worked on collatz for decades). yikes.

the commenter seems to have (hints of) a split personality in the jekyll hyde sense, sometimes “KT kind troll” and sometimes just “CT collatz troll”. fervently hope/ wish/ pray that K/CT is not the same as any older very acid-hostile commenters who have a tendency to write very bitter harshly stinging ad hominem sentiments mixed in with some real technical observations. lets see if the currently temporary/ fragile CT → KT trap holds any further/ longer! “(no) hard feelings?” anyway the motto around here is to err is human, to forgive is divine… but hope-wish-pray the new affinity/ trust is not betrayed with cyber backstabbing… 😛 😡 👿 😈 😥

ok, did a quick skim of his python code, dont have big analysis yet, but building somewhat off KTs brief comments, rephrasing in language devised here it looks like a bitwise strategy to find a long glide but maybe related to ‘w’ or ‘r’ metrics staying low even at the same time as extending the glide. it looks like KT maybe constructed the example by something like a “hand-bitwise” method (C/KT remains coy/ “cagey” so far here alas, teetering on the C←→KT boundary one might say, reminds me of Blue the star velociraptor of Jurassic World…). 😛

ofc have been aware of such approaches/ methods/ techniques for a few years now and they are long documented in older blogs, have many variants. would like to explain all this to KT at length as offered but KT is still somewhat shy/ evasive and avoiding SE chat. dont know if KT read “the chat offer/ proposition” yet, should have asked but was distracted by all the other angles.

⭐ ⭐ ⭐

anyway 2 possible misunderstandings to clarify: (1)KT posted an example(lol still cant totally imagine am writing those words now) with a long 0-bit run and am lately looking more at 1-bit runs. (2) also looking closer at his code/ comments suspect KT doesnt yet realize that my recent conjecture is talking about bit runs over the entire climb not just in the initial iterate. this is obvious from anyone following the dialogue but ofc drop-in guests here dont have that luxury. (avoiding mochizuki-style arrogance over that scenario, lol, have recently posted on/ am continuing to track that remarkable scandal in neighboring number theory research…)

but hey, am not done with KT yet, this is an easy exercise around here, graphing his example using the grid code. so for 356531718818319033628912255003, lets look at glide climb/ descent and analyze it “together”… voila!

comment: KT has an example with a large 0-triangle and 0-runs. but the example almost trivially adheres to the current conjecture: as long observed here the 0/1-runs “dissipate,” ie decrease as general trend, in this case very close to the top of the glide, at which point there is almost no remaining glide. so anyway lets review the current conjecture and spell it out more. notice the similarity to the x-y trap idea. it also ties in with old established ideas about density, more recent ones about entropy, observations on max 0/1 runs (incl vs length bounds), etc, ie in 1 word a (grand?) synthesis

all glides dissipate or “crunch” the 0/1-runs during climbs. this eventually leads to a “crunch region”. also similar to “mixing”

the “crunch region” can be “large/ long” depending on iterate bit widths but it is bounded. no further climb is possible after “some span” of crunch region.

in retrospect, some of this is obscured by recent prior approaches. they find larger and larger crunch regions (larger iterates and longer trajectory sequences) but thats a sort of “expanding cross section” that doesnt nec interfere with the basic hypothesis, maybe not easy to picture but it apparently still meshes with it.

⭐ ⭐ ⭐

⭐ 💡 ❗ ❤ 😀 😎 😮 more good news to report, to quote andy grammer (speaking of, as 1st cfd by resident teenager around here), good time to be alive right now™…

⭐ ⭐ ⭐ THIS STUFF LOOKS BREAKTHRU…! ⭐ ⭐ ⭐

as promised, here is the awesome code worked on yesterday, ran a bit with sterling/ desired/ hypothesis-validating-confirming results, and built some nice/ pretty analysis today. this took the old workhorse mix26 code and made some helpful adjustments.

the basic modification is to bound ‘mx1’ the 1-bit-run max length over climbs by a constant here 10 while attempting to increase ‘cm’ the climb width.

the output section is modified to detect/ exclude duplicate seeds long previously part of the analysis code.

previously the frontier size was typically constant but this code has a slight modification to increase frontier size if the frontier set seems to be getting low in size, here less than 50 points. ie the frontier increases dynamically as the optimization hardness increases.

wanted to do this also for awhile, natural but not trivial improvement, output multiple run tracking graphs for the individual runs instead of repeatedly overwriting a single one.

theres followup analysis code built into the single file.

here are 3 out of the 10 graphs. out of 3k iterations with 10 runs there are 60k points explored and about ~½ are unique at 29358 total. these are consolidated/ sorted by iterate bit width/ sampled at 5k and graphed in the 4th graph.

in all the 10 runs length related measurements ‘cm’ and ‘cg’ hit limits/ walls. there were some notable “coming-out-of-nowhere” spikes near the end for ‘cm’ (local) climb length red and the bounding is most apparent there, but ‘cg’ glide length blue shows clear bounding although more spiky/ noisy. the ‘cm’ plateauing/ flatlining precedes the ‘cg’ stabilizing, ie former tending to signal/ reinforce/ confirm the latter. ‘w’ magenta the widthdiff between max glide bit width and starting bit width is bounded. also quite significantly the ‘r’ measurement lightblue (right scale) tends to decrease slightly as bit widths increase, this is exactly what is expected/ required (aka sufficient + necessary) for a constant bound on eg ‘cm’ climb width. in the 4th graph the cumulative/ consolidated results are even more striking. ‘nl’ starting bit width red increases but length-measuring statistics flatline esp ‘cm’. ‘r’ in yellow declines again exactly as needed. (‘t’ red dotted line is the iteration the iterate was found on scaled down by factor of 10 to fit on the graph. also it would be better to use same graph colors for same variables but thats a minor hassle right now.)

😳 “what a long strange trip its been™”… 2020 hindsight all this meshes with prior ideas and could have possibly been found “ages ago,” and prior angles come very close to this finding (esp earlier/ recent ideas of looking at max trajectory lengths findable for fixed iterate and 1-run max sizes), but trying to attack the problem in exactly the right angle takes a lot of analysis. again the needle in haystack aspects of the problem predominate, here with attack angles. as just sketched out, these ideas look entirely provable with the regular language construction techniques, ie for each constant ‘c’ bit run length maximums over all (infinite!) integers. something like a core “induction axis” or “invariant” has finally been identified. it looks very solid, like the shining gold needle has finally been found. the stellarator trap is holding steady.

other way of looking at it from prior/ physics contexts: apparently the 1-run lengths are a hidden statistical-oriented counter and also a rough measure of the “potential energy” of the climb.

ie 2nd part of guiding hypothesis looks confirmed, 1st half is already largely secured also. future directions: it makes sense to throw the more powerful (?) hybrid optimizer at the problem from same angle & see what happens but am very confident right now it will only uphold/ further solidify the results… think multiple runs of the bitwise optimizer probably behave very similarly to the hybrid optimizer. but always anticipating glitches, there is some minor risk/ danger that limited prefixes/ (un)diversity (something of the major weakness verging on achilles heel of the bitwise approach) could be leading to some kind of bias. ofc the scariest bias of all that can rear its head— even be amplified— with highly trained/ professional scientists… confirmation bias… 😮

(later) KT shows up again with another iterate to test/ explain and continues the game of “break the hypothesis” now involving more than 1 participant and all that extra energy that entails. so how is he getting these tricky near-counterexamples? seems like it would have to be some kind of search algorithm but hes not telling for now. Mr Laconic Mystery. the example does seem to challenge the current ideas & was presumably customized for that.

looking at last graph #4 it is not easy to explain his counterexample. it doesnt seem to fit the diagram because it indeed does have a long glide and seem to be fully a “crunch region”. he notes the near ~½ density which btw is another strong indicator of a crunch region, but not necessary (recall earlier discussion of possible divergence between entropy/ density, it appears ~½ entropy is more reliably indicative of the crunch region). quick analysis shows global climb length ‘cm1’ 128 and glide length ‘b’ 178 of the example. how does that fit into the graph? the glide length ‘cg’ magenta in the graph never exceeds ~100. but looking at the example closely, theres a “large(r/ish)” 1-run near the end of the climb at 13 bit width.

now heres where more of the story comes into play. some of the earlier analysis did reveal this but it wasnt graphed. basically very small uptrend in the ‘mx1’ max 1-run length leads to much larger trajectory (climb/ glide) sizes. this approximate factor could be quantified/ calculated but for now there are graphs. recall in graph #4 ‘mx1’ max is 10. just increasing it to 13 causes a somewhat “dramatic”/ substantial shift in trajectory sizes.

the earlier data for bitwise9 is still available/ saved and can regraph it to reveal more of this story, following 2 graphs which are nearly the same even though the 1st optimized for ‘a01’ and the 2nd ‘mx01’, apparently these two variables tend to be fairly tightly correlated. ‘mx01l’ left/ climb 0/1 run maxes is in blue right side scale and ‘b’ red glide length, ‘cm1’ global climb length, ‘nw’ magenta starting iterate bit width left side scale. there is noticably more noise in ‘mx01l’ in the 1st graph but both trends are basically overlapping incl in the other variables.

eyeballing the graphs, starting iterate bit width nw=100 leads to max trajectory length ‘b’ in the range ~150-200 and glide length ~100. actual is b=178 and cm1=128 and also mx01l≈13, not bad, not too far off! KT did seem to find an iterate on the borderline/ edge region that is being explored by my analysis. but it fits! however, full disclosure/ caveat, rerunning bitwise10 with max 13 is not (yet) finding the ranges implied by the potential counterexample… currently twiddling the optimization knobs, have to “crunch” some more on it… part of the issue is that (so far) also as in graph #4 the ‘cm’ trend found is only ~½ the starting bit width… it is a simpler 1-variable optimization scheme, seems like maybe thats most of the explanation…

apparently there is something special about the multivariate optimization that is leading to the cm≈nl trend (with spikes on top)… think have seen that in other optimizations, need to maybe hunt thru old results to find it… another way of looking at this is that it seems counterintuitively maximizing ‘cm’ does not lead to a close or high cm-to-nl ratio that can be obtained otherwise, by multivariate optimizing… apparently a failure of the greedy approach… more Mysteries needing attn… on the other hand presumably not every (numerous) mystery needs to be understood to get a proof… ❓

(10/8) ❗ am trying to consolidate many of the prior ideas and dont have a clear path at the moment but feel like something is building up. the story is not as simple as one would hope because (trying to make not-so-short story short) the bitwise code seems to be missing something and acting differently at higher maximum 1-bit run lengths (limit).

came up with the idea to just use it to look for long right-descent 1-runs, and then accidentally coded it to look for long right-descent 0-runs instead, and then it found this rather bizarre/ extraordinary case. (once again “never a dull moment around here™”…) the climb seems to move from disorder to order (opposite to what has been generally conjectured), the descent has an enormous “bubble”, and the post-glide descent does indeed “crunch” it as has been observed. actually thinking about it, starting at the maximum its a case of the form mentioned/ considered earlier where there is a “lower” number ‘n’ and a “higher” number ‘m’ separated by a long 0-run gap.

heres another odd example! they seem to break prior ideas/ generalizations (seems collatz endlessly resists all generalizations in a sort of ultimate anti-induction embodiment/ case study). fixed up the code to look for long 1-runs in the right/ descent and it finds this one. both of these examples have a disordered region but its post-glide in each case. some of these would seem to be rejected with some kind of “minimum (constant?) climb length/ mixing” criteria but is it enough?

later thoughts: the trajectories are “weird” but in a way, seem to fit into the general spirit of the conjecture. my bias was to think of the disordered crunch region as always somewhere in the (glide) descent, but these “glides” “definitely” (in the sense of all glides being part of a trajectory) all have a disordered crunch region, its just in the post-glide (aka “tail”) area, and the glides are very short. sketched out only in words, the conjecture did not actually insist exactly the crunch region be in the glide region, it only insists (somewhat rephrased) that all trajectories eventually hit crunch regions, and those crunch regions cannot contain further climbs… ofc these words are not precisely defined/ quantified exactly… yet!

(10/9) its all about slicing/ dicing/ attacking from exactly right angle and have feeling its somewhere in sight/ nearby, and sometimes staring at individual binary iterate patterns can be hypnotic/ disorienting/ brainmelting and hard to understand how they fit into more general trends (aka “bright shiny objects lying by the side of the road™”). it seems very natural at this point to look at max 1-runs in a current iterate vs remaining “distance” either (global) climb ‘cm1’ or glide length ‘cg’. as part of this, another idea is to “focus/ care less” about where the maximums occur ie climb, descent, tail and/ or identifying the “crunch” region. this code works with the last 3 distributions, takes 100 trajectory samples, breaks them into subglides, and computes ‘cm1’ and ‘cg’ for each point in the subglide, and then computes a (log) color-based histogram, x axis is ‘mx’ max bit length in the iterate (simply negated for ‘cm1’ to get a sort of multiplot) and y axis is either ‘cm1’ or ‘cg’.

it looks almost like a gaussian distribution. the 2nd, 3rd multivariate optimizations are stonger than the 1st single variate in pushing the boundaries upward about 50-100 iterations. the curves already vary a little between distributions and the big question is whether these curves hold for higher iterates, but it all looks very solid right now. ie the idea of a limit/ bound is generally confirmed with the analysis. next step seems like maybe developing a search optimization to try to directly push up on these apparent limits for higher iterates.

there are some other fairly straightfwd/ almost simple ways to push further on the hypothesis… eg just generating random iterates with a distribution of 1-runs and showing that the 1-run sizes typically decrease over a single iteration… have feeling maybe did something like that long ago, not long after discovering density phenomenon, but cant recall doing exactly that… need to go wade thru old blogs again…

(minutes later) its easy to adapt the code to handle arbitrary specified inputs. took KTs 5 latest numbers/ trajectories and added them as a subroutine. the different distributions are called with ARGV[0] 0..3. 4th graph is his own trajectories “deconstructed”/ plotted with exactly same code. his mystery-optimization-algorithm code finds outer-limit climbs on left side distribution around ~250 distance (similar to graph #1 for ‘cm’ side although my multivariate distribution code in #2, #3 comes in at ~300) but not outer-limit glides in right side distribution maxing out around ~300. in short it looks like his method tends to create steep/ fast/ short descents (if glide lengths are around the trajectory lengths, although thats not exactly identifiable just by looking at these distributions instead of individual cases). including numbers here

another simple/ quick riff: accumulate the statistics over all the 1-runs in the trajectory and not just the max in each iterate. it puts sharp edges on the graphs for the smaller runs but doesnt alter the higher ones much. kind of just dashed this off wondering what it would look like and have to think a little more what it really means, how to interpret it, what the whole story is. these graphs now maybe explain some of the difficulty of earlier bitwise optimizations behaving so differently for mx=13 vs mx=10. its right on a transition point from long to short climb/ glide lengths. ofc it makes sense to look at trajectory lengths also, forgot about that.

later/ more thought on meaning, some rough interpretation: max 1-runs in a single iterate vs smaller 1-runs have a (prime) implicate-like dynamics. the graph of the iterate max 1-runs “eclipses” or “hides” all the smaller 1-runs statistics in each iterate. also as the max 1-runs get larger, all the smaller 1-runs in those iterates seem to have smaller cm1/ cg values with most of the reoccurrence/ overplotting in the graphs, therefore unaffecting the higher parts of the graphs.

(10/10) this should be apparent to anyone following closely (admittedly a highly hypothetical scenario), but to further sketch/ spell it out,

the middle/ center parts of those graphs are “crunch region”.

the conjecture is that no matter how large the iterates, the climb/ glide lengths are finite/ bounded/ limited within the crunch regions, and (addressing KTs recent objection in comments) hopefully “explorable” with various types of optimization code, ie maybe above graphs are close to theoretical optimums in some sense.

farther away from the center, the lengths increase indefinitely, but it seems still finite, although a proof could possibly still work even if trajectory lengths are infinite (unbounded) outside the center crunch region (finding that scenario hard to picture right now, and maybe it is not consistent with prior observations about apparently bounded expansion ratio eg widthratio ‘r’ also closely related to width difference ‘w’).

all iterates that start outside the center trend toward/ eventually “fall” in the center “trap”.

the general idea could work even without worrying too much about where trajectory peaks start or glides end (a very strong focus of a lot of prior analysis; its still of interest but maybe not nec/ critical for a proof). maybe some trajectories have crunch regions starting “deep” into the descents, maybe even post-glide tail region.

ie this is all the basic induction structure/ framework conjectured: the crunch region is like the base case and the induction is on 1-run sizes, not exactly sequentially (ie not holding strictly over adjacent iterates, as long explored previously), but wrt trends.

the core/ noncore regions appear to involve some type of key “phase transition” long speculated. some prior analysis is already substantially connected to this such as density/ entropy, other iterate statistics etc.

this is another quick riff on prior code framework. it determines the max 1-bin run over the remaining climb/ glide for all the subglides and plots vs ‘cm1’ climb, ‘cg’ glide.

(later) ❗ 💡 😳 had some time to look at KTs claims/ algorithm in more detail! ok! this is substantial! admit its a crucial property of collatz that havent stumbled across exactly on my own sometimes-substantially-limited/ nearby meanderings and yes thats clearly a gaping oversight! poking at it/ running it (on repl.it, still havent installed python around here), have some idea now.

restated, the basic idea is that for the (“semicompressed”) collatz function any initial glide pattern is possible for large enough seeds, didnt understand what KT was getting it/ alluding to/ doing earlier, this is definitely a very important/ striking property of the problem. it relates to what is known as the “parity sequence” of a trajectory in the literature and have inquired into that to some degree. is it described in the literature somewhere? probably…!? have read Lagarias survey and another (brief) ref & dont think it is mentioned. the code is close to a proof of his claims, but not exactly… need to dissect further/ understand it better…

so KT pulls some rabbit out of the hat™, a card up his sleeve™, impressive, while downplaying it as “high school level math,” (lol yeah right) but KT doesnt respond to many other basic points. has KT read anything on the subject other than Roosendaals web page? he evinces no sign. the code is nontrivial but its not (yet) clear if KT knows much about statistics, data science, datamining, machine learning, optimization techniques, complexity theory, current analysis, earlier blogs, scientific literature on subject, undergraduate math/ computer science, history of computer-based proofs, etc (all covered across many blogs on this site)… as usual mysteries abound everywhere around here… maybe some random musing will trigger him again into response and decrease it all slightly )(…

an excellent idea/ suggestion/ exercise from his latest comment: try to obtain the record glides (or something “close”) from Roosendaals pg using a non brute force search/ greedy algorithm!

rewrote the code in ruby for now + get identical results.

concede KT is partly )( right, this remarkable, even extraordinary property does invalidate some of the general ideas/ angles/ approaches in this blog, although was coming to similar conclusions lately anyway as written/ recorded. the property noticed/ cited by me in response (thought it was what KT was pointing to in his characteristically terse words) is similar/ related but not the same: for any trajectory pattern found (or initial subtrajectory), a larger seed exists with the same glide pattern, aka “prefixes”. code working from that idea dates to some of the earliest blogs here. however, depending on the “structure” of the larger seeds, other proof ideas being pursued are not ruled out.

(10/11) dont have any big exact ideas to pursue at moment. feel big ideas swimming around but they arent exact yet. quick scan: yep did analyze 0-run adjacent iterate statistics way back in 4/2015 and found the convergence-to-~2-lengths. have to revisit this some. feel a bit of “@#%&” should have paid much better attn to it long ago… the ~2-length crunch region is displayed right in the graph, couldnt be much more glaring! knew it was a big deal at the time but there were so many other shiny objects lying by the side of the road™…

KT wrote up a beginning of a proof on github apparently using latex & replied with comment. need to play with his code more, didnt even try other trajectories other than the one he hardcoded. KT responds to so little written about/ directed at him, was recently discussing this with cohort, honestly he seems almost autistic sometimes. if 10 questions are directed at him he might reply indirectly to 1 of them. might sound like a harsh thing to say but it probably pales in comparison to the things C/KT has written on the blog in the past. who knows if he will even read this or ever return… sometimes the truth hurts/ stings™…

thinking more about it, KT has a very good point with the long 0-run example. the recent prior code histograms are focused/ based on 1-runs and the example shows a basic pattern where trajectories lengths must be unbounded with long 0-runs, ie “padding” the iterates can lead to arbitrarily large trajectories/ long “(pseudo)crunch” regions or a sort of “false positive crunch”. was familiar with that structure and should have caught this basic counterexample myself. it also seems to prove that the workhorse bitwise greedy algorithm is not finding these samples ie is not/ cannot be “representative” at least wrt them (a key need of this line of empirical research as mentioned in the comments). was thinking that maybe some of the collatz “compression” methods that compress multiple div 2 runs into a single run might avoid the gaps and allow the crunch region to be characterized wrt 1-runs only. sometimes statistical approaches while having counterexamples have a “grain of truth” to them/ observed statistics.

(10/12) 😳 ❗ 💡 😡 👿 😈 #@%& looks like the jokes on both of us™/ join the club™ so to speak. looking closer thru old refs. from a reddit thread earlier this year the infamous sleeps_with_crazy(fondPTSD memories of the dark mistress of pain, now at 100K reddit comment karma, also vociferously objecting to my computational/ empirical/ algorithmic etc oriented approaches on apparently similar grounds, also somewhat similarly piercingly painfully, aka popping my bubble(s)™, making other trolls look like mere sapless n00bz) mentioned Terras results from late 1970s in a different context, 2020 hindsight should have looked at these closer but honestly find purely math oriented formulations hard to parse/ grasp. they seem to contain KTs ideas/ proof, have skimmed these in the past but did not understand their significance, it would have been helpful if they had been stated in the language of random walks as KT and this blogs approach uses somewhat. KT seems to have rediscovered thm B p7 of this Lagarias ref/ intro/ overview the 3x + 1 problem and its generalizations, Jan 1985 cites [64] Terras 1976 apparently same as Lemma 2 given by Roosendaal on his web pg the Terras thm. the Lagarias ref is ~3decades old and Terras ~4, holy @#%& cow, staggering… 😮

(10/13) 💡 🙂 still a bit dazed and confused/ reeling… maybe now recovering some )( from what at times feels like kick in the gut™… as for paradigm shifts, maybe not fully an earthquake but definitely a major tremor… studied the Roosendaal restatement/ lemmas/ thms/ proof(s) of Terras today in a brilliant clear/ sunny/ cooler day out on the lanai (saw that word in a Hawaii magazine yesterday and this reminded me of my grandfathers use of the term…) with lawn chair, incense, a firepit, and 2 new dragon protector statues (ask me for a pic and might respond! … like a dream! 40° temp drop + snow forecast tomorrow!) and find it nearly all more comprehensible now. still cant quite follow KTs code yet, need to analyze it some more. (speaking of him, KT a no show for a few days, yeah, maybe now also in 2020 hindsight a paper tiger…)

cant follow the Lagarias specialized syntax yet but its gotta be the same ideas in significantly different form, was amused/ surprised/ experienced weird/ eerie sense of synchronicity/ deja vu / etc to run across this after typing in "math sup" into google, is google nearly psychic or metaphysical?(gosh darn it where are those Collatz experts/ math translators when you need them? nearly zilch peeps on the stack exchange so-called expert chat lines although did hear from longtime Riemann hacker Mats Granvik on his pov on the Atiyah attack, blog). the Roosendaal proof uses positive integers only in the construction but KT seems to have some negative outputs. am thinking of maybe coding it up myself. new ideas! all is not lost™…

the 1-1 correspondence/ permutation between parity sequences and collatz sequence numbers reminds me of the idea of encryption and decryption, have mentioned that vague idea of correlations of the problem to cryptography before and this makes it much more direct, it seems like a simple cipher system, almost a kind of “transposition”.

have long noticed and also recently that glide max indexes ‘cm’ often tend to correlate with bit length of initial iterate under some optimization schemes, now much more understandable. this new knowledge means that one has to focus on the post-bit-length dynamics of the glide, that this boundary is a very natural/ intrinsic part of the problem, can see some hint/ awareness of this in prior analysis. much of the optimization ideas still seem to apply/ have validity post-bit-length.

post paradigm shift, a natural angle to study: one can now construct glide dynamics of any desired structure. how do different custom/ specially constructed glides effect the “later-glide” dynamics? what kinds of trends/ patterns emerge? its just the same questions with slightly different adjustments.

need new names/ concepts for some of this. already talked about the “covered vs uncovered” region to KT in the comments. there is a pre-(initial seed)-bit-length glide region and a post-bit-length glide region, might come up with some kind of nice definition for that.

from the theorem, the pre-bit-length region is totally “unconstrained” whereas before this was not understood. actually “glide” word itself is more problematic terminology now because the “sequence-correspondence property” is key and doesnt map neatly onto the concept of a glide any more, it could be either intra- or extra-glide.

yes— exactly as KT insisted over some terse comments— stated in another (less devastating) way, a lot of the prior dynamics analyzed must now be chalked up to optimization algorithm “footprints”/ artifacts/ bias, but not all of it…

some other searching: math stackexchange has 3 semirelevant posts with 1 roughly on-target but 2, 3 clearly containing some misinformation, mathoverflow search on terras collatz turns up zilch (but maybe 23 other refs/ things hes worked on…?). so maybe shouldnt feel so bad about its obscurity… math is truly a massive universe… even the known parts… which reminds me, just ran across this neat announcement in a chat room, big congratulations stackexchange

Today Mathematics Stack Exchange becomes the first Stack Exchange site to have 1 million questions.
These question… twitter.com/i/web/status/1…

(10/14) ok this Terras idea is really not rocket science. have a better grasp. dont understand KTs code yet, its very efficient, but did have this basic idea as an exercise in understanding/ implementing Terras lemma; its not immediately obvious from the Roosendaal proof how to create the efficient implementation. heres a naive/ “proof of concept” implementation. the lemma shows a (crucial!) 1-1 correspondence between parity sequences and initial iterate binary values. as remarked this reminds me of cryptography. in 2 words, plaintext ←→ cyphertext. its a mapping, a permutation, like a substitution cipher. this is some simple code that recomputes the sequence each time from (current) initial value, ie is inefficient, the more efficient version doesnt have to redo the computation and can reuse prior calculations (maybe not too hard to do next but it helps to start with something basic/ solid/ provably correct). it chooses a random trajectory of 20 up/ downs and determines the corresponding starting iterate leading to that trajectory and validates that it found it at the end. trajectory is left-right order and iterate is output in “reverse” lsb to msb order.

another way of looking at the code: its actually the long-familiar bitwise construction strategy in a different form! it chooses new bits in the starting seed that generate exactly the specified trajectory. the existence and method are guaranteed by the Terras lemma(s). looked at the Terras thms long ago and thought the main point was to show that “most seeds in an interval terminate”. but it turns out for me the lemma(s) are almost more important than the theorem. example run

(10/15) 😳 💡 ❗ ❓ lol that code “sort-of” works. it does have all the machinery to validate a starting seed but apparently for even numbers it is utilizing the 1-4-2-1 loop at the end to construct a matching trajectory seed, not as intended. rewrote the code today and think have it corrected ok. a still hunting for the efficient version that doesnt require recalculation, it seems maybe embedded in the Terras proof but cant figure it out yet, and want to post it when ready, stared at it + Roosendaal Terras lemma statement a long time but couldnt quite formulate it quite yet, feel right on verge…

another big idea: the code/ fix rewrite revealed/ uncovered a remarkable/ eyepopping structure… suspect that may have some old code that is very close to the Terras proof and didnt realize what it was doing! have to go dig that up. after coding this am really getting the feeling of deja vu!

ps am typing this after finally managing to boot my ipad mini after trying to boot many times, its very strange, its fully charged but displays the battery low indicator, and then booted up at 100%. am having that sinking feeling that fixing it might cost me money because its just a few months out of 1yr warranty… doncha just luv CS… also popped it out of the nice/ beautiful griffin survivor case in anticipation of taking it in, not simple to re-assemble, have to re-clean, now not sure what to do next… quick google search turns up other complaining about similar problems, some triggered by OS updates… argh, sigh!

(later) 😮 ❗ 😎 😀 ⭐ ❤ lol holy @#%& cowhere it is! scanning thru lots of old blogs, ~2½ yr ago early 2016 was wondering about “inductive structure” of creating “high siblings” from “lower ones” and realized that changing the msbs of a seed from “1” to “11” and “10” lead from the lower to higher ones, and wrote the inductive code to do it. this is basically a rediscovery of Terras ideas/ results from a slightly different angle, in retrospect a very nice/ succint/ natural way of demonstrating it, and the 10, 11 msbs correspond exactly to M1, M2 in Terras-Roosendaal lemma 2, always meant to revisit it, didnt fully understand/ realize its implications… the missing analysis is to understand the trajectory relationship(s) to the seed… each inductive step extends the trajectory by exactly 1 step and seed by exactly 1 bit… (but wrt the semicompressed mapping only)! clearly it seemed very meaningful at the time but forgot to followup, or maybe didnt…! another bright shiny object lying by side of road™… no math is an island… but alas collatz research can sometimes feel like living on one… 😥

😮 ❗ 👿 talk about a near miss the length6 code written at the time used the high-compressed evaluation function f1 and not the semicompressed function f2 to compare lengths, and also compared glide length not total trajectory length and would have discovered the extraordinary “full” “Terras 1-1” property otherwise, although it was also rather implicit in the construction, just didnt think it thru fully at time…!* also maybe 1 of the pitfalls about not always testing all the different compression functions/ metrics against all code…

(10/16) this is some simple code with a kind of “(random) test harness” to try to understand some relationships in the proof better mainly between M1, M2, decided to write it as exercise/ proof of concept after having trouble wrapping brain around the concepts, and this is at the heart of how to make the Terras correspondence algorithm more efficient. there is different ways to motivate this and the proof supplies one, but here is another that ties in with lots of prior research. in the bitwise technique there is “very similar” recalculation of much of the initial part of trajectories after an msb is added, and have wondered some about a more efficient approach.

the lemma supplies a relationship and says that if the final value/ iterate ‘X’ of a trajectory of length ‘k’ is known where ‘k’ is also the bit size of the starting iterate, then adding an msb at position ‘k’ means the final iterate ‘Y’ is known by Y = X + 3n where ‘n’ is the # of odd iterates in the semicompressed sequence of length ‘k’. the proof is actually general for all possible msbs ‘a’ above the 1 msb added. this code recasts the proof relationship into the language of binary strings and ‘a’ can also be empty. there are 3 tests of 3 instances each: the ’10’, ’11’ case mentioned in the proof, the case ‘0’, ‘1’ where ‘a’ string is empty and also exactly the “bitwise” case of adding a single msb, and an arbitrary ‘a’ string of 5 bits, 9 total tests.

now, how to utilize this exhibited relationship to handle the case of upper msb changed from ‘1’ to either ’10’ or ’11’ as in the Terras 1-1 correspondence generation? simple, right? 😮 😐 ❓

* further thought/ rethink on the length6 construction: it seems that none of the glide or trajectory metrics generally used so far/ to date would have directly “revealed” the Terras 1-1 correspondence property. however another potential/ nearby metric that could have but probably hasnt been studied much wrt my code (although maybe some others come close) is the trajectory state after exactly ‘n’ semicompressed iterations where ‘n’ is the bit width of the initial iterate. ❓

other news: ipad 12.0.1 update sent down/ installed. seems like not long since prior one. brief issues description mentioned some exotic devices not charging correctly. not the same as my issue but lets see if its software related, feel it would be a bit miraculous if its fixed just with software update…. ❓

(10/18) 😮 ❗ wow this stuff is sure hard sometimes and frying/ melting my brain over many hours of labor intensive work/ thought over about ~2 wks now, recall exactly similar feelings/ process in early 2016 on my initial discovery/ construction in the area. harder than expected™… no wonder brilliant mathematicians (and knowledgeable commenters on this blog…?) are so rare. there is some way to prove/ derive all this with algebra but am working from a sort of hacker approach by building working code that passes tests, working with individual cases to derive the general rule(s), aka test-driven development (applied to math!); and KT seems to have succeeded in exactly this challenge already weeks ago but hes now nowhere to be seen or heard, disappeared without trace.

the basic idea is to deconstruct the iterative code and use the Terras relationships to come up with a version that doesnt require re-iteration. also was wondering about a case where (re Roosendaal proof) N = a · 2k + b for N = 1 which implies a = 0 but it seems to be legitimate in the proof because it says N can be any odd number. strictly speaking this is all not exactly necessary but its an important/ worthwhile exercise that builds significantly on Terras original ideas. (yes even after all this feel KT still has nearly/ basically “publishable” results with the right presentation/ motivation, eg citing Terras etc, all the difficulty/ challenge/ trouble to reproduce it/ re-derive it further supports that… still dont (fully) understand his version!) 😮

here is the “naive” inefficient (re)iterative version (which still took quite a bit of thinking/ adjustments to code). it constructs random 20 step trajectories of specified “shape”. it outputs a variable ‘X’ that is the final calculation of the iterative function. another way to look at the efficiency goal is to find an efficient/ iterative way to compute ‘X’. simple, right? sometimes a simple way to make an algorithm more efficient is memoization aka method input/ output caching, thought about implementing that here. its not quite as efficient as an iterative recasting/ refactoring because of the memory use, but its not hard to implement.

pondering all this there seems to be a sort of basic logic/ calculus involved, need to come up with a nice notation for that which would help reasoning about the algorithm. another way to look at it is that the collatz algorithm generates a graph and siblings are related to this graph by connections. the edges of the graph are arithmetic operations and function calculations/ inverses. it would be helpful to draw out this graph. right now the hacker approach is to name a lot of variables with the calculations which represent nodes in the graph (or attributes associated with nodes) and try to build it piecewise.

(later) ❗ ❤ 🙂 voila here it is. (or in an infamous decades old pop song whoop there it is!™.) this took a lot of debugging but and very careful analysis of the relationships and there are a rather large # of variables to keep track of. the code calculates using the inefficient and the efficient methods and has “assert” statements to demonstrate equality (raises in case of inequality). notice the similarity in spirit to eg junit tests. more of the curry-howard correspondence (as if anyone could doubt these general approaches, curry-howard discovered them decades ago, but its rarely taught, alas!) the output is basically identical to prior output. the inefficient iterative code calculates s, n2 the parity vector and final iterate value and the efficient inductive code calculates the corresponding s2, x3 and corresponding variables are asserted/ (“proven”) to be equal.

(later) 💡 ❤ ⭐ 😎 😀 ❗ for the win™!(name of local excellent new downtown video game arcade.) this is a major careful refactor with the help of the “test harness/ unit tests” that removes a bunch of temporary variables, namely nb, n2b, n3b, mb, x2, b now seen as mere scaffolding, and rearranges the code quite a bit. all the (rather complex) modification logic of 4 key variables m, n, x3, s2 is factored into a separate subroutine adj, from a dataflow pov its a sort of black box cpu for the function/ calculation/ computation. the (inefficient) iterative function is called with variables that are all updated by the efficient code. this all changes the algorithm runtime from O(n2) to O(n) disregarding the inefficient code at this point only used for purposes of comparison/ equivalence proof. again same output. the coding strategy was interesting and was very much like manipulating algebraic equations for equality, except making small iterative changes 1 at a time and rerunning the code to check/ test for equivalence.

test driven development/ unit/ integration testing meets advanced theoretical mathematics…! this is now all a solid/ highly debugged subroutine to be used for later analysis and have some ideas building on prior directions already. have some real gratitude for KT and his “disruptive technology” lol! life with collatz was utterly incomplete without some strong/ focused/ decisive attack/ taming of all this. agreed/ concede this is obviously crucial/ core theory but few refs say much about it and who has even implemented the code before? rather strikingly the literature doesnt seem to contain any indication/ instance of that (afaik/ct)! staring at the code, it took quite awhile to duplicate KTs feat; the code doesnt really look anything like KTs, but its apparently nearly equivalent. proof left as exercise for reader™ lol!

(10/20)when you have a hammer everything starts to look like a nail™.(there is some background creepy real life halloween news last few wks about Khashoggi RIP… the nail that sticks out gets hammered down™…) that saying has been some theme of this blog. it would seem even KT can attest to that, he had some great ideas/ analysis but they seem all tied to the Terras analysis, which honestly am now thinking while extremely insightful is somewhat limited esp wrt ultimate proof directions/ techniques/ properties (because it “only” applies to early trajectory dynamics whereas later regions are “where all the action is”™), and KT in his absence is increasingly looking like a flash in the pan™/ 1 trick pony™. some frustrated/ exasperated sentiments about trolls leaving me alone were alluded to/ expressed earlier. be careful what you wish for, you might get it™. easy come easy go™. after all the excitement back to the long familiar ECHO Echo e c h o off the walls… sheesh cant even manage to bait a troll around here 😛

as mentioned had some ideas about how to look at the Terras 1-1 properties. this is a fairly quick/ straightfwd riff/ exercise on that. a simple way to think of the proof is that it describes a mapping between trajectory parity sequences and iterates (in binary). what are basic properties of this mapping? it makes sense to analyze it wrt the major properties found to be associated with the problem so far. around here by now this is almost a routine exercise to crank out. and having done so much work on the Terras code, do want to utilize/ demonstrate it somehow at least once! also, from the pov/ modus operandii of this blogs attack style and tribute to the father of analytic geometry— graphed, therefore it exists.

so this code looks at the two key standbys density and entropy. the entropy variant was only discovered roughly earlier this year. it generates 1000 50-bit width iterates with smoothly varying density and entropy starting from min to max (0 to 1) and calling the Terras construction as a subroutine. the neat/ powerful entropy code ideas from about ½ yr ago on 4/2018 to generate arbitrary/ smoothly varying entropy with fixed ½ density were utilized. then it looks at the function output density/ entropy metrics. these are all labelled dd, de, ed, ee. the display was easier to understand without overplotting so theyre plotted side by side instead.

these are helpful to understand/ picture some general dynamics/ behavior of the function. some of this analysis is already hinted and analyzed “in the opposite direction” in a lot of earlier analysis which varies density/ entropy on input seeds and then looks at trajectory dynamics. however here the (input/ domain) density/ entropy are controlling the trajectory parity sequence. higher density means more ‘1’s which indicates more 3n+1 ops ie higher trajectory slope.

(later) its a simple exercise to look at basic trajectory statistics used previously such as glide length ‘cg’, max index ‘cm’, total length ‘c’ for these 2 density ‘d’ and entropy ‘e’ seed construction schemes. the scales turned out to be different so the 1st 3 d_cg, d_c, e_c are on left side scale and the 2nd 3 d_cm1, e_cg, e_cm1 are on right side scale. there are correlations (d_cg, d_c, d_cm1), anticorrelations (e_cg, e_cm1), and random noncorrelations (blue ‘e_c’).

💡 the Terras 1-1 mapping ideas lead to some basic insights. it looks like it makes sense to rule out all iterates with (semicompressed) glides of size less than the initial bit width, because all such glide patterns are possible to construct. or more specifically, if the trajectory glide region is over in iteration count before the # of initial seed bits, then all such patterns are possible. the unresolved part of the problem is “what remains”. both what happens on the (“semi-“) glides that “fall short” and over all the semiglides that fall short. another way to look at it: after the Terras construction code custom-builds an iterate, there is an “end iterate” that encodes the later part of the trajectory that somehow encodes a glide-terminating subtrajectory, somewhat reminiscent of a “call stack” for a program, reminding me also of PDAs, push down automata, have long wondered about that connection. have to try to convert these new musings/ intuitions into terminology and more rigorous formulations…

⭐ ⭐ ⭐

(10/21) 💡 ❗ coming back to a particular idea, building on/ in line with recent new observations. there was an analysis of the disordered region at last spring 5/2018. the histogram for review30 shows a very even/ noiseless outline. this is the 0/1/01-run statistics for the disordered region. the disordered region looks remarkably a lot like CRT (cathode ray tube) video static! well familiar to us “old timers,” yet a phenomenon that this new LCD digital age hasnt seen in decades. at the time was looking for features in this region and then was a bit )( disappointed/ crestfallen about the apparent indistinguishability. however, the static appears disordered but the histogram is not. so… how to make lemonade out of lemons™?it would seem paradoxically on 2nd thought the featurelessness is not a bug, but a feature™! my new paradigm-shift thinking is that the (so-called) disordered region is itself apparently some kind of feature!

in retrospect, maybe some anthropomorphic bias here, because somewhat counterintuitively/ from other pov/ thinking outside the box™there is very substantial order in the disordered region as seen in the very “sharp” (not jagged) histogram and other misc statistics (max, average, std dev of 0/1/01 runs etc). it also exhibits striking scale invariance at higher iterate sizes. my long insistence/ bias/ now-misconception/ mistake may be trying to find a feature that seemingly predicts glide descent, at least fairly “quickly”. it looks like the disordered region is actually some kind of precursor/ predictor of descent, subject to some kind of (sometimes extended) delay. in the x-y trap sense: at least some amount of disordered region precedes the descent. also it looks like the disordered region is actually a sign of sufficient “mixing” long conjectured all the way back to 3-decades old Lagarias commentary.

a lot of analysis has already gone in this direction but it looks like the key to unlocking the problem may be recognizing/ characterizing the disordered region effect on/ somehow associated with “triggering” descent. long-studied density and entropy properties are closely associated with the disordered region, also eg “1st density crossover” metric studied, but none of these so far seem to fully imply/ determine it, although combined 0/1 bit runs may come very close. now its just a matter of devising even better/ more exact/ precise analysis from this particular angle. ie how can all this be expressed in code/ optimization searches (beyond those already substantially formulated)? from Lagarias ibid p6

sec 2.1. A heuristic argument. … From the viewpoint of this heuristic argument, the central difficulty of the 3x + 1 problem lies in understanding in detail the “mixing” properties of iterates of the function T(n) (mod 2k) for all powers of 2. The function T(n) does indeed have some “mixing” properties given by Theorems B (Terras) and K below; these are much weaker than what one needs to settle the 3x + 1 Conjecture.

⭐ 💡 ❗ 😎 😀 ❤ footnote: was digging around trying to find what link seen pointed to the ML aspects of fusion systems eg stellarator(s) and/ or tokomaks. finally found this one (about 2mo old)! it has a great video of ESA (Euro Space Agency) new ITER prj. it will be one of the starting projects for the worlds largest new supercomputer Aurora in a few years. more analogies: the video mentions a diverter component of the system that cycles off energetic plasma from the suspended microsecond-dynamically-controlled “unstable equilibrium”, and ofc theres an energy injection component (lasers + plasma mass etc). the collatz problem can be seen to have 2 fundamental components/ operations, a “energy/ mass”-increasing 3n + 1 op and the energy-decreasing div2 op. the div2 op acts like a “diverter” of mass off the chaotic/ fractal contained plasma (the current iterate). phrased in this language the collatz problem is question about the inherently decreasing mass/ energy of the system. a cycle found would disprove the conjecture and be like a kind of equilibrium state. an infinitely increasing trajectory is analogous to increasing/ spiralling mass/ energy.

the introduction of ML elements to fusion control are relatively new at only about ~2yrs, did a quick google search and have some links to share. also google has worked on 1 of the prjs! its the pivotal birth of a new critical field in computer science/ control systems. exciting stuff! it looks like computer scientists are going to play a starring role in this upcoming potentially world-shifting/ altering saga.

(10/22) a long several-week detour there. want to get back to optimization/ search methods that seem to be closer to building a structure for a proof. the last search code was bitwise10 that was looking for long glides (or rather large peak indexes ‘cm’, also the natural measure of “remaining increase/ expansion”) with small 1-runs (10 or less). it occurred to me (ok, full disclosure, this is actually building on KTs idea!) that one can optimize in “trajectory space” instead of “iterate space”, using the same bitwise optimizer. in other words the bitwise optimizer builds arbitrary trajectories instead of iterates and converts the trajectories to iterates using the Terras correspondence. it also filters this time by both 0/1 runs at 10 max.

this is a fairly fast adaptation of code. noticed the last review code/ subroutine for bitwise10 didnt sort the final results by the optimizing variable possibly making it less representative. this code fixes that by sorting by ‘cm1’ at the end. the hope is that it optimizes better. it does seem to find slightly higher ‘cm1’ at ~80 compared to ~50 although its not clear yet if thats due to the trajectory space optimization or merely the post ‘cm1’ sort switches. it runs 10 times and the optimizations all “paint into corner” ie eventually cannot further extend all partial trajectories in the search space. this final plot shows an upcurve in ‘cm1’ blue near the end but am thinking at moment its misleading and more an artifact of the sort. the graph is a bit busy with lots of trends but am going to skip any reformatting it more for now.

💡 have some cool new ideas, am going to go in some new directions. the bitwise optimizers are very powerful but have a general problem of local optimization even when attempting to optimize over multiple trajectory variables. they adjust later moves/ bits of the glide/ iterate after adjusting earlier parts, even this new Terras-integrated scheme has the same approach/ (likely?) limitation. a remarkable new idea how to build a different less-local optimizer recently occurred to me… its not going to be hard to implement, yet seems promising…

(10/23) ❗ 💡 😮 🙄 😳2 ⭐ 😀 feel am finally starting to see the light! but again wish had known then what is known now™. research is built fwd but understood bkwd™. relooking at the ~3½ yr old 4/2015gap code results mentioned earlier, that code had a defect as mentioned in the blog (the regular expressions to cut the bit string were wrong and didnt handle repetition correctly). therefore that rather basic exercise needs to be redone, & think that never was revisited/ carried out! the simple idea is to generate different density seeds and then look at the max 0/1 bit runs (pre/ post, mx1, mx2). this new code uses one iteration of the collatz function instead of the 3n + 1 op only with similar results. it generates 1000 seeds of 1000 width of varying density and looks at pre/ post 0/1 bit runs averaging over 10 runs.

results are really a bit )( extraordinary. the bit run sizes almost always decrease except in the core density region (graph 3). this suggests the core density region identified otherwise based on the “trap” phenomenon is also possibly/ actually (almost? or…) exactly associated with the max bit runs.

another striking aspect: looking at the graph really reminded me of another graph made around the time density property was discovered, almost ~4yrs ago now in this now clearly seminal 11/2014 blog the map6b graph. that was one of the earliest experiments with iteration count to density crossover and it found nearly the same shape as the max 0/1 runs just now found. so am computing them both in this code and displaying them in the graphs. in 1st graph the post mapping 0/1 max run is in blue and iteration count to density crossover is in magenta (versus input density x axis). they are highly aligned not even in different scale. output density is red/ right side scale. in graph 2 these are plotted in logarithmic scale that reveals some key difference hidden in graph #1 near the origin, some discrepancy in location/ difference in inflection.

the bigger picture is starting to emerge. this graph reveals almost the entire inductive proof structure overview. the earlier idea about a 2-region/ phase trap concept is basically fully outlined here. in short the 0/1 runs are measuring mixing—the devil is now in the details™. in retrospect, maybe was somewhat avoiding the max bit runs as highly pivotal because didnt totally trust them as otherwise typically noisy extrema…

known possible exception/ outlier/ edge case at this point: the significantly studied ‘w’ generation scheme had sequences with 1-lsb triangles interspersed with “crunched” iterates (small 0/1 runs) but the crunched iterates/ intervals seemed to be “brief/ scattered”. so formulating the exact nature of the crunch region is not so simple. again the x-y trap idea comes into play where it seems to require some minimal length ‘y’ region.

(with this last entry this epic post is now at ~12K words and is probably a record for the blog/ collatz study here.)

(10/24) it occurred to me to look into entropy response of a single semicompressed mapping instead of density as in prior code. there is some study of this already but not as thoroughly. this has 3 generation methods that vary entropy: entropy which was previously seen for creating ½ density seeds with smoothly varying entropy, initr which came from some genetic algorithm seed generation and has random 0/1 length runs, and secs which flips bits at a smoothly varying # of specified sections at random locations. the 3 responses are seen below with x axis the input entropy. ‘e21’ and ‘mx21’ are differences between post/ pre values.

this code also computes ‘cm1’ black and ‘cg’ red glide max index and glide length which are seen to highly correlate with entropy in most cases. so there is some remarkable/ strong signal here. there are strong trends in the max bit runs. the crosshatching is due to odd/ even block count switching by the the mapping operation. despite much different code schemes the 1st/ 3rd entropy/ secs schemes cause very similar response and am now thinking they are nearly equivalent. further thought the secs in contrast to entropy doesnt guarantee ½ density but apparently approaches it closely on average. this all seems to show that maybe entropy will be very useful somehow inside the core density. but also suspect maybe highly customized seeds are not fitting in these distributions. feel like have done some entropy analysis within the core region but cant remember much particular experiments along those lines. maybe some gaping oversight? ❓

(10/25) 💡 looking back, did not see too much entropy analysis of glides. the 5th graph of gridc on 3/2018 had ‘c0s’ and ‘c1s’ the ‘0’ and ‘1’ block counts that are essentially the entropy metric over 0s and 1s separately ((c1s – c0s) * s = 1 where s is seed width and entropy e = c0s + c1s) apparently for a ‘w’ glide (with noticeable lsb 1-triangles on the climb). heres some code that isnt esp dramatic in its findings but has some nice analysis. it uses the last 3 “hard/ disordered” dbs and finds a long trajectory from each and does an analysis over the full trajectory. density ‘d’ red, entropy ‘e’ green, ‘mx01’ max scaled bit runs ‘mx01s’ blue all right side scale, ‘mx01’ * 10 magenta and ‘nw’ bit width lightblue left side scale. the 2nd and 3rd plots use some nice modular averaging code (should have written this stuff a long time ago, didnt find anything close), 50 running average or 0.99 exponential moving average.

something about the fractal nature of the problem means that even long 50 count averages smooth substantially yet still remain rather bumpy whereas exponential moving averages are smoother; maybe this is saying something about the random walk properties (eg mean returning tendency). near the end the density spread increases and the entropy tends to increase, not clear from looking at the unaveraged data. there is something funky about the way the max statistics trend somewhat but not directly in line with the bit widths. inverted it seems to be nearly the monotonic decreasing invariant long sought! this seems to be further supportive evidence for the 2-phase crunch idea… ❓

❗ 😮 😎 😀 ❤ 💡 holy #$%&who was that masked man? someone else beside me wrote some collatz search code? (lol someone else is on this long-deserted island?)ASTONISHING. WE MUST TALK! please contact me ASAP on stackexchange! what language are you using? what is the search algorithm? could you post the code somewhere? at least plz choose some pseudonym for future correspondence here. fyi am looking at both '1' runs and '0' runs and there is some difference in behavior in the two. all the code is posted/ open source if you feel like trying it out.

still reeling in utter shock… a 1-in-years breakthru has occured… this easily makes my month!

😛 lol you waited a whole 5 days to share your breakthru with me? fyi your link with .js suffix seems to be incorrect and leads to a html dump. this seems to be the correct link. will take a look sometime however if you dont want to discuss/ describe it, its not going to be one of my top priorities. btw it likely doesnt disprove any conjectures in this blog which are more statistical in flavor, but do appreciate https://gist.github.com/collatz-troll/8f77e40acdd1b35a7b637351e164f123

(later/ edit) 😳 oops! the comment window did not autorender the javascript code, but it does display on the page correctly. wow! you figured out gist embed before me on my own blog lol!

Basically, for any sequence of ups and downs of length k, there is an infinite number of starting seeds of the form 2^k*r-c, whose Collatz trajectory starts with this sequence. All 2^k sequences are possible, so it’s just false to say that a glide can’t be such-and-such. This code gives you all seeds for a desired sequence (as well as its ending points).
The reason you observe a correlation between 1-bit runs and ups is due to the fact that numbers of form 2^k-1 produce k ups.
To generate glides barely floating above the starting seed, keep ups-do-downs ratio close to log(2)/log(1.5) (but above it).

the blog talks about this “prefix” property in several places, even recently, identified it years ago, it does affect some of the conjectures/ ideas. yes youre also talking about what are called lsb 1-runs in this blog, theres lots of analysis/ some recent. yes have been writing code to generate “long glides” for years using various strategies. etc

The previous example was to address specifically your statement that without long 1-bit runs the glide is very limited. Now it looks like you still claim that 1-bit runs are the “potential energy”, but require both 0 and 1-bit runs to be “crunched” to indicate the glide is ending.
Anyway, here is another 100-bit example with glide length 176:
1058071015331242680377450331899

Its base2 is pretty close to a random 0-1 string, where each bit is generated independently with a fair coin toss.

ok didnt quite understand your idea for a counterexample and now get it. yes your 1st instance does show that both 0-runs and 1-runs and not merely 1-runs only need to be taken into account to find/ identify/ “detect” the crunch region. wrote up some new analysis of your 2nd example incl in main blog. it does seem to test/ push at the edges of the conjecture/ current computational analysis/ searches. it would be nice if you describe if youre running any optimization algorithms or just doing random searches etc.

This code will give you similar seeds with pretty high probability (I got the example I gave you on 2nd try). Needs my code above as a module. If you don’t have Python 3.x, I recommend installing Anaconda.

On a separate note: of course, optimizing bits in a greedy algorithm fails to get record glides. A good analogy would be a jigsaw puzzle/tiling with irregular shapes. Finding seeds with long glides is like tiling pretty well. Finding a counterexample to the conjecture (if one exists) would be like a perfect tiling over infinity. Local search doesn’t solve tiling well.

(KT) ok admittedly like the idea/ image/ metaphor of a jigsaw puzzle or space-tiling analogies, they are closer than some others proposed by “CTs” around here. they also have deep connections with fractals although maybe few have pointed this out. various tiling problems ofc also have deep connections to undecidability/ Turing completeness. but as Korzybski famously said, “the map is not the territory”…

now, greedy algorithms have limitations but maybe not what you think, and you dont really seem to have much grasp on the algorithms being utilized, and likewise havent figured yours out yet either, although youre not providing much help/ detail on that. greedy search algorithms have found “extraordinary” trajectories/ patterns and greedy is not always exactly the same as “local”. there are so many different ways to code greedy algorithms that dissect the problem from different angles. the real question is how “representative” the greedy-searched samples are and theres a constant/ superattentive analysis/ running commentary on that around here. limitations on greedy algorithms can be adjusted/ avoided/ evaded with various parameter tuning/ tricks etc. greedy algorithms recently beat the worlds best human Go player after such a feat was thought nearly impossible. the collatz algorithms are getting more sophisticated, as are the patterns and conjectures, and think The End Is Not Far™ 🙂

“of course, optimizing bits in a greedy algorithm fails to get record glides.” this is just not nec true of my own code or greedy algorithms in general. have been generating apparently “record” glides from greedy algorithms for several years now. one can tune them to have a parameter that adds “more brute force” ie “delocalizes” the search on some continuum (similar to stochastic gradient descent/ simulated annealing with different cooling approaches/ “schedules”). some greedy algorithms can provably get optimal solutions in special cases. yes, greedy algorithms are subject to local extrema traps but sometimes the local extrema are very close to each other in optimization parameter. anyway take a look at the new histogram deconstruction analysis incl over your 5 instances/ samples.

I’ll wait until you make some falsifiable claim before giving more examples or responding to what you said about greedy algorithms. Meanwhile, if you want to look at established record glides, here they are, computed by enumeration:http://www.ericr.nl/wondrous/glidrecs.html
Try to reproduce them with any optimization, greedy or not…

Once again, I didn’t do any optimization to get the examples I gave you, and I posted here all “search” code that I used. The first piece of code gives you all seeds that produce a given (partial) trajectory: seeds_by_trajectory(traj). traj is a string, and ‘1’ means up and ‘0’ means down. Any (finite) up-down sequence is realizable by infinite # of seeds. The second piece of code just generates a random trajectory that doesn’t fall below the initial seed and calls seeds_by_trajectory. After that I check whether the smallest seed’s base2 looks more or less like fair coin flips, I check for its full glide, and if it’s long, I pick it.

have cited E Roosendaal pg years ago in this blog + more than once, its some great empirical work dating back many years, and he is also cited by world expert Lagarias, but basically brute force without much further analysis. there is a new distributed attack that has maybe found new records past those in the list, have to blog on it sometime, ran across it awhile back. it is not hard to find longer glides/ trajectories with greedy algorithms with far more bits although apparently almost nobody else has published on this except me. those listed trajectories do seem to be notable because of the very high “compression ratio” ie length of glide vs initial bits. have studied this compression ratio at some length, have to analyze all this further to see if they are anomalous wrt compression ratios achievable with the library greedy code. even if they are they dont nec invalidate other (current) hypotheses about glide limits etc… thx for the descr of your code, may play with it at some pt. it seems the algorithm is indeed a kind of optimization but maybe not very iterative. yes, building precisely falsifiable statements is the name of the game (science™), but the blog is chockful for years of falsifiable “fragments”. the jigsaw puzzle work continues…

😎 lol “high school math”… or not? nice effort/ start so far KT! that comment + maybe some of your possible others on blog + your sparkly/ shining talent now remind me of remarkable individual met yrs ago on stackexchange chat… so you have latex + python skills! just occurred to me: suggest asking on mathoverflow whether this property is known to anyone. think its easily a mathoverflow level question but if thats too scary math stackexchange could work too. if you dont ask maybe will ask myself. if not, youve found a new substantial/ remarkable/ publishable result & encourage you to submit it to a journal or at least arxiv. have to think about it more to understand how to prove it myself. am going to update code right now re latest blog addition/ entry (was just in middle of keystrokes away from linking it last nite but then had to run to wild cafe tacuba concert). it would be easier for me if you were willing to work in ruby, its not a lot different than python, but “to each his own”…. (edit) here it is axplusb.rbhttps://gist.github.com/vznvzn/4a8b7fc1b2ed323abd9462358ba90576

I meant to ask about it and a few other related questions on mathoverflow after I finish the note and possibly add a more speculative part to it :-). But it may take a week, or two, or more. It’s not a high priority for me. So, if you are anxious, go ahead and ask anything you want.
The result itself is indeed too obvious and doesn’t require any college-level math. No linear or abstract algebra, no calculus, no number theory, just the parity that anyone can understand. And can be rewritten without scary terms like “univariate linear forms”. I assume it was just too obvious to write it down for math people who previously stumbled upon it. Also, it brings no surprises: no one would believe “proofs” based on impossibility of certain forms of glides anyway.

as stated am not extremely familiar with collatz literature but havent seen this remarkable property described anywhere in basic refs. researchers tend to document even (supposedly) basic properties somewhere (maybe decades ago, somewhat buried). collatz is longknown to be a “semi-” random walk but (apparently) youve shown/ proven in a key sense its “basically” an unrestricted one wrt “initial” part (aka “prefix”). it is defn college level math because it deals with congruence relations and the proof (still waiting on that lol) will require induction, + number theory is not taught at high school level almost at all (and theres a good reason for that).

as for “unbelievable proofs based on impossibility of certain forms of glides”… your construction doesnt show what will happen after the initial glide (or “initial walk”) one constructs, and think elements of my work relate/ even focus on that (“uncovered”) region. your proof apparently shows the “uncovered” region can be long after the glide ends. all those words in your claim require technical definitions, but yes, you have basically (nearly) proven a variant of a theorem with those claims.

its whats known generally as a “no-go” thm and rules out a large class of possible approaches. have mentioned similar no-go formulations/ constructions but none quite so general. btw ofc dont forget to incl it in your paper, it will be utterly incomplete without it. sometimes very significant findings can be expressed in short proofs, youve apparently uncovered case of that. did wonder about similar ideas in the past but didnt pursue them directly as you have.

it is somewhat “unproblematic” for much of my work/ techniques to evade the (significant!) restrictions imposed by your thm, but yes, its a very key result to guide future aims/ directions/ angles, and yes quite a bit of the hypotheses/ proof ideas in this blog are directly refuted by the amazingly general/ far-reaching theorem. yes, in retrospect with your revelation/ “paradigm shift™” there has been some “chasing of phantoms” in some directions in my research… aka false leads/ bright shiny objs lying by side of road™… have long remarked on/ experienced its hall of mirrors™ disorienting surprises…!

now needs a major rethink… will write more on that… this is actually good news, not bad news, and all scientific (“mini”)breakthrus require reappraisal/ reevaluation of prior results… that is the nature of scientific progress aka hypothesize, test, refute/ discard, (re)generalize, repeat. “lather, rinse, repeat.” two steps fwd one step back… like (collatz) random walk? was musing on some ways to directly utilize this construction in further research… this also shows the potentially dynamic/ powerful strength of collaborative approaches to the research long sought here… so as you can see many mixed emotions on all this! 😳 😥 ❓ 💡 ❗ ⭐ 😎 😀

😐 ❗ 💡 ⭐ ok fyi KT had a feeling + (re)looked around closer at Lagarias great intro/ overview and yes it looks like the thm you found was proved/ published by Terras in 1976 also mentioned on Roosendaals site. suggest you look at it closely & see what you think before asking on mathoverflow. https://vzn1.wordpress.com/2018/10/03/collatz-fusion/#1012