1/24/2012

One of the stranger things that's happened in the past 20 years or so is that the religion of
free-marketing has become widely accepted dogma. I say "the religion" because these people believe
in it beyond rationality. For examply the widely known ways in which the mathematical abstraction
of ideal free markets does not apply to the real world (such as rational actors, perfect information,
shared goods, etc.) are waved off as insignificant.

Part of this weird mass delusion is the belief that "protectionism failed" or "the welfare state failed" ;
you will frequently see these assertions in supposedly factual news articles. Papers like the NYT which
will parrot ridiculous government press releases without question will also say things "free markets have
clearly proven themselves" or "nobody wants a return to a welfare state, that model has clearly failed".

Huh, really? I find that interpretation highly dubious; certainly not something that can be dropped as an
aside as if it's incontrovertible. During the golden age of America (roughly 1950-70)
we were highly protectionistic and we had our most generous welfare state. So, where was the failure?
I'm sure the popular memory is somewhat fixated on the bad times of the 70's-80's that were blamed on
corrupt unions and inefficient protected businesses and supposedly saved by Reagan/Thatcher free marketism.
But there are numerous complicating factors that make that interpretation questionable at best.

The only indisputable success of the free market era was the dot com boom, but that would probably have
happened anyway in a more controlled market system (and we may have avoided some major crashes due to
excessive speculation and capital liquidity, such as the southeast asian crash and the south american crash).

In the mean time, highly protectionist / socialist states have been doing very well (eg. most of the scandanavian
and german speaking states). I don't mean to imply that they are a good model for us to copy, but if you're
just looking at world evidence for certain economic/political schemes, it seems there's more pro-socialist
examples than there are good "free market" examples.

I believe that part of the problem is the mathematical appeal of free markets in theory. You can set up these
toy problems with idealized conditions, and talk about what creates the "global optimum" for all.

There's nothing wrong with playing with toy models, except that people then think that it applies to the real
world. A serious economist would be able to list the various ways that the idealized models don't match
reality, but then you start drawing yield curves and run simulations and cook up formulas and it all looks
so nice that by the time you get to the end you forget that it doesn't actually connect to reality.

1/18/2012

I'm impressed with the breadth of support for this cause, and it is a good one. But at the same time I
find it a bit disturbing that people can get mobilized for this, but can't get mobilized for things that
are actually much worse, like NDAA, the Patriot Act, campaign finance reform, lobbyist reform, the
crushing of consumer protection or glass-steagall or media ownership rules, etc. etc.

In the end, the reason that SOPA exists is because corporate interest groups are literally writing our
laws. If you stop SOPA it's like cutting off the top of a dandelion - as long as the roots are in
the ground it will just come back.

Lobbyists and persistent and clever, they will slip the laws they want in as a rider on a budget bill,
or they'll just keep trying until the opposition movement peters out.

We have to attack the cause.

A very similarly evil act, the
Research Works Act (RWA) is moving forward.
In brief, the RWA makes it illegal for the government to require that researchers that receive government funding
make their results available for free (eg. on the internet). The RWA was written by the AAP (American Association of
Publishers), which is made up of the IEEE and ACM and all the usual suspects, who want to have exclusive rights to
research paper copyrights.

It should be noted that even before the RWA has passed, we currently have pathetically weak open access to research requirements.
Only research that receives direct grant funding from something like the NIH is required to supply open access.
But almost all research is actually government funded, because the researchers are paid by colleges, and the colleges get
government funding (and of course all research is based on past research, which is a public good, etc).
But this type of funding does not require the documents to be open access. So, first of all :

Colleges : require all of your professors' publications to be open, NOW!

(colleges also have no right to own patents or create public/private partnerships for business development of
professors' work. That research is owned by all of us. But that's another topic...)

The IEEE and ACM's standard publication terms gives them exclusive ownership of the copyright, which really
they have no business in getting, since their contribution to the work is near zero (eg. peer review is done
by unpaid volunteers, they provide almost zero paid editing, etc.; they aren't like a book publisher that
actually does something).

Authors : stop publishing your papers in works that take your copyright! Make all your works "open access"
only!

Some of the blog posts I've read about the RWA have been
very naive in
my opinion ; they seem to think that the scholarly organizations like the ACM or IEEE are better than the RWA
and aren't specifically the people behind it. I think it should be absolutely clear by now that the
RWA is exactly in line with the sleazy character of these usurious organizations.

Scholars : quit the ACM and IEEE right now.

It's very simple, these publishers are bad for science, and they serve no purpose at all in a world where
the internet is the most important form of publication.

But more generally it's just another case of corporations writing the laws and pushing an agenda that's bad
for America. We may stop SOPA and RWA, but it will just happen over and over again until we fix the problem.
Stop corporate personhood. Make lobbying illegal. Make it illegal for corporations to write laws.

Chain Parse : first do a full parse of the file using some other parser; this provides with a baseline
cost to end from each point. Now do a forward parse. At each position, consider all match and literal
options. For each option, step ahead by that option and consider all the options at the next position.
Add up the cost of each coding step. After N steps (for chain N) add on the cost to end from the first
baseline parse. Go back to the original position and finalize the choice with the lowest cost. Basically
it's a full graph walk for N steps, then use an estimate of the cost to the end from the final nodes of
that sub-graph.

To make Chain parsing viable you have to reduce the number of match options to a maximum of 8 or so.
Still Chain N has a complexity of 8^N , so it becomes slow very quickly as N grows.

Chain forward parse is significantly better than LZSS style backwards optimal parse for these LZ coders that
have important adaptive state.
The baseline parse I use for Chain actually is a backwards LZSS optimal parse, so you can see how it does by
looking at the "Chain 0" results.

First overall results. Chain 6 is the most amount of steps I can run in reasonable time, and AStar 2048
means the quantum length for dividing up the file for AStar was 2048.

raw

Fast

Normal

Chain 6

AStar 2048

lzt00

16914

5179

5016

4923

4920

lzt01

200000

198313

198321

198312

198312

lzt02

755121

181109

177792

173220

173315

lzt03

3471552

1746443

1713023

1698949

1690655

lzt04

48649

13088

12412

10407

10249

lzt05

927796

368346

367598

355804

354230

lzt06

563160

352827

351051

344721

343173

lzt07

500000

226533

215996

209133

208566

lzt08

355400

250503

249987

230541

230220

lzt09

786488

302927

287479

268544

265525

lzt10

154624

11508

10958

10307

10291

lzt11

58524

20553

19628

19139

19087

lzt12

164423

29001

26488

23966

23622

lzt13

1041576

935484

931415

924510

922745

lzt14

102400

47690

47298

46417

46350

lzt15

34664

10832

10688

10269

10260

lzt16

21504

10110

10055

9952

9927

lzt17

53161

19526

18514

17971

17970

lzt18

102400

64280

63251

59772

59635

lzt19

768771

322951

288872

269132

269162

lzt20

1179702

888881

872315

856369

855588

lzt21

679936

91677

88011

83529

83184

lzt22

400000

287715

284378

279674

279459

lzt23

1048576

807253

804048

798369

798334

lzt24

3471552

1418076

1411387

1399197

1388105

lzt25

1029744

113085

107882

97320

100175

lzt26

262144

212445

210836

207701

207552

lzt27

857241

237253

235137

222023

220837

lzt28

1591760

332660

308940

260547

252808

lzt29

3953035

1193914

1180823

1147160

1135603

lzt30

100000

100001

100001

100001

100001

10800163

10609600

10337879

10289860

Now number of Chain steps for the chain parser : (that's O0 - O6)

U

N

O0

O1

O2

O3

O4

O5

O6

lzt00

16914

5016

5024

4922

4922

4922

4922

4923

4923

lzt01

200000

198321

198321

198312

198312

198312

198312

198312

198312

lzt02

755121

177792

177877

175905

174835

174073

173759

173509

173220

lzt03

3471552

1713023

1712337

1704417

1703873

1702651

1701635

1700282

1698949

lzt04

48649

12412

11315

10516

10481

10457

10427

10416

10407

lzt05

927796

367598

368729

365743

364332

360630

356403

355968

355804

lzt06

563160

351051

350995

346856

345500

344778

344739

344702

344721

lzt07

500000

215996

215644

211336

209481

209259

209244

209138

209133

lzt08

355400

249987

249372

239375

237320

231554

231435

233324

230541

lzt09

786488

287479

284875

280683

275679

270721

269754

269107

268544

lzt10

154624

10958

10792

10367

10335

10330

10311

10301

10307

lzt11

58524

19628

19604

19247

19175

19225

19162

19159

19139

lzt12

164423

26488

25644

24217

24177

24094

24108

24011

23966

lzt13

1041576

931415

931415

929713

927841

926162

924515

924513

924510

lzt14

102400

47298

47300

46518

46483

46461

46437

46429

46417

lzt15

34664

10688

10656

10317

10301

10275

10278

10267

10269

lzt16

21504

10055

10053

9960

9966

9959

9952

9948

9952

lzt17

53161

18514

18549

17971

17970

17974

17971

17973

17971

lzt18

102400

63251

63248

59863

59850

59799

59790

59764

59772

lzt19

768771

288872

281959

277661

273316

269157

269141

269133

269132

lzt20

1179702

872315

872022

868088

865376

863236

859727

856408

856369

lzt21

679936

88011

88068

84848

83851

83733

83674

83599

83529

lzt22

400000

284378

284297

281902

279711

279685

279689

279696

279674

lzt23

1048576

804048

804064

802742

801324

799891

798367

798368

798369

lzt24

3471552

1411387

1410226

1404736

1403314

1402345

1401064

1400193

1399197

lzt25

1029744

107882

107414

99839

100154

99710

98552

98132

97320

lzt26

262144

210836

210855

207775

207763

207738

207725

207706

207701

lzt27

857241

235137

236568

233524

228073

223123

222884

222540

222023

lzt28

1591760

308940

295072

286018

276905

273520

269611

264726

260547

lzt29

3953035

1180823

1183407

1180733

1177854

1170944

1162310

1152482

1147160

lzt30

100000

100001

100001

100001

100001

100001

100001

100001

100001

10609600

10585703

10494105

10448475

10404719

10375899

10355030

10337879

Some notes : up to 6 (the most I can run) more chain steps is better - for the sum, but not for all files.
In some cases, more steps is worse, which should never really happen, but it's
an issue of approximate optimal parsers I'll discuss later. (*)

On most files, going past 4 chain steps helps very little, but on some files it seems to monotonically
keep improving. For example lzt29 stands out. Those files are ones that get helped the most by AStar.

Now the effect on quantum size on AStar. In all cases I only output codes from the first 3/4 of each quantum.

raw

256

512

1024

2048

4096

8192

16384

lzt00

16914

4923

4923

4920

4920

4920

4921

4921

lzt01

200000

198312

198312

198312

198312

198312

198314

198314

lzt02

755121

175242

173355

173368

173315

173331

173454

173479

lzt03

3471552

1699795

1691530

1690878

1690655

1690594

1690603

1690617

lzt04

48649

10243

10245

10234

10249

10248

10241

10241

lzt05

927796

357166

354629

354235

354230

354233

354242

354257

lzt06

563160

346663

343202

343139

343173

343194

343263

343238

lzt07

500000

209934

208669

208584

208566

208556

208553

208562

lzt08

355400

228389

229447

229975

230220

230300

230374

230408

lzt09

786488

266571

265564

265487

265525

265559

265542

265527

lzt10

154624

10701

10468

10330

10291

10273

10273

10272

lzt11

58524

19139

19123

19096

19087

19085

19084

19084

lzt12

164423

23712

23654

23616

23622

23628

23630

23627

lzt13

1041576

923258

922853

922747

922745

922753

922751

922753

lzt14

102400

46397

46364

46351

46350

46350

46348

46350

lzt15

34664

10376

10272

10260

10260

10251

10258

10254

lzt16

21504

9944

9931

9926

9927

9927

9927

9927

lzt17

53161

17937

17970

17968

17970

17969

17969

17969

lzt18

102400

59703

59613

59632

59635

59637

59640

59640

lzt19

768771

269213

269151

269128

269162

269193

269218

269229

lzt20

1179702

855992

855580

855478

855588

855671

855685

855707

lzt21

679936

83882

83291

83215

83184

83172

83171

83169

lzt22

400000

279803

279368

279414

279459

279605

279630

279647

lzt23

1048576

798325

798319

798321

798334

798354

798357

798358

lzt24

3471552

1393742

1388636

1388031

1388105

1388317

1388628

1388671

lzt25

1029744

97910

101246

101302

100175

100484

100272

100149

lzt26

262144

207779

207563

207541

207552

207559

207577

207576

lzt27

857241

222229

220832

220770

220837

220773

220756

220757

lzt28

1591760

256404

253257

252933

252808

252737

252735

252699

lzt29

3953035

1136193

1135442

1135543

1135603

1135710

1135689

1135713

lzt30

100000

100001

100001

100001

100001

100001

100001

100001

10319878

10292810

10290735

10289860

10290696

10291106

10291116

The best sum is at 2048, but 1024 is a lot faster and almost the same.

Again, as the previous note at (*), we should really see just improvement with larger quantum sizes, but
past 2048 we start seeing it go backwards in some cases.

Lastly a look at where the AStar parse is spending its time. This is for a 1024 quantum.

The x axis here is the log2 of the number of nodes visited to parse a quantum. So, log2=20 means a million nodes were needed to parse that
quantum. So for speed purposes a cell one to the right is twice as bad. The values in the cells are the percentage of quanta in the file that needed that number of nodes.

(note : log2=20 means one million nodes were visited to output 768 bytes worth of codes, so it's quite a lot)

log2

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

lzt00

0

0

0

18.18

59.09

18.18

4.55

lzt01

3.75

0.75

41.2

34.08

13.86

5.62

0.75

lzt02

1.81

1.36

25.37

34.09

13.59

13.02

8.15

1.93

0.23

0.23

lzt03

1.46

1.18

17.51

18.46

14.16

13.17

6.95

4.81

3.66

4.81

9.54

2.79

0.96

0.11

0.03

lzt04

1.67

0

0

1.67

0

21.67

5

18.33

3.33

10

16.67

16.67

5

lzt05

0.59

0.25

4.41

10.77

9.92

18.32

13.23

10.09

9.67

6.02

12.47

3.22

0.51

0.08

0.08

lzt06

0.8

0.93

6.81

23.77

14.69

16.96

21.09

11.48

2.67

0.8

lzt07

0.46

0.46

8.66

7.88

6.8

15.3

17

14.53

5.56

9.58

8.19

4.79

0.31

0.31

lzt08

0

0

0

0

0

0

1.68

1.68

1.47

27.67

53.88

11.95

1.68

lzt09

0.29

0.48

0.76

0.86

0.95

3.9

28.07

47.76

16.18

0.38

lzt10

0

0.56

10.17

12.99

9.04

9.04

10.17

41.24

4.52

0.56

1.13

lzt11

0

0

7.89

10.53

14.47

17.11

6.58

9.21

21.05

10.53

2.63

lzt12

0

0

0

0

0

4.27

28.91

59.24

7.58

lzt13

0

0

0.07

0.14

0.57

1.72

3.36

5.72

39.24

42.03

7.08

0.07

lzt14

0

0.83

0

2.5

8.33

34.17

42.5

5

2.5

1.67

0.83

0

0.83

lzt15

0

2.27

4.55

15.91

13.64

15.91

13.64

6.82

11.36

11.36

4.55

lzt16

0

0

3.57

0

14.29

42.86

32.14

3.57

lzt17

1.39

1.39

2.78

1.39

4.17

75

13.89

lzt18

0

0

0

0

0

0.72

0

2.17

2.9

11.59

56.52

23.19

2.9

lzt19

0

0

1.26

2.81

0.39

7.56

87.11

0.87

lzt20

0

0.13

2.08

2.02

4.29

67.07

24.29

0.06

lzt21

0.2

0.78

6.07

6.07

5.28

19.77

35.62

22.9

1.96

0.2

0.2

lzt22

0

0.56

2.98

5.59

26.82

62.94

1.12

lzt23

0

0

0

0

0

0.07

1.35

2.63

0.92

70.88

23.15

0.14

0.36

0.5

lzt24

0.44

0.61

4.14

37.41

7.62

12.68

12.72

8.52

6.11

5.19

3.11

0.94

0.31

0.04

lzt25

0.22

0.43

1.52

1.74

2.68

6.44

15.69

27.19

30.22

13.09

0.72

lzt26

0

0

0

1.15

3.15

2.58

77.65

14.61

0.57

lzt27

0.61

0.1

7.55

6.53

1.22

4.39

5

4.08

7.76

44.8

16.43

1.43

lzt28

0.25

0.1

3.71

0.94

0.74

6.77

15.56

10.08

10.97

14.82

18.68

11.41

4.05

1.24

0.1

lzt29

0.3

0.73

1.61

22.37

5.28

6.16

26.34

2.97

0.48

0.85

19.63

12.47

0.73

lzt30

3.7

0.74

47.41

34.07

12.59

0.74

Well there's no easy answer, the character of the files are all very different.

In many cases the A Star parse is reasonably fast (comparable to Chain 3 or something). But in some cases it's
quite slow, eg. lzt04, lzt08, lzt28.

Okay, I think that's all the data. We have one point to discuss :

(*) = in all these type of endeavors, we see these anomolies where as we give the optimizer more space to
make decisions, it gets better for a while, then starts getting worse. I saw the same thing, but more
extreme, with video coding.

Basically what causes this is that you aren't optimizing for your real final goal. If you were optimizing
for the total output size, then giving it more freedom should never hurt. But you aren't. With Chain N
or with A Star in both cases you are optimizing just some local portion, and it turns out that if you let
it make really aggressive decisions trying to optimize the local bit, that can hurt overall.

A similar issue happens with an Huffman optimal parse, becuase you are using the huffman code lengths
from the previous parse to do the current parse. That's fine as long as your parse is reasonably similar,
but if you let the optimal parser really go nuts, it can start to get pretty far off those statistics,
which makes it wrong, so that more optimizing actually gives worse results.

With video coding the main issue I had was that the optimization was generally local (eg. just on one
macro block at a time or some such), but it of course affects the future as a source for motion compensation
(and in other ways), and it turns out if you do really aggressive optimization on the local decisions,
that can wind up hurting overall.

A similar thing can happen in image and video coding if you let optimization proceed very aggressively,
because you have to use some simple analytic criterion (such as RMSE - though even if you use a fancier
metric the same problems arise). The issue is that the coder can wind up finding strange states that
are a good trade-off for RMSE, but wind up looking just horrible visually.

Obviously the correct solution is to optimize with the true final goal in mind. But that's not always
possible, either computationally, or because the final goal is subjective.

Generally the solution is to moderate the optimization in some way. You have some heuristic idea of
what kind of solutions will provide good globally optimal solutions. (for example, in image/video coding,
you might require that the bit rate allocation not create too big of a difference between adjacent blocks).
So you sort of want to guide your optimization to start around where you suspect the answer to be, and then
you tune it so that you don't allow it to be too aggressive in making whatever decision it thinks is
locally optimal.

Nobody likes the idea of protectionism, because it conjures idea of jingoism, as well as corrupt
inefficient business locking in their market through political deals.
But in fact basically every country in the world still engages in heavy protectionism, in the form
of subsidizing local business in one way or another.

Subsidies can take many forms. Of course there are direct subsidies (Airbus, Boeing, etc.) and these
are generally disliked. There are tax break incentives, which almost every big business in America gets.
Obviously there are tons of small business incentives and employment incentives and so on.

But there are also more subtle and indirect form of subsidies.

China (and other asian countries) are big fans of the "business development zone" ; these are areas they
construct for certain industries, provide tax breaks, and build up infrastructure for power, transport, etc.

Government paid health care is perhaps the biggest subsidy any country offers. It's a general subsidy for
employment (notably, not for business). In the American model, employers pay for health care - only for
employed people. With government paid health care, the employers are still paying for health care (through
taxes) - but they are paying whether they employ the people or not.

To make that more clear - imagine a system where everyone was paid $100k by the government, and then
corporations had to pay taxes to cover that - whether you were an employee or not. Then there's no such thing
as "saving money by laying people off" ; the cost per employee is the same whether you hire them or not,
so you may as well hire them. Obviously it's too much of a market distortion for the government to just
pay all of everyone's expenses whether they are employed or not, but paying some amount regardless of employment
amounts to a subsidy for employment. The more basic social welfare of a person is paid by the government from the
general tax fund, the more incentive there is to hire people.

So social welfare (health care being the biggest one) is actually a subsidy for local employment.
There are other things that reduce the cost of an employee, such as government child care, good public transit
(allows you to pay less because employees don't need cars),

Another big one is education. But of course subsidized education is a form of
development of a local business resource (people). Some countries have well developed industry-eduction partnerships
to provide students with the skills needed.

etc.

I believe that one of the problems in America is that we are in fact engaging in heavy protectionism to this
day, but we are not doing it in a very smart way. Defense contractors probably get the biggest subsidy
(actually I take that back, finance gets the biggest subsidy by a huge huge margin; finance has been getting
around a trillion dollars in direct subsidy while defense only gets a few hundred billion), but
so does aviation, mineral/mining, lumber, agriculture. They are generally in the form of direct subsidies
(eg. have some free money), which is a very bad way to do it. Direct subsidies tend to go straight into
rich people's pockets, they don't promote employment. Subsidies can be crafted in more clever ways, and the
best way is not to favor one particular industry over another, but rather to favor employment over outsourcing
and let the market decide what type of employment is best.

Of course the US government does heavily subsidize certain behaviors, so the idea of subsidy to affect the market
is not at all exotic. The subsidies for real estate investment are massive; even ignoring TARP and the FM's and
such, the mortgage interest deduction and capital gains exclusion are huge behavior modifiers for very questionable
benefit. The lower capital gains tax (vs. income tax) and dividend tax rate are huge subsidies for investors.
And of course the tax code in general is a huge subsidy for corporations (vs. individuals) and particularly for
multi-national corporations. Why are we massively subsidizing all those things, and not employment and local
business?

1/07/2012

There are some basic economics that I just don't understand, and a lot of the times the
accepted "right answer" conflicts with common sense.

One example is it seems to me that buying something locally made is better for the local area
than buying something made far away. (for "locality" you may substitute "state" or "nation" or whatever
region you want to divide things by).

For me this conjures bad memories of the anti-"Jap" "USA USA" crowd of the '80's that had "buy American"
bumper stickers and such ; something in my moral
fibers says you should buy the best quality cheapest product. But I don't think that's true.

Consider for the moment the case that there are two products, identical in price and quality. One is locally
made, one is foreign made. I contend it is better for the local area (and usually better for you personally)
to buy the locally made product. When you do that, the money goes to someone who lives nearby, who spends that
money again, and that person spends it, again, etc. This makes the local area prosperous.

(in a purely selfish sense, whether or not making the local area prosperous is good for you or not depends on
the details of your situation; if you are a merchant or an altruist, it is good for you; but if your business
is international and you would prefer local property values to be low, it might be bad for you; we will
assume for the moment that you want the local area to benefit).

So there is some value to buying local and keeping money and industry circulating locally.

So, even if the local product is somewhat more expensive, it still might be better overall if
you bought that instead of the foreign product. You have to weigh the benefit of both; the region gains some
utility from access to cheaper foreign products, but that is traded off against not circulating that money
around the local economy. eg. there's some break even point (in terms of overall utility) ; maybe if the local
product costs 20% more, that's the actual break even point.

Of course consumers should not have to make that decision themselves, they should just be able to buy the cheapest product.
The correct way to fix that is with government - one of the valuable things that government can do is to make apparent
price equal to actual price (eg. to make price proportional to utility, or to move long term costs forward, etc.),
or to use laws to bias pricing so that logical purchasing decisions lead to the greatest overall utility (eg. putting
penalties on products that help you but hurt others).

The obvious way to make the prices match utility here is either with tarrifs on imports, or subsidies for local production. This is called "protectionism" to
attack it, but it seems to me it's just a way of getting the benefit of circulating those dollars locally.

I'm a little disturbed by my conclusion because it's awfully close to the anti-globalization crackpots who claim that modern government
financial policy benefits "wall street not main street" (and other slogans).

Granted, in reality, it's too late to go back to pre-1990's protectionism. The cat is out of the bag. And
of course in reality protectionism degrades into political gifts for corrupt corporations. But we can ignore
those issues for the theoretical discussion.

Also, if you are an extremely altruistic chap you might question the whole goal of maximizing the benefit to
your locality (nation/state/fiefdom/whatever). You might say the goal of policies should be to maximize the
good for the world. But for the moment let's ignore that and assume that the government of a nation should
act to maximize benefit for that nation.