jpauclairhttp://jpauclair.net
Ninjaneering!Tue, 17 Feb 2015 11:33:03 +0000enhourly1http://wordpress.com/http://s2.wp.com/i/buttonw-com.pngjpauclairhttp://jpauclair.net
“Adobe Gaming Summit” Post-mortemhttp://jpauclair.net/2012/08/07/adobe-gaming-summit-post-mortem/
http://jpauclair.net/2012/08/07/adobe-gaming-summit-post-mortem/#commentsTue, 07 Aug 2012 13:00:21 +0000http://jpauclair.net/?p=867]]>I was a speaker at this summit both as the owner of TheMiner, and a R&D Architect at Frima studio, and I wanted to talk to you a bit about what was the event, and what (as a developer) I learned there.

But first, you should read the blog post by Thibault to get details about what Adobe learned from the summit (Read This)

This summit was a really good mix between
Speaker: Giving their short/mid/long term feature requests, giving bug report and commenting on better pipe-line solutions
Adobe: Reveling and explaining lot’s of “What’s next” in the form of a mini “Adobe Max Sneak Peaks”

So let’s just follow Thibault post structure:
And I know it’s going to be so disappointing for all you guys…. but here it is!

1. The workflow

What if you could gain time… lot’s of it..

What if you could integrate in better ways to your “full pipe-line”…

What if you artist could stop complaining? (lol)

2. Performance

What if your compile time would reduce a lot…

I’m sure you know how many blogs talk about little as3 optimization to make code run faster.

What if they would become… all… obsolete…

What if the only performance boost you will get could be from better algorithm?

What if you could completely trust who’s making the update?

3. Perception

What if you could really think “Flash is far from dead”

What if would you’d expect to present to 3 Adobe senior engineer, and you would get 100 engineers in an auditorium, and a whole lot more online in an internal event streaming to all the Adobe employees that can make what’s coming next.

What if my own skepticism was yelling: “Fuck, they are really serious about this…”

4. Project “Monocle” (As the owner of TheMiner)

What if the tools they are making could compete to what’s best of other technology?

What if you could “trust” the data that is being outputted?

What if the next version of the VM would be built with a lot of telemetry from the scratch?

5. Post-Mortem

I guess the most important thing here is commitment.

I worked on flash project and R&D for a very long time now. And all the sudden, they just want to raise that bar to a level you would not expect.

This is a very exiting time for me as, suddenly, I feel like I’m being a key role in what “has been” flash gaming in the past, and certainly in what’s “will be” flash in the future.

I won’t talk about other technology here, but I am now 100% confident that this blog is not soon to be dead.

And if clients come-up and ask “what technology should we use”, I’ll be more than ever happy to answer.

5. Frima Icefield 3D – World Editor Trailer

This is a video made from a couple thing we showed there. (it’s available in 720p)
*please, do not blink your eyes, and crank-up your speaker!*

Thanks for reading, and I hope this all make you really happy.

I just want to give a special thanks to Thibault and the other Adobe folks who made this all possible.

Most of the time, Flash performance issues come from bad memory management.
Object instantiation take a lot of time, and since FlashPlayer work with a Garbage Collector, having tons of objects can make it go nuts.
Good practice is to remove all instantiation from loops, keep a minimum of object in memory and use serialization and bytearrays to keep some data.

This profiler show you what the VM is reporting as object being allocated and removed (garbage collected).
Here is what the profiler look like:

1.Search filter

Since there is a lot of different object type being managed by the VM,
just listing them would make it hard to find specific data.
This input box let you enter any text to filter the visible object using their QName

When the filter search is ON (there is text inside the box), the box appear green,
and only class name with the text in it appear in the list.

2.Class QName

The QName is the ClassName with its full ClassPath. the format is ClassPath::ClassName3.Avg new / frame

When profiling memory, high number of instantiation might be harder to understand.
Having an average instantiation per frame let the developer understand the exact amount of allocation in a single frame

4.Sample snapshot

The miner cannot guess all the post-analysis you want to do.
But it does offer you a easy way to save data in a grid format that can be pasted directly in excel
If you need to save the current state, or do graphics using the data in the profiler,
you can use the snapshot button to copy all class to the clipboard with all their numbers.

5.Clear all samples

Reset the current instance count and cumulative instance count for all class.

Number of Class instantiated in one second The default refresh rate can be change in the configuration tab

8.Deleted per second(default refresh-rate)

Number of Class being collected by the GC in one second The default refresh rate can be change in the configuration tab

9.Sort by Current Instance Count

Sort by Current Instance Count

10.Current Instance Count

The current instance count is calculated by removing Collected Sample to New Sample.
This number is significant only when the memory profiler continuous profiling is turned-on in the configuration tab
because all sample report must be processed in order to have a good count.

Released: TheMiner 1.4.00

I’ll skip the usual “we fixed bugs” part, and go straight to what’s new and interesting!

Here is a quick overview of the biggest additions:

UserEvents

A UserEvent is a event created by the developer in the application that will communicate with TheMiner to show up in a specific tab of TheMiner (UserEvent tab).
The event can be pushed to TheMiner in two ways:

Better Filtered content

You can now toggle a Permanent Search filters option in the configuration tab

By doing so, any text you enter in a given session inside one of the search box (MemoryProfiler, PerformanceProfiler, LoaderProfiler, UserEvents) will remain active after you close the application, and reloaded when you start a new session.

This way, when you do iterative work on optimizing your application, you don’t have to lo0se time on remembering, and re-entering the same info over and over again.

The Permanent Search Filters are also used when using Raw Data Dumps, this way it’s a lot easier to track down precise information in those huuuge list of info.

Average per frame

Did you ever try to know precisely how much time you function are running?

The Performance profiler now offer a way to make a ratio of time taken by a function over a period, which give you a time that can be calculated in microseconds!

What do we have here….

Each frame, around 23 microseconds is spent in the b2Mat22/Solve function calls

and 0.1 Millisecond in b2Mat22/Set

Do you have a Performance budget? This is the tool you need!

Last but not least: The Resources Grabber

Do you link directly to JPG and PNGs?

Do you load SWF at runtime? encrypted?

The resource grabber feature (in the Loader profiler) can expose, and redirect you directly to file being downloaded , and SWF being loaded by the VM.

If it’s a simple media, clicking in a icon will pop-up a window and show you the file in your browser.

If it’s a SWF, it will ask you to save the SWF bytes in a file somewhere.

The file is the Unpacked, and Unencrypted version of any SWF.

This means you can pass it directly to a decompiler to check the content.

TheMiner

For those who already use TheMiner, you know that these are only little addition compare to what’s already in it.

I suggest you give it all a try to know if you like it! (you will)

]]>http://jpauclair.net/2012/07/02/a-few-big-additions-1-4-00/feed/0jpauclairTheMinerUserEvents_updatableFlashConsole_TheMinerPerformance_avg_per_frameswf_saverTheMinerfastSort, faster is better!http://jpauclair.net/2012/03/12/fastsort-faster-is-better/
http://jpauclair.net/2012/03/12/fastsort-faster-is-better/#commentsTue, 13 Mar 2012 02:00:49 +0000http://jpauclair.net/?p=825]]>Today, Jackson Dunstanposted about how to use a profiler to get better performance in flash.
For this post he decided to show how to use TheMiner… awesome!

At first I was impressed by the result of Skyboy. Then I realized two things.
First, The flash native sort REALY don’t like Number.infinity, negative infinity.
So when sorting and vector. with these values inside, it’s getting a LOT slower.
Where a standard vector could take 100ms to sort, one with infinity values in it can take up to 2000ms!!

The other thing is fastSort is using void (*) pointer everywhere.
So if we go back a few day to my previous (epic) article , we know that this is a really bad thing when casting from void to Number.
It’s allocating a LOT of memory. (5Mo/Sort on 50K elements)

So I decided to add a few hundreds lines to this class with already a lot of it (fastSort)
I added a specific sorting method for int, uint and Number to manage only typed values.

Finaly, just before giving you the code, I want to invite you on a new little forum that focus on Performances, optimization, debugging and multiple other flash hardcore subjects.
Many of the most hardcore dev and blogger I know are already in or are going to join soon, so please be part of this and enjoy posting refreshing and brain teasing content!
The Hardcore flash forum

]]>http://jpauclair.net/2012/03/12/fastsort-faster-is-better/feed/4jpauclairEpic Flash memory leak track downhttp://jpauclair.net/2012/02/25/epic-memory-track-down/
http://jpauclair.net/2012/02/25/epic-memory-track-down/#commentsSun, 26 Feb 2012 03:31:15 +0000http://jpauclair.net/?p=807]]>Have you ever had a memory instantiation problem that was impossible to track down? Here is a post that might help you with that kind of thing!

Beware! This post is very long… but VERY instructive! If you want to learn some internal mechanism of Flash, I strongly suggest you read it from top to bottom without skipping parts!:)

After finding what it was, we can conclude that this is not a real leak but a irritating behaviour of the Flash VM. Still, you should all be aware of it!

The Context

Yesterday I was helping Luca (creator of Nape physic engine, made withhaXe) to find what seemed to be a big memory leak. There was a couple framework used so it could be directly from Nape… or from Starling… or from debug tool running… or from a port error from haXe…

So I though it was going to be simple

I got TheMiner running and started profiling. It was pretty easy to find a LOT of memory allocation coming from Starling. The SWC version available is over 4 months old and back then there was a lot of useless instantiation coming from TouchProcessor.as
If you use Starling, I suggest you build from sources as these instantiation are gone now.
There are a couple left but no big deal.

We also saw that Nape was using a LOT of anonymous function call and some try catch. This create what’s called activation-object. And there was a lot of these. So by removing anon-calls and try catch we were able to remove a lot of instanciation.

And After removing the debugging tools, It was clear it was coming from haXe or Nape.

But even when recording all samples allocated by the VM I could not find the damn allocation. (we are talking of more than 1Mo per sec)

Then I wrote a Scala script to try to identify samples that could have been missed, but that didn’t help me much since that was only an override of constructor and constructorProp avm opcodes.

Simplifying the context

So I decided to look a bit further into the bytecode and found something very interesting.

After simplifying the problem multiple time, I got this very simple code to demonstrate the bug.
Because Nape has that very nice abstraction layer, it can support both standard DisplayObject as well as Starling 3D DisplayObject.

The leak happens when Nape try to set the DisplayObject properties like .x, .y, .rotation, etc.
Let’s create a fake DisplayObject class and name it DO

public class DO
{
public var x:Number;
public var y:Number;
public function DO() { }
}

And now, a simple loop that set .x and .y values on that DisplayObject, and on a UnTyped object representing the abstraction layer.

So basicly we have the same DO Object in memory, and we access it via a Typed or an UnTyped variable.
When accessing it with the Typed on, nothing happens, The memory is not impacted at all.
But when accessing through the Anonymous Object, BAM, 1 Mo/sec. wtf?

The ByteCode:

So let’s take a look at the bytecode generated for this loop in the update() function:

I don’t know if you see any difference between the Anonymous function and the strickly typed one, but for me they look pretty much the same! getlocal0, getProp, getlocal2, setProp.

Again, when we comment out “this.mDO.x = number1;“, there is a lot of memory allocation, and when we comment out the other one “this.mAnonDO.y = number1;” there is zero allocation.

A strange behaviour

So if it’s not in the opcodes… it’s deeper! For those who have read the AVM2 Architecture documentation, you might remember that after the opcodes there are still a big phase to go through: The JIT, including Intermediate representation, and Machine Code Assembly.

Intermediate representation (MIR)

As demonstrated in the graph, the intermediate representation is part of the JIT.
And it looks pretty much like this:
When the abc is ready to be processed, the JIT compiles it in the MIR, and then the result is process for your specific machine (Win32, MAX, Linux, etc.) in a Machine Dependant Code.

It’s the same setproperty opcode, but with a LOT of things underneath…

The difference is really that one variable is strictly typed, and the other not.

We can see it on the Stack:

stack: DO?@86 Number@87

vs

stack: *@95 Number@96

No need to say that just by looking at the number of instructions, you KNOW it will be a lot slower.
But not only that.. you can see type validation, memory allocation ( @109 alloc 16 )
and even access to the VTable: @118 cmop Toplevel::toVTable (@116, @117)

So I though *bingo*, there is our allocation!

*sight*… It’s not over yet!

I knew that the memory growing was not a problem when using Int instead of Number.
So I tried to output the same MIR code for the Int test:

Conclusion

At runtime, when you use anonymous object, the JIT have no idea of what type it is, and when you need to set the property on one of these object, it need to validate the objects, and instantiate a new Atom to copy the values.
It’s not the case when you use strictly typed object because when it’s ready to set the value, it just does!

Solutions?

I didn’t find any great way to solve this problem except from using strictly typed object.

Going back to Nape, it means that it cannot use the abstraction layer like it was designed.
The only way is to expose a generic management for classic displayobject, and overridable functions to give the possibility to an external framework (like Starling) to do updates of properties in a strictly typed way.

]]>http://jpauclair.net/2012/02/25/epic-memory-track-down/feed/27jpauclairmemoryavm2ArchitectureCodeFlowMemoryDanceUpdated the optimized Base64 libraryhttp://jpauclair.net/2012/01/12/updated-the-optimized-bas64-library/
http://jpauclair.net/2012/01/12/updated-the-optimized-bas64-library/#commentsFri, 13 Jan 2012 02:35:06 +0000http://jpauclair.net/?p=766]]>More than two years ago, I made a blog post about how to optimized the existing Base64 libraries.

This library was highly linked and used in multiple project as it’s 100% free (MIT license)

A few days ago I decided to take another look at it just for fun. To see if I could get it to be a bit faster.

Here is some change I made to make it even faster:
On of the biggest change was to go from bytearray.writeInt() to direct byte access bytearray[]

]]>http://jpauclair.net/2012/01/12/updated-the-optimized-bas64-library/feed/11jpauclairBe an affiliate Miner!http://jpauclair.net/2012/01/12/be-an-affiliate-miner/
http://jpauclair.net/2012/01/12/be-an-affiliate-miner/#commentsFri, 13 Jan 2012 00:47:58 +0000http://jpauclair.net/?p=762]]>I just wanted to let you know that TheMiner remain 100% free for all non-commercial use, but if you like the product and you want to help promote TheMiner, you can become a sale affiliate very easily!
It’s simple as:
-Put TheMiner icon on your website.. or blog.. or forum.. car painting.. cat clothes..
-Earn money if people click and buy the PRO version of this awesome software.

]]>http://jpauclair.net/2011/12/15/got-bugs-we-got-the-bugbase/feed/0jpauclairMinerIconWebThe king is dead. Long live the king!http://jpauclair.net/2011/12/12/the-king-is-dead-long-live-the-king/
http://jpauclair.net/2011/12/12/the-king-is-dead-long-live-the-king/#commentsMon, 12 Dec 2011 23:42:17 +0000http://jpauclair.net/?p=698]]>The king is dead:

Yes.. today I have some bad new for you. One of the best flash performance analysis tool: FlashPreloadProfiler, is officialy dead. After a LOT of time developing this profiler for flash. I decided to kill the thing.

Long live the king!

YES! After this much effort, it would be too sad to drop everything. right?
Launching today, TheMiner is the new (FLash Profiling) solution that will take the place of FlashPreloadProfiler, but with a lot of optimization, innovation and new features!
TheMiner is dedicated to every single hard-working developer that are making the most out of what they have by analysing and improving flash application. Every hardcore developer should be proud to be a miner and has such, we decided to make it look like a hero worker.