The Apple of Edenhttp://appleofeden.de-doc.com
Make retro-gaming great again.Sun, 05 May 2019 12:15:50 +0000en-UShourly1https://wordpress.org/?v=5.0.4http://appleofeden.de-doc.com/wp-content/uploads/2015/12/cropped-Apple_of_Eden_2_render-32x32.pngThe Apple of Edenhttp://appleofeden.de-doc.com
3232What’s coming next for Classic REbirthhttp://appleofeden.de-doc.com/index.php/2019/03/31/whats-coming-next-for-classic-rebirth/
http://appleofeden.de-doc.com/index.php/2019/03/31/whats-coming-next-for-classic-rebirth/#commentsSun, 31 Mar 2019 14:49:10 +0000http://appleofeden.de-doc.com/?p=633Read more What’s coming next for Classic REbirth]]>As most of you following on Twitter already know, I’m updating the RE1 version of Classic REbirth with all those features that were supposed to kick in. Hit on that jump button to read the whole thing!

Like the news intro sort of implies, Hardware Mode is finally functional and Software Mode is nothing more than a relic of the past (it doesn’t even work properly anymore, it’s just hidden under the carpet). If you’re not familiar with Hardware Mode, this is what it does: game is rendered at a higher resolution (640×480), supports all fancy hardware acceleration features like texture filtering, and fixes texture warping artifacts. I took it from there and improved it further by adding the same kind of treatment that Resident Evil 2 SourceNext got: even higher resolutions, stability fixes, modern Windows support, and optional improvements for major issues.

In order to work smoothly on this new revision of the DLL I had to make one very important choice: what version to support. This choice fell on the Japanese MediaKite print, which happens to be the latest release of the game, but it’s also completely in Japanese. How do we fix this situation for those who can’t read Japanese and want to fully experience the game? Well, the Classic REbirth DLL does some magic behind the curtain and translates the whole game for you, just like the RE2 version does. So, in other words, the European / USA version with the “NEWEUR” fix becomes useless and isn’t supported anymore; remember that before you try the new patch and be sure to already have around a copy of the MediaKite print. I’ll probably rework the project page and drop useless stuff to make it more straightforward.

You can always find a list of upcoming features in either version of Classic REbirth in their respective change log pages (quick links: CR1 and CR2) and if you are a backer on Patreon you can get pre-release or work in progress builds.

At this point Classic REbirth has become quite a huge project of mine and it takes large amounts of time to develop and maintain. With my recent switch to a new job my chances of working on this, or even branching into other classic games (i.e. Resident Evil 3, Dino Crisis 1&2), become harder to fit in the schedule. Basically this project needs your help to keep up at a good pace (i.e. I spend less time on my job to pay bills, food, and whatnot), otherwise it’s likely going to slow down like a slug at some point due to no more spare time / little energies to work on it.

ARE YOU STILL GOING TO MAKE PUBLIC RELEASES?

Absolutely, don’t even think about me locking this behind a paycheck. The project’s free, it was from day one and always will be – you will never have to pay a dime to enjoy classic games on modern Windows. Patrons do get a few benefits if they pledge (like early access to test builds, and a nice archive with previous versions), but everyone else gets a slice of the polished builds, as usual.

IS THIS EVEN LEGAL?

Technically it is as legal as any improvement project out there, like the RE4HD texture packs (which is also backed via Patreon). All Classic REbirth patches don’t mess with DRM nor they include any piece of the original game. All they do is provide a quick and painless solution to make the games run with a simple drag & drop of one file that does all the hard work for the users.

IF I WERE TO BACK YOU UP, WOULD MY PRIVACY BE PRESERVED?

The short answer is: yes. The longer answer is: consider Patreon as a proxy server that shields you from spreading any personal information. You can make any of your pledges completely private, all external people would see is just a number, not even the backers nicknames.

SO WHERE CAN I BACK YOU UP?

This is where my Patreon page is for Classic REbirth: https://www.patreon.com/classicrebirth
Don’t feel obligated to pledge a penny if you don’t want to, but if you can help the project, it will be immensely appreciated.

And here we go with an xmas release for Resident Evil 2. I was meant to write an entry to Classic REbirth 2 for the longest time, but I guess most people found their way to the download in the menu bar.

Not really much to say about this release, the change log should make it clear how far along this got, but something that gotta be mentioned for this release is the censorship global disable switch inside the configuration. I was asked to implement this quite a few times, as the game already has an uncensored mode in the form of Arrange Mode. Still, this setting will make happy those who find it stupid that Capcom censored Original Mode death cutscenes.

Without further ado, click on the big blue download icon to visit the download page for Classic REbirth 2.

]]>http://appleofeden.de-doc.com/index.php/2018/12/24/xmas-release-for-re2-classic-rebirth/feed/1Resident Evil 1 PC, a fix to the classic versionhttp://appleofeden.de-doc.com/index.php/2018/07/04/resident-evil-1-pc-a-fix-to-the-classic-version/
http://appleofeden.de-doc.com/index.php/2018/07/04/resident-evil-1-pc-a-fix-to-the-classic-version/#commentsWed, 04 Jul 2018 14:49:06 +0000http://appleofeden.de-doc.com/?p=373Read more Resident Evil 1 PC, a fix to the classic version]]>If you’re looking for a fix to this ancient port, click on the jump button and get ready for fun times!

So, where to start? There’s a lot to say about this version of Resident Evil.

Released around 1997, this was yet another version of the game that received many prints: Japan, USA, Asia (an almost carbon copy of USA), and various European countries (UK, Germany, and probably France too). Some of these received reprints, update patches, or even separate releases altogether like the legendary PowerVR edition. With each version you get different quirks and hardware supported. For example, the original USA version only explicitly supports accelerated cards (despite having two hidden software “debug” rendering engines), PowerVR would only work with a card of the same name, and Japan received the most updated patch (1.01, Europe being the second one with 1.00c) that consisted in… well, no idea to be honest what it does, but speedrunners love that one.

Now, this port of the game is ancient, like Win95 stuff, which means there will be little chance for it to run on modern Windows, unless whoever ported this programmed a miracle into it… aaaand unfortunately they didn’t. In most cases you can try and boot them game with Win95/98 compatibility mode, but that raises many issues and still leaves a ton more to be solved. For example, the game runs at insane speeds (despite having 100% CPU0 spikes), sometimes even causing cutscenes bugs leading to the game soft-hanging (i.e. it still works but characters are stuck doing nothing). Other issues are incompatibilities with hardware rendering support, tho in some cases you can fix that with a few DLLs (dgVoodoo seems to be the best at this, from my personal tests). And of course, no support for modern input devices, despite XInput controllers being partially compatible (no shoulder triggers or D-Pad support for XBox 360/One controllers tho).

Given all these issues, I decided to come up with my personal solution to all of this, by trying to program the One DLL to Rule them All. It is a replacement for DDRAW.DLL, which is Microsoft’s driver for DirectDraw, now a mere compatibility workaround that most of the times just doesn’t cut it. So, what does my DLL do exactly?

It lets you run the game without resorting to Win95/98 compatibility mode (i.e. it runs fine on Win7 and Win10 — still hasn’t been tested on Win8, but it should work there as well). This accidentally fixes the CPU0 100% usage spikes, too.

It fixes the framerate for good, even when cutscene audio is being reproduced. The game is intended to run at either 30 or 60 frames per second, depending on which section of the game is currently active (main game runs at 30, menus and door animations at 60), and this patch finally delivers that.

If the application forced the whole Desktop to go into 16 bit mode (i.e. disable Aero on Win7), it doesn’t anymore due to the integration of DXGL as a DirectDraw wrapper. This makes the game effectively run with OpenGL 2.0 used as a wrapper for all DirectDraw calls. Beware tho, if you’re using OBS to record footage of this you might have trouble. It also seems to have issues when you move the application window to a secondary monitor, effectively freezing the video in place.

XInput controllers become fully compatible. You can use the D-Pad or Left Stick to move around, and even shoulder triggers work as intended.

No more abuse of the Windows Register to store configuration data. It has all been moved to a nice INI file that can be manually edited, effectively making the game portable without having to install it.

Video files actually load from your installation drive, rather than from CD. I’m not sure if this was intended to be some sort of anti-copy measure so that you would have the game disk inserted at all times, but it doesn’t really matter since it’s fixed.

Now, with all these features you might be wondering what’s left to do. Well, there are a few things that I need to implement in order for this to be more or less an actual definitive fix to the game. Here’s what I’d like to have in future releases:

An entire replacement of the software renderer, which is currently the only one to work out of the box. I’m currently looking into locating an rewriting the necessary code, so this is probably just a matter of time.

Support for higher resolutions and a nice configuration dialog that opens up when you first boot the game with the DLL.

A real plug & play solution, since you still need to edit CONF.INI that comes with the DLL to reflect the current path where the game is being executed from.

Support for more if not most versions of the game. Currently it only support 1.00c European, also known as NEWEUR.EXE (shouldn’t be too hard to find, it’s usually labeled as an “x64 fix”).

Release the full source, tho this isn’t an actual feature and only programmers will be interested.

I guess that’s all for this update. You can find the fix download at the following page.

]]>http://appleofeden.de-doc.com/index.php/2018/07/04/resident-evil-1-pc-a-fix-to-the-classic-version/feed/9Intro coding and CD-rom usagehttp://appleofeden.de-doc.com/index.php/2016/06/21/intro-coding-and-cd-rom-usage/
http://appleofeden.de-doc.com/index.php/2016/06/21/intro-coding-and-cd-rom-usage/#commentsTue, 21 Jun 2016 16:22:59 +0000http://appleofeden.de-doc.com/?p=330Read more Intro coding and CD-rom usage]]>Let’s go back to the old style and use this article’s topic to wrap some useful code. Start developing after a leap of faith.

This article is going to take the approach of 90s famous code samples, generally known as cracktros (crack+intro/s). These introductions are nothing more than a boot loader showing a few fancy effects on screen, credits, and usually an annoying tune to haunt your brain cells. We will keep those fancy effects and drop the rest, especially the freaking music, but we’ll also add some way to load data from disk rather than embed everything into the exe like the in previous demos.

Since we already discussed graphics more or less in detail in the previous articles, we should focus more or how the CD-rom unit works on PlayStation. Technical data on hand, the CD-rom works with two speed modes, which roughly translates to 150KB/s for single speed and 300KB/s for double speed. Most games rely entirely on double speed for loading data and streamed playback, but you might also want to use single speed in some cases. We will skip single speed for now as it’s not required for this very demo.

Getting to code, PSY-Q comes with two libraries for handling CD reads, one with the obvious name LibCD while the extended lib is called LibDS (no idea why DS, but whatevs). The main difference between these two libraries is that LibCD provides most basic/low-end access to the hardware. On the other hand, LibDS is an abstraction layer built on top of LibCD, which provides a number of handy functions for parallel request processing. This demo will explain how to operate LibCD, just to keep it simple enough; we don’t need really that much complexity for a simple cracktro, right?

Let’s jump into some actual code with a procedure to initialize the system with LibCD taken into account:

There is a precise place for LibCD initializers as they tend to reset the SPU volume environment, which is why I left in there some SPU code even if the demo isn’t going to use it at all.

Before we move to some more code about loading data from disk, there is a little something to discuss first about LibCD and its limitations: beware if you are thinking of using the CD table of contents to locate all your files. This is because LibCD has a bug where a directory can’t be fully cached and that causes the function relying on name lookup (i.e. CdSearchFile()) to miserably fail on entries that are too far away to reach. To explain this technically, the libs only cache one sector of directory data, then they stop and ignore whatever is stored into following sectors. On top of this, the functions that look for names on disk can be a bit slow, especially if you use complex paths and try jumping around the disk a lot to grab some files (storing file positions in advance may work, but it’s still quite slow).

How do we solve limitations and performance issues at the root? We create a Virtual File System! Most of the time a VFS consists in a big file with all your data stored inside it, possibly with a header large enough to store all information on how to find your data on disk. There are other cases where the VFS is nothing more than LBA+size values stored somewhere in a hidden location of the disk, which the game knows exactly how to access and caches at boot to have instant access at any time. In either case, performance is preserved and you can address as many files as you need, even pull a few tricks if you can store separate tables to be loaded on demand to save some memory.

We will code the approach with a big file with allocation tables at the top and all user data following. We can try the faster approach, where we read files by calling them as enumerated entries, or with an alternative but slightly slower approach by names using hash tables. For the sake of keeping it simple, let’s see how the former method works. This would be a structure useful for the task:

typedef struct tagPackHeader
{
u32 count, // how many files we have in the package
sectors; // number of sectors used by the header
} PACK_HEADER;
typedef struct tagPackFile
{
u32 lba, // relative position on disk
size; // size of this entry
} PACK_FILE;

Now to decide on how addressing and reserved space should work: by implementing these structures, a package could store a number of files, even if typically the best solution it to store it all into a sector (i.e. 2048 bytes) which gives us exactly 255 files for a package. Alternatively we could actually use the sector member of PACK_HEADER to read more sectors from the header and quickly bump up that max count to whatever we want. As for the actual handling of this data, what we want to do for startes is to search the file on disk with CdSearchFile() and keep it’s logical position. From there we cache one sector of the header and check how many other sectors are left, then we repeat the operation until all sectors are cached. With this out of the way, comes another decision: we can leave the allocation data as is or transform lba addresses into absolute values; in the latter case we simply add the package lba to all the entries in the header, otherwise we handle local lba with a reference value from the package. I would suggest the processing approach, since it avoids further remangling of data at run-time and tends to be less bloated.

As you can see, Cd_read() works in blocking mode, which means the console can’t execute any operation in background while it’s loading from disk. Of course it is possible to use non-blocking code to achieve parallelism, but I’ll keep that for a more complex article; all this sample needs is just some basic load code with byte-precise reads (instead padded to sectors).

I think this is all you need to start using the CD-rom in a basic manner. Just in case, I added to the source code a packer that works in dual mode: one to create BIGFILE.PAK with the structure above, the other to merge multiple files into one entry of BIGFILE.PAK. The merger mode is useful if you need to load several small files in a row without unnecessarily killing the CD with a million seek to position requests. As for creating the ISO with all your files, you can use Pixel’s CD-tool or PSx CD-Gen.

Download source

]]>http://appleofeden.de-doc.com/index.php/2016/06/21/intro-coding-and-cd-rom-usage/feed/13D graphics and controller handlinghttp://appleofeden.de-doc.com/index.php/2016/06/14/3d-graphics-and-controller-handling/
http://appleofeden.de-doc.com/index.php/2016/06/14/3d-graphics-and-controller-handling/#respondTue, 14 Jun 2016 20:52:53 +0000http://appleofeden.de-doc.com/?p=304Read more 3D graphics and controller handling]]>The title of this should be extremely self explanatory, but still more info after the jump.

This time I’m taking a slightly different approach than in the other articles as this one is written around the code, rather than just wrapping together some functions from the news. What we will be discussing is a mixed tackle on 3D graphics and controller input, where input influences a 3D mdel.

First things first, I took some time to model a 3D controller of the XBox One with separate joints in order to make some of its parts move freely from its rigid root node. This is because the demo allows you to enter an input test mode where you can see on screen what’s happening more or less in your hands. Bear in mind I’m no modeler so the final result is more like a practical demonstration rather than a modeling tutorial (i.e. expect glaring errors on my geometry).

Controller model with wireframe visible

Let’s start by analyzing some 3D concepts. The source code introduces a custom format I use on Squeeze Bomb, which is based on CAPCOM’s TM2, a simplified TMD revision without all the bloat and natively handled with inline GTE code. In order to export the model I used a COLLADA parser that translates the mesh into TM2 data; when you deal with 3D graphics it’s important that you can count on a reliable format to store all your information, especially one that keeps hierarchy and quadrilaterals (quads are EXTREMELY important for performance and memory usage). Something like Waveform OBJ does keep a model intact but only stores geometry and material data at best, so better avoid similar limitations.

So how does TM2 work exactly? These are a few structures used to handle header and subheaders:

It’s very similar to what TMD has to offer, but tries to keep the code as simple as possible and ties functionality to your programming choices, rather than fetching a million variations of the same primitive type with slight variations (i.e. it’s faster and smaller).

Let’s associate that with fast rendering tricks. The rule of thumb is to preallocate as many primitives as possible for the task. My TM2 handler does that and uses an increasing pointer to pack all the necessary primitives for double buffering, it also copies into the primitives some data from the model itself, like uv coordinates, clut, and tpage. The reason for this it to provide minimal code when the console is transforming 3D data, so that the only thing that needs to be written to primitives are coordinates and RGB values for lighting. You can see how that works in detail inside gte\render.c.

One explanation about the rendering code itself, since the rest comes with comment, is about the chrome effect applied to plastic sections of the controller: it works by using normal vectors to obtain distorted UV coordinates. What I do from there is drop a few bits in the transformed result and apply them to the UV map with a 64 ± 64 range. The result is a texture that stretches through the whole model depending on the way it’s facing the camera. You can experiment a bit to try and use different textures and ranges for various chrome effects.

The demo also includes some controller code for handling input (also comes with a couple handy headers from Sony). What you need to know about controllers, other than the code to make them recognized by the console, is that some peripherals tend to miss reads when you test a controller too frequently. What you want to do to avoid that is to place a call to a pad reader at the top of your logic loop so that it gathers input once every frame. Something like this will do:

What the demo does with controllers is quite basic: it manages input in the main interface (an XBox One dashboard clone, yay) and it’s also used to simulate a fancy pad test, where the 3D controller responds to your input and moves or highlights part of the controller itself. The Read_pad() function also contains some useful code on how you can handle repeated pressure of the same button in a row: pad_raw is a straight read from controller buffers, while pad_raw_t removes the bloat until a button is released and pressed again. There should be a few more functions about vibration, but let’s keep that for some other day.

In the source you can also find an updated font handler and the dashboard icon displayer, both containing some useful trick with alpha blending and sprites. Basically what it does is: draw two sprites in a row, one in subtractive mode and the other in additive mode. This will cause the first sprite black out the background, while the second one will brighten it up. The effect is a nice antliasing simulation that can be applied for many purposes on interfaces.

Download source

]]>http://appleofeden.de-doc.com/index.php/2016/06/14/3d-graphics-and-controller-handling/feed/0Working around 2D primitiveshttp://appleofeden.de-doc.com/index.php/2016/06/12/working-around-2d-primitives/
http://appleofeden.de-doc.com/index.php/2016/06/12/working-around-2d-primitives/#respondSun, 12 Jun 2016 15:30:48 +0000http://appleofeden.de-doc.com/?p=265Read more Working around 2D primitives]]>This article is about 2D primitives and provides some code to interact with then. More after the jump.

We’re back to more tricks on how you avoid LibGS like the plague. Let’s start with some background information on sprites.

The main reason why most homebrews are made with LibGS is because it provides a few handy functions to make the PlayStation draw stuff on screen with little to no effort. So what’s the hurdle with direct low level access? There may be quite a few reason for this, the main one being sprites not storing VRAM page indices into the usual structure, while a second guess would be scaling and rotation effects provided by some of the calls. Let’s go in order to fill a few gaps.

DR_TPAGE primitives

typedef struct tagDrTpage
{
u32 tag;
u32 code;
} DR_TPAGE;

These primitives function as texture switches in a linked list. In other words, they set up the GPU to move around VRAM pages and can also assign a few effects like blending, dithering, etc. Here’s a list of what the bits inside code do, including some undocumented features like horizontal/vertical flip.

The first 5 bits contain the number of VRAM page to use. All the bits above and VRAM index can be ORed together to form any desired effect, then simply register the primitive in a linked list and you’re good to go. For example:

This code will set any following sprite primitive to use VRAM page 5 in 16 color mode with a forced dither effect applied.
If you need several changes of VRAM page in a row, there is a trick you can pull that chains a DR_TPAGE primitive and sprite together, which also saves you one useless word in the process. We’ll see later how that works in detail.

Sprites in all sizes

The PlayStation can draw sprites of varying size with a texture cap of 256×256 pixels (i.e. you can still wrap a texture and repeat it on the same sprite). There are three types of sprite primitives: SPRT, SPRT8, and SPRT16. They all act the same, except SPRT8/16 have no width/height attribute (i.e. they are a word smaller), so they will always draw as 8×8 or 16×16 images. This is what a sprite structure looks like:

As you can see, sprites have no TPAGE attribute anywhere to be found which can discourage a few to use the structure for drawing anything simple (a few official games actually do this, whoops). So how do we overcome this limitation with a smart solution? We use chained primitives.

Merged sprites

The PlayStation GPU is quite versatile when it comes to primitive handling, as you can chain them together up to 255 words. Let’s see how that works in terms of data representation with two examples of sprites embedding a DR_TPAGE primitive:

Both these structures represent variable and fixed sprites with an extra word called mode, which stores our DR_TPAGE code from the structure above. We can populate them exactly the same, giving sprites the ability to store VRAM attributes per primitive. This trick also saves us one word which would be otherwise wasted on an extra tag from DT_TPAGE (i.e. when you chain primitives only one tag attribute is required, since it only stores an address to the next primitive in list for the lowest 24 bits, while the upper 8 bits are the length of a primitive in words). Let’s see how we can populate a variable merged sprite:

This code populates the primitive with just word writes, providing fast RAM access as well. One important note about sprites: they have a few limitations, specifically the U coordinate in UV maps. When you operate them in 4 bit mode, the U value mustn’t be an odd number or it will produce distortion on the console. Just make sure to pad all your textures to even values and you will be good.

In conclusion, what you see happening with these special sprites can be applied to many other primitives as well. For example, you can use a TILE primitive with DR_MODE attributes (similar to DR_TPAGE, just without the VRAM index) or a POLY_G4 with blending effects applied to transparencies. You can even chain more primitives of the same kind to simulate strips, just as long as you keep the total length below 255 words.

Here is some sample code summing up what we got so far. It’s a Visual Studio 2010 project (works fine in 2013/2015) configured as Makefile, so you’re going to need some manual work to make it compile without VS via batches or command line.

Download source

]]>http://appleofeden.de-doc.com/index.php/2016/06/12/working-around-2d-primitives/feed/0Light sources: how they work and what you can do with themhttp://appleofeden.de-doc.com/index.php/2016/06/11/light-sources-how-they-work-and-what-you-can-do-with-them/
http://appleofeden.de-doc.com/index.php/2016/06/11/light-sources-how-they-work-and-what-you-can-do-with-them/#respondSat, 11 Jun 2016 21:00:45 +0000http://appleofeden.de-doc.com/?p=237Read more Light sources: how they work and what you can do with them]]>Part 3 of the articles about PlayStation development. Get ready for the next battle after the jump!

In short, these are 3×4 matrices, where m represents rotation and t is a translation vector. They apply mostly the same for all transformations, but lights are a separate case where only m is really used to represent operating data. This means we use matrices for light sources as well, two to be exact, one for light and another for color, which can hold a maximum of 3 sources.

This is how a local light matrix is represented:

X/Y/Z 0/1/2 store normalized coordinates of each light source. That means that if you have light sources placed anywhere on a field, they will have to be processed in order to apply correctly to a mesh. We will see later how this works.

Now, this is a local color matrix:

If you look carefully, a local light matrix stores data in rows, while here it’s columns. These RGB values work as 12 bit values, which means ONE (=4096) makes a light fully lit for an RGB channel and you can go beyond to make the effect more prominent. In most cases you can simply take 0-255 RGB values from a structure of your choice and multiply them by 16 to obtain a correct scale to 0-ONE values.

Now, let’s see how light/color matrices work on LibGS with some custom code that clones its internals:

In this function M_ll is the local light matrix, while M_lc is the local color matrix. You can see a Mag variable there also being taken into calculation, which simply represents the intensity of a light.
So how do we you apply those values to a mesh? There are a number of strategies you can apply, depending on what your game is going to support. Typically you need a helper function that calculates the light position for each mesh, so you can’t simply apply a global set of matrices and get away with that; you effectively need to recalculate the local light matrix for every new mesh that passes through a rendering procedure, while local color can stay the same across multiple meshes. Let’s see that in detail with some sample code from my engine:

// set local color matrix for an entity
Set_light((VECTOR*)&em->Pos_x,&G.pRoom->pLight[G.Cut_no]);
gte_SetColorMatrix(&M_lc);
// --------------------
// set local light matrix for a mesh
gte_MulMatrix0(&M_ll,&p->Workm,&m_light);
gte_SetLightMatrix(&m_light);

The first slice of code calls Set_light, which is a handler that performs calculations to generate full light and color matrices, while the second one multiplies the results of the helper to generate a new proper light matrix which is calculate by multiplying a work matrix by the current local light. Let’s give a look at how light vectors are calculated inside Set_light():

LM_FALL2 and LM_OMNI2 generate different effects, depending on the complexity you need. LM_FALL2 simulates a local light with fall off, while LM_OMNI2 is an infinite light, providing the same amount of light everywhere.
As for the actual rendering, you need to take into account two more aspects of the GTE: ambient light and color diffuse. Ambient light is the neutral color of a mesh when there’s no light applied to it, while diffuse is the color of the mesh itself. For example, you could have a very dark room with almost no light, which requires a low ambient RGB (say 0x303030). As for color diffuse, it applies the same rules as RGB values on primitives, so it influences hue in the same exact way (0x808080 = neutral color, 0x000000 = totally black, 0xF0F0F0 = overly bright). Ambient can be assigned via GTE instructions gte_SetBackColor as 0-255 ranges, while color diffuse needs a call to gte_ldrgb and a CVECTOR as its input (again 0-255 color values).
Finally, let’s talk about the actual rendering code. Now that you have your light/color matrices, ambient, and diffuse set up what you need is the correct GTE commands to apply the effect on primitives. Typically you want to apply lights on entities (say animated player and enemies) with a smooth effect, while the environment can get flat lights which are not as intensive to calculate. In either case, you need to load into the GTE one or a set of normal vectors, call gte_ncXY() commands, then retrieve the results and apply them to rgb structures of a primitive of your choice. As for gte_ncXY, X can be nothing, ‘d’ (fog) or ‘c’ (no fog), while Y is ‘s’ (single) or ‘t’ (triplet). ‘t’ and ‘s’ versions can be used together or separately, depending on your primitive and lighting effect. A sample of quad rendering with smooth lighting applied:

As you can see, I’m using both ncct and nccs together to fill all four rgb channels. Triangles would be similar, but you only need to call ncct. You working with POLY_FT3/4 rendering? Then you only need nccs to apply one flat light to all three/four points of the primitive.

]]>http://appleofeden.de-doc.com/index.php/2016/06/11/light-sources-how-they-work-and-what-you-can-do-with-them/feed/0Writing a good replacement for LibGShttp://appleofeden.de-doc.com/index.php/2016/06/11/writing-a-good-replacement-for-libgs/
http://appleofeden.de-doc.com/index.php/2016/06/11/writing-a-good-replacement-for-libgs/#commentsSat, 11 Jun 2016 11:42:22 +0000http://appleofeden.de-doc.com/?p=209Read more Writing a good replacement for LibGS]]>Second part of the article on how to write fast and reliable code for the PlayStation. More after the jump.

Let’s start by setting the general environment to initialize the console; a structure will be useful for the scope:

This function does basically what GS functions do to initialize the frame buffer, with 2-3 calls merged into just one. Also notice how I’m not using a million parameters to set up frame buffer mode. That is because we don’t wanna use more than 4 parameters most of the time; remember older versions of the compiler tend to push anything past parameter 4 into the stack, which we don’t want since it kills performance and produces messy binaries. Limit yourself as much as possible when you create a function prototype or it’s going to look ugly and perform worse.

Let’s move to packet allocators, which correspond to Gfx_alloc in the big structure above. You can fill them depending on your need of primitives, but always remember to make them as big as possible for the task (example: sprites for menu interfaces). Go for malloc3 or even a global variable in your program, it doesn’t matter in the end as you probably won’t ever need to resize them at any point of the program’s life. Some example code of how you would populate the rest of the structure:

If you’re asking why I have two allocators instead of just one, the reason is pretty simple: double buffering. The PlayStation expects you to provide two memory locations to store packet data because the GPU takes a while to send them all on screen. It’s not an operation that takes place immediately, so you need a back buffer to store new primitives while the old ones are getting through the DMA.
So, the first set of functions is what makes packet allocators work and provides an environment for frame buffer swaps, while the second slice it how you would retrieve pointers in order to actually draw and seek forward. Most of those static inline functions aren’t actual calls but code that gets copied as-is into the caller, providing no overheat from real calls while keeping your code slim.

For the code above being used in a real dev case, let’s see how that gets pieced together with another sample:

That’s literally all the code you need to replace all LibGS calls that usually take care of setting the environment.

]]>http://appleofeden.de-doc.com/index.php/2016/06/11/writing-a-good-replacement-for-libgs/feed/1The do’s and don’ts of PlayStation programminghttp://appleofeden.de-doc.com/index.php/2016/06/10/the-dos-and-donts-of-playstation-programming/
http://appleofeden.de-doc.com/index.php/2016/06/10/the-dos-and-donts-of-playstation-programming/#respondFri, 10 Jun 2016 23:42:02 +0000http://appleofeden.de-doc.com/?p=196Read more The do’s and don’ts of PlayStation programming]]>This is a little article I decided to write while reading some good and bad code from various homebrew PlayStation communities. Full news after the jump.

Do’s

The less you write/read, the faster it works

On the PlayStation reading and writing from/to RAM is an expensive operation, which is terribly noticeable on critical code, especially loops dedicated to population and draw of primitives. When you read try and cache data as necessary: for example, set a local variable into your code for recurring data, like a structure pointer; this will avoid your code to become a mess of reloads of the same value over and over. As for writes, there are at least a couple important cases to take into consideration:

Repopulation: if you have code that populates a primitive entirely, unless it’s for small data (say 4 words) avoid rewriting it fully. Instead preallocate the necessary primitives, fill data that is usually static (like UV coordinates), and update only changing bits (say XY coordinates).

Bulk writes: Avoid populating data by structure boundaries, instead go for writes that cover multiple values. This is important because building a couple of variables from registers takes way less time than a number of bytes, shorts, etc. For example, if you need to populate rgb values in a primitive use a cast that initializes all three values in one write (i.e. *(u32*)&p->rgb0 = 0xFF00FF); this can be used in combination with repopulation for maximum optimization.

In all cases, beware of memory alignment. If you read or write from a non-multiple of the operation size (i.e. read 2 bytes -> must be performed on an address multiple of 2, read 4 bytes -> memory address multiple of 4), the console will fail the operation and throw an exception, which translates to a hard crash if you don’t have an exception handler to catch the case.

The scratchpad is your friend, ABUSE IT!

The scratchpad is a little buffer (1KB) of fast memory that can be operated by the programmer for quite a few optimizations. You can use it for decompression code as a fast-access buffer, or similarly for a sort algorithm. Typically the best usage is with 3D operations, where you need to write a lot of variables and structures while performing operations in loops. Say, if you have a MATRIX it will be allocated onto the stack, which is part of RAM. Ok, we know RAM is slow and bad, right? So instead change the stack to point to the scratchpad, execute your slow code, and expect it to have quite a few speed ups. Of course, remember to restore the stack when you’re done.

Organize your data in packages

If your game or program needs to load textures or other small data from disk, don’t fall into the trap of loading a million files just because they are tiny. Always remember the PlayStation CD unit isn’t exactly the fastest, even if it can read 300KB/s in double speed mode. CD seek to position is a slow operation that breaks the flow, which means you better pack your data and possibly read it all in one pass to take advantage of those 300 KBs, otherwise the laser will jump back and forth like a spicy mole on a synthesizer. Once all the data is in memory you can process it as necessary and discard whatever isn’t needed anymore.

Running out of space? Overlays to the rescue

If you are not familiar with the term overlay, it’s more or less like a DLL, just without the dynamic part. In other words, overlays are a part of your exe that live as a separate binary files in a specific region of memory (can be dynamically or statically set, you decide). The advantage of using overlays comes when you’re running low on resources and need a stable region of RAM to hold some extra code to be cached on demand. For example, you can code as overlays a main menu or a configuration screen, which can both share the same address in memory.

Don’ts

LibGS is EVIL and you can do better

This is the most common error I’ve found while looking at homebrew code. People tend to overdo LibGS usage because it’s usually simple to operate and provides a nice abstraction layer. What these people ignore is how utterly slow and big this library really is.

Let’s talk about TMD as a case to examine: the whole thing is programmed so that you have instant access to 3D functionality with very little effort. What samples don’t tell you is that TMD is an extremely limited format which provides almost no flexibility at all, not to mention the code to display a model is freaking huge and usually comes with a million cases you don’t really need. Sure, you could fall back to HMD, a more advanced container with extended functionality, but that is extremely slow and usually put together from poorly optimized code.

Another flaw with this library dwells with sprites. Don’t bother to use its internal sprite handler; again, it’s slow as hell and provides not that much of a real treat in the end. You wanna scale and rotate? Write optimized code for that and use POLY_FT4 directly, it’s not hard to call sin functions to produce rotations and even rotation matrices do that internally, so that the user can take advantage of them to produce rotated vectors to use for the effect. Don’t want rotations? Use straight sprite primitives then, SPRT/SPRT8/SPRT16 already do everything and are extremely fast even to repopulate (3-4 writes at worst). Don’t know how to address a VRAM page? Check how DR_TPAGE primitives work, you can even merge them with any sprite to draw sequencially from different VRAM pages! Plus you can sort all types of primitives, not just sprites or whatever else LibGS offers (which is extremely limiting and dull to be honest).

In other words, just write your code around LibGPU. That library provides all low level access you will ever need and you can come with faster replacement than what LibGS has to offer.

Allocation is important, but don’t overdo it

Another weird habit I’ve seen is the abuse of InitHeap3/malloc3/free3, which apparently stems from some of my code on Opera of the Red Moon (thanks to the guy who recycled it in the first place by stripping all my comments, now the code looks like random gibberish).

So why would you actually use dynamic allocation? Because you need to keep all that data automatically managed without the need to dig some room manually. Ok, scrap that thought because malloc comes with a price, which is a best fit algorithm. Why avoid it? Because you can use a temp pool to store your data sequentially (i.e. kinda like a stack, but in reverse) and fit new data there, rather than polluting RAM with search requests. The good thing is you can even discard some volatile data (textures and sound) without any reallocation required. Just drop volatiles from the last allocated pointer and you’re good to push more data with no problem. You can also use this to cooperate around overlays so that you don’t accidentally allocate where code may be stored at some point.

On a side note, you can use stack allocation to also allocate primitives, as an efficient counterpart to LibGS’ packet allocator. Both would work more or less the same really, but with your own code at least you know what the heck is going on.

Float is not what you wanna use

Float variables aren’t any good for the R3000, because they are internally handled as “emulated” data to build actual floats (i.e. there is no FPU to operate them natively). This is going to kill your performance, A LOT. Wanna use something that keeps the CPU fresh and running? Try fixed point math; that’s what the console uses for vectors, degrees, and matrices anyways.