Finally some 'real' progress:
- command/cpu requests take the earliest free VRAM slot
- CPU requests take priority over command requests
These are not very surprising results, I actually already hinted
at them in my previous mail. But the hard part was in actually
verifying this model against the measured data. And If you look
in detail then 'taking the earliest free slot' also doesn't work
100%. But see attachment.
Alex, maybe it's a bit late, but it would be nice if you could
still read this text so that we can discuss the results tomorrow
IRL.
Wouter
On Tue, Jan 29, 2013 at 10:47 PM, Wouter Vermaelen
<vermaelen.wouter@...> wrote:
> See attachment. I made a little progress on the cpu and
> command engine access slots.
>
> Wouter
>
>
> On Sun, Jan 27, 2013 at 10:38 AM, Wouter Vermaelen
> <vermaelen.wouter@...> wrote:
>> Thanks for the feedback Alex.
>>
>> I've made another improvement to the data files. Now all accesses
>> are annotated with their purpose. As before each access has 3
>> characters in front:
>> <read/write><burst><type>
>> where:
>> <read/write>: R->read W->write
>> <burst>: b-> burst access (takes 4 cycles) '.'-> no burst (takes 6 cycles)
>> (with a few exceptions)
>> <type>: this field has changed compared to earlier files
>> 'v' -> (still) indicates where the VDP 'VDS' pin was active,
>> we now know this means bitmap data fetch
>> 'r' -> refresh read
>> 's' -> sprite data fetch (i didn't distinguish the different sprite
>> reads further)
>> 'p' -> pre/post-amble reads (see slots.txt for details)
>> 'c' -> cpu read/writes
>> 'e' -> command engine read/writes
>> '.' -> still unknown
>>
>> I only annotated the bitmap screen mode files.
>>
>> All write accesses are accounted for (cpu write or command engine
>> write). But there all still a few read accesses that I can't explain. Most
>> of them are again reads from address 0x1ffff, though there still are a few
>> others as well (I checked the VCD files, and they don't appear to be
>> glitches). Maybe someone else can figure out their purpose? (Just grep
>> for 'R..').
>>
>> I think all the 'mechanical' work on the data files is done. So now the more
>> 'interesting' part of the analysis can start. Or in other words, now would
>> be a good time to step in if you want to help ;-)
>>
>> Wouter
>>
>>
>> On Sat, Jan 26, 2013 at 2:57 PM, Alex Wulms <alex.wulms@...> wrote:
>>> Hi,
>>>
>>> I have read the slots.txt file. Its a very good analysis. Well done.
>>>
>>> The findings are very interesting, especially the finding about the sprites
>>> and about the pre-amble and post-amble reads to address 0x1ffff.
>>>
>>> I had not expected that the sprite plane reads would happen interleaved with
>>> the bitmap reads. I always assumed that all sprite reads would happen during
>>> the horizontal retrace. Also interesting to see that the sprite reads are
>>> not optimized at all from clock cycle usage perspective
>>>
>>> Regarding the following point about the sprites: "- Note that the
>>> y-coordinate is fetched again, it was already fetched to
>>> figure out which sprites are visible."
>>>
>>> The most likely explanation is that the VDP contains an 8 byte memory buffer
>>> that can be used to remember *which* sprite numbers are visible (this buffer
>>> will be filled while reading the 32 y-coordinates) but the VDP does not have
>>> a memory buffer to remember the y-coordinates of those sprites. That could
>>> explain why the VDP has to read the Y-coordinate again while rendering the
>>> sprites. It implies that the designers of the VDP choose to spend extra
>>> clock cycles in order to save on number of transistors. Which is a design
>>> choice they apparently made in many places of the sprite engine. They
>>> clearly favored simplified hardware design over fewer clock cycles during
>>> rendering, despite the impact on the performance of the other parts of the
>>> VDP like the command engine. Its probably a heritage from the MSX1 VDP
>>> where those memory reads where 'free', because they were not interfering
>>> with anything else anyway.
>>>
>>> Cheers,
>>> Alex

See attachment. I made a little progress on the cpu and
command engine access slots.
Wouter
On Sun, Jan 27, 2013 at 10:38 AM, Wouter Vermaelen
<vermaelen.wouter@...> wrote:
> Thanks for the feedback Alex.
>
> I've made another improvement to the data files. Now all accesses
> are annotated with their purpose. As before each access has 3
> characters in front:
> <read/write><burst><type>
> where:
> <read/write>: R->read W->write
> <burst>: b-> burst access (takes 4 cycles) '.'-> no burst (takes 6 cycles)
> (with a few exceptions)
> <type>: this field has changed compared to earlier files
> 'v' -> (still) indicates where the VDP 'VDS' pin was active,
> we now know this means bitmap data fetch
> 'r' -> refresh read
> 's' -> sprite data fetch (i didn't distinguish the different sprite
> reads further)
> 'p' -> pre/post-amble reads (see slots.txt for details)
> 'c' -> cpu read/writes
> 'e' -> command engine read/writes
> '.' -> still unknown
>
> I only annotated the bitmap screen mode files.
>
> All write accesses are accounted for (cpu write or command engine
> write). But there all still a few read accesses that I can't explain. Most
> of them are again reads from address 0x1ffff, though there still are a few
> others as well (I checked the VCD files, and they don't appear to be
> glitches). Maybe someone else can figure out their purpose? (Just grep
> for 'R..').
>
> I think all the 'mechanical' work on the data files is done. So now the more
> 'interesting' part of the analysis can start. Or in other words, now would
> be a good time to step in if you want to help ;-)
>
> Wouter
>
>
> On Sat, Jan 26, 2013 at 2:57 PM, Alex Wulms <alex.wulms@...> wrote:
>> Hi,
>>
>> I have read the slots.txt file. Its a very good analysis. Well done.
>>
>> The findings are very interesting, especially the finding about the sprites
>> and about the pre-amble and post-amble reads to address 0x1ffff.
>>
>> I had not expected that the sprite plane reads would happen interleaved with
>> the bitmap reads. I always assumed that all sprite reads would happen during
>> the horizontal retrace. Also interesting to see that the sprite reads are
>> not optimized at all from clock cycle usage perspective
>>
>> Regarding the following point about the sprites: "- Note that the
>> y-coordinate is fetched again, it was already fetched to
>> figure out which sprites are visible."
>>
>> The most likely explanation is that the VDP contains an 8 byte memory buffer
>> that can be used to remember *which* sprite numbers are visible (this buffer
>> will be filled while reading the 32 y-coordinates) but the VDP does not have
>> a memory buffer to remember the y-coordinates of those sprites. That could
>> explain why the VDP has to read the Y-coordinate again while rendering the
>> sprites. It implies that the designers of the VDP choose to spend extra
>> clock cycles in order to save on number of transistors. Which is a design
>> choice they apparently made in many places of the sprite engine. They
>> clearly favored simplified hardware design over fewer clock cycles during
>> rendering, despite the impact on the performance of the other parts of the
>> VDP like the command engine. Its probably a heritage from the MSX1 VDP
>> where those memory reads where 'free', because they were not interfering
>> with anything else anyway.
>>
>> Cheers,
>> Alex

Thanks for the feedback Alex.
I've made another improvement to the data files. Now all accesses
are annotated with their purpose. As before each access has 3
characters in front:
<read/write><burst><type>
where:
<read/write>: R->read W->write
<burst>: b-> burst access (takes 4 cycles) '.'-> no burst (takes 6 cycles)
(with a few exceptions)
<type>: this field has changed compared to earlier files
'v' -> (still) indicates where the VDP 'VDS' pin was active,
we now know this means bitmap data fetch
'r' -> refresh read
's' -> sprite data fetch (i didn't distinguish the different sprite
reads further)
'p' -> pre/post-amble reads (see slots.txt for details)
'c' -> cpu read/writes
'e' -> command engine read/writes
'.' -> still unknown
I only annotated the bitmap screen mode files.
All write accesses are accounted for (cpu write or command engine
write). But there all still a few read accesses that I can't explain. Most
of them are again reads from address 0x1ffff, though there still are a few
others as well (I checked the VCD files, and they don't appear to be
glitches). Maybe someone else can figure out their purpose? (Just grep
for 'R..').
I think all the 'mechanical' work on the data files is done. So now the more
'interesting' part of the analysis can start. Or in other words, now would
be a good time to step in if you want to help ;-)
Wouter
On Sat, Jan 26, 2013 at 2:57 PM, Alex Wulms <alex.wulms@...> wrote:
> Hi,
>
> I have read the slots.txt file. Its a very good analysis. Well done.
>
> The findings are very interesting, especially the finding about the sprites
> and about the pre-amble and post-amble reads to address 0x1ffff.
>
> I had not expected that the sprite plane reads would happen interleaved with
> the bitmap reads. I always assumed that all sprite reads would happen during
> the horizontal retrace. Also interesting to see that the sprite reads are
> not optimized at all from clock cycle usage perspective
>
> Regarding the following point about the sprites: "- Note that the
> y-coordinate is fetched again, it was already fetched to
> figure out which sprites are visible."
>
> The most likely explanation is that the VDP contains an 8 byte memory buffer
> that can be used to remember *which* sprite numbers are visible (this buffer
> will be filled while reading the 32 y-coordinates) but the VDP does not have
> a memory buffer to remember the y-coordinates of those sprites. That could
> explain why the VDP has to read the Y-coordinate again while rendering the
> sprites. It implies that the designers of the VDP choose to spend extra
> clock cycles in order to save on number of transistors. Which is a design
> choice they apparently made in many places of the sprite engine. They
> clearly favored simplified hardware design over fewer clock cycles during
> rendering, despite the impact on the performance of the other parts of the
> VDP like the command engine. Its probably a heritage from the MSX1 VDP
> where those memory reads where 'free', because they were not interfering
> with anything else anyway.
>
> Cheers,
> Alex
>
>
>
>
>
>
> On 26-01-13 13:49, Wouter Vermaelen wrote:
>
> Another status update:
>
> I started comparing the different capture files with each
> other. First I synchronized the files with each other.
> To do this I made a 'rotate' tool (source code included).
> The relative timing of all the (non-text mode) capture
> files should now be the same.
>
> I also included a tool called 'merge' to make it easier to
> compare a set of different capture files. Compile (in c++11
> mode) and run like: merge <file1> <file2> .. <fileN>
> Or 'merge screen*.txt > all.dat' to get all data in one big
> file. Though the result is easier to read if you e.g. exclude
> the text modes or only include files with/without sprites
> or with/without screen enabled.
>
> Next I started some actual analysis of the data. So far
> I've only looked at the refresh, bitmap and sprite vram
> accesses. Not yet at the cpu-read/writes or command
> engine read/writes. That's next on my TODO list. I've
> included a text file 'slots.txt' with my findings so far.
>
> Now that I'm starting to understand the data a bit better
> I also found (and fixed) some mistakes we made while
> capturing. E.g. some of our captures apparently were
> done in the vertical border. So even though the test was
> run with screen and/or sprites enabled, the data actually
> looks like a screen-disabled capture (we tried to avoid this
> situation while capturing, but apparently some tests
> slipped through). I also found a labeled-as-screen-5
> capture that actually was a text mode (probably the test
> prog was not yet started / already stopped when we
> captured the data).
>
> Wouter

Hi,
I have read the slots.txt file. Its a very good analysis. Well done.
The findings are very interesting, especially the finding about the
sprites and about the pre-amble and post-amble reads to address 0x1ffff.
I had not expected that the sprite plane reads would happen interleaved
with the bitmap reads. I always assumed that all sprite reads would
happen during the horizontal retrace. Also interesting to see that the
sprite reads are not optimized at all from clock cycle usage perspective
Regarding the following point about the sprites: "- Note that the
y-coordinate is fetched again, it was already fetched to
figure out which sprites are visible."
The most likely explanation is that the VDP contains an 8 byte memory
buffer that can be used to remember *which* sprite numbers are visible
(this buffer will be filled while reading the 32 y-coordinates) but the
VDP does not have a memory buffer to remember the y-coordinates of those
sprites. That could explain why the VDP has to read the Y-coordinate
again while rendering the sprites. It implies that the designers of the
VDP choose to spend extra clock cycles in order to save on number of
transistors. Which is a design choice they apparently made in many
places of the sprite engine. They clearly favored simplified hardware
design over fewer clock cycles during rendering, despite the impact on
the performance of the other parts of the VDP like the command engine.
Its probably a heritage from the MSX1 VDP where those memory reads where
'free', because they were not interfering with anything else anyway.
Cheers,
Alex
On 26-01-13 13:49, Wouter Vermaelen wrote:
> Another status update:
>
> I started comparing the different capture files with each
> other. First I synchronized the files with each other.
> To do this I made a 'rotate' tool (source code included).
> The relative timing of all the (non-text mode) capture
> files should now be the same.
>
> I also included a tool called 'merge' to make it easier to
> compare a set of different capture files. Compile (in c++11
> mode) and run like: merge <file1> <file2> .. <fileN>
> Or 'merge screen*.txt > all.dat' to get all data in one big
> file. Though the result is easier to read if you e.g. exclude
> the text modes or only include files with/without sprites
> or with/without screen enabled.
>
> Next I started some actual analysis of the data. So far
> I've only looked at the refresh, bitmap and sprite vram
> accesses. Not yet at the cpu-read/writes or command
> engine read/writes. That's next on my TODO list. I've
> included a text file 'slots.txt' with my findings so far.
>
> Now that I'm starting to understand the data a bit better
> I also found (and fixed) some mistakes we made while
> capturing. E.g. some of our captures apparently were
> done in the vertical border. So even though the test was
> run with screen and/or sprites enabled, the data actually
> looks like a screen-disabled capture (we tried to avoid this
> situation while capturing, but apparently some tests
> slipped through). I also found a labeled-as-screen-5
> capture that actually was a text mode (probably the test
> prog was not yet started / already stopped when we
> captured the data).
>
> Wouter
>
>
> On Thu, Jan 24, 2013 at 8:28 PM, Wouter Vermaelen
> <vermaelen.wouter@...> wrote:
>> Made some progress yesterday. The data files are slowly
>> getting into a format that is easy to analyze.
>>
>> These new files convert time from ns to VDP clock cycles
>> within one display line, so a value between [0 .. 1368). They
>> also put the data for different display lines (within the same
>> capture) in columns next to each other. So time runs from
>> top to bottom and continues from the top in the next column
>> (from left to right). This way it's easy to see which accesses
>> occur each display line at the same relative position.
>>
>> The starting point (VDP cycle 0) within a display line is at the
>> moment a bit arbitrary. Once we understand the data better,
>> we'll probably shift this point. I took the point where the
>> HSYNC signal in the VCD files goes from high to low.
>> Unfortunately this signal was _very_ noisy in our captures, so
>> this high->low edge is very ill defined. To still get a somewhat
>> stable position I took my best estimate for the transition, then
>> took the closest falling edge of the RAS signal. I did this for
>> all (3 or 4) transitions in the capture and also made sure that
>> the duration between the transitions is always the same.
>>
>> So if you look at the new derived data, the cycles within one
>> file should be fairly accurate from one display line to the next.
>> Though if you compare the cycle numbers from different files,
>> them they can still be a bit off (+/- 4 cycles at first sight).
>> Fixing this is next on my TODO list (e.g. the 'refresh' reads
>> seem to be present in all capture at the same relative positions.
>> I'll shift the data, so that in all the files, these refreshes also
>> happen at the same cycle number).
>>
>> Wouter
>>
>>
>> BTW, I just found out that the linux 'tar' tool can handle the 'xz'
>> compression format by using the '-J' option.
>>
>>
>>
>> On Wed, Jan 23, 2013 at 10:12 AM, Wouter Vermaelen
>> <vermaelen.wouter@...> wrote:
>>> The mail below I sent yesterday was rejected because the
>>> attachment was too big (over 200kB). I've slightly reformatted
>>> the text files to be more compact and compressed the file
>>> using a different compression tool (on linux use 'unxz' to
>>> extract, on windows i guess you can use 7zip). Hopefully
>>> now this mail will arrive.
>>>
>>> Wouter
>>>
>>> On Tue, Jan 22, 2013 at 7:39 PM, Wouter Vermaelen
>>> <vermaelen.wouter@...> wrote:
>>>> I've created some tools to extract vram read/write
>>>> accesses from the VCD files. I've attached the results.
>>>>
>>>> The resulting text files contain one line per read/write.
>>>> It specifies:
>>>> - timestamp (timescale = 10ns, resolution = 20ns)
>>>> - access type (read or write)
>>>> - the vram address (for screen 8, already translated to
>>>> logical addresses)
>>>> - whether it was a part of a burst access (so without
>>>> changing the row address) (*)
>>>> - whether the VDS pin was active or not
>>>>
>>>> Note that this does omit some of the details that
>>>> can be found in the VCD file. For example:
>>>> - HSYNC/CSYNC info is not present
>>>> - RAS-without-CAS accesses are no longer present
>>>> (technically these are refreshes, though it *seems*
>>>> that the VDP uses regular read accesses for the
>>>> actual vram refresh). The RAS-without-CAS stuff
>>>> *seems* the be present because the VDP doesn't
>>>> bother turning off the RAS toggles when it doesn't
>>>> actually need to read/write anything (very preliminary,
>>>> this guess might be wrong).
>>>> - Exact duration of the access is no longer visible,
>>>> though at first sight the timing always seems to follow
>>>> the same pattern (*)
>>>>
>>>> BTW I *think* the conversion tool is correct. I also
>>>> manually verified some of the resulting files. Though
>>>> if you see something strange in the files, it might
>>>> not be a bad idea to also check the original VCD file.
>>>>
>>>> I didn't yet try to interpret the resulting data (so figure
>>>> out the VDP-VRAM time slots). That's next on my
>>>> TODO list. Though I thought sharing these intermediate
>>>> results may make it more likely to receive some help ;-)
>>>>
>>>> Wouter
>>>>
>>>>
>>>> (*) A single VRAM access (read or write) takes 6 VDP
>>>> clock cycles. If it is part of a burst of N accesses, it takes
>>>> 2+4N cycles.
>>>>
>>>>
>>>>
>>>> On Sun, Jan 20, 2013 at 11:24 AM, Wouter Vermaelen
>>>> <vermaelen.wouter@...> wrote:
>>>>> Thanks.
>>>>>
>>>>> Also thanks a lot to anyone who helped with the measurements
>>>>> yesterday. I think it was a lot of fun and we may even learn
>>>>> something from it. We didn't yet analyze the data, but at first sight
>>>>> the raw data we obtained yesterday looks promising.
>>>>>
>>>>> I'll briefly describe the tests and the raw data files for anyone who
>>>>> is curious, or even wants to help analyze the data.
>>>>>
>>>>> The test program configured the VDP:
>>>>> - in a certain screen mode
>>>>> - screen/sprites enabled/disabled
>>>>> - 8x8 or 16x16 sprites, or set a sprite with Y-coord=208/216
>>>>>
>>>>> The base pointer registers were set so that (union for all screen modes)
>>>>> 0x00000-0x0D3FF: name table
>>>>> 0x0D400-0x0D7FF: sprite attribute table
>>>>> 0x0D800-0x0DFFF: sprite pattern table
>>>>> 0x0E000-0x0FFFF: pattern table
>>>>> 0x10000-0x11FFF: color table
>>>>> The _intention_ was to also use the following regions:
>>>>> 0x12000-0x13FFF: not used
>>>>> 0x14000-0x15FFF: CPU read
>>>>> 0x16000-0x17FFF: CPU write
>>>>> 0x18000-0x1BFFF: command engine source
>>>>> 0x1C000-0x1FFFF: command engine destination
>>>>> Though I'm sure that not all of our measurements actually
>>>>> confirm to these last 4 regions. E.g. we sometimes executed
>>>>> commands that used too 'high' rectangles so that they did
>>>>> read/write outside the wanted VRAM region. Or we read/write
>>>>> too many VRAM bytes via the Z80.
>>>>>
>>>>> After the setup, we optionally executed a VDP command and
>>>>> optionally read/write VRAM via the Z80 (both in a loop). And then
>>>>> we captured the VDP-VRAM bus while this test was running.
>>>>> The logic-analyzer we used had an internal buffer that allowed to
>>>>> capture for a duration of 3-4 VDP display lines (in tests with
>>>>> screen enabled, we tried to make sure we actually captured
>>>>> during the display area, though it's not impossible we sometimes
>>>>> made a mistake)
>>>>>
>>>>> We only had a limited amount of time, so we couldn't capture all
>>>>> possible combinations of the above configuration parameters.
>>>>> (that would also have been very boring ;-). However I do think
>>>>> we do have some interesting data already. Maybe if after analyzing
>>>>> the data we find that some details are still unclear we could do
>>>>> additional measurements sometime in the future.
>>>>>
>>>>> We tried to encode the parameters of the testrun in the filename
>>>>> of the saved VCD file (Value Change Dump). We more or less
>>>>> followed the following (ad-hoc) convention:
>>>>>
>>>>> <screenmode><screen/spritestatus><command><cpu-access>
>>>>>
>>>>> <screenmode>
>>>>> We mostly concentrated on screen 5 and screen 8 (due to lack
>>>>> of time). The *assumption* is that screen 5 and 6 and screen
>>>>> 7 and 8 have very similar timing behavior anyway. The other
>>>>> screen modes can't execute VDP commands (on a V9938
>>>>> anyway), so they are less interesting. One very important
>>>>> difference between screen 5 and 8 is that the latter accesses the
>>>>> VRAM in an 'interleaved' way. Keep in mind that these VCD
>>>>> captures show the actual 'physical' bus address, not the 'logical'
>>>>> address as used by a typical screen8 program.
>>>>>
>>>>> <screen/spritesstatus>
>>>>> could be:
>>>>> screen off
>>>>> screen on, sprites off
>>>>> screen on, sprites on 8x8
>>>>> screen on, sprites on 16x16
>>>>> screen on, sprites on 16x16, but with y-coord of sprite16 set to 216
>>>>> (so that sprites 16-31
>>>>> are not shown)
>>>>>
>>>>> <command>
>>>>> We most often tested with the HMMV command because this is the
>>>>> fastest VDP command (highest chance of seeing where the VRAM
>>>>> access slots are located in time). Again due to lack of time we only
>>>>> did a limited amount of tests with other commands.
>>>>> We encoded the command as
>>>>> <name>: command over the full screen width (e.g. NX=256)
>>>>> <name><number>: 'rectangular' command with a smaller width, note
>>>>> that the number is what is programmed in the NX VDP register,
>>>>> so pixels, not bytes
>>>>> <name><number><logop>: for 'L' commands when logop is not 'IMP'
>>>>> LINE<numer>: a major-X line with NX=256 and NY=number
>>>>>
>>>>> <cpu-access>
>>>>> One of 'nocpu', 'cpuwrite' or 'cpuread'.
>>>>> Initially we tested both read and write but later limited ourselves to
>>>>> only writes.
>>>>>
>>>>>
>>>>> Some preliminary results:
>>>>> As already said, we didn't analyze the captured data yet, but we
>>>>> could already see some interesting stuff just by briefly looking
>>>>> at the captured waveforms. Most of these things are completely
>>>>> as expected (which is good), but later we should be able to derive
>>>>> some numbers from these captures. It was also very pleasant to
>>>>> actually see these phenomena in the waveforms.
>>>>>
>>>>> * We could see the VDP commands speed-up and slow-down
>>>>> while they were executing. E.g. during one VDP-display line, the
>>>>> commands speeds up during the horizontal border.
>>>>> * 'Rectangular' commands slow down when they go from one row
>>>>> to the next. E.g when executing a command on narrow rectangle
>>>>> of 3 pixels/bytes, you can see 3 VRAM writes approx evenly
>>>>> spaced apart in time, then a slight pause and again 3 evenly
>>>>> spaced writes.
>>>>>
>>>>>
>>>>> Wouter
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>>>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>>>> MVPs and experts. ON SALE this month only -- learn more at:
>>>>> http://p.sf.net/sfu/learnnow-d2d
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> openMSX-devel mailing list
>>>>> openMSX-devel@...
>>>>> https://lists.sourceforge.net/lists/listinfo/openmsx-devel

Another status update:
I started comparing the different capture files with each
other. First I synchronized the files with each other.
To do this I made a 'rotate' tool (source code included).
The relative timing of all the (non-text mode) capture
files should now be the same.
I also included a tool called 'merge' to make it easier to
compare a set of different capture files. Compile (in c++11
mode) and run like: merge <file1> <file2> .. <fileN>
Or 'merge screen*.txt > all.dat' to get all data in one big
file. Though the result is easier to read if you e.g. exclude
the text modes or only include files with/without sprites
or with/without screen enabled.
Next I started some actual analysis of the data. So far
I've only looked at the refresh, bitmap and sprite vram
accesses. Not yet at the cpu-read/writes or command
engine read/writes. That's next on my TODO list. I've
included a text file 'slots.txt' with my findings so far.
Now that I'm starting to understand the data a bit better
I also found (and fixed) some mistakes we made while
capturing. E.g. some of our captures apparently were
done in the vertical border. So even though the test was
run with screen and/or sprites enabled, the data actually
looks like a screen-disabled capture (we tried to avoid this
situation while capturing, but apparently some tests
slipped through). I also found a labeled-as-screen-5
capture that actually was a text mode (probably the test
prog was not yet started / already stopped when we
captured the data).
Wouter
On Thu, Jan 24, 2013 at 8:28 PM, Wouter Vermaelen
<vermaelen.wouter@...> wrote:
> Made some progress yesterday. The data files are slowly
> getting into a format that is easy to analyze.
>
> These new files convert time from ns to VDP clock cycles
> within one display line, so a value between [0 .. 1368). They
> also put the data for different display lines (within the same
> capture) in columns next to each other. So time runs from
> top to bottom and continues from the top in the next column
> (from left to right). This way it's easy to see which accesses
> occur each display line at the same relative position.
>
> The starting point (VDP cycle 0) within a display line is at the
> moment a bit arbitrary. Once we understand the data better,
> we'll probably shift this point. I took the point where the
> HSYNC signal in the VCD files goes from high to low.
> Unfortunately this signal was _very_ noisy in our captures, so
> this high->low edge is very ill defined. To still get a somewhat
> stable position I took my best estimate for the transition, then
> took the closest falling edge of the RAS signal. I did this for
> all (3 or 4) transitions in the capture and also made sure that
> the duration between the transitions is always the same.
>
> So if you look at the new derived data, the cycles within one
> file should be fairly accurate from one display line to the next.
> Though if you compare the cycle numbers from different files,
> them they can still be a bit off (+/- 4 cycles at first sight).
> Fixing this is next on my TODO list (e.g. the 'refresh' reads
> seem to be present in all capture at the same relative positions.
> I'll shift the data, so that in all the files, these refreshes also
> happen at the same cycle number).
>
> Wouter
>
>
> BTW, I just found out that the linux 'tar' tool can handle the 'xz'
> compression format by using the '-J' option.
>
>
>
> On Wed, Jan 23, 2013 at 10:12 AM, Wouter Vermaelen
> <vermaelen.wouter@...> wrote:
>> The mail below I sent yesterday was rejected because the
>> attachment was too big (over 200kB). I've slightly reformatted
>> the text files to be more compact and compressed the file
>> using a different compression tool (on linux use 'unxz' to
>> extract, on windows i guess you can use 7zip). Hopefully
>> now this mail will arrive.
>>
>> Wouter
>>
>> On Tue, Jan 22, 2013 at 7:39 PM, Wouter Vermaelen
>> <vermaelen.wouter@...> wrote:
>>> I've created some tools to extract vram read/write
>>> accesses from the VCD files. I've attached the results.
>>>
>>> The resulting text files contain one line per read/write.
>>> It specifies:
>>> - timestamp (timescale = 10ns, resolution = 20ns)
>>> - access type (read or write)
>>> - the vram address (for screen 8, already translated to
>>> logical addresses)
>>> - whether it was a part of a burst access (so without
>>> changing the row address) (*)
>>> - whether the VDS pin was active or not
>>>
>>> Note that this does omit some of the details that
>>> can be found in the VCD file. For example:
>>> - HSYNC/CSYNC info is not present
>>> - RAS-without-CAS accesses are no longer present
>>> (technically these are refreshes, though it *seems*
>>> that the VDP uses regular read accesses for the
>>> actual vram refresh). The RAS-without-CAS stuff
>>> *seems* the be present because the VDP doesn't
>>> bother turning off the RAS toggles when it doesn't
>>> actually need to read/write anything (very preliminary,
>>> this guess might be wrong).
>>> - Exact duration of the access is no longer visible,
>>> though at first sight the timing always seems to follow
>>> the same pattern (*)
>>>
>>> BTW I *think* the conversion tool is correct. I also
>>> manually verified some of the resulting files. Though
>>> if you see something strange in the files, it might
>>> not be a bad idea to also check the original VCD file.
>>>
>>> I didn't yet try to interpret the resulting data (so figure
>>> out the VDP-VRAM time slots). That's next on my
>>> TODO list. Though I thought sharing these intermediate
>>> results may make it more likely to receive some help ;-)
>>>
>>> Wouter
>>>
>>>
>>> (*) A single VRAM access (read or write) takes 6 VDP
>>> clock cycles. If it is part of a burst of N accesses, it takes
>>> 2+4N cycles.
>>>
>>>
>>>
>>> On Sun, Jan 20, 2013 at 11:24 AM, Wouter Vermaelen
>>> <vermaelen.wouter@...> wrote:
>>>> Thanks.
>>>>
>>>> Also thanks a lot to anyone who helped with the measurements
>>>> yesterday. I think it was a lot of fun and we may even learn
>>>> something from it. We didn't yet analyze the data, but at first sight
>>>> the raw data we obtained yesterday looks promising.
>>>>
>>>> I'll briefly describe the tests and the raw data files for anyone who
>>>> is curious, or even wants to help analyze the data.
>>>>
>>>> The test program configured the VDP:
>>>> - in a certain screen mode
>>>> - screen/sprites enabled/disabled
>>>> - 8x8 or 16x16 sprites, or set a sprite with Y-coord=208/216
>>>>
>>>> The base pointer registers were set so that (union for all screen modes)
>>>> 0x00000-0x0D3FF: name table
>>>> 0x0D400-0x0D7FF: sprite attribute table
>>>> 0x0D800-0x0DFFF: sprite pattern table
>>>> 0x0E000-0x0FFFF: pattern table
>>>> 0x10000-0x11FFF: color table
>>>> The _intention_ was to also use the following regions:
>>>> 0x12000-0x13FFF: not used
>>>> 0x14000-0x15FFF: CPU read
>>>> 0x16000-0x17FFF: CPU write
>>>> 0x18000-0x1BFFF: command engine source
>>>> 0x1C000-0x1FFFF: command engine destination
>>>> Though I'm sure that not all of our measurements actually
>>>> confirm to these last 4 regions. E.g. we sometimes executed
>>>> commands that used too 'high' rectangles so that they did
>>>> read/write outside the wanted VRAM region. Or we read/write
>>>> too many VRAM bytes via the Z80.
>>>>
>>>> After the setup, we optionally executed a VDP command and
>>>> optionally read/write VRAM via the Z80 (both in a loop). And then
>>>> we captured the VDP-VRAM bus while this test was running.
>>>> The logic-analyzer we used had an internal buffer that allowed to
>>>> capture for a duration of 3-4 VDP display lines (in tests with
>>>> screen enabled, we tried to make sure we actually captured
>>>> during the display area, though it's not impossible we sometimes
>>>> made a mistake)
>>>>
>>>> We only had a limited amount of time, so we couldn't capture all
>>>> possible combinations of the above configuration parameters.
>>>> (that would also have been very boring ;-). However I do think
>>>> we do have some interesting data already. Maybe if after analyzing
>>>> the data we find that some details are still unclear we could do
>>>> additional measurements sometime in the future.
>>>>
>>>> We tried to encode the parameters of the testrun in the filename
>>>> of the saved VCD file (Value Change Dump). We more or less
>>>> followed the following (ad-hoc) convention:
>>>>
>>>> <screenmode><screen/spritestatus><command><cpu-access>
>>>>
>>>> <screenmode>
>>>> We mostly concentrated on screen 5 and screen 8 (due to lack
>>>> of time). The *assumption* is that screen 5 and 6 and screen
>>>> 7 and 8 have very similar timing behavior anyway. The other
>>>> screen modes can't execute VDP commands (on a V9938
>>>> anyway), so they are less interesting. One very important
>>>> difference between screen 5 and 8 is that the latter accesses the
>>>> VRAM in an 'interleaved' way. Keep in mind that these VCD
>>>> captures show the actual 'physical' bus address, not the 'logical'
>>>> address as used by a typical screen8 program.
>>>>
>>>> <screen/spritesstatus>
>>>> could be:
>>>> screen off
>>>> screen on, sprites off
>>>> screen on, sprites on 8x8
>>>> screen on, sprites on 16x16
>>>> screen on, sprites on 16x16, but with y-coord of sprite16 set to 216
>>>> (so that sprites 16-31
>>>> are not shown)
>>>>
>>>> <command>
>>>> We most often tested with the HMMV command because this is the
>>>> fastest VDP command (highest chance of seeing where the VRAM
>>>> access slots are located in time). Again due to lack of time we only
>>>> did a limited amount of tests with other commands.
>>>> We encoded the command as
>>>> <name>: command over the full screen width (e.g. NX=256)
>>>> <name><number>: 'rectangular' command with a smaller width, note
>>>> that the number is what is programmed in the NX VDP register,
>>>> so pixels, not bytes
>>>> <name><number><logop>: for 'L' commands when logop is not 'IMP'
>>>> LINE<numer>: a major-X line with NX=256 and NY=number
>>>>
>>>> <cpu-access>
>>>> One of 'nocpu', 'cpuwrite' or 'cpuread'.
>>>> Initially we tested both read and write but later limited ourselves to
>>>> only writes.
>>>>
>>>>
>>>> Some preliminary results:
>>>> As already said, we didn't analyze the captured data yet, but we
>>>> could already see some interesting stuff just by briefly looking
>>>> at the captured waveforms. Most of these things are completely
>>>> as expected (which is good), but later we should be able to derive
>>>> some numbers from these captures. It was also very pleasant to
>>>> actually see these phenomena in the waveforms.
>>>>
>>>> * We could see the VDP commands speed-up and slow-down
>>>> while they were executing. E.g. during one VDP-display line, the
>>>> commands speeds up during the horizontal border.
>>>> * 'Rectangular' commands slow down when they go from one row
>>>> to the next. E.g when executing a command on narrow rectangle
>>>> of 3 pixels/bytes, you can see 3 VRAM writes approx evenly
>>>> spaced apart in time, then a slight pause and again 3 evenly
>>>> spaced writes.
>>>>
>>>>
>>>> Wouter

Made some progress yesterday. The data files are slowly
getting into a format that is easy to analyze.
These new files convert time from ns to VDP clock cycles
within one display line, so a value between [0 .. 1368). They
also put the data for different display lines (within the same
capture) in columns next to each other. So time runs from
top to bottom and continues from the top in the next column
(from left to right). This way it's easy to see which accesses
occur each display line at the same relative position.
The starting point (VDP cycle 0) within a display line is at the
moment a bit arbitrary. Once we understand the data better,
we'll probably shift this point. I took the point where the
HSYNC signal in the VCD files goes from high to low.
Unfortunately this signal was _very_ noisy in our captures, so
this high->low edge is very ill defined. To still get a somewhat
stable position I took my best estimate for the transition, then
took the closest falling edge of the RAS signal. I did this for
all (3 or 4) transitions in the capture and also made sure that
the duration between the transitions is always the same.
So if you look at the new derived data, the cycles within one
file should be fairly accurate from one display line to the next.
Though if you compare the cycle numbers from different files,
them they can still be a bit off (+/- 4 cycles at first sight).
Fixing this is next on my TODO list (e.g. the 'refresh' reads
seem to be present in all capture at the same relative positions.
I'll shift the data, so that in all the files, these refreshes also
happen at the same cycle number).
Wouter
BTW, I just found out that the linux 'tar' tool can handle the 'xz'
compression format by using the '-J' option.
On Wed, Jan 23, 2013 at 10:12 AM, Wouter Vermaelen
<vermaelen.wouter@...> wrote:
> The mail below I sent yesterday was rejected because the
> attachment was too big (over 200kB). I've slightly reformatted
> the text files to be more compact and compressed the file
> using a different compression tool (on linux use 'unxz' to
> extract, on windows i guess you can use 7zip). Hopefully
> now this mail will arrive.
>
> Wouter
>
> On Tue, Jan 22, 2013 at 7:39 PM, Wouter Vermaelen
> <vermaelen.wouter@...> wrote:
>> I've created some tools to extract vram read/write
>> accesses from the VCD files. I've attached the results.
>>
>> The resulting text files contain one line per read/write.
>> It specifies:
>> - timestamp (timescale = 10ns, resolution = 20ns)
>> - access type (read or write)
>> - the vram address (for screen 8, already translated to
>> logical addresses)
>> - whether it was a part of a burst access (so without
>> changing the row address) (*)
>> - whether the VDS pin was active or not
>>
>> Note that this does omit some of the details that
>> can be found in the VCD file. For example:
>> - HSYNC/CSYNC info is not present
>> - RAS-without-CAS accesses are no longer present
>> (technically these are refreshes, though it *seems*
>> that the VDP uses regular read accesses for the
>> actual vram refresh). The RAS-without-CAS stuff
>> *seems* the be present because the VDP doesn't
>> bother turning off the RAS toggles when it doesn't
>> actually need to read/write anything (very preliminary,
>> this guess might be wrong).
>> - Exact duration of the access is no longer visible,
>> though at first sight the timing always seems to follow
>> the same pattern (*)
>>
>> BTW I *think* the conversion tool is correct. I also
>> manually verified some of the resulting files. Though
>> if you see something strange in the files, it might
>> not be a bad idea to also check the original VCD file.
>>
>> I didn't yet try to interpret the resulting data (so figure
>> out the VDP-VRAM time slots). That's next on my
>> TODO list. Though I thought sharing these intermediate
>> results may make it more likely to receive some help ;-)
>>
>> Wouter
>>
>>
>> (*) A single VRAM access (read or write) takes 6 VDP
>> clock cycles. If it is part of a burst of N accesses, it takes
>> 2+4N cycles.
>>
>>
>>
>> On Sun, Jan 20, 2013 at 11:24 AM, Wouter Vermaelen
>> <vermaelen.wouter@...> wrote:
>>> Thanks.
>>>
>>> Also thanks a lot to anyone who helped with the measurements
>>> yesterday. I think it was a lot of fun and we may even learn
>>> something from it. We didn't yet analyze the data, but at first sight
>>> the raw data we obtained yesterday looks promising.
>>>
>>> I'll briefly describe the tests and the raw data files for anyone who
>>> is curious, or even wants to help analyze the data.
>>>
>>> The test program configured the VDP:
>>> - in a certain screen mode
>>> - screen/sprites enabled/disabled
>>> - 8x8 or 16x16 sprites, or set a sprite with Y-coord=208/216
>>>
>>> The base pointer registers were set so that (union for all screen modes)
>>> 0x00000-0x0D3FF: name table
>>> 0x0D400-0x0D7FF: sprite attribute table
>>> 0x0D800-0x0DFFF: sprite pattern table
>>> 0x0E000-0x0FFFF: pattern table
>>> 0x10000-0x11FFF: color table
>>> The _intention_ was to also use the following regions:
>>> 0x12000-0x13FFF: not used
>>> 0x14000-0x15FFF: CPU read
>>> 0x16000-0x17FFF: CPU write
>>> 0x18000-0x1BFFF: command engine source
>>> 0x1C000-0x1FFFF: command engine destination
>>> Though I'm sure that not all of our measurements actually
>>> confirm to these last 4 regions. E.g. we sometimes executed
>>> commands that used too 'high' rectangles so that they did
>>> read/write outside the wanted VRAM region. Or we read/write
>>> too many VRAM bytes via the Z80.
>>>
>>> After the setup, we optionally executed a VDP command and
>>> optionally read/write VRAM via the Z80 (both in a loop). And then
>>> we captured the VDP-VRAM bus while this test was running.
>>> The logic-analyzer we used had an internal buffer that allowed to
>>> capture for a duration of 3-4 VDP display lines (in tests with
>>> screen enabled, we tried to make sure we actually captured
>>> during the display area, though it's not impossible we sometimes
>>> made a mistake)
>>>
>>> We only had a limited amount of time, so we couldn't capture all
>>> possible combinations of the above configuration parameters.
>>> (that would also have been very boring ;-). However I do think
>>> we do have some interesting data already. Maybe if after analyzing
>>> the data we find that some details are still unclear we could do
>>> additional measurements sometime in the future.
>>>
>>> We tried to encode the parameters of the testrun in the filename
>>> of the saved VCD file (Value Change Dump). We more or less
>>> followed the following (ad-hoc) convention:
>>>
>>> <screenmode><screen/spritestatus><command><cpu-access>
>>>
>>> <screenmode>
>>> We mostly concentrated on screen 5 and screen 8 (due to lack
>>> of time). The *assumption* is that screen 5 and 6 and screen
>>> 7 and 8 have very similar timing behavior anyway. The other
>>> screen modes can't execute VDP commands (on a V9938
>>> anyway), so they are less interesting. One very important
>>> difference between screen 5 and 8 is that the latter accesses the
>>> VRAM in an 'interleaved' way. Keep in mind that these VCD
>>> captures show the actual 'physical' bus address, not the 'logical'
>>> address as used by a typical screen8 program.
>>>
>>> <screen/spritesstatus>
>>> could be:
>>> screen off
>>> screen on, sprites off
>>> screen on, sprites on 8x8
>>> screen on, sprites on 16x16
>>> screen on, sprites on 16x16, but with y-coord of sprite16 set to 216
>>> (so that sprites 16-31
>>> are not shown)
>>>
>>> <command>
>>> We most often tested with the HMMV command because this is the
>>> fastest VDP command (highest chance of seeing where the VRAM
>>> access slots are located in time). Again due to lack of time we only
>>> did a limited amount of tests with other commands.
>>> We encoded the command as
>>> <name>: command over the full screen width (e.g. NX=256)
>>> <name><number>: 'rectangular' command with a smaller width, note
>>> that the number is what is programmed in the NX VDP register,
>>> so pixels, not bytes
>>> <name><number><logop>: for 'L' commands when logop is not 'IMP'
>>> LINE<numer>: a major-X line with NX=256 and NY=number
>>>
>>> <cpu-access>
>>> One of 'nocpu', 'cpuwrite' or 'cpuread'.
>>> Initially we tested both read and write but later limited ourselves to
>>> only writes.
>>>
>>>
>>> Some preliminary results:
>>> As already said, we didn't analyze the captured data yet, but we
>>> could already see some interesting stuff just by briefly looking
>>> at the captured waveforms. Most of these things are completely
>>> as expected (which is good), but later we should be able to derive
>>> some numbers from these captures. It was also very pleasant to
>>> actually see these phenomena in the waveforms.
>>>
>>> * We could see the VDP commands speed-up and slow-down
>>> while they were executing. E.g. during one VDP-display line, the
>>> commands speeds up during the horizontal border.
>>> * 'Rectangular' commands slow down when they go from one row
>>> to the next. E.g when executing a command on narrow rectangle
>>> of 3 pixels/bytes, you can see 3 VRAM writes approx evenly
>>> spaced apart in time, then a slight pause and again 3 evenly
>>> spaced writes.
>>>
>>>
>>> Wouter
>>>
>>>
>>>
>>> On Sun, Jan 20, 2013 at 7:48 AM, Joost Yervante Damad <joost@...> wrote:
>>>> Hey,
>>>>
>>>> as promised the full set of VDP timing measurements Wouter and I aquired
>>>> at the Nijmegen 2013 MSX fair.
>>>>
>>>> We managed to get 82 measurements, not bad eh? :)
>>>>
>>>> They're a bit big to fit in a email (2M total), but you can download
>>>> them at:
>>>>
>>>> http://damad.be/joost/nijmegen2013_VDP_timing_measurements.zip
>>>>
>>>> For viewing the vcd files, Wouter recommends using gtkwave.
>>>>
>>>> Joost
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>>> MVPs and experts. ON SALE this month only -- learn more at:
>>>> http://p.sf.net/sfu/learnmore_123012
>>>> _______________________________________________
>>>> openMSX-devel mailing list
>>>> openMSX-devel@...
>>>> https://lists.sourceforge.net/lists/listinfo/openmsx-devel

The mail below I sent yesterday was rejected because the
attachment was too big (over 200kB). I've slightly reformatted
the text files to be more compact and compressed the file
using a different compression tool (on linux use 'unxz' to
extract, on windows i guess you can use 7zip). Hopefully
now this mail will arrive.
Wouter
On Tue, Jan 22, 2013 at 7:39 PM, Wouter Vermaelen
<vermaelen.wouter@...> wrote:
> I've created some tools to extract vram read/write
> accesses from the VCD files. I've attached the results.
>
> The resulting text files contain one line per read/write.
> It specifies:
> - timestamp (timescale = 10ns, resolution = 20ns)
> - access type (read or write)
> - the vram address (for screen 8, already translated to
> logical addresses)
> - whether it was a part of a burst access (so without
> changing the row address) (*)
> - whether the VDS pin was active or not
>
> Note that this does omit some of the details that
> can be found in the VCD file. For example:
> - HSYNC/CSYNC info is not present
> - RAS-without-CAS accesses are no longer present
> (technically these are refreshes, though it *seems*
> that the VDP uses regular read accesses for the
> actual vram refresh). The RAS-without-CAS stuff
> *seems* the be present because the VDP doesn't
> bother turning off the RAS toggles when it doesn't
> actually need to read/write anything (very preliminary,
> this guess might be wrong).
> - Exact duration of the access is no longer visible,
> though at first sight the timing always seems to follow
> the same pattern (*)
>
> BTW I *think* the conversion tool is correct. I also
> manually verified some of the resulting files. Though
> if you see something strange in the files, it might
> not be a bad idea to also check the original VCD file.
>
> I didn't yet try to interpret the resulting data (so figure
> out the VDP-VRAM time slots). That's next on my
> TODO list. Though I thought sharing these intermediate
> results may make it more likely to receive some help ;-)
>
> Wouter
>
>
> (*) A single VRAM access (read or write) takes 6 VDP
> clock cycles. If it is part of a burst of N accesses, it takes
> 2+4N cycles.
>
>
>
> On Sun, Jan 20, 2013 at 11:24 AM, Wouter Vermaelen
> <vermaelen.wouter@...> wrote:
>> Thanks.
>>
>> Also thanks a lot to anyone who helped with the measurements
>> yesterday. I think it was a lot of fun and we may even learn
>> something from it. We didn't yet analyze the data, but at first sight
>> the raw data we obtained yesterday looks promising.
>>
>> I'll briefly describe the tests and the raw data files for anyone who
>> is curious, or even wants to help analyze the data.
>>
>> The test program configured the VDP:
>> - in a certain screen mode
>> - screen/sprites enabled/disabled
>> - 8x8 or 16x16 sprites, or set a sprite with Y-coord=208/216
>>
>> The base pointer registers were set so that (union for all screen modes)
>> 0x00000-0x0D3FF: name table
>> 0x0D400-0x0D7FF: sprite attribute table
>> 0x0D800-0x0DFFF: sprite pattern table
>> 0x0E000-0x0FFFF: pattern table
>> 0x10000-0x11FFF: color table
>> The _intention_ was to also use the following regions:
>> 0x12000-0x13FFF: not used
>> 0x14000-0x15FFF: CPU read
>> 0x16000-0x17FFF: CPU write
>> 0x18000-0x1BFFF: command engine source
>> 0x1C000-0x1FFFF: command engine destination
>> Though I'm sure that not all of our measurements actually
>> confirm to these last 4 regions. E.g. we sometimes executed
>> commands that used too 'high' rectangles so that they did
>> read/write outside the wanted VRAM region. Or we read/write
>> too many VRAM bytes via the Z80.
>>
>> After the setup, we optionally executed a VDP command and
>> optionally read/write VRAM via the Z80 (both in a loop). And then
>> we captured the VDP-VRAM bus while this test was running.
>> The logic-analyzer we used had an internal buffer that allowed to
>> capture for a duration of 3-4 VDP display lines (in tests with
>> screen enabled, we tried to make sure we actually captured
>> during the display area, though it's not impossible we sometimes
>> made a mistake)
>>
>> We only had a limited amount of time, so we couldn't capture all
>> possible combinations of the above configuration parameters.
>> (that would also have been very boring ;-). However I do think
>> we do have some interesting data already. Maybe if after analyzing
>> the data we find that some details are still unclear we could do
>> additional measurements sometime in the future.
>>
>> We tried to encode the parameters of the testrun in the filename
>> of the saved VCD file (Value Change Dump). We more or less
>> followed the following (ad-hoc) convention:
>>
>> <screenmode><screen/spritestatus><command><cpu-access>
>>
>> <screenmode>
>> We mostly concentrated on screen 5 and screen 8 (due to lack
>> of time). The *assumption* is that screen 5 and 6 and screen
>> 7 and 8 have very similar timing behavior anyway. The other
>> screen modes can't execute VDP commands (on a V9938
>> anyway), so they are less interesting. One very important
>> difference between screen 5 and 8 is that the latter accesses the
>> VRAM in an 'interleaved' way. Keep in mind that these VCD
>> captures show the actual 'physical' bus address, not the 'logical'
>> address as used by a typical screen8 program.
>>
>> <screen/spritesstatus>
>> could be:
>> screen off
>> screen on, sprites off
>> screen on, sprites on 8x8
>> screen on, sprites on 16x16
>> screen on, sprites on 16x16, but with y-coord of sprite16 set to 216
>> (so that sprites 16-31
>> are not shown)
>>
>> <command>
>> We most often tested with the HMMV command because this is the
>> fastest VDP command (highest chance of seeing where the VRAM
>> access slots are located in time). Again due to lack of time we only
>> did a limited amount of tests with other commands.
>> We encoded the command as
>> <name>: command over the full screen width (e.g. NX=256)
>> <name><number>: 'rectangular' command with a smaller width, note
>> that the number is what is programmed in the NX VDP register,
>> so pixels, not bytes
>> <name><number><logop>: for 'L' commands when logop is not 'IMP'
>> LINE<numer>: a major-X line with NX=256 and NY=number
>>
>> <cpu-access>
>> One of 'nocpu', 'cpuwrite' or 'cpuread'.
>> Initially we tested both read and write but later limited ourselves to
>> only writes.
>>
>>
>> Some preliminary results:
>> As already said, we didn't analyze the captured data yet, but we
>> could already see some interesting stuff just by briefly looking
>> at the captured waveforms. Most of these things are completely
>> as expected (which is good), but later we should be able to derive
>> some numbers from these captures. It was also very pleasant to
>> actually see these phenomena in the waveforms.
>>
>> * We could see the VDP commands speed-up and slow-down
>> while they were executing. E.g. during one VDP-display line, the
>> commands speeds up during the horizontal border.
>> * 'Rectangular' commands slow down when they go from one row
>> to the next. E.g when executing a command on narrow rectangle
>> of 3 pixels/bytes, you can see 3 VRAM writes approx evenly
>> spaced apart in time, then a slight pause and again 3 evenly
>> spaced writes.
>>
>>
>> Wouter
>>
>>
>>
>> On Sun, Jan 20, 2013 at 7:48 AM, Joost Yervante Damad <joost@...> wrote:
>>> Hey,
>>>
>>> as promised the full set of VDP timing measurements Wouter and I aquired
>>> at the Nijmegen 2013 MSX fair.
>>>
>>> We managed to get 82 measurements, not bad eh? :)
>>>
>>> They're a bit big to fit in a email (2M total), but you can download
>>> them at:
>>>
>>> http://damad.be/joost/nijmegen2013_VDP_timing_measurements.zip
>>>
>>> For viewing the vcd files, Wouter recommends using gtkwave.
>>>
>>> Joost
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>> MVPs and experts. ON SALE this month only -- learn more at:
>>> http://p.sf.net/sfu/learnmore_123012
>>> _______________________________________________
>>> openMSX-devel mailing list
>>> openMSX-devel@...
>>> https://lists.sourceforge.net/lists/listinfo/openmsx-devel

I've created some tools to extract vram read/write
accesses from the VCD files. I've attached the results.
The resulting text files contain one line per read/write.
It specifies:
- timestamp (timescale = 10ns, resolution = 20ns)
- access type (read or write)
- the vram address (for screen 8, already translated to
logical addresses)
- whether it was a part of a burst access (so without
changing the row address) (*)
- whether the VDS pin was active or not
Note that this does omit some of the details that
can be found in the VCD file. For example:
- HSYNC/CSYNC info is not present
- RAS-without-CAS accesses are no longer present
(technically these are refreshes, though it *seems*
that the VDP uses regular read accesses for the
actual vram refresh). The RAS-without-CAS stuff
*seems* the be present because the VDP doesn't
bother turning off the RAS toggles when it doesn't
actually need to read/write anything (very preliminary,
this guess might be wrong).
- Exact duration of the access is no longer visible,
though at first sight the timing always seems to follow
the same pattern (*)
BTW I *think* the conversion tool is correct. I also
manually verified some of the resulting files. Though
if you see something strange in the files, it might
not be a bad idea to also check the original VCD file.
I didn't yet try to interpret the resulting data (so figure
out the VDP-VRAM time slots). That's next on my
TODO list. Though I thought sharing these intermediate
results may make it more likely to receive some help ;-)
Wouter
(*) A single VRAM access (read or write) takes 6 VDP
clock cycles. If it is part of a burst of N accesses, it takes
2+4N cycles.
On Sun, Jan 20, 2013 at 11:24 AM, Wouter Vermaelen
<vermaelen.wouter@...> wrote:
> Thanks.
>
> Also thanks a lot to anyone who helped with the measurements
> yesterday. I think it was a lot of fun and we may even learn
> something from it. We didn't yet analyze the data, but at first sight
> the raw data we obtained yesterday looks promising.
>
> I'll briefly describe the tests and the raw data files for anyone who
> is curious, or even wants to help analyze the data.
>
> The test program configured the VDP:
> - in a certain screen mode
> - screen/sprites enabled/disabled
> - 8x8 or 16x16 sprites, or set a sprite with Y-coord=208/216
>
> The base pointer registers were set so that (union for all screen modes)
> 0x00000-0x0D3FF: name table
> 0x0D400-0x0D7FF: sprite attribute table
> 0x0D800-0x0DFFF: sprite pattern table
> 0x0E000-0x0FFFF: pattern table
> 0x10000-0x11FFF: color table
> The _intention_ was to also use the following regions:
> 0x12000-0x13FFF: not used
> 0x14000-0x15FFF: CPU read
> 0x16000-0x17FFF: CPU write
> 0x18000-0x1BFFF: command engine source
> 0x1C000-0x1FFFF: command engine destination
> Though I'm sure that not all of our measurements actually
> confirm to these last 4 regions. E.g. we sometimes executed
> commands that used too 'high' rectangles so that they did
> read/write outside the wanted VRAM region. Or we read/write
> too many VRAM bytes via the Z80.
>
> After the setup, we optionally executed a VDP command and
> optionally read/write VRAM via the Z80 (both in a loop). And then
> we captured the VDP-VRAM bus while this test was running.
> The logic-analyzer we used had an internal buffer that allowed to
> capture for a duration of 3-4 VDP display lines (in tests with
> screen enabled, we tried to make sure we actually captured
> during the display area, though it's not impossible we sometimes
> made a mistake)
>
> We only had a limited amount of time, so we couldn't capture all
> possible combinations of the above configuration parameters.
> (that would also have been very boring ;-). However I do think
> we do have some interesting data already. Maybe if after analyzing
> the data we find that some details are still unclear we could do
> additional measurements sometime in the future.
>
> We tried to encode the parameters of the testrun in the filename
> of the saved VCD file (Value Change Dump). We more or less
> followed the following (ad-hoc) convention:
>
> <screenmode><screen/spritestatus><command><cpu-access>
>
> <screenmode>
> We mostly concentrated on screen 5 and screen 8 (due to lack
> of time). The *assumption* is that screen 5 and 6 and screen
> 7 and 8 have very similar timing behavior anyway. The other
> screen modes can't execute VDP commands (on a V9938
> anyway), so they are less interesting. One very important
> difference between screen 5 and 8 is that the latter accesses the
> VRAM in an 'interleaved' way. Keep in mind that these VCD
> captures show the actual 'physical' bus address, not the 'logical'
> address as used by a typical screen8 program.
>
> <screen/spritesstatus>
> could be:
> screen off
> screen on, sprites off
> screen on, sprites on 8x8
> screen on, sprites on 16x16
> screen on, sprites on 16x16, but with y-coord of sprite16 set to 216
> (so that sprites 16-31
> are not shown)
>
> <command>
> We most often tested with the HMMV command because this is the
> fastest VDP command (highest chance of seeing where the VRAM
> access slots are located in time). Again due to lack of time we only
> did a limited amount of tests with other commands.
> We encoded the command as
> <name>: command over the full screen width (e.g. NX=256)
> <name><number>: 'rectangular' command with a smaller width, note
> that the number is what is programmed in the NX VDP register,
> so pixels, not bytes
> <name><number><logop>: for 'L' commands when logop is not 'IMP'
> LINE<numer>: a major-X line with NX=256 and NY=number
>
> <cpu-access>
> One of 'nocpu', 'cpuwrite' or 'cpuread'.
> Initially we tested both read and write but later limited ourselves to
> only writes.
>
>
> Some preliminary results:
> As already said, we didn't analyze the captured data yet, but we
> could already see some interesting stuff just by briefly looking
> at the captured waveforms. Most of these things are completely
> as expected (which is good), but later we should be able to derive
> some numbers from these captures. It was also very pleasant to
> actually see these phenomena in the waveforms.
>
> * We could see the VDP commands speed-up and slow-down
> while they were executing. E.g. during one VDP-display line, the
> commands speeds up during the horizontal border.
> * 'Rectangular' commands slow down when they go from one row
> to the next. E.g when executing a command on narrow rectangle
> of 3 pixels/bytes, you can see 3 VRAM writes approx evenly
> spaced apart in time, then a slight pause and again 3 evenly
> spaced writes.
>
>
> Wouter
>
>
>
> On Sun, Jan 20, 2013 at 7:48 AM, Joost Yervante Damad <joost@...> wrote:
>> Hey,
>>
>> as promised the full set of VDP timing measurements Wouter and I aquired
>> at the Nijmegen 2013 MSX fair.
>>
>> We managed to get 82 measurements, not bad eh? :)
>>
>> They're a bit big to fit in a email (2M total), but you can download
>> them at:
>>
>> http://damad.be/joost/nijmegen2013_VDP_timing_measurements.zip
>>
>> For viewing the vcd files, Wouter recommends using gtkwave.
>>
>> Joost
>>
>>
>> ------------------------------------------------------------------------------
>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> MVPs and experts. ON SALE this month only -- learn more at:
>> http://p.sf.net/sfu/learnmore_123012
>> _______________________________________________
>> openMSX-devel mailing list
>> openMSX-devel@...
>> https://lists.sourceforge.net/lists/listinfo/openmsx-devel

Thanks.
Also thanks a lot to anyone who helped with the measurements
yesterday. I think it was a lot of fun and we may even learn
something from it. We didn't yet analyze the data, but at first sight
the raw data we obtained yesterday looks promising.
I'll briefly describe the tests and the raw data files for anyone who
is curious, or even wants to help analyze the data.
The test program configured the VDP:
- in a certain screen mode
- screen/sprites enabled/disabled
- 8x8 or 16x16 sprites, or set a sprite with Y-coord=208/216
The base pointer registers were set so that (union for all screen modes)
0x00000-0x0D3FF: name table
0x0D400-0x0D7FF: sprite attribute table
0x0D800-0x0DFFF: sprite pattern table
0x0E000-0x0FFFF: pattern table
0x10000-0x11FFF: color table
The _intention_ was to also use the following regions:
0x12000-0x13FFF: not used
0x14000-0x15FFF: CPU read
0x16000-0x17FFF: CPU write
0x18000-0x1BFFF: command engine source
0x1C000-0x1FFFF: command engine destination
Though I'm sure that not all of our measurements actually
confirm to these last 4 regions. E.g. we sometimes executed
commands that used too 'high' rectangles so that they did
read/write outside the wanted VRAM region. Or we read/write
too many VRAM bytes via the Z80.
After the setup, we optionally executed a VDP command and
optionally read/write VRAM via the Z80 (both in a loop). And then
we captured the VDP-VRAM bus while this test was running.
The logic-analyzer we used had an internal buffer that allowed to
capture for a duration of 3-4 VDP display lines (in tests with
screen enabled, we tried to make sure we actually captured
during the display area, though it's not impossible we sometimes
made a mistake)
We only had a limited amount of time, so we couldn't capture all
possible combinations of the above configuration parameters.
(that would also have been very boring ;-). However I do think
we do have some interesting data already. Maybe if after analyzing
the data we find that some details are still unclear we could do
additional measurements sometime in the future.
We tried to encode the parameters of the testrun in the filename
of the saved VCD file (Value Change Dump). We more or less
followed the following (ad-hoc) convention:
<screenmode><screen/spritestatus><command><cpu-access>
<screenmode>
We mostly concentrated on screen 5 and screen 8 (due to lack
of time). The *assumption* is that screen 5 and 6 and screen
7 and 8 have very similar timing behavior anyway. The other
screen modes can't execute VDP commands (on a V9938
anyway), so they are less interesting. One very important
difference between screen 5 and 8 is that the latter accesses the
VRAM in an 'interleaved' way. Keep in mind that these VCD
captures show the actual 'physical' bus address, not the 'logical'
address as used by a typical screen8 program.
<screen/spritesstatus>
could be:
screen off
screen on, sprites off
screen on, sprites on 8x8
screen on, sprites on 16x16
screen on, sprites on 16x16, but with y-coord of sprite16 set to 216
(so that sprites 16-31
are not shown)
<command>
We most often tested with the HMMV command because this is the
fastest VDP command (highest chance of seeing where the VRAM
access slots are located in time). Again due to lack of time we only
did a limited amount of tests with other commands.
We encoded the command as
<name>: command over the full screen width (e.g. NX=256)
<name><number>: 'rectangular' command with a smaller width, note
that the number is what is programmed in the NX VDP register,
so pixels, not bytes
<name><number><logop>: for 'L' commands when logop is not 'IMP'
LINE<numer>: a major-X line with NX=256 and NY=number
<cpu-access>
One of 'nocpu', 'cpuwrite' or 'cpuread'.
Initially we tested both read and write but later limited ourselves to
only writes.
Some preliminary results:
As already said, we didn't analyze the captured data yet, but we
could already see some interesting stuff just by briefly looking
at the captured waveforms. Most of these things are completely
as expected (which is good), but later we should be able to derive
some numbers from these captures. It was also very pleasant to
actually see these phenomena in the waveforms.
* We could see the VDP commands speed-up and slow-down
while they were executing. E.g. during one VDP-display line, the
commands speeds up during the horizontal border.
* 'Rectangular' commands slow down when they go from one row
to the next. E.g when executing a command on narrow rectangle
of 3 pixels/bytes, you can see 3 VRAM writes approx evenly
spaced apart in time, then a slight pause and again 3 evenly
spaced writes.
Wouter
On Sun, Jan 20, 2013 at 7:48 AM, Joost Yervante Damad <joost@...> wrote:
> Hey,
>
> as promised the full set of VDP timing measurements Wouter and I aquired
> at the Nijmegen 2013 MSX fair.
>
> We managed to get 82 measurements, not bad eh? :)
>
> They're a bit big to fit in a email (2M total), but you can download
> them at:
>
> http://damad.be/joost/nijmegen2013_VDP_timing_measurements.zip
>
> For viewing the vcd files, Wouter recommends using gtkwave.
>
> Joost
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_123012
> _______________________________________________
> openMSX-devel mailing list
> openMSX-devel@...
> https://lists.sourceforge.net/lists/listinfo/openmsx-devel

Hey,
as promised the full set of VDP timing measurements Wouter and I aquired
at the Nijmegen 2013 MSX fair.
We managed to get 82 measurements, not bad eh? :)
They're a bit big to fit in a email (2M total), but you can download
them at:
http://damad.be/joost/nijmegen2013_VDP_timing_measurements.zip
For viewing the vcd files, Wouter recommends using gtkwave.
Joost

Feature Requests item #2964248, was opened at 2010-03-05 06:46
Message generated for change (Comment added) made by manuelbi
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=421864&aid=2964248&group_id=38274
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General/misc
Group: Next Release
>Status: Closed
Priority: 5
Private: No
Submitted By: seanyoung (seanyoung)
>Assigned to: Wouter Vermaelen (m9710797)
Summary: Implement bash completion function
Initial Comment:
Implement a bash shell function which can do command line completion for all the options of openmsx, extensions and machines. This is probably requires openmsx to be able to execute tcl from its command line.
----------------------------------------------------------------------
>Comment By: Manuel Bilderbeek (manuelbi)
Date: 2013-01-05 07:37
Message:
Wouter implemented this in revisions 12896 and 12897.
See Contrib/openmsx-complete.bash for instructions on how to activate
this.
----------------------------------------------------------------------
Comment By: Maarten ter Huurne (mthuurne)
Date: 2010-03-05 06:58
Message:
Why would this require Tcl execution? As far as I know, all command line
parsing and path completion code is implemented C++. The path completion is
called from Tcl, but does not require Tcl.
It would require other refactoring, I think. The current command line
parsing code does both the parsing itself and setup of the initial machine.
Instead, it should parse and build a data structure or Tcl script first and
setup the initial machine in a second step, only if openMSX was actually
called to start emulation. This refactoring was already on my wish list
because it would simplify the command line parsing code.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=421864&aid=2964248&group_id=38274