mahen wrote:Something is really funny in Thunder Force IV : ingame music stalls until the player's ship fires. Then music seems to be playing at the shooting pace. (I have no idea if what I say in understandable

I have just tried the game. OMG it's the weirdest thing! If I was trying to make that on purpose, I would fail!

I'll look into it this week. I have also heard some clues in other games that may be pointing to crosstalk between operators: a control signal from one operator gets mixed in the pipeline with that of a different operator because phisically there is only one sound operator, but virtually there are 24 of them. So I need to be cycling the signals of each virtual operator through the real one. I may have a signal from one operator getting mixed with the signal of another.

I have worked a lot the last two weeks in order to improve sound quality. I have added FIR filters and a better sigma-delta in order to have a proper signal processing chain at least on the digital side. I have not seen FIR filters for interpolation before the sigma-delta DAC in any other retro FPGA core so this was really cool to me. I worked on an improved PSG sound core (JT89) too. The whole thing was very exciting.

However, we have a complex implementation of the Genesis core inherited from a different project. I do not quite understand why but the problem is that we are having many, many difficulties when trying to put the blocks together. Timing issues everywhere. This is a similar problem to what I found when I tried to add JT51 to the Atari ST (MiST) core. If the base core is not robust in terms of timing, as more stuff is added to the FPGA the whole thing just breaks down. I know that JT12 itself is fine because I can test it independently. But putting it together with the rest of the system has shown to be impossible for the last week.

I am going to take some time off now. I have been spending many hours everyday on this for the last two months. I think I have delivered a good JT12 module. It is only missing the LFO at the moment, but that is something minor that I will eventually fix. The other issue, the sounds in Thunder Force IV, is probably related to the timing issues we have at FPGAgen system level and not JT12 itself. Other sound artifacts are related to the signal chain used and I have also fixed that as I said in the first paragraph.

I will go back to mundane things like dusting off the shelves of my house, which I have not done in the last two months . I hope the timing of FPGAgen gets fixed by the rest of team. robinsonb5 and phoboz are putting a lot of effort on this too. Or maybe someone new comes to the project and fixes timing... Eventually, with timing fixed I will come back to connect the latest JT12 incarnation to FPGAgen and we will have sound completed.

jotego wrote:However, we have a complex implementation of the Genesis core inherited from a different project. I do not quite understand why but the problem is that we are having many, many difficulties when trying to put the blocks together. Timing issues everywhere. This is a similar problem to what I found when I tried to add JT51 to the Atari ST (MiST) core. If the base core is not robust in terms of timing, as more stuff is added to the FPGA the whole thing just breaks down. I know that JT12 itself is fine because I can test it independently. But putting it together with the rest of the system has shown to be impossible for the last week.

Make sure there are no asynchronous design is used in any part of project. Check is there are warnings like "Found signal xxx used as a clock without declaration." - or something like this.If project is really large then adding more info into .sdc file is unavoidable. I hope i will understand this file someday...

alexh wrote:I think it has something to do with VDP DMA VRAM FILL? Thunder Force IV makes extensive use of this. Software emulators had to improve their VRAM FILL accuracy for this game.

Perhaps if it is not emulated/recreated correctly the audio core gets no DMA access for a while and then lots of data very quickly? Where in the real hardware it is paced correctly?

A quick search of old (pre-2011) forum posts can find the technical details

I don't know about the VDP. I have checked what is special about TFIV in terms of YM2612 and it is the only game I have seen that uses the two internal timers of YM2612 (actually the rest of the Thunder Force series does it too). This has to do to how the Z80 determines the rythm of the music. There are three options:

1. Some internal Z80 counter (most games, eg Sonic)2. Use timer A or timer B from YM2612 to advance music when the timer flag is set (eg Ghouls'n Ghosts uses timer B)3. Use both timers (Ultra Force series)

So I focused on the timers but I have not found issues in my implementation. That's why I think it has something to do with timing violations in the FPGA in the signal path from flag A/B to the CPU.

Some cores I've seen have problems with one clock being derived (one clock divided to produce a slower clock) and then having hold timing issues. You should be able to see the failing paths on the timing report. These cases are usually not too difficult to fix. Ideally remove the clock division and replace with a clock enable, or if possible use a PLL to perform the division. Otherwise some additional delay on the problematic paths might be needed. Quartus sometimes can do that automatically. But if not, it can be done manually.

If the problem is that the whole design is too close to the original implementation, it might have so many async issues that it would need some kind of major redesign.

ijor wrote:Some cores I've seen have problems with one clock being derived (one clock divided to produce a slower clock) and then having hold timing issues. You should be able to see the failing paths on the timing report. These cases are usually not too difficult to fix. Ideally remove the clock division and replace with a clock enable, or if possible use a PLL to perform the division. Otherwise some additional delay on the problematic paths might be needed. Quartus sometimes can do that automatically. But if not, it can be done manually.

This is called "asynchronous design" where any random signal used as a clock. That's what i wrote above.I think jotego should be already experienced to know about this. But sometimes original design comes from other authors and can include some flaws. I know Quartus not always reports about signals used as a clock not being declared as a clock. I think it depends on sdc file content. I remember i cleaned sdc file (i left only necessary records) which i used for almost all my cores, and Quartus started to report about such problems which compiled before without warnings.

Sorgelig wrote:This is called "asynchronous design" where any random signal used as a clock. That's what i wrote above.

I think ijor is talking about when you have two (or more) clocks in your design which are different frequencies but you can make them (rising) edge synchronous. In that situation you can define the fastest clock and use a clock enable to for the slower clocks (instead of defining two clocks).

I know nothing about Altera but the Xilinx toolset really doesn't like multiple clocks. Anything you can do to make them one clock with an enable will really help the tool. Not only will timing improve but run times will go down too as the tool doesn't try to do hold fixing.

Yes, the issue comes from having multiple clocks. The original author of FPGAgen used a sort of clock enable strategy. That actually works fine for small designs. The problem is that we are almost completely using the FPGA resources and then things that are normally not a problem become a problem.

The design is made using a counter on a main 54MHz clock. With that counter there are other signals generated that are sometimes used as clocks directly (which is not a recommended practice) or have a clock enable signal (which is not great either). Again, the two approaches work well with small designs in comparison with the FPGA size.

The FPGA has signal routing and clock routing. The clock routing is optimized to work as a clock across the whole device. The FPGA in MiST has 20 of these global clock trees. Signal routing is optimized for short connections. There seems to be an exception: there is support for reset signals so they go to many more cells. But I do not know how is reset routing done: is it only one? is it many?

When clocks are generated without FPGA clock pins or FPGA PLL outputs, the global clock tree is not used but the regular signal routing. Probably there is some clever instantiation of cells that can be done to force a global clock tree to be used. But we have not figured that out yet. So we have regular routing for clocks. And when regular routing gets full because of true signals then the whole thing starts to break apart.

Up to JT12 v0.3 we were getting clean synthesis normally. Then as I made the sound system larger, or simply different, the fitter started to fail and I have not seen any other clean synthesis. Note that when a design is marginal, it can easily become impractical for the fitter if a single gate is changed in the design.

I do not have experience with making a system as large as the FPGAgen yet. So I do not want to start making big changes to it. For the time being I prefer to wait for robinsonb5 to see if he can fix it. It may actually need a major rewrite of the glue logic that links the two CPUs, the video system and the audio system.

This is also related to how people release open source cores. They usually do not come with implementation details. Ideally we would like to know things like:

-clock edge used-internal clocking scheme and requirements-SDC constraints-Timing diagrams to interact with memories and other external elements to the block-Warning-free HDL code-Code that follows FPGA design recommendation and not just simulation/ASIC code

With JT51 I could only do the warning-free part. Basically because I was not conscious of the importance of the other elements. With JT12 I have also followed all the recommendations from Altera.

By the way, there is a nice book chapter available for free about clocks here. I ended up buying the whole book after reading that chapter...

jotego wrote:The design is made using a counter on a main 54MHz clock. With that counter there are other signals generated that are sometimes used as clocks directly (which is not a recommended practice) or have a clock enable signal (which is not great either). Again, the two approaches work well with small designs in comparison with the FPGA size.

Using a clock enable is actually the recommended way to do it. Mixing clocks and clock enable is not a good idea.

The problem with clock enable is that it consumes more resources, but not nearly as much as when used on ASIC designs. Contrary to ASICs, the FPGA registers have dedicated clock enable logic built in. This doesn't mean that use them is completely free. On this Cyclone family there are two clock enable signals per LAB, you can't have a different clock enable for each register in a LAB. Also you might be already using them as part of a conditional updating of the registers. In such cases Quartus will make some further combinatorial logic to implement the desired behaviour. Say, the physical clock enable would become the AND of your logical clock enable and the conditional value. In some cases this might mean an additional LE (logic element) has to be used. But because FPGA implements combinatorial logic using 3 or 4 way LUTs, in many cases no extra LE is needed.

There is an additional problem when replacing a clock with a clock enable. The phasing changes. From the point of view of the master clock, a signal used directly as a clock updates the target register in THIS cycle. A clock enable would do it on the NEXT cycle. This must be considered.

The FPGA has signal routing and clock routing.... There seems to be an exception: there is support for reset signals so they go to many more cells. But I do not know how is reset routing done: is it only one? is it many? ... When clocks are generated without FPGA clock pins or FPGA PLL outputs, the global clock tree is not used but the regular signal routing.

The clock network can be used for any signal, clock, async reset or whatever. They are normally dedicated for low skew and high fanout signals. And it is not true that it is not used for internally generated clocks. What is happening here, I guess, is that Quartus is trying to help you.

The clock network is slower than the regular interconnect. It is designed for low skew, not for speed. If you have a derived clock, you already are having clock skew issues just as direct consequence of producing the derived clock. Taking the signal up to the clock network, plus the relative slow speed of it, would make the skew even worse.

Probably there is some clever instantiation of cells that can be done to force a global clock tree to be used. But we have not figured that out yet. So we have regular routing for clocks.

There are multiple ways to do that and it depends on the exact Quartus version. But check the "global_signal" option setting.

And when regular routing gets full because of true signals then the whole thing starts to break apart.

Are you sure that is the problem? Modern FPGAs normally have lots of interconnect resources. Normally you run out of logic elements way before you run out of routing resources. Can you post a compilation report, may be the summary at least?

Back again to the clock skew issues. The most important here seems to be to check Quartus TimeQuest reports. Depending on this you might try to fix the problem with different strategies.

This is a summary of a build on the Next branch with all the sound elements commented out, to ease synthesis. Nonetheless it still contains many violations. If I include the sound subsystem things just get worse.

TNS stands for Total Negative Slack. Is the summatory of the negative slack for all the failing paths on that clock. It gives you an idea of how serious the problem is. In the quoted line above it means that you have, at least (might be much more), about 30 failing paths on that clock. Note that clock skew usually produces hold timing violations, not setup ones, but ...

You have to constrain the clocks, otherwise the report is not really meaningful. Launch standalone TimeQuest. It will tell you all the clocks on the design. And it will help you building the right constraints. Clocks produced by the PLLs can be constrained automatically. Derived clocks might need a manual declaration, which in turn might require some code analysis and/or simulation. Let me know if you need further help with constraining the clocks.