The Iyonix profile results are a lot harder to read – lots of jumping around between MManager, EtherK and Internet.

After hastily adding a histogram option to profanal, it looks like 43% of the time is spent in MManager, 29% in Internet, 14% in EtherK, and 5% in the SCL – but it’s hard to say exactly which functions are taking up all the time since I don’t have a convenient way of getting the addresses of C ‘static’ functions. And of course MManager is closed source (but if the BB results were anything to go by, I can guess that most of the time in there will be being spent copying from the NIC receive buffer)

I’m not sure if it is still the case, but it is also worth remembering that the Pi isn’t the fastest VFP implementation. Certainly the numbers from the BASICVFP thread were significantly lower that the (same-ball park clock-speed) i.MX6 system I tested on (basically Pi was roughly half the Titanium, and i.MX6 was closer to Titanium than to Pi).

It has generally been a “rule of thumb” (pardon the bad pun) not to use float with RISC OS due to historical deficiencies and ARM CPUs being stronger with integer maths. Intel CPUs, by contast, have always been floating point kings (it tended to be an area of strength over their rivals at AMD, even when the wind wasn’t blowing in their direction in other areas). As such, you’re probably looking at the weakest part of the Pi, and comparing to the strongest aspect of the Xeon.

we could save around 75K of ROM space by reprocessing the Wimp iconsprite set to all use RISC OS 3.5 mode words

saving about 300K of RAM for the Sprites11 set

all the sprites 11 sprites are using RISC OS 3.5 sprite mode words

I’m probably missing something (very little sleep last night) but don’t these statements contradict each other? If Sprites11 is already using 3.5 mode then how can we save 300k by changing them to 3.5 mode? Unless that “or newer” in your first post means something special…

I’m fairly certain my original testing was done with packets which were smaller than the MTU.

Regular use case but hardly likely to trigger errors
When testing always take things beyond normal limits if at all possible. Things that behave nicely when stressed tend to work even better when they aren’t. Standard production test rigs push the various design parameter limits for a good reason.

While investigating something else, I came across the following anomalies associated with the handling of Writable Icons…

1. The 1992 RISC OS 3 PRM says on page 3-104, when describing the actions for the K validation command, that the action taken for Kr is:

- If the icon is not the last icon in the window, pressing Return in the icon will move the caret to the beginning of the next writable icon in the window.
- If the icon is the last writable icon in the window then Return (code 13) will be passed to the application.

The current online ROOLPRM is in agreement, although is more cryptic, as it just says:
- Return moves caret to next writable icon

From my observations, the current Wimp reacts as the PRM states when Return is pressed in a Writable icon which includes Ktar validation, except that it puts the caret at the end. No key is passed back to the application unless it is in the last writable icon. If the validation includes Ktarn, then all keys are passed back.

2. The new Style Guide says on page 109 for Writable icons in Dialogue boxes that they should have Validation strings of “Ktar;Pptr_write”, and that pressing Return after giving input to a writable field should activate the default action button, not move the caret to the next field.

The Style Guide also says on page 53 that when the Return key is pressed, an application should implement the current settings and remove the dialogue from the screen.

3. The PRM also claims that pressing Tab/Shift-Tab will move to the beginning if the next/previous writable icon. However, the Style Guide says that it should move to the end of the text – which agrees with what happens in practice – identical to what is happens with the Ka action and the down/up keys.

Note that the terms ‘next’, ‘previous’, first’ and ‘last’ refer to the sequence of icon numbers, and not their appearance in the dialogue, although I have found nowhere that is defined.

4. In August 2014 RISC_OSLib v5.83 was changed from using a default of Kar to Ktar, which makes dialogues without any K validation behave exactly as defined in the PRM for Ktar.

Thus to me the Style Guide and PRMs seem to be contradicting each other, but the PRMs agree with what happens currently in the native Wimp (except for the caret position), and using RISC_OSLib. Is there a ‘bug’ in the Style Guide or elsewhere?

VPf is not very fast, -mfpu=neon improve the speed of factor 3.5 far far away from Intel float performance.
Neon should be better. More registers and faster operations, but stranglely -funsafe-math-optimizations do not change anything !

CC only knows how to use the old FPA floating point instruction set, which means all floating point instructions will actually be emulated via the FPEmulator module (very slow).

GCC supports FPA, softfloat, and VFP. IIRCFPA is the only option if you’re linking against the shared C library. Unixlib defaults to softlfloat (which is faster than FPEmulator). VFP must be specifically enabled, since it’s only supported by the newer CPUs:

-mfpu=vfp will select VFPv2 and automatically switch the CPU to ARM11 – producing code suitable for the Raspberry Pi 1 and later

-mfpu=neon will select VFPv3 + NEON and automatically switch the CPU to Cortex-A8. The code won’t be compatible with the Pi 1, but it’ll work on all the other modern machines that have NEON.

-mfpu=neon -funsafe-math-optimizations (and maybe also an optimisation flag like -O3) must be used if you want floating point operations to be automatically vectorised into NEON code – NEON isn’t fully IEEE compliant so you need to tell the compiler that you’re OK with the non-compliance.

My installation of Otter Browser is working fine and that file is to be found in the directory Resources.!SharedLibs.lib.abi-2/0.vfp, in case that helps. AFAICR you can use PackMan to uninstall/reinstall OB, if the file is missing.

I’ve bench CC and GCC in order to see the quality of the produced code. I’ve realised the same test on A BSD server-3.3 Ghz XEON (only to check the difference).
Most of the time GCC in a little bit better than CC.

For Integer test, CC and GCC produce almost the same code quality and the RaspBerry Pi is 2 or 3 times slower than the XEON (it’s quite normal)
But for floating operation, that’s not the same. GCC is 2 time faster than CC. But the Intel gcc compiler is up to 100 times faster on BSD.

My conclusion therefore is that GCC and CC do not use Hardware floating point operation under RISC OS !
I’ve a look at the compiler options but I did not find anything !

After many years of hiding from it in fear, I’ve finally taken another look at ticket #324. I haven’t observed any crashes yet, but I am seeing some behaviour that indicates the system is close to crashing (e.g. excessive CPU usage), and some undesirable side-effects which are perhaps related to that (e.g. excessive packet loss).

The note by Sprow suggests that the problem is buffer related – pinging with a packet size bigger than the MTU will involve packet split and re-assembly at/in the destination and the re-assembly is either failing or causing the buffer to fill while the packet re-assembly delay is active.

Interesting from my viewpoint as I regularly test for underlying problems by increasing the packet size for the ping test1 where duplex mis-match issues and other bandwidth affecting problems tend to show up. In the declared scenario I wonder what happens if you set -f on the ping sent from an interface with a larger MTU. If it’s a buffer issue the problem shouldn’t arise as the interface won’t (well shouldn’t) accept the larger packet.

1 on a PC ping 8.8.8.8 -l 2048 gives a packet size of 2048 bytes for those interested.

This returns address 1B valid for read, as expected, no address is valid for write.

I’ve tried various values in data% and for the number of bytes there, and get no difference.

I’m wondering whether there is a fault in the hardware – it’s an interface board from Sheepwalk Electronics. They only supply software for Linux, and setting that up is more than i want to attempt at the moment.

After many years of hiding from it in fear, I’ve finally taken another look at ticket #324. I haven’t observed any crashes yet, but I am seeing some behaviour that indicates the system is close to crashing (e.g. excessive CPU usage), and some undesirable side-effects which are perhaps related to that (e.g. excessive packet loss).

Profiling on my BB-xM showed that when subjected to a packet flood, about 30% of the CPU time was spent in BufferManager, copying the received USB data into the DeviceFS buffer. Apart from the slow performance of the copy routine, the copy was also being performed with interrupts disabled, and seemed to be taking long enough to be breaking the age-old RISC OS 3 PRM rule of not spending more than 100us with interrupts disabled (although the overhead of the profiling may have contributed to that). Then after that the data would be copied out of the DeviceFS buffer and into mbufs (again with IRQs disabled, albeit much quicker this time due to being in cacheable memory) for processing by the Internet module (which, thankfully, is performed with IRQs enabled).

Presumably the high packet loss when the system is running at a lower clock speed is a symptom of the interrupt-driven USB → DeviceFS copy routines taking far too much time compared to the callback-driven Internet code which is processing the packets (and very little time, if any, being left for foreground programs to run and actually process those packets)

This suggests we have the following areas for improvement:

Make BufferManager’s copy routines faster (i.e. resurrect the better memcpy() project – I think that stalled because I was struggling to fit a strcpy / strcmp / str-something test into my framework, so if we ignore everything except memcpy/memmove then it should be possible to make some useful progress)

Rewrite BufferManager so that the buffer insert/remove routines perform the copy operation with IRQs enabled. Potentially rewrite the system to be threading-friendly as well, so that you can have multiple concurrent read and write requests to the same buffer. However the difficulty with this is dealing with requests completing out-of-order (which could happen both due to re-entrancy and due to threading). Instead of using a simple circular buffer system containing a “used” part and a “free” part, BufferManager will have to keep track of multiple “used” and “free” parts, along with “locked” parts which are in the process of being filled / emptied.

If this is going to result in a significant change to the way BufferManager handles buffers, maybe now would be a good time to introduce a zero-copy option, especially if there’s a way to unify it with mbufs so data can be zero-copied between the two

EHCIDriver (and almost certainly the other USB drivers) will need modifying as well, to make sure that they only perform buffer insert/removal with IRQs enabled

We could also consider allowing EHCIDriver (and other drivers) to tell the hardware to read/write from the DeviceFS buffers directly – there is/was a #define related to this in the source code, but I’m not sure of the history of why it’s not in use (plus there are hardware bugs to worry about – so maybe the inverse approach, of allowing DeviceFS/BufferManager to use USB-allocated buffers, would a safer place to start)

I’m yet to do any profiling on my Iyonix (hopefully tonight) – it’ll be interesting to see what the bottleneck is there, since there’s no USB/BufferManager involved. Perhaps it’ll be a similar problem, i.e. EtherK might be copying data into mbufs from within its IRQ handler.

I am trying to use the IIC bus in an rPi and getting confused.
The device concerned is an IIC to 1-wire interface chip, Maxim DS2482-100. It has an address of 18 hex. I’m programming in BASIC.
I tried using OS_IICOp and got highly confused about which part of the data structure was for the interface and which part was to be passed through, so I tried the simpler IIC_Control SWI. In both cases I hit the same problem.
If I use address &30 (i.e. &18 shifted left one bit, and read/write bit set to write) I always get “No response from IIC device”.
If I use address &31 (i.e. read) then it returns &1A. The default is to read the status register on the interface device. This indicates that the last thing done was a reset (at power on) and there was an 1-wire device present. With no device present at power-up, i get &18 instead, as expected.

I expected I should use the write version of the address to send commands to the interface, or to send data to be transferred to the 1-wire bus. Have I got this wrong, and if so, why?

Looks like talked myself into a project and series of articles to follow. Looks like the serial UART method is a great way to get 1-wire data with risc OS and guess it means other risc OS machines can get access to 1-wire too.

Start will be thermostat, relay, Rpi and the program up a simple program to control the heating and display temp. This will then be added to for things like controlling remotely. Next will be getting it to bring in things like time, outside temp, wind speed and direction and are if can’t use the info to get it to learn and adjust its behaviours smartly (maybe also include time from heating on to change in temp upwards and time after goes off for drop to start in data as well.

Last bit which may end up pie in the sky but fingers crossed. There are 6ghz rf devices for Rpi and z-wave smart TVR’s use same 6ghz Rf. So if can get at info as to what is sent to and from the thermostats i can pick up individual room temps and even create a zoned system where if one room gets too far below temp it can come on just for that room.

Anyway won a Rpi from raspberry pie competition so have a dedicated risc OS one for project and am going to order the temp sensor and other bits. You even get one wire light sensors and other stuff so could maybe even bring in smart lighting and even some sockets (you can buy 6ghz Rf controlled socket things as well.

May be poss to build to a really reliable full automation system for the diy electronics and programming sorts.

And I’ll bet i can produce a much better learning system than any expensive Honeywell etc learning system.

One thing I’ve realised is that the there doesn’t seem to be any mention of TRIM support in the filesystem bounty plans. Probably a good thing to slot in somewhere, now that SSDs are the norm for modern machines.