So far, I've been able to work around this problem without re-installing my core. Can you be more specific about what this does because I don't want to just seat-of-the-pants experiment on my working system. I don't want to attempt anything that my not allow my LMCE Core to at least boot up if this doesn't correct the orbiter problem.

Guys, hold off on the bashing because I'm still fairly new to Linux and LinuxMCE but I don't have a file specifically name core.sh in my /etc/init.d directory. I have a text file named core without the .sh extension. Here is the code:

Okay, the double orbiter launch on the core has been really bothering me and I've been trying to isolate why/how it is happening.

I have resolved the issue on my system, it appears, but I'm thinking what I found is just a point in the right direction to a deeper issue.

After booting up and having the second orbiter open with the 'this orbiter will be closed message' I was hunting around and noticed that the orbiter (21) was not listed in my /usr/pluto/locks/pluto_spawned_local_devices.txt file. Actually, there were a lot of devices not listed in this file that should have been. I did some more digging into the startup procedures and it took me to the file /usr/pluto/Spawn_DCERouter.sh. In this script exists the following lines:

# hack: cleaning lockfile on router start to allow# local devices to start# TODO: remove this when correct locking will be implementedrm -f /usr/pluto/locks/pluto_spawned_local_devices.txt

So I commented out the 'rm -f ...' line. My system now seems to boot normally with the /usr/pluto/locks/pluto_spawned_local_devices.txt file showing all of the spawned devices correctly. I do not have a second orbiter spawning.

It seems to me that this is an indication that /usr/pluto/Spawn_DCERouter.sh is being called more than once which would delete the pluto_spawned_local_devices.txt file part way through startup. This causes DCERouter to attempt to reload any devices which had already been loaded. Hence starting a second orbiter.

Am I on the right track here? This is a very complex system and I'm not sure where to go now to figure where/why Spawn_DCERouter.sh may be called a second time.

It looks like Device 21 is spawned and then Start_DCERouter.sh/Spawn_DCERouter.sh is called (where it erases the lock file) and when Device 22 is spawned the orbiter (Device 21) is not listed in the 'Already Running list: ' (as the lock file was erased). Then as Device 26 is spawned #22 is shown in the 'Already Running list: 22m,'.

Then when the the system tries to start Device 21 again a short time later:

So my fix above of commenting out the 'rm -f ...' line is probably not correct but it works because the lock file is not deleted by Start_DCERouter.sh. Instead I believe the initial call to 'Spawn_Device.sh 21' probably shouldn't be happening until later.

It seems the next step is to figure out why/where 'Spawn_Device.sh 21' is executing before Start_DCERouter.sh. Can anyone with more knowledge than I confirm for me that this should not be happening in this order?

I've been digging further into this and am starting to get a good grasp of the lmce boot process.

I have done a near forensic audit of my logs during the incorrect boot process and have narrowed down the issue to occurring within LMCE_Launch_Manager. My C++ is pretty sketchy so I decided to double check every possible script I could during the boot process. I couldn't find anything in the scripts which would cause orbiter to load prematurely so I went back to LMCE_Launch_Manager.

By cross-referencing with pluto.log it seems premature execution of orbiter is occuring between "Spawning new children devices - media <0xb77616c0>" and "Finished spawning new children devices - media <0xb77616c0>". But these sections of code shouldn't be executing if the Core is not running yet (at least from my understanding of the code)...

So I forced StartCoreServices.sh to execute BEFORE LMCE_Launch_Manager by adding it to Startup_Core-Hybrid.sh right before the call to lmce_launch_manager.sh. The system seems to boot PERFECTLY (or it seems so to me... I'm still learning). The following is my complete LaunchManager.log from the new boot:

LMCE_Launch_Manager understands that the core is already started and doesn't re-execture StartCoreServices.sh. Now a "Spawning new children devices - core" executes (because the core is now started) and the "Spawning new children devices - media" section doesn't seem to be executed (but I think it should be).

Now, I am no C++ expert, I'm wondering about an expression in LM.cpp which tests to determine if the spawn devices-media section will execute:

Should there not be a space between the && and m_bCoreRunning for the expression to evaluate properly? If LMCE_Launch_Manager knew the core was not running it wouldn't attempt to execute the spawn devices-media section at this time. Likewise it *should* execute this section if the core is running (but didn't).

I'm sorry if I'm way off here but the only C++ experience I have is a college course nearly 10 years ago. I'm willing to keep beating my head into the screen trying to sort this through but this is my best guess right now.

Thanks so much for digging in! I had to bow out last night due to muscle pain (which is still present today.)

No, There does not have to be a space between the && and the other parts of the expression, although it would make it look prettier.

The respawn new children code IS being called more than once, which is very puzzling, it shouldn't.

It should only be triggered from the DCE part of the LMCE Launch Manager. The Launch Manager is a bit different from a typical DCE device because:

(1) it does not live within the device tree, it attaches itself to a computer device node, such as the core, or the media director, and therefore can receive messages that children have jumped on or off the system. This code is actually implemented in DCE::Command_Impl as CreateNewChildren. (2) The UI part of it is implemented as a separate thread, which is launched from main, and which connects itself back to the DCE thread so that it can call methods. This allows the DCE device to call the LM::respawnNewChildren() to do all the needed house cleaning.

This is most puzzling, but it has narrowed it down to something really small.

Your welcome Thom. This is an extremely complex system and I am very grateful for all the work everyone has put it in. I'm glad I can contribute, even if it is just a little.

I've spent much of my day wrapping my head around LMCE_Launch_Manager (LM) and decided to try a few things out. Because I don't have a dev environment set up I am really unable to do much with LM, except stare at the source incessantly. So I noticed that there was a lot of logging within the source that doesn't end up in a log. I also noticed that *some* extra logging shows when you run LM from a console. So I commented out LM in the boot scripts and logged in on tty1 and ran LM there while capturing output... Still not all the logging showing but some extra. The first portion of the log is here:

=>initialize_Connections()Opening DB connection..SuccessfulChecking if core is up...Running UpdateAvailableSerialPorts.shRequested respawning of new children devicesSpawning new children devices - mediaStarting device 21 OnScreen OrbiterError when querying device statusFinished spawning new children devices - mediaFinished respawning of new children devices>>Performing autostart if configured..>>Autostarting core....Running UpdateAvailableSerialPorts.shStarting process /usr/pluto/bin/StartCoreServices.sh... continues...

The line "Running UpdateAvailableSerialPorts.sh" through to "Finished respawning of new children devices" show that the functions LM::updateScripts and LM::respawnNewChildren are called. This is happening before the LM::doAutoStart function is called and is the premature execution of orbiter.

There are 2 places in code this looks like it could happen:1) a call to LMdeviceKeepAlive() <-- this seems rather unlikely given that DCERouter is not running to have sent this msg yet2) either m_bCoreRunning OR m_bMediaRunning are true in the middle of LM::Initialize

Because LM::respawnNewChildren logs that it's calling LM::startMediaDevices it stands to reason that m_bMediaRunning is TRUE in LM::Initialize (supports #2 above). As well, because the logs do not show LM::startCoreDevices being called it seems reasonable to assume that m_bCoreRunning is FALSE (especially since the autostart core routines seem to run after LM::respawnNewChildren), **edit: it's also the only way that m_bMediaRunning wouldn't have an opportunity to be set false...

So... My hypothesis at the moment is that m_bMediaRunning is erroneously being set TRUE during one of the LM:initialize_* functions. I'm not sure what else may or may not be happening in between because what is/isn't logged seems kinda random to me.

Wait... What about this:m_bMediaRunning is never assigned an initial value and has the possibility of not being assigned a value of FALSE during the LM::initialize_* functions. m_bMediaRunning can only be set FALSE if m_bCoreRunning is TRUE during LM::initialize_Start (when checkMedia() is called). So, sketchy again on the C++ but... will the memory location that m_bMediaRunning points to not just contain whatever was in the memory location beforehand? And if it contains a 1 and m_bMediaRunning *is not* set FALSE somewhere then it could be TRUE? If this is the case it would be just a matter of setting m_bMediaRunning=false at the top of LM::initialize_Start()...(or somewhere else)?

This could also explain why some people experience the problem and some don't.

I might be completely crazy here... I've been beating my head into this monitor for a while...