OK, so lets say I have a Nagios setup that monitors different services using the so-called nagios-plugins.

What would be the best practive for my nagios plugin (probably written in python) to determine if given service is running OK?

The particular service in question is a python socket server that listens on some port. So I will make sure nagios frequently checks that service and if it stops responding / dies, I should restart it. What should I do to know if the socket server is alive? Eventually how would I check if it is responding.

I have control over the service - I can change the way it works if that would help me determine it's health state.

Since you can modify your service, you can do something like "Are you OK?" and look for "I'm OK". It depends on how involved you want to get with checking to see if the service is up and running.

You can also use check_procs to see if the process for the service is there. This might be in conjunction with a check_tcp check, or as an alternative. Again, it depends on what you're doing, and how much you actually want to do. If you want to get very involved, you can write a custom Nagios check that will do all sorts of things to verify the functionality of the service and return custom state messages to the Nagios server.

You can start by checking whether the process name exists in ps -ef output.

You can check the listening port in output of netstat -lnp | grep your_port.

You can try to connect to the port using a python function.

You can try to request some service after that and check the returned output. This is related to the service. For example, you can request an existing page for HTTP service and so on. This will enable you also to measure the response time.