Seeking some advice on a new project

Hello all, hope this message finds you well.

I am new to programming and I have begun planning a new project. The project is simply this, I will send diagnostic info from a Linux machine to a server. The server will then display the data in a webpage. I have many Linux machines that will transmit the data on a daily basis; therefore, I am wanting to create a one stop shop so to speak so I can simply go to the specific URL and see the same data I would otherwise have to go into each individual machine.

The problem that I am running into is trying to figure out the best way to get the data to the server and in what format i.e. txt, csv, or a TCP/UDP datagram.

I am not asking anyone to solve this problem for me because I do not believe I will learn that way. I am simply asking if someone would be willing to steer me in the right direction and allow me to pick their brain for a bit. I would greatly appreciate any feedback.

I also posted my question in the HTML forum. I am sorry for the redundancy, however I was not sure which forum to post in.

There are utilities for this sort of thing, but if you want a DIY experience to force yourself to learn a few tricks, I would recommend focusing on skills that are useful in as many settings as possible.

If I were to design something like this I'd ask a few questions first:

What sort of data do we need to see?

What sort of data do we want to see?

What is the smallest feature set which, when completed, defines project success?

Does the data need to be retrieved when the page is looked at, or can it be cached and updated on a schedule?

Does the data need to be historical, or is the current/latest update all we care about?

Why are we using the web? Might we ever want to use another interface? (i.e. Android application)

Might we ever want to add a new tracking criteria after the initial deployment?

etc.

If I were to build a prototype I'd start with how to get the data first and then where to keep it. A scripting language is an ideal way to query system data (unless you want to write your own top type program -- then C is pretty straightforward), and languages like Python and Perl have networking, database, GUI and unix shell bindings that are easy to use.

To keep things simple I'd write a scheduled reporter based on a system cron job (receiving commands for dynamic output requires either writing a daemon or maintaining an open connection -- way too complex for this stage).

I'd write a script that, say, dumps the output of "df" to a text file somewhere. Then on the server I'd get Postgres (an awesome, free RDBMS server) running and create a very simple schema to keep track of incoming system, IP, incoming timestamp, reported timestamp, command string, and a name for the output type.

Now I've got something to send and some place to receive it. Then I'd try to manually send the content via psql (a text Postgres client) from the client to the server. Then I'd write another script that either connects directly (straight from Python using the psycopg2 library is easy) or via psql (most straightforward option if the script is in Bash or some language that lacks Postgres bindings) to the server and see if I can get the insert to the server DB to work.

If it does, then I'd write another script that calls both of the ones just written in order -- dump the data to a text file, then call the Postgres server and insert it as table data. If that works, then I'd write a cron job to call the script just written every X minutes/hours/whatever and check to make sure its actually doing its job without problems.

If all that works then I'd rework the set of three scripts a bit. The final script that controls everything would grow a "check settings" type function that reads through a list of scripts to run, or a set of commands to run (whatever). When called it would now check its settings and run through the list of data pullers without me needing to write a new job for it every time. Then I'd think of a way to check whether a file that we've got on the client side has been sent yet or not (this is networking -- things will go down) and think about how the program should behave when it can't connect to the server, and if I'm really interested in it I'd probably write a way for it to become aware of how long its been since it was run last so I can report gap times to the server.

Then its back to the DB to expand the schema to account for the new data. Also, our toy schema is no longer good enough and we need real authentication now -- so that means creating roles for each system that's sending and a way for those systems to authenticate to the DB so you're not just letting anyone write/read to your server (using Postgres built-in user/role model massively simplifies this, btw).

There remain a million tweaks on the client side I'd write into a TODO/Wishlist and forget about for the time being. The point would be, up to this point, whether you can create a system data reporting infrastructure based on easy to handle, broadly useful, readily available components that don't cost you anything (so far that's *nix, cron, Python/Bash/Perl/whatever, and Postgres). Nothing up to now is very hard, but all if it is very useful to know.

The next step would be to get it output somewhere useful. Since we're just doing static data dumps I'd probably just write a script that builds a static HTML page from the data in the DB whenever asked for now. You can get all crazy with web frameworks and things later -- frameworks like Django are so easy this should really be an afterthought. Focus first on the low-order task of checking to make sure your webserver actually can serve pages and that you understand where old-fashioned HTML document files get stored. And then script the construction of them.

Once I was satisfied I'd probably do the "build a page on request from the DB" thing. I'm intimately familiar with Django, so I'd use that, but pick your poison. Be warned: all frameworks suck for non-trivial data. So the key here is to keep yours trivial and stupid, just like everything else on the web.

At this point I might check my TODO/wishlist file, or not. Depends on how much time I had available and how interested I still was in the project.

The very last thing I'd do would be to tackle the task of making a live request from a web page generate refresh responses from the client computers. A lot more goes into that, so it needs to be last. That's also where you will stand a very high chance of opening gigantic security holes in your network without realizing it -- so once again, last.

If you look back over my enormous, un-edited brain dump you'll see that while the details are focused on the task you specified, the underlying process of "break this bite-sized chunk off and explore it, then this one, etc." as well as the idea of focusing on using broadly useful, available, accessible things up front (languages, kernels, DB servers, etc.) and then niche-use stuff last (a web framework -- none of which may be popular next year) is the way quite a few one-man FOSS projects go. Well, if the guy has built a thing or two already, that is.

Also, as for "where to post this?" -- imo it should have gone in the Software Design forum, not Linux or HTML. But there are a bajillion forums to choose from here, so its sort of daunting to pick one sometimes, not to mention scroll through them all.

WOW and thank you for such a detailed and explanatory response. It means a great deal that you would go to such great lengths...again, thank you.

I have busted this projects up into many small bite size chunks...hahah "divide and conquer" right!?

Currently I am using AWK within a shell script to extract the output of the free command. I am in turn taking the output and with a little arithmetic, I am able to output the percentage of memory used; all of which will be passed to a variable and then passed to a PHP script. The PHP script will then remotely connect to MYSQL which is on the server and then forward the data. The above will all be done via a cronjob on a daily basis.

Now, with the exception of the shell script, the above is still a work in progress. I do intend to add more diagnostic info, however I am keeping it simple by only using memory at this point.

Once I get the info into the DB on the server....uhh...wait a min...I have not got that far. The theory is, when the webpage is loaded a nicely designed table so to speak will display the data next to an icon resembling a computer.

I am trying to keep this as simple as I can; aesthetics can be added at a later time. My goal is to get it operational on a minimal level.

Yes, I am quite sure there are tools readily available that do the sort of thing I am trying to do; however, I am learning as I do this and to be honest, it is quite fun.

I would probably wrap the AWK in very simple bash script and have it dump the AWK result to a text file named something like "free-[timestamp]" or "[timestamp]-free" so that sorting them the old-fashioned way was easy/natural. All the bash script would do is build the file name and redirect to it. As for timestamps that make for friendly file names, [read this].

I would also ditch PHP unless we're actually dealing with web pages themselves on the server. PHP is a niche language and will not be found on client systems (or shouldn't be, anyway). That means that you introduce a server-side dependency on your client systems -- that's a bad thing. Develop client-side with things that are common on client systems, like Python or Perl. The point is to learn something new, right? Then learn something more generally useful than a webserver language. The essential functionality of your application has nothing to do with the web -- it is merely incidental that you want to display output via web pages. When you consider things that way, PHP is awkward choice for client-side code.

I would personally recommend switching from MySQL to Postgres. Not because its necessary for this small use case, but because anything beyond the triviality of a website backend needs what Postgres has to offer. But regardless, you can change backend later if you get curious. The main points are:

Dump the output of your AWK to a text file somewhere sane.

Either make the AWK script grow the ability to generate sane timestamps, or wrap it/call it from a Bash script that can.