I'm pretty new to java and really enjoying learning about it. I have made a program that is working well but takes a bit long when I add more data for it to process. I made it threaded and it really speeded it up alot, but now I'm thinking of trying to speed it up more(obviously more data it has to process the longer it takes). Just an fyi, my program does not share any data between threads,it gets one item of a list and does some math and uploads the result to a database. Ideally, a few work computers get a few items of the list and then do its work and then get more work until its done

I did a bit of research and found queues, and am not sure if its what I need or if there's something else out there(also I was thinking maintaining integrity/monitor of workers might be too much for me to write as a newbie). I have 4 computers at home(some Linux, mac and windows..but I can install linux vm's on all nonlinux systems if these solutions are os specific) and wanted to try to get them to start working on this task as well. I thought about creating Java queues that the other clients take a piece and process but I also saw libraries(rabbitmq). I also did briefly look over grid computing.

Is this the way to go or is there a better way? I don't need any code or anything just want to know what are the solutions to distributing threads or what factors to use when evaluating them.

4 Answers
4

Just to wrap up - you have already scaled-up, now you want to scale-out. From the top of my head:

terracotta: you can create a Java Queue that will be automatically distributed across the whole cluster. Basically you run the same application with few threads reading data from the queue. Terracotta magically distributes that queue so it feels like local object.

You can use JMS or Hazelcast (e.g. distributed ExecutorService) to distribute work between machines.

What I would do first is look at improving your algorithms. You might find you can go 2-4x faster using 4 machines, but you can get 10-1000x performance improvement through performance profiling, refactoring and tunings, often with less complexity.

In general, using a queue (like RabbitMQ) to load in "jobs", and then having workers pull jobs off of the queue for processing is the most scaleable pattern that doesn't take too much work to get up and running.

Once that is in place, you can spin up whatever workers you need, spread across whatever machines you have / need.

After the general "message passing" architecture is in place, the next step is always to find out what is causing the process to be slow. Not all problems can be solved by simply throwing up more threads on a box or more boxes in a cluster (many can, however).

For example, if the jobs are CPU bound, there's no point in running more threads on a single box than you have cores to run them on (-1 core that is used to manage the threads).

If the operations are disk or network bound, however, those sorts of jobs can be built in an asynchronous fashion internally to the job that allows other threads to kick in while the first one waits for the disk or the network to come back with what it asked for.

Ultimately, the message passing architecture is the most important piece, and after that it's all about optimizing the jobs and using your resources efficiently, which requires intimate knowledge of the domain.

If you get through the bulk of the job optimizing, you may start to look at inter-process caching techniques using fast key-value caches like Redis so that you are non re-computing data that you need over and over again.