upgrading advice...

This is a discussion on upgrading advice... within the General Discussions forums, part of the Community Boards category; Hi,
I'm looking to upgrade my current rig. Let me tell you a little bit about the work I use ...

upgrading advice...

Hi,

I'm looking to upgrade my current rig. Let me tell you a little bit about the work I use it for.

I use it for computational linguistics processing. I have a multithreaded program which calculates vector distances in very sparse matrices extracted from language data. I code in Java. Since it's language data, you easily have several thousand dimensions in the vector space, and several thousand vectors in memory simultaneously (of course we're using sparse matrices).

Right now I'm running a 2.5 ghz Phenom 9850, with 8 gigs of ddr2-800.

I am considering two upgrades:

1.) a quad core intel 9xx cpu

2.) a phenom II 3.4 ghz.

I see that quad core 9xx's support hyperthreading. Is this going to help my throughput?

With out any clue about your processing... I would suggest you attempt to profile the Java program. You may find that bottle necks come from the RAM clock speed, if so, changing the processor won't really do much.

Have you tried compiling to native code (i.e. with GCJ). You'll probably find that can increase your through-put a tad. And should cost you nothing.

Depends with the hyperthreading, how does your application use threads?

I see that quad core 9xx's support hyperthreading. Is this going to help my throughput?

Presumably, yes. But taking advantage of hyperthreading is better guaranteed as you increase the portion of your total processing that operates under multi-threading, as per Amdahl's law. That link should help you get the picture. If multi-threading operations are sparse and take only a small portion of your total processing, you are probably better off with the Phenom. Benchmarking is however, and unfortunately, the only way to know for sure.

There's also the fundamental issue of the operating system. On the Windows world, Windows 7 seems to be the first OS to really take advantage of logical cores and scheduling. Linux supports it fully too and to make sure it's active, just check /proc/cpuinfo. siblings lists the number of logical cpus. Should be a higher number than that of physical cores under cpu cores.

With out any clue about your processing... I would suggest you attempt to profile the Java program. You may find that bottle necks come from the RAM clock speed, if so, changing the processor won't really do much.

Have you tried compiling to native code (i.e. with GCJ). You'll probably find that can increase your through-put a tad. And should cost you nothing.

Depends with the hyperthreading, how does your application use threads?

My cpu utilization is 100% across all four cores most of the time. I parallelized all of the intensive stuff. For example, about 1gb of tagged corpus has to be read in, information has to extracted from each sentence, and a vector space has to be constructed. This is all highly parallelizable, and so I did it. Also, distance calculations were highly parallelizable so I did that too. My application uses threads during these intensive parts using ThreadPool and Executors to manage them.

I'm compiling at the command-line with javac.exe and executing with java.exe. This produces .class files, but I guess this isn't native executable code? What kind of speed up can I expect?

Presumably, yes. But taking advantage of hyperthreading is better guaranteed as you increase the portion of your total processing that operates under multi-threading, as per Amdahl's law. That link should help you get the picture. If multi-threading operations are sparse and take only a small portion of your total processing, you are probably better off with the Phenom. Benchmarking is however, and unfortunately, the only way to know for sure.

There's also the fundamental issue of the operating system. On the Windows world, Windows 7 seems to be the first OS to really take advantage of logical cores and scheduling. Linux supports it fully too and to make sure it's active, just check /proc/cpuinfo. siblings lists the number of logical cpus. Should be a higher number than that of physical cores under cpu cores.

Right my program is highly parallelized and I think I can take advantage of some extra juice. And right, I'm hoping that four cores + HT is better than four cores w/o HT. I wish I had a friend with the hardware so I could check it out!

Are you planning on updating your current computer, or replacing it?

You wont be able to carry much over as AM3 and 1156/1366 both use DDR3 memory. Do you have a budget in mind? Budget usually seems to be biggest deciding factor.

No budget restriction, this is for research so I can get funding as I need it. And yes I am trying for a complete overhaul. I use to think that if I got a roughly 1.5 increase in speed, it'd be worth upgrading. But since I've been using AMD for so long (since the original Athlons), I'm shot for experience with the new Intel technology.

I use it for computational linguistics processing. I have a multithreaded program which calculates vector distances in very sparse matrices extracted from language data. I code in Java. Since it's language data, you easily have several thousand dimensions in the vector space, and several thousand vectors in memory simultaneously (of course we're using sparse matrices).

If you want to do all of that I suggest you do what I would call a language upgrade to C or C++ since it will be far better at crunching the numbers. In this instance your language of choice is obviously not the best for this particular task unless of course you have other tasks it must perform which necessitate the use of Java. If you are absolutely stuck with Java, I would second the notion of compiling the code down to native.

Upgrading to a better CPU probably won't give you as much of a boost as you are hoping in XP at least primarily b/c of what Mario stated. XP is just not built for multi-core processor and sucks at scheduling and managing cores. Most multi-thread apps on my Phenom quad core barely touch the other cores and yet nearly max out the first core. Sort of defeats the purpose really.

There's another option, if somehow C or C++ doesn't appeal to you, or you feel the cost of porting the code is too steep. It's not so advantageous in the short-term as simply compiling to native code, but I feel it's a long term investment you (and your team?) won't regret.

Use Erlang for your whole MT logic and interface it with the rest of your Java code.

Erlang was written specifically for multitasking. Personally I'm enamored with this language. Not so much because of its syntax, but its semantics and its whole approach to MT which is vastly different from the thread-based model in languages like Java and C or C++. It's gone to a point I actually loath having to code thread-based MT on C++. For purposes of multitasking, I feel Erlang's Actor Model approach is a lot easier to code and manage and less prone to the generation of hard to track bugs. It's in my opinion an highly elegant approach.