Ideas in Wiring: The End of the Free Lunch

Tuesday, June 14, 2005

The End of the Free Lunch

Ok, so there's No Such Thing as a Free Lunch but in the computing world the processor vendors like Intel were buying lunch for everyone. I headed up the LabVIEW Performance team for about 2 years and had a pretty easy job. All our group had to do to make LabVIEW faster was to wait 18 months and let Intel double the speed of their processors. It almost made cutting 20% off of some array manipulation feel rather worthless. As you've probably been noticing, that 4 Gigahertz Pentium that you were expecting never quite happened. We've been stuck at 3.2ish GHz for quite some time now. So what's going on? CPU vendors have discovered that they can't keep increasing the clock rate the way they used to. In fact, their entire paradigm for extracting more speed has fallen down. Thus, they've all changed course and gone dual core. Now, instead of doubling the speed, they are going to give you two processors next year for the same price as one processor this year. But does your software still get twice as fast?

Well.... maybe. Software written in C needs to be specially written in order to take advantage of the second processor. Windows XP, MacOS X, and Linux all support Symmetric Multiprocessing (SMP). This allows chunks of code to run on either processor without the software noticing which processor it is running on. The smallest "chunk" that can run on a different processor is known as a "thread". In C, you split your code into different sequential paths and call some extra libraries that schedule those paths using the separate threads. If you do this, you call your application "multi threaded". LabVIEW became a multi threaded application in version 5.0 and ran on SMP computers. In the past 6 years multiprocessor machines have been rather rare. They were typically server machines that were four times more expensive than typical desktop machines. This may have been rather fortunate because it takes a while to get the kinks out of your software. We found all sorts of bizarre bugs (both in hardware and in software) on multiprocessor machines. Once Intel started putting Hyperthreading in their desktop processors, we saw average users have SMP machines.

Now, I mentioned that C code needs to be specially written to take advantage of multiple processors. What about LabVIEW code? Getting code to run correctly and more quickly on a multiprocessor machine requires code to be scheduled for correctness and maximum efficiency. In order to get the code to run correctly, the system needs to only schedule chunks of code to run when the data is available. In short, it needs to know the data flow of the program. That's what LabVIEW does. It schedules your code using data flow rules. Thus, on a multiprocessor machine, data flow code will execute correctly. (Note: you've learned all along that globals in LabVIEW can be bad... This is yet another place that they become bad). Using data flow rules, LabVIEW will automatically figure out which code can run independently and start both chunks of code running at the same time. Remember my example about parallel while loops? On an SMP system, these loops will run completely in parallel on separate processors. There's no need to share the processor so there's no need to ping-pong back and forth.

Thus, LabVIEW programs will automatically spread themselves across multiple CPUs. You don't need to do anything. There are some NI customers who see the speed of their applications double when adding a second processor (or triple or quadruple when adding a third or fourth). So, does Intel in combination with NI give you a free lunch? Maybe. It turns out that not all programs get that nice speed boost when adding an additional processor. I'll give you some tricks in the next article.

3 Comments:

At 2:15 PM, Joel said...: Typically, we recommend against globals when you can avoid them. You can't always but in many cases you can. The biggest risk is this one:

Why Does Using Local or Global Variables to Pass Data Between Parallel Loops Cause Unexpected Behavior in LabVIEW?

LabVIEW will schedule code using data-flow rules. If you use global variables, the data flow engine doesn't realize that there might be a timing dependency between acesses to the global variable and you may get behavior you don't expect, especially if you ever run the code on a multiprocessor machine.

This will be compounded if your global variable is something like a cluster rather than a simple scalar. Then life gets really messy.

There are some minor memory allocation issues to consider also:

LabVIEW Performance and Memory Management

"A note in the Memory Usage section of this application note cautions against overusing local and global variables. When you read from a global variable, a copy of the data of the global variable is generated. Thus, a complete copy of the data of the array is being generated each time you access an element. The next method shows an even more efficient method that avoids this overhead."

And: Does Reading a Local and Global Variable Create a Copy of the Data in Memory?

One other technique we recommend is something called "Functional Globals" (sometimes known as LabVIEW 2 style globals). They provide better protection, especially when you have a global cluster. You can see an example of them here
At 9:18 AM, Anonymous said...: Hey Joel...thanks for the writeup!
At 5:31 PM, Anonymous said...: Need more articles on architecture, frameworks, desing patterns. keep up the good work...

Ideas in Wiring

Tuesday, June 14, 2005

The End of the Free Lunch

3 Comments:

Previous Posts

About Me