Wednesday, June 29, 2005

Writing Parallel Programs in LabVIEW - Part 1

In my post about how the CPU business is going to multi-core processors, I talked about how LabVIEW can help things. Here's the first in an occasional series of things I can think of that help LabVIEW applications run in parallel on multi-CPU machines.

1) Read the paper about Hyperthreading and LabVIEW. That shows you all about the LabVIEW threading model, clumps, and how to write some basic applications that do run in parallel.

2) Learn about re-entrant SubVIs. Normally, a VI that is called from two places in parallel will not actually execute in parallel. Otherwise, the data from one iteration would stomp on the data running in the other. Therefore, you need to mark the VI as re-entrant.

There's a hard way and an easy way to do this. The easy way is to go to the VI properties and mark the VI as re-entrant, then drop down the VI in all of the places you want to use it. This was shown in the Hyperthreading and LabVIEW paper.

But what if you don't know how much parallelism you want when you write the program? What if you don't know how many channels you are going to run in parallel?

One application note on this is Multi-Tachometer Processing with the Order Analysis Toolkit. Order Analysis is accomplished through a long computation (FFT). If you have multiple channels, you can compute multiple channels in parallel on a multiprocessor machine. Add more processors, compute more channels in the same amount of time. This appnote shows you how to dynamically add more parallelism based on the number of channels you are processing.

3) Make sure you don't get serialized in Call Library Nodes either. If a Call Library node isn't marked as threadsafe, LabVIEW will serialize calls to the DLL. However, unlike a VI, just changing the Call Library Node to reentrant is NOT a good idea unless you are positive the code is really threadsafe. If it is threadsafe, then it's always a good idea to mark the Call Library Node as reentrant. There is no downside (and there's a big performance boost to marking a Call Library Node reentrant).

4) Be judicious in your use of re-entrant SubVIs. If your VI that you want called in parallel calls other VIs, you may need to make those SubVIs you call re-entrant also. However, it's a bad move to mark absolutely everything as re-entrant. When you do that, it causes extra copies of every single VI's dataspace to be created. That can result in a HUGE memory footprint and you won't realize it. Therefore, be selective. Use the profiler to help determine which VIs should be marked as re-entrant. The VIs that execute for a long time (or have children that execute for a while) should be marked as re-entrant if there is any chance they will be called in parallel. The short quick VIs can be left as normal since they won't typically get called simultaneously or, even if they do, they won't block for very long.

There are some other tricks but I'll save those for another time.

Sunday, June 19, 2005

Lessons from other industries

You can learn a lot just watching the mistakes of others. Here's one I hope we never make: forgetting the reason for your company's existence.

I'm a car-nut. I've spent my life pouring through the pages of auto magazines. I asked my dad to buy a Ferrari when I was age 10 (he didn't go for it). I've been to the Indy 500, drag races, NASCAR at Watkins Glen, and a host of other events. Today, I watched what happens when a group of folks forget why they're around. At the US Grand Prix Formula One race, one of the tire manufacturers (Michelin) screwed up. They brought tires that were unsafe for the track and couldn't fix it in time for the race. The various power players butted heads, couldn't reach an agreement, and so the Michelin teams didn't race. The "Grand Prix" consisted of a grand total of 6 cars starting and finishing the "race", if you could call it that. The fans were livid. They changed their schedules and paid anywhere from a few hundred dollars to a few thousand dollars to be there.

If there were zero fans in the grandstands and zero viewers on TV, the decisions made by all involved were probably correct. Neither Formula One nor the other tire manufacturer (Bridgestone) should have had any reason to accommodate Michelin. Michelin screwed up and the teams that relied on Michelin were wrong by having no fallback position. Motor racing at its purest is Darwinian. Those that are unprepared should not succeed.

But, it's obvious that all people involved (Formula 1, Michellin, Bridgestone, the head of the various organizations, the lead team owners, etc..) have forgotten why they pour billions of dollars into the sport, it's for the fans stupid. Teams won't spend $400 million/year if there's no audience. If they had cared about the fans, they would have forged a real compromise and allowed something that allowed the fans to be happy. Not because it was the proper racing thing to do, because it was the proper way to treat the fans. Instead, they put their competitiveness above all reason on display for the world today. Today I think they killed the golden goose. They forgot who pays their bills.

Tuesday, June 14, 2005

The End of the Free Lunch

Ok, so there's No Such Thing as a Free Lunch but in the computing world the processor vendors like Intel were buying lunch for everyone.  I headed up the LabVIEW Performance team for about 2 years and had a pretty easy job.  All our group had to do to make LabVIEW faster was to wait 18 months and let Intel double the speed of their processors. It almost made cutting 20% off of some array manipulation feel rather worthless.  As you've probably been noticing, that 4 Gigahertz Pentium that you were expecting never quite happened.  We've been stuck at 3.2ish GHz for quite some time now.  So what's going on?  CPU vendors have discovered that they can't keep increasing the clock rate the way they used to. In fact, their entire paradigm for extracting more speed has fallen down.  Thus, they've all changed course and gone dual core. Now, instead of doubling the speed, they are going to give you two processors next year for the same price as one processor this year.  But does your software still get twice as fast?

Well.... maybe. Software written in C needs to be specially written in order to take advantage of the second processor. Windows XP, MacOS X, and Linux all support Symmetric Multiprocessing (SMP). This allows chunks of code to run on either processor without the software noticing which processor it is running on.  The smallest "chunk" that can run on a different processor is known as a "thread". In C, you split your code into different sequential paths and call some extra libraries that schedule those paths using the separate threads. If you do this, you call your application "multi threaded".  LabVIEW became a multi threaded application in version 5.0 and ran on SMP computers. In the past 6 years multiprocessor machines have been rather rare. They were typically server machines that were four times more expensive than typical desktop machines.  This may have been rather fortunate because it takes a while to get the kinks out of your software. We found all sorts of bizarre bugs (both in hardware and in software) on multiprocessor machines. Once Intel started putting Hyperthreading in their desktop processors, we saw average users have SMP machines.

Now, I mentioned that C code needs to be specially written to take advantage of multiple processors. What about LabVIEW code?  Getting code to run correctly and more quickly on a multiprocessor machine requires code to be scheduled for correctness and maximum efficiency. In order to get the code to run correctly, the system needs to only schedule chunks of code to run when the data is available. In short, it needs to know the data flow of the program.  That's what LabVIEW does. It schedules your code using data flow rules.  Thus, on a multiprocessor machine, data flow code will execute correctly. (Note: you've learned all along that globals in LabVIEW can be bad... This is yet another place that they become bad). Using data flow rules, LabVIEW will automatically figure out which code can run independently and start both chunks of code running at the same time.  Remember my example about parallel while loops? On an SMP system, these loops will run completely in parallel on separate processors.  There's no need to share the processor so there's no need to ping-pong back and forth.

Thus, LabVIEW programs will automatically spread themselves across multiple CPUs. You don't need to do anything.  There are some NI customers who see the speed of their applications double when adding a second processor (or triple or quadruple when adding a third or fourth).  So, does Intel in combination with NI give you a free lunch?  Maybe.  It turns out that not all programs get that nice speed boost when adding an additional processor.  I'll give you some tricks in the next article.

Thursday, June 09, 2005

Abstracting the hardware

Tuesday was a very big day for my group and the timing is rather interesting. 

Monday, Apple announced they were switching from PowerPC to Intel x86 processors.  Mac software developers everywhere were both overjoyed that they were getting a faster processor and dismayed that now they have to do some work just so their software keeps working. Software is very sensitive to its operating environment and needs to be modified or recompiled when the environment changes in unanticipated ways.  Fortunately, changing processors is actually less of a problem these days than changing operating systems.  Applications written in C on WindowsCE basically runs on MIPS, X86, or ARM systems. It needs a recompile and it needs to be tested, and tested, and tested (write once, test multiple). Software written in interpreted languages like Java, Javascript, or PHP don't even need to be recompiled. They just need to be tested on the different platforms.  It's the testing, though, that can eat up a lot of time and expense. Why all the testing? There are subtle differences between each implementation of the environment.  Javascript apps run differently in Internet Explorer than in Mozilla.  Java apps have different environments on Sun versus Windows versus Mac. 

LabVIEW (the code representing the development or execution environment), over its history, has been shipped on Macintosh (68k and PowerPC for MacOS "Classic" and MacOS X), Win3.1 through WinXP (x86 CPUs), Sun Solaris (on SPARC processors), SCO Unix, Linux, HPUX (on PA-RISC), Concurrent PowerMAX, PalmOS, PocketPC (ARM), Windows CE, RTX, and PharLap. In fact, it has outlived most if not all of the compilers used to create it. Every time we ship LabVIEW, the testing needs to be duplicated across each platform, at least for the places the are different between them. Over time, we've dropped some platforms when the sales we got from that platform no longer covered the cost of testing and support.

Sometimes, we do ports of LabVIEW internally to satisfy some curiosity or as a one-off experiment. We did a version of LabVIEW with the Jet Propulsion Laboratory to see whether it made sense to write software for space flight applications in G.  The experiment is still going on but the results look promising.

We have talented developers but at the end of the day, you can only support a limited number of configurations.  We commonly get requests for LabVIEW to run on some hardware that we don't currently run on, be it a military VME controller running VxWorks, an automotive telematics system running QNX, or an engine controller for a train running a custom operating system. Building a new version of LabVIEW to run on these platforms just hasn't been an option. Before today, we always had to say no.  But today we announced the LabVIEW Embedded Development Module and we don't have to say "no" any more.

Over Christmas break, when no one was around, I got a week to play.  I took a development version of the software and found the cheapest embedded target I could: a Linksys WRT54G router.  $69 from Frys. Linksys runs Linux on the box and some enterprising hackers had figured out how to write applications that would run on the unit.  It took me a little bit of time but in about a week I had adapted LabVIEW to run on the unit using the Embedded Development Module.  When I hit the run button on my VI, it would take the VI, convert it to C, script the compiler for the platform (GCC) to build the application, call some scripts to download the code to the router, start the program running, open a TCP/IP connection back to the host, and let me interact with the application just like I was talking to a LabVIEW RT target.  I had a new toy and grinned ear to ear.  LabVIEW running on a router of all things.  Even better, it ran as a multitasked application so my Linksys box actually routed packets while my application was running.  The Linksys box has no real I/O but it has 5 Ethernet ports and a WiFi antenna.  I'm sure someone could do something with it.  The best I could do was write a program to control an ENET-GPIB box connected to an instrument.  $69 instrument control.  How fun.

Linksys also makes a device with an Ethernet connection and a USB master connection.  Wonder what I can do with that USB port....



FREE hit counter and Internet traffic statistics from freestats.com