Its been a week since I got back from ParaPLoP ‘10 workshop – its was great to meet with a group of people doing so much to bring parallel programming into the mainstream, and it was great learning.

Ade Miller presented the Task Graph pattern from Microsoft TPL. As a TBB user, I find strong parallels between them; TPL is elegant, and I am keen to learn more. Arguably, some of it is syntactic sugar, but sugar is sweet! I have resisted Microsoft for many years, but it is time to concede and assimilate.

Just as interesting as the workshops were the side conversations – It was good also to get a sense of the future (and direction of TBB) from its architect, Arch Robison – TBB 3.0 will be out soon, and I look forward to it.

Dick Gabriel’s talk on the works of Chistopher Alexander was fascinating. The architect’s work has been an inspiration for the design patterns community.

And Ralph Johnson’s view of programs as transformations struck a chord – it is indeed true the most of us revisit the same applications / algorithms over and over again, so making programs parallel is indeed a transformation. Documenting patterns is about sharing best practices, to help the rest of us with that transformation process.

Very interesting for me personally, was hearing from Tim Mattson about the consequences for software developers of what’s cooking in the silicon furnaces at Intel. I have a feeing things are going to get interesting.

Processors cannot be clocked much faster any more – power consumption and heat dissipation have seen to that. But the consequences go beyond clock speed limits. Current generations of processors have deep pipelines and out-of-order instruction scheduling, to hide memory access and other internal latency.

Instructions are not executed in the order they are laid out – if the execution of an instruction is going to be stalled because its operands are not yet available, other instructions are executed instead. Optimized instruction scheduling is done by the hardware at run time.  Quite likely, newer processors will ditch some of this complexity to accommodate more cores.

That leaves a hard optimization job for compilers – some of these optimizations cannot be statically done.

I take two lessons from this:

1. Where possible, use vendor libraries. MKL, NAG for math, for example. Let the vendors deliver processor optimized versions. I’ve come across variants of Numerical Recipes code a lot; which will keep application developers on an optimization treadmill.

2. Parallel programming is no longer just for speed junkies – to maintain current performance, parallelism is going to be needed.

What do you think?