eigen.systems

because shift happens

  • home
  • about us
  • press room
  • eigen

welcome to our blog!

explore parallel worlds

11
July

eigenomics: june jobs data

Posted by Rakesh | In: Uncategorized

the robot economist visualizations are updated for june – the main changes:

history of avg. hourly wages can now be viewed as nominal or inflation adjusted values

sector revisionsare new, the adjustments to provisional values for prior two months are shown; upward and downward revisions are segregated and listed by decreasing magnitude

regime plots now show regime boundaries as shaded areas; labeled buttons can be used to highlight the boundaries of a specific regime

  • 0 Comments
3
June

eigenomics!

Posted by Sarah Richards | In: announcements

eigenomics combines machine learning (borrowed from our HFT toolset) and data visualization to deliver nuanced analyses of economic, market and other financial data.The beta site shows some illustrative (but very real) examples of US non-farm payroll data and CPI.

the emphasis is on information and no-frills simplicity.

interactive component plots show current and prior month values following market norms. clicking on the individual bars updates the lower time plot with a 30-year, ( or all available if less), history for the selected sector.

(right-click on the image to see options for full size viewing)

the left-to-right arrangement of bars places winners on the left, losers on the right. views pertinent to the specific data set can be chosen (e.g. sector payroll as % of total non-farm, total value or monthly change).

regimes analysis (payroll data only; others to follow) uses a simple classifier to identify historical months as belonging to a growing, stable or recessionary economic regime; traffic light colors are used to label regimes in shades of recession-red, meh-amber and growing-green.

a parallel coordinate plot shows regime averages. the left-to-right arrangement of co-ordinates is significant – sectors to the far right are the least sensitive to economic situation, whereas on the left are the sectors that see the largest swings.

the heading color on the table to the left of the regime plot, corresponds to the regime that payroll data belongs to in the current month. the values show the improvement or decline in the current months levels relative to the average value for the regime.

so, in may’11, job growth switched from the green zone to amber; retail and leisure/hospitality gains from apr’11 were given back, and then some; the losses are materially worse than the amber-regime averages; only services (professional/business and the education/health sector) improved in may, and the decline in local govt. jobs accelerated further.

stay tuned as more data sets and analytic models are added in the coming weeks and months!

  • 0 Comments
  • Tags: eigenomics, Model.Bricks
23
March

the uprising begins…

Posted by Rakesh | In: Uncategorized, news

So far, its been a year for change like never before… and in the year of parallel programming, change now comes to a university near you!

Carnegie-Mellon is eliminating Object Oriented programming entirely from its curriculum. Why?

“Object-oriented programming is eliminated entirely from the introductory curriculum, because it is both anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum. A proposed new course on object-oriented design methodology will be offered at the sophomore level for those students who wish to study this topic.”

More on : Robert Harper’s post. Referenced in the post is the report Introductory Computer Science Education at Carnegie Mellon University: A Deans’ Perspective, outlining the reasons that motivate a change so radical.

Certainly, this is going to shake more than a few trees – for my part, I am glad that parallel programming is finally making its way to a CS foundation curriculum.

I only wish the anti-modular remark followed the ever-so-green Dijkstra meme – as in GOTO Statement Considered Harmful.

I must try to find inspiration to offer a solution – in the spirit of COME FROM.

  • 3 Comments
17
January

2011 – a year for || programming

Posted by Rakesh | In: Model.Bricks, patterns

Its been a busy 6 months readying model.bricks for beta. Now the blog awakens from ‘ibernation!

2011 is many things – a prime number, the sum of 11 consecutive primes. And (naturally!) the visual cue of || leads us to designate it a year for parallel programming.

Strange Loop have just posted Guy Steele’s keynote talk – How to think about Parallel Programming: Not!. Few have expressed the core idioms of parallel program design with such clarity. Yes! We need a new mindset – sequential won’t do!

Separable state is the first key to parallelism – for a derivative security payoff, valuation state might hold coupon schedules, year-fractions and partial results. Risk results are aggregates of (or computed using) this state. Once state is extracted from the payoff object, independence of elements within the aggregate allows multiple valuations to progress in parallel.

A typedef declares the type of state associated with a payoff. As more payoffs are added, so are state declarations; the templates for parallel risk measure calculation expect and use this pattern.


template<typename Payoff>
class Instrument {
....
template<typename Model>
double value(
const Model & calibration,
Payoff::CalcInfo & calcInfo,
const Interval<double> & e = Interval<double>::INF
) {
_payoff.value<Model>( calibration, _schedule, calcInfo, e );
}
....
};

Another big idea from the keynote is: Whether to be sequential or parallel is a separable question – calling most model.bricks methods requires explicit choice of sequential or parallel execution, like so:


/*
* fast discrete convolution (FDC):
* s: stencil, u: signal, v: convolution
*/
FDC<IMPL::SEQ, float>::convolve( s, u, v );
FDC<IMPL::PAR, float>::convolve( s, u, v );
FDC<IMPL::CUDA, float>::convolve( s, u, v );

Our motivations differ from the keynote’s ideas, though. The guiding factor in choosing which implementations are problem size, and the overall context of the computation. Size based choice can be automatic (a model.bricks auto-tuner is on the cards), but only the caller knows if this is one large convolution, or one of many – and the grain where parallel overheads are best amortized.

Side note… the keynote opening was very personal for me. I learnt to program on the IBM 1130 – and in the last year or so I’ve learned that I have great company, Dick Gabriel and Guy Steele included.

  • 0 Comments
  • Tags: Model.Bricks, parallel computing
8
August

the only planet with chocolate

Posted by Sarah Richards | In: green computing

In Helsinki, surplus heat from hundreds of computer servers located in a data center below Uspenski Cathedral will be used to heat hundreds of homes. Full Article

Near Berlin’s Tegel airport, rising heat from a data center is reported to cause turbulence for passing aircraft.
Full Article

The IT industry already has a carbon footprint matching airlines, and it is not going to get any better anytime soon. The same goes for energy cost for operating and cooling these giant information factories.

In a world where ‘green’ has become king and recycling is more than just the ‘right thing to do’. People are looking under every rock, tree and CPU for ways to ‘save the earth’.

Who would have guessed that IT + Data Centers = high carbon emissions?

Data centers are a necessary evil driving business costs into the stratosphere. The process of real-time information gathering demands energy, causing the current server construct to create an environmental and financial hazard, with seemingly no relief in sight.

Datacenters – Emissions in mt (Mt = thousands of metric tons)

US datacenters
170 Mt
Argentina 142 Mt
Netherlands 146 Mt
Malaysia 178 Mt

A focus on green technology is becoming a main objective for the modern business.

Better, more effective use of many-core and GPU computing is one path to reducing operating cost and environmental impact.

Adoption of GPU acceleration is slowly but surely on increase, with better operating support beginning with Snow Leopard and Windows 7. There is some technology flux, and some debate about which technology is faster, or easier to program – but the migration to parallel programming is inevitable.

At eigen.systems we are doing our part to deliver efficiency and performance. The eigen.spaces library makes it easier to modify existing applications to effectively use many-core processors. Model.Bricks provides a suite of framework components that allow selection of either many-core CPU or GPU algorithms for most functions, allowing fine control, best performance and the most efficient use of available hardware.

By providing tools that make it easier for developers to focus on their requirement while leveraging close-to-the-metal parallel programming eigen.systems is helping to promote green computing, generating savings in costs as well as deliver better processing performance.

Save the earth, its the only planet with chocolate!

  • 0 Comments
  • Tags: carbon footprint, Eigen.Spaces, green computing, Model.Bricks
19
May

The need for speed: the world of High Frequency Trading

Posted by Sarah Richards | In: HFT

Traders - AP Photo Since the unexpected and significant drop in the DOW on May 6th, articles have permeated the net, speculating the cause of the event.

Working at eigen.systems, my eyes are always peeled for articles that talk about High Frequency Trading (HFT), especially when the significance of computing in this industry is under discussion.

Overall, regardless of whether persons blame or praise the role of computers in todays age of HFT, one thing is for sure, computers are here to stay. The key questions that abound seem to be how do we make them more accurate, faster and more efficient?

There is a great article in today’s New Zealand Herald, which I think both novices and industry professionals do well to read. It explains quite simply what HFT is, how the world of trading has changed over the years with the introduction of faster computation as well as predictions for the future. It’s a great read, so check it out!

AP photo

  • 0 Comments
  • Tags: high frequency trading, parallel computing
26
April

mind the cache

Posted by Rakesh | In: patterns

In a previous post I described iteration space partitioning as one way of improving cache residency of data. How much of a speedup does it deliver, really?

Matrix multiply is a good vehicle to illustrate the memory wall effect – the plots below show performance with increasing matrix dimension / storage layout combinations, for both the familiar multiplication loop, and a locality optimizing block multiply algorithm.

Being both computation and bandwidth intensive, matrix multiply has performance characteristics similar to many large problems – VaR and large portfolio simulations in particular. Of course, if it is matrix algebra that you need, an auto (ATLAS) or vendor tuned (MKL et al.) library is best.

Locality improvement (both clever arrangement of data in memory and loop twiddling) can and does yield significant speedup.

The measurements use four storage layouts, for matrix dimensions ranging from 32×32 to 2048×2048. The code and stats used are here.

A-matrix B-matrix
cc column major column major
cr column major row major
rc row major column major
rr row major row major

Two algorithms are compared; the familiar, simple three deep loop:


for( int i = 0; i < N; i++ ) {
for( int j = 0; j < M; j++ ) {
sum = 0;
for( int k = 0; k < K; k++ ) {
sum += A[ i, k ] * B[ k, j ]
}
C[ i, j ] = sum;
}
}

and the block multiply algorithm below:


for( int k = 0; k < A.nCols(); k += K_BK ) {
for( int i = 0; i < C.nRows(); i += I_BK ) {
for( int j = 0; j < C.nCols(); j += J_BK ) {
for( int i1 = i; i1 < min( i + I_BK, C.nRows() ); i1++ ) {
for( int j1 = j; j1 < min( j + J_BK, C.nCols() ); j1++ ) {
T sum = C[ i1, j1 ];
for( int k1 = k; k1 < min( k + K_BK, A.nCols() ); k1++ ) {
sum += A[ i, k ] * B[ k, j ];
}
C[ i1, j1 ] = sum;
}
}
}
}
}

On Xeon 5460 cores (both single and 8-way parallel), simple multiply hits a brick wall once matrix dimension reaches 2048×2048.

Removing just the extreme (2048×2048) measurement, the behavior is still far from the cubic curve you might expect; speed is very sensitive to memory layout (when one or both matrices are accessed with a large stride, the memory controller’s ability to deliver multiple words per bus cycle is wasted).

With the exception of the case when A is row-major and B is column-major, performance with large matrices is unstable. By contrast, the block algorithm is predictable. The next plot compares the best performing case (rc) of the simple algorithm to block matrix performance – runtimes for the block algorithm get increasingly better as matrix size grows. At 2K x 2K, better locality delivers a 2.25x speedup.

The block algorithm is relatively insensitive to storage layout. On the flip side, block sizes need tuning for best performance on a given machine.

Finally, the parallel comparisons (the examples use OpenMP), plotting MFLOP rates against matrix size confirm the normal intuitions. Knowing the inflection points is useful to make the best choices – a few loop tweaks and a rearrangements of memory can do wonders for speedup!

  • - Below a certain size, it is best to use the simplest algorithm as the parallel overheads are overwhelming.
  • - If it is possible, arranging the storage in a cache and memory controller friendly layout pushes the performance envelope of every algorithm upto a point.
  • - Simply parallelizing the simple algorithm is best for large data sets.
  • - As simple parallel performance begins to decay, a locality enhancing algorithm will outperform.
  • 0 Comments
  • Tags: block algorithms, locality, OpenMP
9
April

ParaPLoP ‘10

Posted by Rakesh | In: announcements

Its been a week since I got back from ParaPLoP ‘10 workshop – its was great to meet with a group of people doing so much to bring parallel programming into the mainstream, and it was great learning.

Ade Miller presented the Task Graph pattern from Microsoft TPL. As a TBB user, I find strong parallels between them; TPL is elegant, and I am keen to learn more. Arguably, some of it is syntactic sugar, but sugar is sweet! I have resisted Microsoft for many years, but it is time to concede and assimilate.

Just as interesting as the workshops were the side conversations – It was good also to get a sense of the future (and direction of TBB) from its architect, Arch Robison – TBB 3.0 will be out soon, and I look forward to it.

Dick Gabriel’s talk on the works of Chistopher Alexander was fascinating. The architect’s work has been an inspiration for the design patterns community.

And Ralph Johnson’s view of programs as transformations struck a chord – it is indeed true the most of us revisit the same applications / algorithms over and over again, so making programs parallel is indeed a transformation. Documenting patterns is about sharing best practices, to help the rest of us with that transformation process.

Very interesting for me personally, was hearing from Tim Mattson about the consequences for software developers of what’s cooking in the silicon furnaces at Intel. I have a feeing things are going to get interesting.

Processors cannot be clocked much faster any more – power consumption and heat dissipation have seen to that. But the consequences go beyond clock speed limits. Current generations of processors have deep pipelines and out-of-order instruction scheduling, to hide memory access and other internal latency.

Instructions are not executed in the order they are laid out – if the execution of an instruction is going to be stalled because its operands are not yet available, other instructions are executed instead. Optimized instruction scheduling is done by the hardware at run time.  Quite likely, newer processors will ditch some of this complexity to accommodate more cores.

That leaves a hard optimization job for compilers – some of these optimizations cannot be statically done.

I take two lessons from this:

1. Where possible, use vendor libraries. MKL, NAG for math, for example. Let the vendors deliver processor optimized versions. I’ve come across variants of Numerical Recipes code a lot; which will keep application developers on an optimization treadmill.

2. Parallel programming is no longer just for speed junkies – to maintain current performance, parallelism is going to be needed.

What do you think?

  • 0 Comments
9
April

“Clothes make the man,” so the saying goes. Developing UI

Posted by Esterline | In: design

A good user interface makes (or breaks) the software, application, or web site. It’s the difference between the user’s happiness and how far they’ll end up tossing the product out their figurative (or in some cases, actual,) window.

The challenge is to create a user interface that meets both the needs of the business as well as the user.  At times this is easier said than done.  Developers and designers have the unenviable task of trying to understand one another enough so that the end result is a usable, workable UI.

A developer may have experience constructing the front-end and back-end of an application but is not versed in design. The reverse is true for designers. Two different languages and thought processes trying to culminate in a cohesive product everyone can love.

To this end, Information and communication is key.  Understanding your user’s mindset is another.

For instance: In today’s plug-n-play mind set, users are rarely inclined to search the ‘help’ section of their application for answers to a specific question (much to the developer’s dismay). Therefore an easy to understand interface, one that teaches your user how to use the application, must be constructed.  Integrating helpful hints, tips and instruction into the application as the user works with the product can be a good way to overcome possible obstacles. Helpful hints with the user option to disable them later is also a good way to go.

Pitfalls can occur when the development team has become caught up in the latest trends, colors, and bells and whistles. This is where the motto – ‘Less is More’ – is a mantra many would do well to tattoo across their computer screen.

Identifying your users will be invaluable. Methods of observation, as well as interviews, can help determine the user’s knowledge of systems and computers in general. This also helps to factor in the user’s background and how this will affect the way they use your product.  What are their jobs?

What tasks does the user frequently conduct and how can your product enhance their workflow?  Analyzing these questions can have a profound effect on your application but it is well worth the effort.

  • 0 Comments
16
March

Eigensystems to present at ParaPLoP 2010!

Posted by Sarah Richards | In: announcements

From March 30-31, our very own Rakesh Joshi, co-founder of eigen.systems, will be presenting at the 2010 ParaPLoP workshop in Carefree, AZ.

ParaPLoP is an interactive conference where pattern authours present their case studies and share expertise in the field of parallel programming patterns. It is a great arena for pattern professionals and enthusiasts alike to analyze previously published patterns, learn about using patterns to develop parallel software and discover mining patterns from significant parallel code implementations.

Rakesh’s paper, CONCURRENT EVALUATION OF A DIRECTED ACYLIC GRAPH, was chosen to be presented, (exact day and time TBD). The paper discusses latent parallelism and the benefit and application of concurrency across a variety of platforms.

So if you are in the AZ area, sign up to attend the ParaPLop 2010 conference! For more information about ParaPLop 2010, please visit the official conference site here.

  • 0 Comments
  • Tags: Eigen.Spaces, parallel computing, paraplop
Pages (2): 1 2 »

 

February 2012
T F S S M T W
« Jul    
 1
2345678
9101112131415
16171819202122
23242526272829

categories

  • announcements
  • design
  • green computing
  • HFT
  • Model.Bricks
  • news
  • patterns
  • press room
  • Uncategorized

recent posts

  • eigenomics: june jobs data
  • eigenomics!
  • the uprising begins…
  • 2011 – a year for || programming
  • the only planet with chocolate

references

  • [1] the free lunch is over
  • [2] optimizing cache use
  • [3] Computers heat homes
  • [4] data center causes turbulence
  • [5] Strange Loop 2010: Keynote

tags

blackbird block algorithms carbon footprint concurrency Eigen.Spaces eigenomics GPU green computing grid high frequency trading kernel locality map-reduce Model.Bricks NVIDIA OpenCL OpenMP parallel computing paraplop SPMD

archive

  • July 2011
  • June 2011
  • March 2011
  • January 2011
  • August 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010

© 2012 eigen.systems
Wordpress Themes by (DT)