Adam KrautBlogGitHub

High-performance Data Appliances (Netezza)

02 April, 2008 - 2 min read

This afternoon I sat through a presentation from a few guys at Netezza. They were here to discuss their system for high-performance data analytics. What they’ve effectively done is build a large database machine with some special hardware to accelerate database queries via parallel processing nodes. These are some notes I jotted down:

Architecture:

SMP Host 100+ specialized processing units per cabinet (they named them SPU’s for “snippet processing units”) SPU’s have their own PPC CPU, commodity disk, memory, and an FPGA GigE networks between SPU’s SMP Host partitions queries and broker activity to the processing nodes Hardware fault-tolert (SPU’s can be hotswapped)

I’ll admit my skepticism tends to mount against any speaker that spends a lot of time at the outset with a marketing pitch when the audience is full of scientists. Do scientists need to be reminded that data sizes are growing? Or that enterprise X, Y, and Z are already using your product? Just show me how at works.

I did a quick search across my feeds to see if anyone has written about Netezza and (not surprisingly there is a post over at Computing at Scale. It appears there are similar efforts from Teradata, Greenplum, and DATAllegro in this space. I can imagine how a systems like Netezza’s might complement more traditional supercomputing. There’s certainly a big effort to commercialize the “new era of HPC” but the technologies that come out of it are business-driven and not science-driven.