News & Events

When VLSI meets DBMS: The Story behind the World’s First SQL Chip

August 21st, 2008
Posted by Raj Cherabuddi

In April this year, Kickfire announced the first high-performance appliance for MySQL. As part of the announcement, the company released data warehouse benchmark results that broke prior records in terms of price/performance and performance in a non-clustered environment. While the creation of a new appliance built exclusively for MySQL along with the benchmark records was noteworthy, perhaps the bigger story lies in what we believe to be the beginning of a paradigm shift in the database world - one marked by the advent of the first SQL chip.

To give some context to this story I have included a graph below which depicts the evolution of VLSI (Very-Large-Scale Integration) semiconductor technology and its growing impact on a broadening range of industries.

VLSI Impact

Specifically, this diagram shows that as VLSI density (# transistors per sq millimeter) has increased over time per Moore’s Law, it has been possible to transition an increasing number of applications from a “software/CPU” model to a “custom chip” model. Starting with Digital Signal Processing in the 1970’s through to SQL Processing today, there has been a long history of industries that have witnessed this transition and seen a major upheaval in the status quo.

Take graphics processing as an example. Initially, the graphics market was led by companies such as Silicon Graphics with their high-end terminals built on a combination of proprietary software and general-purpose CPUs. This all changed with the arrival of the graphics chip. Designed by companies like ATI and Nvidia, the graphics chip delivered a much higher price/performance ratio, which opened up high-end graphics processing to a much broader audience (e.g. gamers) and transformed the industry. Silicon Graphics, now called SGI, is worth $73M today. Nvidia is worth 100 times more at $7.3B.

The question you might be asking yourself is why these particular applications? What is it about these applications that made them suitable for such a transition? In a word, Dataflow.

The common characteristic underlying these application domains is that they all deal with the need to process large volumes of data at high speed. Now, general-purpose CPUs are based on the von Neumann architecture which was conceived in the 1940’s at a time when data volumes were much much smaller than today. This architecture is an instruction-centric or control flow one that is good at processing large numbers of instructions quickly but not well suited to processing large data sets due to the so-called von Neumann bottleneck.

What the pioneers in each of the application domains we mentioned discovered is that a Dataflow architecture is much better suited to solving the problem of high-volume data processing because it eliminates the von Neumann bottleneck. In a dataflow architecture the data, as opposed to instructions, flows directly through the execution engine. There are no wasted clock cycles spent waiting for data to arrive into the registers as in the case of the von Neumann architecture. The difference is significant. As an example, a single SQL chip from Kickfire provides better performance than 10’s of CPU cores, as demonstrated in the data warehouse benchmark results we published.

In my next post I’ll discuss this topic a little more, explaining why the transition from general-purpose CPU to custom chip is only happening now in the database world and why we believe this will be an irreversible trend.

Why $20 million for Kickfire?

July 31st, 2008
Posted by Karl Van den Bergh

As Matt Asay recently mentioned in his post about Kickfire, the company just closed a Series B for $20 million. In today’s credit-scarce market where VC funding is flat/declining, $20 million is a lot of money, especially for a company whose product is still in beta. What’s more, there seems to be an investment bubble in the broader data warehousing space in which Kickfire participates (at last count, there were over two dozen vendors, the majority of which are relatively new entrants) and that bubble looks like it is starting to burst as witnessed by Microsoft’s recent acquisition of DATAllegro. So, are the Kickfire investors misguided or is there something more here than just another data warehousing play? If the successful track records of the top-tier firms (Accel, Greylock, and Mayfield) that have invested in Kickfire  are anything to go by, a betting man would probably assume the latter.

There are many reasons I could give for why this $20 million bet makes sense but I’ll just give the two most important ones here - ones that set Kickfire apart from every other player in this space.

1) The SQL chip. This is Kickfire’s core technology differentiator. It has become clear from the TPC-H benchmark records that Kickfire announced at launch and subsequently  that this technology delivers results. What may not be as apparent is the macro-level implication. We believe that this is the start of a trend that has been seen before in many other industries such as graphics and network routing. Specifically, as VLSI technology has improved, more computing workloads have moved from software running on general-purpose CPUs to custom chips specifically designed for these workloads. The resulting chips have far outperformed their general-purpose counterparts at a fraction of the cost leading to tectonic shifts in the industry landscape. Such shakeups have played out numerous times before, leading in many cases to the creation of new markets and the birth of industry behemoths (think Nvidia/ATI in graphics and Cisco/Juniper in network routing). VLSI technology has now gotten to the point that SQL-like operations can be run natively in silicon. Much as has happened in other industries, we believe this shift will also happen in the database world. Whether or not Kickfire will ultimately be a player in this transformation, the transformation WILL happen, and Kickfire has started the ball rolling.

2) MySQL. This is Kickfire’s second not-so-secret “secret weapon.” The two dozen data warehousing vendors I mentioned can be broadly grouped into two buckets. First, the traditional database vendors (Oracle, Microsoft, IBM). Second, the pure-play data warehousing vendors. Up until now, customers have had to choose between these two imperfect options a) incumbents which deliver the benefits of being standard (e.g. broad third-party tool and app support) but fall down from a performance perspective and b) pure-plays which deliver the performance benefit but fail the “standard” metric as their databases are proprietary. With Kickfire this changes. The Kickfire appliance looks and feels just like MySQL running on a Linux server, except it is 10-1000X faster for data warehousing. It therefore delivers the benefit of running a standard database while delivering performance at the same time. And just in case there is any dispute that MySQL is now a standard here are some stats. MySQL has now 11 million active installations, growing at an estimated 30% a year. According to Gartner, MySQL is now the third most deployed database on the planet, ahead of DB2. According to IDC, MySQL is now the third most used database for data warehousing.

On a final note, some skeptics might still say that targeting an open source market is no guarantee of success. Very few companies, with a couple of notable exceptions, have made it big thus far. After all, isn’t one of the most appealing aspects of open source its low cost (read “free” for the majority of users)? All true if you think about this from a software perspective. But from a hardware perspective, the open source world looks very different. Today, billions of dollars are spent on the servers running MySQL. Kickfire is a systems company. We build appliances and we’re targeting those billions of dollars.

600X MySQL Performance Improvement with Kickfire

July 17th, 2008
Posted by Karl Van den Bergh

As promised, in this post I will update on the performance improvements another Kickfire beta customer is seeing relative to its query response times.

The customer in question is a successful mid-sized company in the network management space. As part of their network management offering, they provide network monitoring and analysis capabilities. They are currently using MySQL as their backend database. The trouble they are having is that they can’t scale beyond about 50GB of data without impacting their monitoring and analysis performance. What this translates to is that their customers can’t use their solution to monitor more than 30 days worth of network traffic. While this is OK for some, others are clamoring for the ability to track and analyze up to three years of traffic and willing to pay significantly more to do so. Today, if they try to accomodate these customers, the queries end up taking hours to run which is unacceptably high.

To test the Kickfire appliance, the customer ran their 12 hardest queries on about half a terabyte of data. The customer schema has125 tables and about half a billion rows in the fact table. As I received a request for more detail in my last post, I’ve pasted below one of the queries (obfuscated for privacy) that the customer is tyring to run as an example of what they are trying to do.

SELECT(CEILING(TIME_END/900)*900)+ -21600 AS TIMEBIN,

IFNULL(SUM(RTT_SUM)/SUM(RTT_COUNT), 0.0) AS RTT,
IFNULL(SUM(RETRANS_SUM)/SUM(RETRANS_COUNT), 0.0) AS RETRANS,
IFNULL(SUM(APP_SUM)/SUM(APP_COUNT), 0.0) AS DTT,
IFNULL(SUM(SERVER_SUM)/SUM(SERVER_COUNT), 0.0) AS SRT,
IFNULL(SUM(RTT_COUNT),0) AS RTTCOUNT,
IFNULL(SUM(RETRANS_COUNT),0) AS RETRANSCOUNT,
IFNULL(SUM(APP_COUNT),0) AS DTTCOUNT,
IFNULL(SUM(SERVER_COUNT),0) AS SRTCOUNT,
IFNULL(SUM(RTT_SUM+RETRANS_SUM)/SUM(RTT_COUNT), 0.0) AS EFFECTIVERTT,

FROM RUNS1

WHERE (TIME_END > 1197472400 - (86400*30) AND TIME_END <= 1197472400)
AND MAINTENANCE = 0

GROUP BY TIMEBIN

Prior to using Kickfire, the customer had taken these 12 queries and set up a lab environment to try and achieve higher performance. By re-architecting the application, leveraging things like partitioning, they were able to achieve a 60X improvement. The problem with this approach is that this re-architecture would have introduced significant development and testing efforts and forced their customers to make a major and costly upgrade to their installations.

The customer then did the query performance comparison with Kickfire by simply moving the data and schema as is to the Kickfire appliance. The appliance in question was the Kickfire 2300 which comes with 64GB memory.

Without any re-architecture, the customer was able to achieve an average 600X improvement out of the box. What this means is the customer can now support its larger customers, and generate more revenue for the company, without any system re-architecture or associated cost.

A New Hardware-Based Approach to Data Warehousing

June 27th, 2008
Posted by Ravi Krishnamurthy

My name is Ravi Krishnamurthy - I am the Chief Software Architect here at Kickfire. I’ll be blogging about our thoughts on database technologies for data warehousing. More specifically I’ll be talking about current challenges, directions going forward, and the simplifications for wider market deployments and other ideas.

Data Warehouse (DW) queries are known to be more complex, more demanding, and longer running than OLTP queries. Some of the distinctive features of these DW queries that produce these characteristics are:

1) Table scan: Most OLTP queries are point queries updating or inserting a few transactional data. Most DW queries on the other hand are reporting or business intelligence (BI) queries which typically touch large numbers of rows of data, often computed by sequential table scans over the large data sets.

2) Many/complex joins: Multiple tables with many joins in the query poses a number of challenges. For example, any sizeable table (except the first/outermost) being joined using table scan would cause the performance to degrade significantly in most cases. Obviously, if all joins are foreign-key (FK) to primary-key (PK) joins and can use the index on the PK then there’s no problem. However, for many reasons (e.g., use of functions, PK-to-FK joins, etc.) using indexed-join methods may not be possible, thereby making joins very expensive.

3) Lots of GROUP BY aggregations: Typical reporting and BI queries leverage many grouped aggregations with multiple GROUP BY keys over large number of rows which can be very slow. Use of DISTINCT clause compounds this problem further. Clearly, using indexes on GROUP BY keys helps improve the performance of this type of query. However, the presence of multiple GROUP BY keys or the use of indexing in the join operation may preclude the use of indexing for the grouping operation which impacts performance.

4) ORDER BY limit: A request for the top ten rows, especially based on a GROUP BY aggregated value, is typically done by sorting all the computed data and then returning the top ten rows. If this grouping (for aggregated values) were done say by department of even a large enterprise you end up with a sort over thousands of rows which is not that bad. However, if you try to group by say visitor ID from a clickstream data set or by product ID from a point-of-sale data set to get the top ten rows, you end up with a sort over potentially millions of rows.

5) Complex filters: LIKE predicates (over a large number of rows) and STRING functions creating complex filters involving AND/OR conditions as well as the use of CASE expressions are typical in DW queries. These tend to create execution flows that significantly reduce the ability to use indexes and also incur computational overhead.

6) Correlated sub-queries: Creating queries that use other queries as building blocks is a common practice in BI. If correlated variables are used in these sub-queries then it becomes difficult to process those queries efficiently which becomes a challenge for the optimizer. Any failure to correctly optimize these types of queries can quickly degrade performance.

The above are examples of the types of problems that database administrators (DBA) are facing today when trying to deliver performance for reporting and data warehousing applications. To get around these problems, DBAs spend currently a lot of time and effort tuning the system, rewriting queries, configuring the I/O subsystem, increasing memory/disk resources and so on.

What if a hardware-based approach could mitigate these performance issues?

In the next few blog posts I will dig into this question in more detail. I’ll talk about some of the issues mentioned above, discuss how DBAs try to work around them today, the pitfalls inherent in these workarounds, and how a hardware-based approach could significantly simplify things.

Kickfire: Early MySQL customer success

June 12th, 2008
Posted by Karl Van den Bergh

I’m happy to say that the market response to our launch continues to be positive. So far we have had nearly 30 postings on the leading blogs in the MySQL world as well as close to 20 articles published in traditional media. Our press releases were picked up and published on over 30 sites. We had about 400 people stop by our booth at the MySQL conference and we continue to get a significant number of prospective customers and partners contacting us every week who want to know more about the company and our product.

Though the response has been very enthusiastic there has also been some healthy skepticism about how well the product would perform in real customer environments. In this post I’d like to briefly describe the results we are seeing at one of our beta customers.

The customer in question is a publicly traded company that manages the online forums for large media and web businesses. They use MySQL extensively today to store the forum data and have racks of servers doing this. They have also used MySQL to build their data warehouse. They use the data warehouse to provide their clients with reports on forum traffic, user activity, hot topics etc. For example, one of their customers, a leading cable channel, uses the data from the reports to understand how viewers are reacting to new shows being introduced.

The data set currently for one of their clients has 50 million rows or about 45GB of data. The data set contains clickstream data with all the associated dimension tables. In order to validate the performance of the Kickfire Database Appliance, the customer gave us six of their poorest performing queries to run. Even though these queries had already been highly optimized, they still took too long to run - in one case taking over an hour. The reason why these queries run slowly is because they contain multiple GROUP BY aggregations with DISTINCT clause whose non-linear processing causes the performance degradation. The fact that this would be a performance challenge for MySQL will not come as a surprise to MySQL DBAs but the fact that the impact is significant even on a relatively small data set makes the observation notable.

After loading the data into the Kickfire Database Appliance the customer saw an average 35X speedup in the performance of the previously tuned queries. The query that had taken over an hour to run now runs in a fraction of a minute, yielding a 150X speedup. A positive outcome of this speedup is that the customer no longer has to resort to using and maintaining custom stored procedures to make sure the queries are processed in an acceptable time window.

Given these significant performance improvements, the customer has conceived of a new revenue-generating service that the Kickfire appliance will allow them to implement. The service consists of enabling the company’s community managers to create ad hoc queries for their clients on a fee basis. Clients have been asking for additional information beyond the canned reports they have been getting but it has been too difficult, too time consuming and ultimately too expensive to provide custom data access for all but the very largest clients. Now, with Kickfire, the customer plans to give its community managers the ability to create these ad hoc queries directly themselves. By combing a user-friendly BI tool with the raw power of the Kickfire appliance, the customer believes even its non-technical community managers will be able to quickly generate these ad hoc queries and consequently create a new revenue stream for the company.

In my next post I’ll write about a customer in the network monitoring space.

MySQL and Kickfire Break Records (Again)

May 20th, 2008
Posted by Karl Van den Bergh

Following on from the announcement at the MySQL conference where Sun and Kickfire jointly announced data warehousing benchmark records, we have just announced new TPC-H benchmark records. Specifically, the Kickfire Database Appliance 2400 is the highest price/performance offering at 300GB, again breaking the $1 barrier for the first time coming in at 89 cents per QphH (Queries per hour on the TPC-H benchmark). The 2400 is also the highest performance (non-clustered) offering at 300GB.

I’m not going to further dwell on the numbers in this post other than to quickly point out another aspect of this achievement that Justin noted in his blog related to the energy savings the Kickfire appliance delivers in addition to the performance and price/performance. What I want to address is why we decided to do these benchmarks in the first place and what we believe their relevance to be. The reason is that as we continue publishing these benchmarks we occasionally get questioned about their importance (or lack thereof). Here’s my take.

First of all, they’re benchmarks and so, by definition, are limited. Let’s get this one out of the way. No benchmark, no matter how thorough, is going to cover every possible real-world scenario. But just because benchmarks have limitations is not a reason to discard them, particularly if they are thoughtfully conceived and rigorously applied as is the case for the TPC-H benchmarks.

Some vendors have been pushing the idea that only POCs count. Not surprisingly these are the vendors who haven’t published their results (and the reason for this should be obvious). Whereas I would agree that POCs are clearly a critical part of an evaluation process, much as test driving is when buying a car, it doesn’t mean you should discard the objective comparison of pertinent metrics. Going back to our car buying analogy, it would be like throwing away the information sheet you find on the car’s passenger window at the dealership. It would seem to me that prospective buyers would want to know about things such as the MPG rating or horsepower and how these compare to other cars they are considering.

To speak more specifically about the TPC-H benchmarks, it is not immediately obvious (unless you have been through the process) how extensive and rigorous they are. First, the 22 queries test a broad spectrum of SQL complexity spanning everything from simple reporting-type queries to deep analytic-type queries with multi-table joins, correlated sub-queries and the like. Second, the system performance is measured on a single query stream (the Power Run) but also on concurrent queries (The Throughput Run). Third, the load performance (important for data warehousing) is measured. Finally, full ACID compliance is tested for. You can check out the full details of the benchmark specifications here.

The benchmark specification also places extensive restrictions on what is allowed to prepare the test system. As an example, anything that would circumvent a true test of the system’s performance such as pre-built aggregates is disallowed.

The audit is also a rigorous process which is carried out by independent, TPC-sanctioned auditors who certify the benchmark and must sign off on the disclosure report in order for the results to be approved and published.

The point I’m making here is that these are not your homegrown benchmarks too often seen in vendors’ marketing material. The fact that there is an independent body, the Transaction Processing Performance Council, which has been around for 20 years and whose sole purpose is to define and monitor these benchmarks, should be a clear indicator that these benchmarks mean business.

Finally, and to make the point that these are serious, I have to be diligent here in how I talk about our performance numbers as there is a fair use clause when speaking about results that must be carefully adhered to in order to avoid penalties. To that end and to wrap up this post, here are some additional details I must disclose in order to be fully in compliance:

The Kickfire Database Appliance Series 2400 delivers a performance of 54,895 QphH@300GB (Queries per hour on the TPC-H benchmark) on the 300GB TPC-H benchmark. The Kickfire Database Appliance has a price/performance of $0.89/QphH@300GB USD on the 300GB benchmark. Kickfire delivers this performance with a 3 year total system cost of $48,790 USD. The Kickfire Database Appliance is in Beta and will be available October 14, 2008.

 

TPC-H, QphH and $/QphH are trademarks of the TPC. For additional information on the TPCH benchmark, please visit the Transaction Processing Performance Council’s Web site at http://www.tpc.org/.

 

 

Wrapping up our Launch at the MySQL Conference

April 23rd, 2008
Posted by Karl Van den Bergh

My name is Karl Van den Bergh — I do Business Development here at Kickfire. I’ll be joining Raj on our corporate blog adding my comments to what is happening at our company and in our marketplace.

What a great conference (my first) and what a great venue it was to have launched our company and beta product.

Now that I have switched from the dark side of commercial software to the open source world, my eyes have been opened to the power of the community. Specifically, the success of our launch can, to a large degree, be attributed to the community.

Over the last couple of weeks I have heard comments in the blogosphere to the effect that Kickfire has a great marketing machine. One blog noted that Kickfire had “brought Web 2.0 Marketing to the Database World.” Whereas our marketing team will certainly take pride in these comments (and should for all the hard work that went into this launch), the reality is that our interaction with the community over the last year was driven by our keen desire to build a product that satisfied the needs of users trying to leverage MySQL for data warehousing, querying, and reporting. Through our interactions with many of the bright minds in the MySQL world we were able to tune our roadmap and add or remove features in order to deliver a product that would be the most relevant to users now.

During the course of these conversations, our technology generated a certain degree of interest. That interest led to a flurry of posts (over twenty in the last two weeks) and articles (about 10) which helped drive traffic to our booth and conference sessions (we had nearly 400 visitors who came to the booth — of course the Wii’s and Jawbones we were giving away didn’t hurt either) — all of which helped put Kickfire on the map in a very short time. Two weeks ago you wouldn’t have found us if you did a Google search (of course, the fact that we hadn’t launched our site didn’t help) whereas now 7 of the first 10 Google hits are related to our company and product.

The point is that working with a community that has a real thirst for technology and innovation and that is so connected, as is the MySQL community, brought us some unexpected (and very welcome) benefits. We realize that with this engagement also comes a responsibility. Namely, there is an expectation of transparency and openness from vendors who enter into this world as these are values which are fundamental to the Open Source movement. In the last year we have tried to guide our work and our interactions based on these principles and will continue to try to do as we move forward from here. Needless to say, if we don’t the community will be the first to let us know.

To wrap up, this has been one heck of a launch. We are very excited to be in the MySQL world. We are passionate about our technology and we believe there is a phenomenal opportunity ahead. Thanks again to all our friends in the community. We look forward to continuing the work together.  

 

Kickfire Launch

April 14th, 2008
Posted by Raj Cherabuddi

Today, we officially launched Kickfire. As part of our announcement we published, together with Sun Microsystems, record-breaking TPC-H benchmark numbers (data warehousing industry benchmarks) as well as a series of significant partnerships in the Open Source world.

There has been a lot of work here over the last two years to get us to this point and I am very proud of the team for getting us to where we are today. Two years ago we just had a vision; today that vision became reality – one substantiated by independent industry benchmarks.

For those of you unfamiliar with these benchmarks let me give you a brief overview to explain why we are so excited about the results.

TPC-H is a broad data warehousing industry benchmark created by the Transaction Processing Performance Council. This benchmark includes a schema, data types, and queries that represent broad reference workloads within the data warehousing world. For example, the benchmark includes ad hoc queries and concurrent data modifications. The benchmarks measure response times on single stream queries as well as query throughput with concurrent users. In order to pass the benchmark the database engine needs to be fully ACID compliant and must go through a stringent audit process.

So, what were our results? At the 100GB data size, our first benchmark, we made MySQL #1 in price/performance and #1 in performance in the non-clustered (or single node) category. Essentially, what this means is that for a given budget, Kickfire delivers the highest performance of any data warehousing solution on the market today. And our starting price points are such that they are within the budget range of the mass market. For example, the cost of the second best vendor in the price/performance category is $261, 623. Kickfire’s offering in this category is $34,435, an order of magnitude less.

Along with highest performance per dollar, we have demonstrated the highest performance per RU (rack unit = 1.7 inches) and the highest performance per Watt. Also, the total storage relative to user data size is 5.44 which is extremely low compared to other high performance systems. For example, for Microsoft’s highest performance offering the ratio is 64.8, i.e. Microsoft needs nearly storage that is 65 times the size of user data in order to get performance.

So, what is the net net? As some of you may hear in my keynote at the MySQL conference on Wednesday this week where we are launching the company, we are all about delivering data warehousing for MySQL that is faster, cheaper, greener than any other solution on the market today. Because our appliances consume significantly less hardware, storage and power than competing offerings, their operating costs are an order of magnitude less as is their impact on the environment.

What we hope to bring to the MySQL world (those using MySQL today or who would like to use it), is an appliance that delivers high performance out of the box (i.e. without the need for tedious tuning) for reporting, ad hoc queries, and data warehousing and is affordable to the broadest range of organizations possible.

For those of you interested in learning more and are able to come to the MySQL conference, I look forward to seeing you there. Those who cannot come can of course find resources such as a white paper on our site or refer to some of the excellent posts by our friends in the community (for example Baron Schwartz here, Keith Murphy here).

Welcome to Kickfire!

April 9th, 2008
Posted by Raj Cherabuddi

I’d like to welcome you all to Kickfire. My name is Raj Cherabuddi. I am the co-founder and CEO. Joe Chamdani, my co-founder, and I founded Kickfire back in 2006. Since then we, along with our amazing team, have been working extremely hard to bring a revolutionary new technology to market which we believe will change the way people think of data warehousing.
Read the rest of this entry »