Archive

Archive for the ‘Applications’ Category

Supercomputing 2010 – Fast Image Compression Demo

December 7th, 2010 No comments

Check out our demo from Supercomputing 2010. We compare our fast qsvd function vs. Matlab’s svd for image compression.

Speedups range from 46x to 3,000x on high-res images as large as 117 megapixels!

http://massiveanalytics.com/blog/wp-content/uploads/2010/11/qla_sc10_demo.mp4

Weapons-Grade SVD: Fast & Scalable

November 3rd, 2010 No comments

In this post we’ll show how QLA can accelerate your SVD computations by orders of magnitude.

QLA provides the qsvd function, which computes a fast SVD with very small error tolerance. You can use it wherever you would use Matlab’s built-in svd.

To benchmark performance we measure runtimes on 30 real-world data matrices ranging in size from 43K to 154M entries. See the data table for descriptions and download links.

We compare qsvd against Matlab’s svd with the ‘econ’ option (‘econ’ skips the extra computation of a null space basis). Timings are obtained as follows:


>> load example.mat
>> tic; [u,s,v] = qsvd(mat); toc
Elapsed time is 0.1234 seconds.
>> tic; [u,s,v] = svd(mat, 'econ'); toc
Elapsed time is 6.4312 seconds.

This comparison is conveniently wrapped in compare_qsvd.m, which you can use like this:


>> compare_qsvd(mat)
% compare_qsvd(mat, epsilon) if using a tolerance other than the default 0.01

We ran the comparison at several different qsvd error tolerances (.01, .025, .05) to capture a range of speedups that might be seen in different applications. Results are summarized in this plot:

These results speak for themselves. We see that qsvd accelerates most cases by 10x – 1,000x (note the log scale). A few cases are accelerated only at higher error tolerances, but the majority have significant speedup at all levels.

Why does this matter to you? The SVD is a slow bottleneck inside many applications – optimizations, model fitting, image and signal processing, search engines, data analytics, etc. Drop in QLA’s qsvd, and suddenly you can run 20x faster or handle 100x larger problems. That’s nuclear impact. Why not grab the free version and try it?


Matrices used in qsvd experiments, in descending order of speedup
dataset size
galaxy 40,000 x 3,839
mars 1,783 x 2,000
declaration 4,656 x 3,923
bcsstk24 3,562 x 3,562
frey 560 x 1,965
madelon 4,400 x 500
dor 800 x 6,063
sonar_scatter 208 x 208
secom 1,567 x 590
pumsb 49,046 x 74
connect 67,577 x 43
regression 124,202 x 37
pumsb_star 49,046 x 63
PIE_32x32 11,554 x 1,025
umist 575 x 10,304
ozone 2,536 x 73
timit 138,839 x 40
olivetti 4,096 x 400
moon 256 x 384
arrhythmia 452 x 280
arcene 800 x 10,000
ailerons 13,750 x 41
gisette 12,500 x 5,000
tic 9,822 x 86
elevators 16,599 x 19
corel 61,634 x 16
isolet 7,797 x 618
usps 11,000 x 256
poker 1,025,010 x 11
mnist_reg500 70,000 x 784
Categories: Applications Tags: , , , ,

Linear Systems: Fast & Robust

September 14th, 2010 No comments

Our next expanded demo focuses on qlsqr, our least-squares linear system solver. We compare against Matlab’s built-in solver, the “\” operator. Matlab’s solver is reasonably fast, but is easily thrown off by ill-conditioned matrices. We’ll show that qlsqr can run even faster while being much more robust to ill conditioning.

splitting a matrix into 80% solve and 20% test sets

Converting a data matrix into a linear system with solve/test sections.

Each test matrix consists of real data – sensor measurements, survey data, etc. We convert each matrix into a linear system by treating the last column as b in Ax = b, then solve for x. You can think of it as a regression of the last column against the other columns; the point is to set up a least-squares problem with real data.

We ran speed and quality comparisons using the code in compare_qlsqr.m. It splits each matrix using the function split_data (included), which a) randomly splits rows into 80%/20% sets, and b) separates out the last column of each row-set for use as a target. We then time the solvers on the 80% set, and evaluate their accuracy (RMSE) on the 20% test set.

Each of the data matrices is listed with a download link, its size, and the error tolerance at which we ran qlsqr in the dataset table. You can replicate our results by running:


>> load example.mat
>> compare_qlsqr(mat)
% compare_qlsqr(mat, epsilon) if using a tolerance other than the default 0.01

First the speedup results. We chose the error tolerances to maximize speed without losing robustness. This plot shows the distribution of speedups over Matlab:

qlsqr speedup over Matlab

In almost every case qlsqr is at least 2x faster than Matlab, and in 50% of cases it’s faster by 10x or more. So the speedup is substantial. What about solution quality?

We measure solution quality and robustness by RMSE performance on the test set. While the quality of the QLA solution relative to the input matrix is guaranteed (see the user doc for details), more often we are concerned about performance on future data. The test-set RMSE simulates this, and is a key robustness indicator showing how well solutions will perform on novel inputs, whether for a control system, a prediction problem, or some other setting.

The following plot shows the ratio of Matlab vs. qlsqr test-set RMSE for each matrix – values greater than 1 indicate qlsqr has superior (lower) test-set RMSE:

qlsqr robustness vs. Matlab

The results speak for themselves, but it’s worth pointing out by what a large margin QLA outperforms Matlab on ill conditioned matrices – ~10-90x in the most extreme 20% of cases.

QLA’s tolerance for a small amount of error allows it to ignore process noise that Matlab’s solver tries to fit. The gain in speed and robustness is a big win in many applications, though if you absolutely must fit to the last decimal place it may not be what you need (keep in mind the error tolerance is adjustable).

We really encourage folks to try this solver – not only is it extremely fast, its solutions tend to perform extremely well on new inputs, and this can make a big difference in applications. That’s some low hanging fruit.


Datasets used in qlsqr experiments, in descending order of speedup
dataset size epsilon
galaxy 40,000 x 3,839 .01
declaration 4,656 x 3,923 .025
mars 1,783 x 2,000 .1
arcene 800 x 10,000 .25
umist 575 x 10,304 .25
madelon 4,400 x 500 .01
gisette 12,500 x 5,000 .45
bcsstk24 3,562 x 3,562 .25
dor 800 x 6,063 .01
secom 1,567 x 590 .25
pumsb_star 49,046 x 63 .1
sonar_scatter 208 x 208 .01
frey 560 x 1,965 .01
regression 124,202 x 37 .01
moon 256 x 384 .01
arrhythmia 452 x 280 .25
tic 9,822 x 86 .25
ozone 2,536 x 73 .01
PIE_32x32 11,554 x 1,025 .01
olivetti 4,096 x 400 .01
timit 138,839 x 40 .01
pumsb 49,046 x 74 .0000001
poker 1,025,010 x 11 .01
ailerons 13,750 x 41 .0005
isolet 7,797 x 618 .05
usps 11,000 x 256 .1
connect 67,577 x 43 0.00001
elevators 16,599 x 19 .01
corel 61,634 x 16 .01

Fast PCA on Large Datasets

August 30th, 2010 No comments

We’ll kick things off with a series of posts showing expanded versions of the demo applications included with QLA. In this first post we’ll cover qpca, our fast PCA function.

PCA is a standard data analysis technique that computes “important” directions (principal components) that capture most of the data variance. Decorrelation, dimension reduction, and a variety of other useful operations can be accomplished with the principal components, making it a standard part of any data analysis toolkit.

One drawback, however, is that PCA is expensive on large matrices: order [latex]O(mn^2)[/latex], with [latex]n[/latex] being the smaller of the two matrix dimensions. With QLA we can do much better: qpca computes an accurate PCA approximation in a fraction of the time.

To demonstrate, we’ll compare the speed and quality of qpca vs. an efficient Matlab PCA for a variety of real-world datasets. Dataset download links are provided in the results table below. You can replicate our results using the following trio of Matlab scripts (all included in the QLA package): pca.m, compare_qpca.m, and suptitle.m

First the speed comparisons. The comparison scripts is invoked like this:

>> load mars.mat
>> compare_qpca(mat)
 

Here are the results for our suite of datasets:

dataset size PCA qpca speedup
galaxy 40,000 x 3,840 33.90 min 3.99 sec 510.4
mars 1,783 x 2,000 96.55 sec .58 sec 166.0
dor 800 x 6,063 6.81 sec .25 sec 27.6
declaration 4,656 x 3,923 17.97 min 48.47 sec 22.3
pie_32 11,554 x 1,024 27.6 sec 1.34 sec 20.6
phoneme 138,839 x 40 1.01 sec .13 sec 7.6
arrhythmia 452 x 280 .11 sec .018 sec 6.1
corel 61,634 x 16 .096 sec .040 sec 2.4

Not bad, especially for the larger matrices. Now let’s examine result quality. QLA obtains speedup by making fast approximations with tight quality control. You can find mathematical guarantees in the user doc, but for PCA one easy check is to project the matrix rows onto pairs of principal components generated by qpca. If our approximation is high quality as guaranteed, these should closely match the corresponding projections from the Matlab PCA.

The compare_qpca script generates these plots automatically. Here are some of the more interesting-looking results:

mars

dor

declaration

As you can see, they match up quite well.

Now consider this: what possibilities are enabled by a 20x or 500x speedup? Realtime analytics on an avalanche of streaming data, fast PCA on massive simulation output, decorrelation and reduction of extreme high-dimensional data…

All on a commodity workstation. That’s supercomputing speed on any computer.

Categories: Applications Tags: , , , ,

Massive Analytics Blog is Digg proof thanks to caching by WP Super Cache