<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Massive Analytics Blog</title>
	<atom:link href="http://massiveanalytics.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://massiveanalytics.com/blog</link>
	<description>Supercomputing speed on any computer. QLA for MATLAB and more.</description>
	<lastBuildDate>Sun, 14 Aug 2011 05:18:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>Supercomputing 2010 &#8211; Fast Image Compression Demo</title>
		<link>http://massiveanalytics.com/blog/2010/12/supercomputing-2010-fast-image-compression-demo/</link>
		<comments>http://massiveanalytics.com/blog/2010/12/supercomputing-2010-fast-image-compression-demo/#comments</comments>
		<pubDate>Tue, 07 Dec 2010 07:12:36 +0000</pubDate>
		<dc:creator>mph</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[fast]]></category>
		<category><![CDATA[image compression]]></category>
		<category><![CDATA[qsvd]]></category>
		<category><![CDATA[speedup]]></category>
		<category><![CDATA[SVD]]></category>

		<guid isPermaLink="false">http://massiveanalytics.com/blog/?p=363</guid>
		<description><![CDATA[Check out our demo from Supercomputing 2010. We compare our fast qsvd function vs. Matlab&#8217;s svd for image compression. Speedups range from 46x to 3,000x on high-res images as large as 117 megapixels!]]></description>
			<content:encoded><![CDATA[<p>Check out our demo from <a href="http://sc10.supercomputing.org/">Supercomputing 2010</a>. We compare our fast<strong> <code>qsvd</code></strong> function vs. Matlab&#8217;s <strong><code>svd</code></strong> for image compression.</p>
<p>Speedups range from <strong>46x to 3,000x</strong> on high-res images as large as 117 megapixels!</p>
<p><a href="http://massiveanalytics.com/blog/2010/12/supercomputing-2010-fast-image-compression-demo/"><em>Click here to view the embedded video.</em></a></p>
]]></content:encoded>
			<wfw:commentRss>http://massiveanalytics.com/blog/2010/12/supercomputing-2010-fast-image-compression-demo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://massiveanalytics.com/blog/wp-content/uploads/2010/11/qla_sc10_demo.mp4" length="1135336" type="video/mp4" />
		</item>
		<item>
		<title>Weapons-Grade SVD: Fast &amp; Scalable</title>
		<link>http://massiveanalytics.com/blog/2010/11/weapons-grade-svd-fast-scalable/</link>
		<comments>http://massiveanalytics.com/blog/2010/11/weapons-grade-svd-fast-scalable/#comments</comments>
		<pubDate>Wed, 03 Nov 2010 06:39:32 +0000</pubDate>
		<dc:creator>mph</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[fast]]></category>
		<category><![CDATA[qsvd]]></category>
		<category><![CDATA[speedup]]></category>
		<category><![CDATA[SVD]]></category>

		<guid isPermaLink="false">http://massiveanalytics.com/blog/?p=261</guid>
		<description><![CDATA[In this post we&#8217;ll show how QLA can accelerate your SVD computations by orders of magnitude. QLA provides the qsvd function, which computes a fast SVD with very small error tolerance. You can use it wherever you would use Matlab&#8217;s built-in svd. To benchmark performance we measure runtimes on 30 real-world data matrices ranging in [...]]]></description>
			<content:encoded><![CDATA[<p>In this post we&#8217;ll show how QLA can accelerate your SVD computations by orders of magnitude.</p>
<div style="margin-top:1em; margin-bottom:1.5em;">
<img src="http://massiveanalytics.com/blog/wp-content/uploads/2010/10/nuke-300x199.jpg" alt="" title="weapons-grade" width="300" height="199" class="aligncenter size-medium wp-image-284" />
</div>
<p>QLA provides the <code>qsvd</code> function, which computes a fast SVD with very small error tolerance. You can use it wherever you would use Matlab&#8217;s built-in <code>svd</code>.</p>
<p>To benchmark performance we measure runtimes on 30 real-world data matrices ranging in size from 43K to 154M entries. See the <a href="#qsvd_data_table">data table</a> for descriptions and download links.</p>
<p>We compare <code>qsvd</code> against Matlab&#8217;s <code>svd</code> with the &#8216;econ&#8217; option (&#8216;econ&#8217; skips the extra computation of a null space basis). Timings are obtained as follows:</p>
<div style="margin-top:0em; margin-bottom:1.5em;">
<code><br />
>> load example.mat<br />
>> tic; [u,s,v] = qsvd(mat); toc<br />
Elapsed time is 0.1234 seconds.<br />
>> tic; [u,s,v] = svd(mat, 'econ'); toc<br />
Elapsed time is 6.4312 seconds.<br />
</code>
</div>
<p>This comparison is conveniently wrapped in <a href="http://massiveanalytics.com/mfiles/compare_qsvd.m">compare_qsvd.m</a>, which you can use like this:</p>
<div style="margin-top:0em; margin-bottom:1.5em">
<code><br />
>> compare_qsvd(mat)<br />
<span style="color:green">% compare_qsvd(mat, epsilon) if using a tolerance other than the default 0.01</span><br />
</code>
</div>
<p>We ran the comparison at several different <code>qsvd</code> error tolerances (.01, .025, .05) to capture a range of speedups that might be seen in different applications. Results are summarized in this plot:</p>
<p><img src="http://massiveanalytics.com/blog/wp-content/uploads/2010/11/qsvd_speedup.png" alt="" title="qsvd vs. Matlab svd: speedup" width="655" height="368" class="aligncenter size-full wp-image-304" /></p>
<p>These results speak for themselves. We see that <code>qsvd</code> accelerates most cases by 10x &#8211; 1,000x (note the log scale). A few cases are accelerated only at higher error tolerances, but the majority have significant speedup at all levels.</p>
<p>Why does this matter to you? The SVD is a slow bottleneck inside many applications &#8211; optimizations, model fitting, image and signal processing, search engines, data analytics, etc. Drop in QLA&#8217;s <code>qsvd</code>, and suddenly you can run 20x faster or handle 100x larger problems. That&#8217;s nuclear impact. Why not <b><a href="http://massiveanalytics.com/download.html">grab the free version</a></b> and try it?</p>
<hr/>
<table id="qsvd_data_table" style="margin-top:1em; margin-bottom: 2em; margin-left:auto; margin-right:auto;">
<caption>Matrices used in <b><code>qsvd</code></b> experiments, in descending order of speedup</caption>
<tr style="font-weight:bold">
<td>dataset</td>
<td>size</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/galaxy.mat">galaxy</a></td>
<td>40,000 x 3,839</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/mars.mat">mars</a></td>
<td>1,783 x 2,000</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/declaration.mat">declaration</a></td>
<td>4,656 x 3,923</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/bcsstk24.mat">bcsstk24</a></td>
<td>3,562 x 3,562</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/frey.mat">frey</a></td>
<td>560 x 1,965</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/madelon.mat">madelon</a></td>
<td>4,400 x 500</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/dor.mat">dor</a></td>
<td>800 x 6,063</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/sonar_scatter.mat">sonar_scatter</a></td>
<td>208 x 208</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/secom.mat">secom</a></td>
<td>1,567 x 590</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/pumsb.mat">pumsb</a></td>
<td>49,046 x 74</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/connect.mat">connect</a></td>
<td>67,577 x 43</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/regression.mat">regression</a></td>
<td>124,202 x 37</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/pumsb_star.mat">pumsb_star</a></td>
<td>49,046 x 63</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/PIE_32x32.mat">PIE_32x32</a></td>
<td>11,554 x 1,025</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/umist.mat">umist</a></td>
<td>575 x 10,304</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/ozone.mat">ozone</a></td>
<td>2,536 x 73</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/timit.mat">timit</a></td>
<td>138,839 x 40</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/olivetti.mat">olivetti</a></td>
<td>4,096 x 400</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/moon.mat">moon</a></td>
<td>256 x 384</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/arrhythmia.mat">arrhythmia</a></td>
<td>452 x 280</td>
</tr>
<tr>
<tr>
<td><a href="http://massiveanalytics.com/data/arcene.mat">arcene</a></td>
<td>800 x 10,000</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/ailerons.mat">ailerons</a></td>
<td>13,750 x 41</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/gisette.mat">gisette</a></td>
<td>12,500 x 5,000</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/tic.mat">tic</a></td>
<td>9,822 x 86</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/elevators.mat">elevators</a></td>
<td>16,599 x 19</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/corel.mat">corel</a></td>
<td>61,634 x 16</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/isolet.mat">isolet</a></td>
<td>7,797 x 618</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/usps.mat">usps</a></td>
<td>11,000 x 256</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/poker.mat">poker</a></td>
<td>1,025,010 x 11</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/mnist_reg500.mat">mnist_reg500</a></td>
<td>70,000 x 784</td>
</tr>
</table>
]]></content:encoded>
			<wfw:commentRss>http://massiveanalytics.com/blog/2010/11/weapons-grade-svd-fast-scalable/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linear Systems: Fast &amp; Robust</title>
		<link>http://massiveanalytics.com/blog/2010/09/linear-systems-fast-robust/</link>
		<comments>http://massiveanalytics.com/blog/2010/09/linear-systems-fast-robust/#comments</comments>
		<pubDate>Tue, 14 Sep 2010 05:14:29 +0000</pubDate>
		<dc:creator>mph</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[fast]]></category>
		<category><![CDATA[least-squares]]></category>
		<category><![CDATA[linear solver]]></category>
		<category><![CDATA[qlsqr]]></category>
		<category><![CDATA[robust]]></category>
		<category><![CDATA[speedup]]></category>

		<guid isPermaLink="false">http://massiveanalytics.com/blog/?p=136</guid>
		<description><![CDATA[Our next expanded demo focuses on qlsqr, our least-squares linear system solver. We compare against Matlab&#8217;s built-in solver, the &#8220;\&#8221; operator. Matlab&#8217;s solver is reasonably fast, but is easily thrown off by ill-conditioned matrices. We&#8217;ll show that qlsqr can run even faster while being much more robust to ill conditioning. Each test matrix consists of [...]]]></description>
			<content:encoded><![CDATA[<p>Our next expanded demo focuses on <code>qlsqr</code>, our least-squares linear system solver. We compare against Matlab&#8217;s built-in solver, the &#8220;<code>\</code>&#8221; operator. Matlab&#8217;s solver is reasonably fast, but is easily thrown off by ill-conditioned matrices. We&#8217;ll show that <code>qlsqr</code> can run even faster while being much more robust to ill conditioning.</p>
<div style="margin-top:1.5em; margin-bottom:1.5em">
<div id="attachment_204" class="wp-caption aligncenter" style="width: 460px"><img src="http://massiveanalytics.com/blog/wp-content/uploads/2010/09/solve_test_split1.png" alt="splitting a matrix into 80% solve and 20% test sets" title="solve_test_split" width="318" height="178" class="size-full wp-image-204" /><p class="wp-caption-text">Converting a data matrix into a linear system with solve/test sections.</p></div>
</div>
<p>Each test matrix consists of real data &#8211; sensor measurements, survey data, etc. We convert each matrix into a linear system by treating the last column as <code>b</code> in <code>Ax = b</code>, then solve for <code>x</code>. You can think of it as a regression of the last column against the other columns; the point is to set up a least-squares problem with real data.</p>
<p>We ran speed and quality comparisons using the code in <a href="http://massiveanalytics.com/mfiles/compare_qlsqr.m">compare_qlsqr.m</a>. It splits each matrix using the function <code>split_data</code> (included), which a) randomly splits rows into 80%/20% sets, and b) separates out the last column of each row-set for use as a target. We then time the solvers on the 80% set, and evaluate their accuracy (RMSE) on the 20% test set.</p>
<p>Each of the data matrices is listed with a download link, its size, and the error tolerance at which we ran <code>qlsqr</code> in the <a href="#qlsqr_data_table">dataset table</a>. You can replicate our results by running:</p>
<div style="margin-top:0em; margin-bottom:2em">
<code><br />
>> load example.mat<br />
>> compare_qlsqr(mat)<br />
<span style="color:green">% compare_qlsqr(mat, epsilon) if using a tolerance other than the default 0.01</span><br />
</code>
</div>
<p>First the speedup results. We chose the error tolerances to maximize speed without losing robustness. This plot shows the distribution of speedups over Matlab:</p>
<p><img src="http://massiveanalytics.com/blog/wp-content/uploads/2010/09/qlsqr_speedup.png" alt="qlsqr speedup over Matlab" title="qlsqr speedup" width="464" height="368" class="aligncenter size-full wp-image-183" /></p>
<p>In almost every case <code>qlsqr</code> is at least 2x faster than Matlab, and in 50% of cases it&#8217;s faster by 10x or more. So the speedup is substantial. What about solution quality?</p>
<p>We measure solution quality and robustness by RMSE performance on the test set. While the quality of the QLA solution relative to the input matrix is guaranteed (see the <a href="http://massiveanalytics.com/files/qla_doc.pdf">user doc</a> for details), more often we are concerned about performance on future data. The test-set RMSE simulates this, and is a key robustness indicator showing how well solutions will perform on novel inputs, whether for a control system, a prediction problem, or some other setting.</p>
<p>The following plot shows the ratio of Matlab vs. <code>qlsqr</code> test-set RMSE for each matrix &#8211; values greater than 1 indicate <code>qlsqr</code> has superior (lower) test-set RMSE:</p>
<p><img src="http://massiveanalytics.com/blog/wp-content/uploads/2010/09/qlsqr_robustness.png" alt="qlsqr robustness vs. Matlab" title="qlsqr robustness" width="464" height="368" class="aligncenter size-full wp-image-182" /></p>
<p>The results speak for themselves, but it&#8217;s worth pointing out by what a large margin QLA outperforms Matlab on ill conditioned matrices &#8211; ~10-90x in the most extreme 20% of cases.</p>
<p>QLA&#8217;s tolerance for a small amount of error allows it to ignore process noise that Matlab&#8217;s solver tries to fit. The gain in speed and robustness is a big win in many applications, though if you absolutely must fit to the last decimal place it may not be what you need (keep in mind the error tolerance is adjustable).</p>
<p>We really encourage folks to try this solver &#8211; not only is it extremely fast, its solutions tend to perform extremely well on new inputs, and this can make a big difference in applications. That&#8217;s some low hanging fruit.</p>
<hr/>
<table id="qlsqr_data_table" style="margin-top:1em; margin-bottom: 2em; margin-left:auto; margin-right:auto;">
<caption>Datasets used in <code>qlsqr</code> experiments, in descending order of speedup</caption>
<tr style="font-weight:bold">
<td>dataset</td>
<td>size</td>
<td>epsilon</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/galaxy.mat">galaxy</a></td>
<td>40,000 x 3,839</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/declaration.mat">declaration</a></td>
<td>4,656 x 3,923</td>
<td>.025</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/mars.mat">mars</a></td>
<td>1,783 x 2,000</td>
<td>.1</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/arcene.mat">arcene</a></td>
<td>800 x 10,000</td>
<td>.25</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/umist.mat">umist</a></td>
<td>575 x 10,304</td>
<td>.25</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/madelon.mat">madelon</a></td>
<td>4,400 x 500</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/gisette.mat">gisette</a></td>
<td>12,500 x 5,000</td>
<td>.45</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/bcsstk24.mat">bcsstk24</a></td>
<td>3,562 x 3,562</td>
<td>.25</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/dor.mat">dor</a></td>
<td>800 x 6,063</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/secom.mat">secom</a></td>
<td>1,567 x 590</td>
<td>.25</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/pumsb_star.mat">pumsb_star</a></td>
<td>49,046 x 63</td>
<td>.1</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/sonar_scatter.mat">sonar_scatter</a></td>
<td>208 x 208</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/frey.mat">frey</a></td>
<td>560 x 1,965</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/regression.mat">regression</a></td>
<td>124,202 x 37</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/moon.mat">moon</a></td>
<td>256 x 384</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/arrhythmia.mat">arrhythmia</a></td>
<td>452 x 280</td>
<td>.25</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/tic.mat">tic</a></td>
<td>9,822 x 86</td>
<td>.25</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/ozone.mat">ozone</a></td>
<td>2,536 x 73</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/PIE_32x32.mat">PIE_32x32</a></td>
<td>11,554 x 1,025</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/olivetti.mat">olivetti</a></td>
<td>4,096 x 400</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/timit.mat">timit</a></td>
<td>138,839 x 40</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/pumsb.mat">pumsb</a></td>
<td>49,046 x 74</td>
<td>.0000001</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/poker.mat">poker</a></td>
<td>1,025,010 x 11</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/ailerons.mat">ailerons</a></td>
<td>13,750 x 41</td>
<td>.0005</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/isolet.mat">isolet</a></td>
<td>7,797 x 618</td>
<td>.05</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/usps.mat">usps</a></td>
<td>11,000 x 256</td>
<td>.1</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/connect.mat">connect</a></td>
<td>67,577 x 43</td>
<td>0.00001</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/elevators.mat">elevators</a></td>
<td>16,599 x 19</td>
<td>.01</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/corel.mat">corel</a></td>
<td>61,634 x 16</td>
<td>.01</td>
</tr>
</table>
]]></content:encoded>
			<wfw:commentRss>http://massiveanalytics.com/blog/2010/09/linear-systems-fast-robust/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fast PCA on Large Datasets</title>
		<link>http://massiveanalytics.com/blog/2010/08/fast-pca-on-large-datasets/</link>
		<comments>http://massiveanalytics.com/blog/2010/08/fast-pca-on-large-datasets/#comments</comments>
		<pubDate>Mon, 30 Aug 2010 14:28:10 +0000</pubDate>
		<dc:creator>mph</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[fast]]></category>
		<category><![CDATA[PCA]]></category>
		<category><![CDATA[qpca]]></category>
		<category><![CDATA[speedup]]></category>

		<guid isPermaLink="false">http://massiveanalytics.com/blog/?p=30</guid>
		<description><![CDATA[We&#8217;ll kick things off with a series of posts showing expanded versions of the demo applications included with QLA. In this first post we&#8217;ll cover qpca, our fast PCA function. PCA is a standard data analysis technique that computes &#8220;important&#8221; directions (principal components) that capture most of the data variance. Decorrelation, dimension reduction, and a [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ll kick things off with a series of posts showing expanded versions of the demo applications included with QLA. In this first post we&#8217;ll cover <code>qpca</code>, our fast PCA function.</p>
<p><img src="http://massiveanalytics.com/blog/wp-content/uploads/2010/08/dor-teaser.png" alt="" title="dor-teaser" width="417" height="179" class="aligncenter size-full wp-image-83" /></p>
<p>PCA is a <a href="http://en.wikipedia.org/wiki/Principal_component_analysis">standard data analysis technique</a> that computes &#8220;important&#8221; directions (principal components) that capture most of the data variance. Decorrelation, dimension reduction, and a variety of other useful operations can be accomplished with the principal components, making it a standard part of any data analysis toolkit.</p>
<p>One drawback, however, is that PCA is expensive on large matrices: order [latex]O(mn^2)[/latex], with [latex]n[/latex] being the smaller of the two matrix dimensions. With QLA we can do much better: <code>qpca</code> computes an accurate PCA approximation in a fraction of the time.</p>
<p>To demonstrate, we&#8217;ll compare the speed and quality of <code>qpca</code> vs. an efficient Matlab PCA for a variety of real-world datasets. Dataset download links are provided in the results table below. You can replicate our results using the following trio of Matlab scripts (all included in the QLA package): <a href="http://massiveanalytics.com/mfiles/pca.m">pca.m</a>, <a href="http://massiveanalytics.com/mfiles/compare_qpca.m">compare_qpca.m</a>, and <a href="http://massiveanalytics.com/mfiles/suptitle.m">suptitle.m</a></p>
<p>First the speed comparisons. The comparison scripts is invoked like this:<br />
<code><br />
>> load mars.mat<br />
>> compare_qpca(mat)<br />
&nbsp;<br />
</code></p>
<p>Here are the results for our suite of datasets:</p>
<table style="margin-bottom: 2em">
<tr style="font-weight:bold">
<td>dataset</td>
<td>size</td>
<td>PCA</td>
<td>qpca</td>
<td>speedup</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/galaxy.mat">galaxy</a></td>
<td>40,000 x 3,840</td>
<td>33.90 min</td>
<td>3.99 sec</td>
<td>510.4</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/mars.mat">mars</a></td>
<td>1,783 x 2,000</td>
<td>96.55 sec</td>
<td>.58 sec</td>
<td>166.0</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/dor.mat">dor</a></td>
<td>800 x 6,063</td>
<td>6.81 sec</td>
<td>.25 sec</td>
<td>27.6</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/declaration.mat">declaration</a></td>
<td>4,656 x 3,923</td>
<td>17.97 min</td>
<td>48.47 sec</td>
<td>22.3</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/PIE_32x32.mat">pie_32</a></td>
<td>11,554 x 1,024</td>
<td>27.6 sec</td>
<td>1.34 sec</td>
<td>20.6</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/phoneme.mat">phoneme</a></td>
<td>138,839 x 40</td>
<td>1.01 sec</td>
<td>.13 sec</td>
<td>7.6</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/arrhythmia.mat">arrhythmia</a></td>
<td>452 x 280</td>
<td>.11 sec</td>
<td>.018 sec</td>
<td>6.1</td>
</tr>
<tr>
<td><a href="http://massiveanalytics.com/data/corel.mat">corel</a></td>
<td>61,634 x 16</td>
<td>.096 sec</td>
<td>.040 sec</td>
<td>2.4</td>
</tr>
</table>
<p>Not bad, especially for the larger matrices. Now let&#8217;s examine result quality. QLA obtains speedup by making fast approximations with tight quality control. You can find mathematical guarantees in the <a href="http://massiveanalytics.com/files/qla_doc.pdf">user doc</a>, but for PCA one easy check is to project the matrix rows onto pairs of principal components generated by <code>qpca</code>. If our approximation is high quality as guaranteed, these should closely match the corresponding projections from the Matlab PCA.</p>
<p>The compare_qpca script generates these plots automatically. Here are some of the more interesting-looking results:</p>
<p><b>mars</b><br />
<img src="http://massiveanalytics.com/blog/wp-content/uploads/2010/08/mars.png" alt="" title="mars" width="450" height="354" class="alignnone size-full wp-image-85" /></p>
<p><b>dor</b><br />
<img src="http://massiveanalytics.com/blog/wp-content/uploads/2010/08/dor.png" alt="" title="dor" width="450" height="354" class="alignnone size-full wp-image-82" /></p>
<p><b>declaration</b><br />
<img src="http://massiveanalytics.com/blog/wp-content/uploads/2010/08/declaration1.png" alt="" title="declaration" width="450" height="354" class="alignnone size-full wp-image-100" /></p>
<p>As you can see, they match up quite well.</p>
<p>Now consider this: what possibilities are enabled by a 20x or 500x speedup? Realtime analytics on an avalanche of streaming data, fast PCA on massive simulation output, decorrelation and reduction of extreme high-dimensional data&#8230; </p>
<p>All on a commodity workstation. That&#8217;s supercomputing speed on any computer.</p>
]]></content:encoded>
			<wfw:commentRss>http://massiveanalytics.com/blog/2010/08/fast-pca-on-large-datasets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ma_blog: init</title>
		<link>http://massiveanalytics.com/blog/2010/08/ma_blog-init/</link>
		<comments>http://massiveanalytics.com/blog/2010/08/ma_blog-init/#comments</comments>
		<pubDate>Mon, 30 Aug 2010 14:24:46 +0000</pubDate>
		<dc:creator>mph</dc:creator>
				<category><![CDATA[Announcements]]></category>

		<guid isPermaLink="false">http://massiveanalytics.com/blog/?p=8</guid>
		<description><![CDATA[We&#8217;re starting this blog to showcase QLA: The Quick Linear Algebra Library, our accelerated linear algebra package for Matlab, to demonstrate what it can do and how to do it. Our main site and documentation convey the basics, but here in the blog we&#8217;ll delve deeper. We plan to demo a wide variety of applications, [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re starting this blog to showcase <a href="http://massiveanalytics.com">QLA: The Quick Linear Algebra Library</a>, our accelerated linear algebra package for Matlab, to demonstrate what it can do and how to do it.</p>
<p>Our <a href="http://massiveanalytics.com">main site</a> and <a href="http://massiveanalytics.com/files/qla_doc.pdf">documentation</a> convey the basics, but here in the blog we&#8217;ll delve deeper. We plan to demo a wide variety of applications, along with offering various tips and tricks on how to get the most out of QLA.</p>
<p>Our goal is to show you how to use the supercomputing speed of QLA to boost your applications to the next level. To this end, <strong>our demo posts will always have code and data available</strong> so you can replicate and build on our examples. We&#8217;ll be exploring the unknown, finding new application areas, and hopefully setting new speed records along the way.</p>
<p>We definitely want an ongoing conversation on these topics and will be very responsive to comments and suggestions. In particular, let us know if there are applications you&#8217;d like us to try, especially if you have data to share.</p>
<p>Our aim is to bring supercomputing speeds to regular computers; follow along and you&#8217;ll see what we mean.</p>
]]></content:encoded>
			<wfw:commentRss>http://massiveanalytics.com/blog/2010/08/ma_blog-init/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
