<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Trahald.com</title>
	<atom:link href="http://www.trahald.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.trahald.com</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Fri, 24 Jul 2009 11:03:36 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Bug in model oddly causes improvement</title>
		<link>http://www.trahald.com/bug-in-model-oddly-causes-improvement/</link>
		<comments>http://www.trahald.com/bug-in-model-oddly-causes-improvement/#comments</comments>
		<pubDate>Fri, 24 Jul 2009 11:02:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.trahald.com/?p=55</guid>
		<description><![CDATA[I created a pseudo-SVD++(3) model (mf_time.cpp in my framework) - one without the &#124;N(u)&#124;^(-.5)*Yi feedback term, and also without the alpha_u*dev_u_hat term (which I found to make the RMSE worse). However the one I initially wrote had a bug in the training and use of the alpha_u_k*dev_u_hat term (the term that is added to the [...]]]></description>
			<content:encoded><![CDATA[<p>I created a pseudo-SVD++(3) model (mf_time.cpp in my <a href="http://www.trahald.com/kadri-framework/">framework</a>) - one without the <b>|N(u)|^(-.5)*Yi</b> feedback term, and also without the <b>alpha_u*dev_u_hat</b> term (which I found to make the RMSE worse). However the one I initially wrote had a bug in the training and use of the <b>alpha_u_k*dev_u_hat</b> term (the term that is added to the user features - not to be confused with the <b>alpha_u*dev_u_hat</b> term).</p>
<p>Surprisingly, and confusingly, this bug actually improved the probe RMSE significantly over the correct code.</p>
<p>The code within the training loop was:</p>
<pre class="cpp">movie_features<span style="color: #000000;">&#91;</span>movieid<span style="color: #0000dd;">-1</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> += LRATE2 * <span style="color: #000000;">&#40;</span>err*<span style="color: #000000;">&#40;</span>uf_old+alpha_uk_old*dev_u_hat<span style="color: #000000;">&#41;</span> - LAMBDA2*mf_old<span style="color: #000000;">&#41;</span>;
alpha_u_k<span style="color: #000000;">&#91;</span>i<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> = LRATE2 * <span style="color: #000000;">&#40;</span>err*dev_u_hat*mf_old - LAMBDA2*alpha_uk_old<span style="color: #000000;">&#41;</span>;</pre>
<p>And within the prediction function:</p>
<pre class="cpp">sum += movie_features<span style="color: #000000;">&#91;</span>movieid<span style="color: #0000dd;">-1</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> * <span style="color: #000000;">&#40;</span>user_features<span style="color: #000000;">&#91;</span>u<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> + alpha_u_k<span style="color: #000000;">&#91;</span>u<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#41;</span>;</pre>
<p>In the training, there was a "+" plus sign absent - it should have been "alpha_u_k[i][f] +=" rather than just "=". And in the prediction function, the "*dev_u_hat" term was missing. It should have been:</p>
<pre class="cpp">sum += movie_features<span style="color: #000000;">&#91;</span>movieid<span style="color: #0000dd;">-1</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> * <span style="color: #000000;">&#40;</span>user_features<span style="color: #000000;">&#91;</span>u<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> + alpha_u_k<span style="color: #000000;">&#91;</span>u<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span>*dev_u_hat<span style="color: #000000;">&#41;</span>;</pre>
<p>Yet despite these errors, I achieved a probe RMSE of 0.901871 with 100 features. When I actually corrected the code, the RMSE surprisingly increased to 0.907681 (also with 100 features).</p>
<p>I also tried a model where the training step matched the incorrect prediction function. That is, I changed the code to:</p>
<pre class="cpp">movie_features<span style="color: #000000;">&#91;</span>movieid<span style="color: #0000dd;">-1</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> += LRATE2 * <span style="color: #000000;">&#40;</span>err*<span style="color: #000000;">&#40;</span>uf_old+alpha_uk_old<span style="color: #000000;">&#41;</span> - LAMBDA2*mf_old<span style="color: #000000;">&#41;</span>;
alpha_u_k<span style="color: #000000;">&#91;</span>i<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> += LRATE2 * <span style="color: #000000;">&#40;</span>err*mf_old - LAMBDA2*alpha_uk_old<span style="color: #000000;">&#41;</span>;
...
<span style="color: #00eeff;">sum</span> += movie_features<span style="color: #000000;">&#91;</span>movieid<span style="color: #0000dd;">-1</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> * <span style="color: #000000;">&#40;</span>user_features<span style="color: #000000;">&#91;</span>u<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span> + alpha_u_k<span style="color: #000000;">&#91;</span>u<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span>f<span style="color: #000000;">&#93;</span><span style="color: #000000;">&#41;</span>;</pre>
<p>I tested with 10 features. The RMSEs I got:</p>
<p><code>Correct model	0.921176<br />
Bad model and bad training	0.913509<br />
Bad model, proper training	0.917931</code></p>
<p>So the bad model with training that mis-matched the prediction formula actually outperformed the algo where the training step matched the incorrect algorithm (with dev_u_hat absent).</p>
<p>I find this very befuddling, as there is no reason I can think of which the wrong formula with a training step that doesn't even match the formula, would outperform not only the correct model, but also one where the training step was fixed.</p>
<p>The incorrect training step makes the contributions of the term seem pretty random to me - yet if they were random, they shouldn't improve the RMSE. If there is some pattern there, then I can't think of it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.trahald.com/bug-in-model-oddly-causes-improvement/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kadri Framework</title>
		<link>http://www.trahald.com/kadri-framework/</link>
		<comments>http://www.trahald.com/kadri-framework/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 04:44:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.trahald.com/?p=46</guid>
		<description><![CDATA[The Kadri Framework is meant for the analysis of the Netflix data set. It is based on Icefox's Netflix Recommender Framework. It uses Icefox's framework to build the core data files, and expands on it to allow pre-processing, the use of dates, and blending.

Requirements:

Qt
LAPACK
64-bit OS (or 32-bit OS with increased RAM limits for processes) and [...]]]></description>
			<content:encoded><![CDATA[<p>The Kadri Framework is meant for the analysis of the Netflix data set. It is based on Icefox's <a href="http://github.com/icefox/netflixrecommenderframework/tree/master" target="_blank">Netflix Recommender Framework</a>. It uses Icefox's framework to build the core data files, and expands on it to allow pre-processing, the use of dates, and blending.<br />
<br />
Requirements:</p>
<ul>
<li>Qt</li>
<li>LAPACK</li>
<li>64-bit OS (or 32-bit OS with increased RAM limits for processes) and 64-bit compiler</li>
</ul>
<p>The download includes Average, Globals, KNN, Matrix Factorization, and blending classes. All of these extend the Algorithm class, which contains the Algorithm::predict(movieid, userid, votedate) and Algorithm::runProbe() functions. To create a new algorithm, make a new class (see the included classes for guidance) centered around the predict() function. The setMovie() and determine() functions are just empty functions added for compatibility with the existing code. You can then use that algorithm to predict the data sets via Algorithm::runProbe() and Algorithm::runQualifying() (see algorithm.cpp for details). You can even run an algorithm on the training set using Algorithm::runTaining() if you wish.<br />
<br />
Use the Algorithm::buildPreProcessor("fileprefix") function to build pre-processing files, containing residuals that can be loaded and used to generate models of their own. Note that these files are another ~800MB that must be mapped into memory, so you need  lots of RAM to do this.<br />
<br />
The blending classes are algorithms of their own, and can be used just like any other model. The ::setup(...) functions take in the names of the preprocessor files and perform the blending calculations based on the probe set, with the first argument needing to be the number of models being blended. The code below is an example that will generate residuals for the Average and Matrix Factorization algorithms, then blend then and run the result on the probe set.</p>
<hr />
<pre class="cpp">Average avg<span style="color: #000000;">&#40;</span>&amp;db<span style="color: #000000;">&#41;</span>;
avg.<span style="color: #00eeff;">buildPreProcessor</span><span style="color: #000000;">&#40;</span><span style="color: #666666;">&quot;somefolder/averagefilename&quot;</span><span style="color: #000000;">&#41;</span>;
&nbsp;
Matrix_Factorization mf<span style="color: #000000;">&#40;</span>&amp;db<span style="color: #000000;">&#41;</span>;
mf.<span style="color: #00eeff;">calculate_incr</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>;
mf.<span style="color: #00eeff;">buildPreProcessor</span><span style="color: #000000;">&#40;</span><span style="color: #666666;">&quot;somefolder/matrixfilename&quot;</span><span style="color: #000000;">&#41;</span>;
&nbsp;
Blend blend<span style="color: #000000;">&#40;</span>&amp;db<span style="color: #000000;">&#41;</span>;
blend.<span style="color: #00eeff;">setUp</span><span style="color: #000000;">&#40;</span><span style="color: #0000dd;">2</span>, <span style="color: #666666;">&quot;somefolder/averagefilename&quot;</span>, <span style="color: #666666;">&quot;somefolder/matrixfilename&quot;</span><span style="color: #000000;">&#41;</span>;
blend.<span style="color: #00eeff;">runProbe</span>;</pre>
<hr />
You can download the framework here:<br />
Note: There is a file embedded within this post, please visit this post to download the file.<br />
See README.txt for setup instructions.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.trahald.com/kadri-framework/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>kNN Source Sode</title>
		<link>http://www.trahald.com/knn-source-sode/</link>
		<comments>http://www.trahald.com/knn-source-sode/#comments</comments>
		<pubDate>Sat, 28 Jun 2008 21:53:35 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Netflix competition]]></category>

		<guid isPermaLink="false">http://www.trahald.com/?p=45</guid>
		<description><![CDATA[Source code implementing the k-NN K Nearest Neighbors algorithm can be found at this post.
It implements Icefox's framework, details of which are in this thread.
]]></description>
			<content:encoded><![CDATA[<p>Source code implementing the k-NN K Nearest Neighbors algorithm can be found at <a href="http://www.netflixprize.com/community/viewtopic.php?pid=6822#p6822">this post</a>.</p>
<p>It implements Icefox's framework, details of which are in <a href="http://www.netflixprize.com/community/viewtopic.php?id=352">this thread</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.trahald.com/knn-source-sode/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
