function v = fastrln(v0, x, lambda, nActs) %FASTRLN Generates reinforcement learning expectations minimizing loops %written by Ryan Jessup, 20090216 %v = FASTRLN(v0,x,lambda,nGens) generates reinforcement learning trial by %trial expectations v which is a nTrial by nActs matrix, based on the %initialized scalar value v0, and a nTrial by 2 matrix x with the %observed action value in column 1 and the observed outcome in column 2, %and the scalar learning rate lambda, and the number of actions nActs. %This function loops through nActs as opposed to nTrials through the use of %a digital filter. If there is only one action function fastrl should %be used. %v0 nActs initial expectation vector for v %x nTrial by 2 matrix with the action in column 1 and the observed %outcomes in column 2. The actions should be numbered consecutively 1 %through nActs. %lambda scalar learning parameter. %nActs scalar number of actions used to produce the expectations %v nTrial by nActs matrix of expectations. When an action is used to %produce an outcome, the expectation for that action is updated for the %next trial; otherwise, an unused action retains the same expectation. %set the weights on preceding information in a form the filter uses a=[1 lambda-1]; %modify the v0; this must be done because otherwise the filter does not %correctly apply the initial expectation modv0=v0./lambda; for i=1:nActs %assumes that options are labeled consecutively 1 thru nActs %Filter the outcome data for the ith action. The modified v0 has %been appended onto the outcome data. This will result in a vector %that has length equal to the number of times the action was used to %produce an outcome. vTemp = filter(lambda,a,[modv0(i); x((x(:,1)==i),2)]); %Obtain a cumulative sum for the number of times the action was used %and use it as an index to obtain the appropriate cell from vTemp in %which to look. 1 is added to all values because a value of 0 from the %cumsum is found in vTemp(1). This will result in a nTrial length %vector which is then placed in the ith column. v(:,i) = vTemp(cumsum(x(:,1)==i)+1); end v = [v0; v(1:end-1,:)];