tag:blogger.com,1999:blog-23486403706962388622024-02-20T17:58:51.506-08:00Spreadsheet Modelling Craft at SUU - A Living TextbookDr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.comBlogger73125tag:blogger.com,1999:blog-2348640370696238862.post-33166549601549929462011-05-26T16:26:00.001-07:002011-05-26T16:26:40.379-07:00TransposeUnder "Paste Special" is an option to change columns to rows and rows to columns when you copy and paste.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-8567202906879373662011-05-26T16:25:00.001-07:002011-05-26T16:25:35.642-07:00Read Malcolm GladwellDr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-44960397106379080462011-05-26T16:24:00.001-07:002011-05-26T16:24:56.070-07:00Risk OptimizerThis ships with @Risk.<br />
<br />
It is a program to simultaneously do Monte Carlo and an optimization routine (like Solver).Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-56082270489638315432011-05-25T16:37:00.000-07:002011-05-25T16:37:06.480-07:00In Regression, More RHS Variables Make Each Estimate Less SharpThe technical term is multicollinearity.<br />
<br />
It's a good thing to start a regression big, but you have to remember to prune it down.<br />
<br />
If you don't, what you get is larger standard errors for everything which makes them all look insignificant.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-74980401014349585172011-05-25T16:34:00.000-07:002011-05-25T16:34:27.739-07:00Power Curves Are CommonThese are related to Pareto distributions ...<br />
<br />
Power curves describe a lot of real world phenomenon like the distribution of talent, and tipping points, and the effects of advertising.<br />
<br />
N.B. Go read Malcolm Gladwell's <em>The Tipping Point</em>.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-54514567965274087472011-05-25T16:26:00.000-07:002011-05-25T16:26:04.564-07:00Deterministic vs. Stochastic ModelsMost Excel models are determinimstic: the inputs don't fluctuate, so the outputs don't either. <br />
<br />
But in the real world they both do.<br />
<br />
Old School: using scenario manager to vary inputs, or doing it by hand.<br />
<br />
New School: use Monte Carlo add-ons like @Risk or Crystal Ball to automate and keep track of simulations that generate probability distributions of outcomes.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-89554786908132702712011-05-25T16:23:00.001-07:002011-05-25T16:23:05.045-07:00@Risk Might be Worth BuyingDr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-76023534448597022612011-05-25T16:22:00.000-07:002011-05-25T16:22:52.082-07:00Monte Carlo SimulationUse these to delineate what outcomes are possible and which are not possible.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-19363456589792330722011-05-25T16:20:00.000-07:002011-05-25T16:20:17.875-07:00=Or() StatementsUse these to yield "True" only if every argument is satisfied.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-11516591837117432202011-05-24T16:28:00.000-07:002011-05-24T16:28:49.724-07:00Standardize the Form of Your Models for SolverSolver doesn't require you to set up your model in any particular way ... but a user is going to have to input target/objective, changing cells/decisions, and constraints, and should be able to find them easily.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-67875231244492153602011-05-24T16:27:00.000-07:002011-05-24T16:27:19.704-07:00How to Evaluate Your Regression OverallLook at "significance F". This is a p-value for the whole regression, and should be below 0.05.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-59882087275106858692011-05-24T16:25:00.000-07:002011-05-24T16:25:09.166-07:00Regression Modelling StrategyOld School: start with one X, and add them one by one until happy with the regression. Con: the smaller models that you're building on are probably biased, and may lead you in the wrong direction.<br />
<br />
New School: start with all (your available) X's, and delete them one by one until happy with the regression. Con: when you start out your model looks like junk, and may continue to look like junk for a long time as you prune it.<br />
<br />
Dr. Tufte prefers the latter method. The risks of omitting variables tend to be worse than those of including irrelevant ones ... so I lean towards using the extra computing power required by the New School method.<br />
<br />
Neither method solves the problem of what variable to add/cut first, or whether to do them in groups or one by one.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-41198796271561263152011-05-24T16:21:00.000-07:002011-05-24T16:21:04.105-07:00Better Understanding of Regression OutputLuke learned more about interpreting t-ratios, p-values, r-squared, and adjusted R-squared.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-2569599760304280112011-05-24T16:19:00.001-07:002011-05-24T16:19:11.328-07:00Regression Tool1) It has most of what you need.<br />
2) It includes stuff you may not need, or may use inappropriately.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-60454069661571305162011-05-24T16:18:00.000-07:002011-05-24T16:18:26.830-07:00Solver was Pretty Cool TooGoal seek is great if you need to find one unknown (as in an algebra problem).<br />
<br />
Use Solver when you need to 1) find more than one unknown, or 2) optimize (as in calculus).Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-10952289795242591542011-05-24T16:17:00.000-07:002011-05-24T16:17:34.766-07:00Sumproduct Is Pretty CoolUse this to take multiple products that you want summed up.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-71284533113869266132011-05-23T16:34:00.001-07:002011-05-23T16:34:47.331-07:00The Mean Is Related to the Sum of Squared Deviations (Sum of Squares)We can calculate deviations (of data) from any value. Then we can square those, and sum them.<br />
<br />
The mean is the value that minimizes the sum of squared deviations.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-69902299544979766202011-05-23T16:32:00.001-07:002011-05-23T16:32:42.230-07:00Solver Is an Add-On that Can Solve a Problem for Many UnknownsSolver is like Goal Seek on steroids.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-76899570816914250992011-05-23T16:31:00.000-07:002011-05-23T16:31:45.327-07:00Statistical TestsThere are 4 types of tests (all easy to construct in Excel).<br />
<br />
We do two: Wald tests, and likelihood ratio tests.<br />
<br />
All Wald tests are constructed by taking the difference between what we observe and what we hypothesize, and dividing that by a "ruler": the standard error. If the result is "big", we know we've found something.<br />
<br />
Maximum likelihood tests are usually known at this level as F or chi-square tests.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-13349485961379657452011-05-23T16:29:00.000-07:002011-05-23T16:29:26.381-07:00Relating Variables with Best Fit LinesNo relationship from X to Y corresponds to a line with zero slope.<br />
<br />
Some relationship from X to Y corresponds to a line with non-zero slope.<br />
<br />
N.B. When we do regressions, we assume causality goes from X to Y.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-73033957957805837062011-05-23T16:27:00.000-07:002011-05-23T16:27:47.959-07:00Data Has MomentsMoments are powers of the data.<br />
<br />
They can be non-central or centralized (around the mean).<br />
<br />
The reason for calculating moments, is that they are the unifying calculation for mean, variance, skewness, kurtosis, sum of squares, and standard deviations.<br />
<br />
These are easy to do in a spreadsheet.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-42712040158075769012011-05-23T16:25:00.000-07:002011-05-23T16:25:18.608-07:00Stocks Have Fat TailsWe commonly say stock returns have "fat" tails. <br />
<br />
This isn't really true. The distribution has long tails, and broad shoulders, but not really fat tails.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-67496679263799176292011-05-23T16:23:00.001-07:002011-05-23T16:23:56.368-07:00KurtosisEasier to calculate than you may think.<br />
<br />
Leptokurtosis (excess) is common in financial and economic data. It means that there are too many observations that are not in the center of the distribution, without being way out in the tails either.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-50172523624214782092011-05-19T18:23:00.000-07:002011-05-19T18:23:16.577-07:00Choose Confidence Intervals or Hypothesis Tests, but Not BothConfidence intervals and hypothesis tests use the same pieces of information in different ways.<br />
<br />
Doing both might make you think you're getting twice as many results, but you're really getting the same result stated two different ways.<br />
<br />
This is scientism.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0tag:blogger.com,1999:blog-2348640370696238862.post-13251060030857839082011-05-19T18:21:00.000-07:002011-05-19T18:21:04.259-07:00You're Not Alone: Everyone's Statistics Understanding Falls Apart Half Way Through the First SemesterThe reason is that you spend the first half learning about different distributions, and you miss that the central limit theorem implies that you only need a small handful when you start working with (some) summary-statistics-about-the-sample instead of the whole sample.Dr. Tuftehttp://www.blogger.com/profile/17397586052171706438noreply@blogger.com0