Spreadsheet Modelling Craft at SUU - A Living Textbook

Transpose

2011-05-26T16:26:00.001-07:00

Under "Paste Special" is an option to change columns to rows and rows to columns when you copy and paste.

Read Malcolm Gladwell

2011-05-26T16:25:00.001-07:00

Risk Optimizer

2011-05-26T16:24:00.001-07:00

This ships with @Risk.

It is a program to simultaneously do Monte Carlo and an optimization routine (like Solver).

In Regression, More RHS Variables Make Each Estimate Less Sharp

2011-05-25T16:37:00.000-07:00

The technical term is multicollinearity.

It's a good thing to start a regression big, but you have to remember to prune it down.

If you don't, what you get is larger standard errors for everything which makes them all look insignificant.

Power Curves Are Common

2011-05-25T16:34:00.000-07:00

These are related to Pareto distributions ...

Power curves describe a lot of real world phenomenon like the distribution of talent, and tipping points, and the effects of advertising.

N.B. Go read Malcolm Gladwell's The Tipping Point.

Deterministic vs. Stochastic Models

2011-05-25T16:26:00.000-07:00

Most Excel models are determinimstic: the inputs don't fluctuate, so the outputs don't either.

But in the real world they both do.

Old School: using scenario manager to vary inputs, or doing it by hand.

New School: use Monte Carlo add-ons like @Risk or Crystal Ball to automate and keep track of simulations that generate probability distributions of outcomes.

@Risk Might be Worth Buying

2011-05-25T16:23:00.001-07:00

Monte Carlo Simulation

2011-05-25T16:22:00.000-07:00

Use these to delineate what outcomes are possible and which are not possible.

=Or() Statements

2011-05-25T16:20:00.000-07:00

Use these to yield "True" only if every argument is satisfied.

Standardize the Form of Your Models for Solver

2011-05-24T16:28:00.000-07:00

Solver doesn't require you to set up your model in any particular way ... but a user is going to have to input target/objective, changing cells/decisions, and constraints, and should be able to find them easily.

How to Evaluate Your Regression Overall

2011-05-24T16:27:00.000-07:00

Look at "significance F". This is a p-value for the whole regression, and should be below 0.05.

Regression Modelling Strategy

2011-05-24T16:25:00.000-07:00

Old School: start with one X, and add them one by one until happy with the regression. Con: the smaller models that you're building on are probably biased, and may lead you in the wrong direction.

New School: start with all (your available) X's, and delete them one by one until happy with the regression. Con: when you start out your model looks like junk, and may continue to look like junk for a long time as you prune it.

Dr. Tufte prefers the latter method. The risks of omitting variables tend to be worse than those of including irrelevant ones ... so I lean towards using the extra computing power required by the New School method.

Neither method solves the problem of what variable to add/cut first, or whether to do them in groups or one by one.

Better Understanding of Regression Output

2011-05-24T16:21:00.000-07:00

Luke learned more about interpreting t-ratios, p-values, r-squared, and adjusted R-squared.

Regression Tool

2011-05-24T16:19:00.001-07:00

1) It has most of what you need.
2) It includes stuff you may not need, or may use inappropriately.

Solver was Pretty Cool Too

2011-05-24T16:18:00.000-07:00

Goal seek is great if you need to find one unknown (as in an algebra problem).

Use Solver when you need to 1) find more than one unknown, or 2) optimize (as in calculus).

Sumproduct Is Pretty Cool

2011-05-24T16:17:00.000-07:00

Use this to take multiple products that you want summed up.

The Mean Is Related to the Sum of Squared Deviations (Sum of Squares)

2011-05-23T16:34:00.001-07:00

We can calculate deviations (of data) from any value. Then we can square those, and sum them.

The mean is the value that minimizes the sum of squared deviations.

Solver Is an Add-On that Can Solve a Problem for Many Unknowns

2011-05-23T16:32:00.001-07:00

Solver is like Goal Seek on steroids.

Statistical Tests

2011-05-23T16:31:00.000-07:00

There are 4 types of tests (all easy to construct in Excel).

We do two: Wald tests, and likelihood ratio tests.

All Wald tests are constructed by taking the difference between what we observe and what we hypothesize, and dividing that by a "ruler": the standard error. If the result is "big", we know we've found something.

Maximum likelihood tests are usually known at this level as F or chi-square tests.

Relating Variables with Best Fit Lines

2011-05-23T16:29:00.000-07:00

No relationship from X to Y corresponds to a line with zero slope.

Some relationship from X to Y corresponds to a line with non-zero slope.

N.B. When we do regressions, we assume causality goes from X to Y.

Data Has Moments

2011-05-23T16:27:00.000-07:00

Moments are powers of the data.

They can be non-central or centralized (around the mean).

The reason for calculating moments, is that they are the unifying calculation for mean, variance, skewness, kurtosis, sum of squares, and standard deviations.

These are easy to do in a spreadsheet.

Stocks Have Fat Tails

2011-05-23T16:25:00.000-07:00

We commonly say stock returns have "fat" tails.

This isn't really true. The distribution has long tails, and broad shoulders, but not really fat tails.

Kurtosis

2011-05-23T16:23:00.001-07:00

Easier to calculate than you may think.

Leptokurtosis (excess) is common in financial and economic data. It means that there are too many observations that are not in the center of the distribution, without being way out in the tails either.

Choose Confidence Intervals or Hypothesis Tests, but Not Both

2011-05-19T18:23:00.000-07:00

Confidence intervals and hypothesis tests use the same pieces of information in different ways.

Doing both might make you think you're getting twice as many results, but you're really getting the same result stated two different ways.

This is scientism.

You're Not Alone: Everyone's Statistics Understanding Falls Apart Half Way Through the First Semester

2011-05-19T18:21:00.000-07:00

The reason is that you spend the first half learning about different distributions, and you miss that the central limit theorem implies that you only need a small handful when you start working with (some) summary-statistics-about-the-sample instead of the whole sample.