Thursday, May 26, 2011
Transpose
Under "Paste Special" is an option to change columns to rows and rows to columns when you copy and paste.
Risk Optimizer
This ships with @Risk.
It is a program to simultaneously do Monte Carlo and an optimization routine (like Solver).
It is a program to simultaneously do Monte Carlo and an optimization routine (like Solver).
Wednesday, May 25, 2011
In Regression, More RHS Variables Make Each Estimate Less Sharp
The technical term is multicollinearity.
It's a good thing to start a regression big, but you have to remember to prune it down.
If you don't, what you get is larger standard errors for everything which makes them all look insignificant.
It's a good thing to start a regression big, but you have to remember to prune it down.
If you don't, what you get is larger standard errors for everything which makes them all look insignificant.
Power Curves Are Common
These are related to Pareto distributions ...
Power curves describe a lot of real world phenomenon like the distribution of talent, and tipping points, and the effects of advertising.
N.B. Go read Malcolm Gladwell's The Tipping Point.
Power curves describe a lot of real world phenomenon like the distribution of talent, and tipping points, and the effects of advertising.
N.B. Go read Malcolm Gladwell's The Tipping Point.
Deterministic vs. Stochastic Models
Most Excel models are determinimstic: the inputs don't fluctuate, so the outputs don't either.
But in the real world they both do.
Old School: using scenario manager to vary inputs, or doing it by hand.
New School: use Monte Carlo add-ons like @Risk or Crystal Ball to automate and keep track of simulations that generate probability distributions of outcomes.
But in the real world they both do.
Old School: using scenario manager to vary inputs, or doing it by hand.
New School: use Monte Carlo add-ons like @Risk or Crystal Ball to automate and keep track of simulations that generate probability distributions of outcomes.
Monte Carlo Simulation
Use these to delineate what outcomes are possible and which are not possible.
Tuesday, May 24, 2011
Standardize the Form of Your Models for Solver
Solver doesn't require you to set up your model in any particular way ... but a user is going to have to input target/objective, changing cells/decisions, and constraints, and should be able to find them easily.
How to Evaluate Your Regression Overall
Look at "significance F". This is a p-value for the whole regression, and should be below 0.05.
Regression Modelling Strategy
Old School: start with one X, and add them one by one until happy with the regression. Con: the smaller models that you're building on are probably biased, and may lead you in the wrong direction.
New School: start with all (your available) X's, and delete them one by one until happy with the regression. Con: when you start out your model looks like junk, and may continue to look like junk for a long time as you prune it.
Dr. Tufte prefers the latter method. The risks of omitting variables tend to be worse than those of including irrelevant ones ... so I lean towards using the extra computing power required by the New School method.
Neither method solves the problem of what variable to add/cut first, or whether to do them in groups or one by one.
New School: start with all (your available) X's, and delete them one by one until happy with the regression. Con: when you start out your model looks like junk, and may continue to look like junk for a long time as you prune it.
Dr. Tufte prefers the latter method. The risks of omitting variables tend to be worse than those of including irrelevant ones ... so I lean towards using the extra computing power required by the New School method.
Neither method solves the problem of what variable to add/cut first, or whether to do them in groups or one by one.
Better Understanding of Regression Output
Luke learned more about interpreting t-ratios, p-values, r-squared, and adjusted R-squared.
Regression Tool
1) It has most of what you need.
2) It includes stuff you may not need, or may use inappropriately.
2) It includes stuff you may not need, or may use inappropriately.
Solver was Pretty Cool Too
Goal seek is great if you need to find one unknown (as in an algebra problem).
Use Solver when you need to 1) find more than one unknown, or 2) optimize (as in calculus).
Use Solver when you need to 1) find more than one unknown, or 2) optimize (as in calculus).
Monday, May 23, 2011
The Mean Is Related to the Sum of Squared Deviations (Sum of Squares)
We can calculate deviations (of data) from any value. Then we can square those, and sum them.
The mean is the value that minimizes the sum of squared deviations.
The mean is the value that minimizes the sum of squared deviations.
Solver Is an Add-On that Can Solve a Problem for Many Unknowns
Solver is like Goal Seek on steroids.
Statistical Tests
There are 4 types of tests (all easy to construct in Excel).
We do two: Wald tests, and likelihood ratio tests.
All Wald tests are constructed by taking the difference between what we observe and what we hypothesize, and dividing that by a "ruler": the standard error. If the result is "big", we know we've found something.
Maximum likelihood tests are usually known at this level as F or chi-square tests.
We do two: Wald tests, and likelihood ratio tests.
All Wald tests are constructed by taking the difference between what we observe and what we hypothesize, and dividing that by a "ruler": the standard error. If the result is "big", we know we've found something.
Maximum likelihood tests are usually known at this level as F or chi-square tests.
Relating Variables with Best Fit Lines
No relationship from X to Y corresponds to a line with zero slope.
Some relationship from X to Y corresponds to a line with non-zero slope.
N.B. When we do regressions, we assume causality goes from X to Y.
Some relationship from X to Y corresponds to a line with non-zero slope.
N.B. When we do regressions, we assume causality goes from X to Y.
Data Has Moments
Moments are powers of the data.
They can be non-central or centralized (around the mean).
The reason for calculating moments, is that they are the unifying calculation for mean, variance, skewness, kurtosis, sum of squares, and standard deviations.
These are easy to do in a spreadsheet.
They can be non-central or centralized (around the mean).
The reason for calculating moments, is that they are the unifying calculation for mean, variance, skewness, kurtosis, sum of squares, and standard deviations.
These are easy to do in a spreadsheet.
Stocks Have Fat Tails
We commonly say stock returns have "fat" tails.
This isn't really true. The distribution has long tails, and broad shoulders, but not really fat tails.
This isn't really true. The distribution has long tails, and broad shoulders, but not really fat tails.
Kurtosis
Easier to calculate than you may think.
Leptokurtosis (excess) is common in financial and economic data. It means that there are too many observations that are not in the center of the distribution, without being way out in the tails either.
Leptokurtosis (excess) is common in financial and economic data. It means that there are too many observations that are not in the center of the distribution, without being way out in the tails either.
Thursday, May 19, 2011
Choose Confidence Intervals or Hypothesis Tests, but Not Both
Confidence intervals and hypothesis tests use the same pieces of information in different ways.
Doing both might make you think you're getting twice as many results, but you're really getting the same result stated two different ways.
This is scientism.
Doing both might make you think you're getting twice as many results, but you're really getting the same result stated two different ways.
This is scientism.
You're Not Alone: Everyone's Statistics Understanding Falls Apart Half Way Through the First Semester
The reason is that you spend the first half learning about different distributions, and you miss that the central limit theorem implies that you only need a small handful when you start working with (some) summary-statistics-about-the-sample instead of the whole sample.
The Distribution of Averages Is Different (and Better) than the Distribution of Samples
This is the central limit theorem.
Advanced Filtering with "Ors"
Excel can't do "ors" for filtering unless you do advanced filtering with a criteria.
The Sound of Popcorn Popping Is the Central Limit Theorem
Microwave popcorn works (and many other things work) because their behavior is explained by the Central Limit Theorem.
Wednesday, May 18, 2011
Excel Can Operate as a Primitive Database
Many times we keep data in Excel that really ought to be in a database program.
Excel has tools for dealing with this: sorting, filtering, pivot tables, and so on.
Excel has tools for dealing with this: sorting, filtering, pivot tables, and so on.
Goal Seek
Goal seek allows you to solve for an unknown value in your spreadsheet such that it causes some other cell to take on a specific value. We introduced this for break-even analysis, and then used it to calculate internal rate of return.
Scenario Manager
You can keep track of sets of inputs as a group, that you can swap in and out with other sets of inputs.
Scientism
I'm doing scientism!
Scientism is using a surface veneer of numbers, jargon, and scientific sounding claims to do something that isn't really science ... basically, you're hoping to shut down discussion by making people think you have serious/hard science backing you up.
Funny chart.
Scientism is using a surface veneer of numbers, jargon, and scientific sounding claims to do something that isn't really science ... basically, you're hoping to shut down discussion by making people think you have serious/hard science backing you up.
Funny chart.
Tuesday, May 17, 2011
You Can Track Changes In Your Spreadsheets
Dr. Tufte used "track changes" to keep track of the edits that each of us made to each others' exams during rotations.
Use the Formula Evaluation Tool
The formula evaluation tool will break down your (more complex) formula into single steps, so that you can find the one that is the problem.
You Can Name Ranges as Opposed to Cells
If all information in a column (or row) comes from the same data source, you can name the entire column (or row). This makes doing averages, sums, and so on, easier.
Table of Contents for Workbooks
Why not make one sheet the Table of Contents for a multi-sheet workbook? You can use hyperlinks to make this more functional.
Hyperlinks
You can use hyperlinks inside a spreadsheet to navigate from one sheet or cell to another.
Trace Precedents and Trace Descendents
For checking formulas, on the Formula tab of the ribbon, are 2 tools to trace how a formula is related to other cells.
Don't Always Protect Your Sheet
It's very common to protect more cells than is appropriate, and interfere with the ability of users to use your spreadsheet at all.
Protect Your Sheet
If downstream users will be using your spreadsheet, protect them with a password, and lock or hide cells as appropriate.
P.S. Every cell is locked by default, so as soon as you set a password, that lock becomes effective.
P.S. Every cell is locked by default, so as soon as you set a password, that lock becomes effective.
Monday, May 16, 2011
Sometimes Less Is More
You don't have to use every tool you know in every spreadsheet to get a decent working answer.
Subtotal Is Useful
=Subtotal() does subtotalling like you'd do by hand, unlike =sum(), in that it doesn't double-count earlier subtotals.
It also has a lot of extra functionality besides summing.
It also has a lot of extra functionality besides summing.
R1C1 Addressing
Conventionally, we label cells in Excel by their column letter and row number. This works great for most situations.
But, it works poorly if we have formulas in different cells that are repetitive; especially if they mix absolute and relative addresses.
Excel has an alternative addressing scheme for this, that you can turn on through the Formulas tab on the Option menu under "File" on the ribbon. It's called R1C1. R1C1 addresses state all formulas in terms of rows or columns from the current position. This is extremely useful for repetitive formulas becuase oftentimes you are referencing cells that are the same number of rows and columns from the cell where the formula is: so that all the formulas look just about the same.
But, it works poorly if we have formulas in different cells that are repetitive; especially if they mix absolute and relative addresses.
Excel has an alternative addressing scheme for this, that you can turn on through the Formulas tab on the Option menu under "File" on the ribbon. It's called R1C1. R1C1 addresses state all formulas in terms of rows or columns from the current position. This is extremely useful for repetitive formulas becuase oftentimes you are referencing cells that are the same number of rows and columns from the cell where the formula is: so that all the formulas look just about the same.
Cells Can Have (Plain English) Names
Formulas can be hard to understand if they are written in terms of cell addresses.
You can give any cell a more natural name using the Name Manager under the Formulas tab of the ribbon.
You can also name ranges. A set of contiguous cells is called a range in Excel.
I didn't cover this in class. But, an example of where I use this is in my gradebooks. I might put all the exam scores for a large class in AC11.AC65. If I highlight that range and call it Exam, then when I want a class average I can type average(exam) instead of average(AC11.AC65). It's less cumbersome, and there's less room for error.
You can give any cell a more natural name using the Name Manager under the Formulas tab of the ribbon.
You can also name ranges. A set of contiguous cells is called a range in Excel.
I didn't cover this in class. But, an example of where I use this is in my gradebooks. I might put all the exam scores for a large class in AC11.AC65. If I highlight that range and call it Exam, then when I want a class average I can type average(exam) instead of average(AC11.AC65). It's less cumbersome, and there's less room for error.
Save Macros Into Templates
Any macro you write (to save time) can be included in a template you use to create new files so that it is there when you need it.
Make and Use Templates
You can, and should, create a template for everyday use (and for any other repetitive type of file you create).
A template is a "base" file that contains whatever you think you'll need when you create a new files. Some ideas are 1) a font choice, 2) an appropriate number of blank sheets, 3) any macros you may need, and 4) the buttons on the Quick Launch Toolbar that you like.
A template is a "base" file that contains whatever you think you'll need when you create a new files. Some ideas are 1) a font choice, 2) an appropriate number of blank sheets, 3) any macros you may need, and 4) the buttons on the Quick Launch Toolbar that you like.
Comments that Aren't Comments
Regular comments can be viewed by anyone: both the developer and the user of the spreadsheet.
You can make a comment that is much harder for someone else to read by putting
You can make a comment that is much harder for someone else to read by putting
+n("Your comment here.")after the formula in that cell. This comment won't show at all unless someone edits your formula.
Wednesday, May 11, 2011
Spreadsheets Have a Longer Lifetime Than We Expect When We Write Them
Ed's spreasheet was written 6 years ago, in Alberta, for a job he no longer has, and is still being used.
We Have a High Opinion of Our Own Spreadsheets
We're all above average drivers, and authors of spreadsheets. This is a problem.
We All Use Models
Many people don't admit that they have a mental model, or that it could be improved by writing it down.
Tuesday, May 10, 2011
Group Inputs and Outputs Together
Inputs certainly at the top.
Outputs either at the bottom, or both at the bottom and at the top.
Outputs either at the bottom, or both at the bottom and at the top.
Break Formulas Into Smaller Pieces (Modularization)
Try to make every formula result simple enough so that you can check it by eye or by hand.
No Numbers In Formulas
Do not bury a number inside a cell with a formula (a constant might be OK).
Instead, put the number in its own cell, and reference it in the formula with a cell address.
Instead, put the number in its own cell, and reference it in the formula with a cell address.
Choose Your Font Like You Care
Consolas ships with Windows Vista and 7 and is designed for spreadsheets.
Subscribe to:
Posts (Atom)