Quantcast
Channel: Tutorials – Win-Vector Blog
Browsing all 302 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Minimal Version Control Lesson: Use It

There is no excuse for a digital creative person to not use some sort of version control or source control. In the past disk space was too dear, version control systems were too expensive and software...

View Article



Image may be NSFW.
Clik here to view.

What does a generalized linear model do?

What does a generalized linear model do? R supplies a modeling function called glm() that fits generalized linear models (abbreviated as GLMs). A natural question is what does it do and what problem is...

View Article

Image may be NSFW.
Clik here to view.

How robust is logistic regression?

Logistic Regression is a popular and effective technique for modeling categorical outcomes as a function of both continuous and categorical variables. The question is: how robust is it? Or: how robust...

View Article

Image may be NSFW.
Clik here to view.

Level fit summaries can be tricky in R

Model level fit summaries can be tricky in R. A quick read of model fit summary data for factor levels can be misleading. We describe the issue and demonstrate techniques for dealing with them.When...

View Article

Image may be NSFW.
Clik here to view.

Rudie can’t fail (if majorized)

We have been writing for a while about the convergence of Newton steps applied to a logistic regression (See: What does a generalized linear model do?, How robust is logistic regression? and...

View Article


Image may be NSFW.
Clik here to view.

Error Handling in R

It’s often the case that I want to write an R script that loops over multiple datasets, or different subsets of a large dataset, running the same procedure over them: generating plots, or fitting a...

View Article

Image may be NSFW.
Clik here to view.

Win-Vector’s Nina Zumel: “I Write, Therefore I Think”

Check out: I Write, Therefore I Think Related posts: Congratulations to both Dr. Nina Zumel and EMC- great job An Appreciation of Locality Sensitive Hashing

View Article

Image may be NSFW.
Clik here to view.

More on ROC/AUC

A bit more on the ROC/AUC The receiver operating characteristic curve (or ROC) is one of the standard methods to evaluate a scoring system. Nina Zumel has described its application, but we would like...

View Article


Image may be NSFW.
Clik here to view.

Revisiting Cleveland’s The Elements of Graphing Data in ggplot2

I was flipping through my copy of William Cleveland’s The Elements of Graphing Data the other day; it’s a book worth revisiting. I’ve always liked Cleveland’s approach to visualization as statistical...

View Article


Image may be NSFW.
Clik here to view.

Don’t use correlation to track prediction performance

Using correlation to track model performance is “a mistake that nobody would ever make” combined with a vague “what would be wrong if I did do that” feeling. I hope after reading this feel a least a...

View Article

Image may be NSFW.
Clik here to view.

A bit more on sample size

In our article What is a large enough random sample? we pointed out that if you wanted to measure a proportion to an accuracy “a” with chance of being wrong of “d” then a idea was to guarantee you had...

View Article

Image may be NSFW.
Clik here to view.

Worry about correctness and repeatability, not p-values

In data science work you often run into cryptic sentences like the following: Age adjusted death rates per 10,000 person years across incremental thirds of muscular strength were 38.9, 25.9, and 26.6...

View Article

Image may be NSFW.
Clik here to view.

Bayesian and Frequentist Approaches: Ask the Right Question

It occurred to us recently that we don’t have any articles about Bayesian approaches to statistics here. I’m not going to get into the “Bayesian versus Frequentist” war; in my opinion, which style of...

View Article


Image may be NSFW.
Clik here to view.

Estimating rates from a single occurrence of a rare event

Elon Musk’s writing about a Tesla battery fire reminded me of some of the math related to trying to estimate the rate of a rare event from a single occurrence of the event (plus many non-event...

View Article

Image may be NSFW.
Clik here to view.

Resolving git “pseudo conflicts”

I strongly advise using version control, and usually recommend using git as your version control system. Usually I feel a bit guilty about this advice as git is so general that it is more of a toolkit...

View Article


Image may be NSFW.
Clik here to view.

Sample size and power for rare events

We have written a bit on sample size for common events, we have written about rare events, and we have written about frequentist significance testing. We would like to specialize our sample size...

View Article

Image may be NSFW.
Clik here to view.

Unit tests as penance

It recently hit me that I see unit tests as a form of penance (in addition to being a great tool for specification and test driven development). If you fix a bug and don’t add a unit test I suspect you...

View Article


Image may be NSFW.
Clik here to view.

Generalized linear models for predicting rates

I often need to build a predictive model that estimates rates. The example of our age is: ad click through rates (how often a viewer clicks on an ad estimated as a function of the features of the ad...

View Article

Image may be NSFW.
Clik here to view.

Unspeakable bets: take small steps

I was watching my cousins play Unspeakable Words over Christmas break and got interested in the end game. The game starts out as a spell a word from cards and then bet some points game, but in the end...

View Article

Image may be NSFW.
Clik here to view.

The Extra Step: Graphs for Communication versus Exploration

Visualization is a useful tool for data exploration and statistical analysis, and it’s an important method for communicating your discoveries to others. While those two uses of visualization are...

View Article
Browsing all 302 articles
Browse latest View live




Latest Images