Quantcast
Channel: Tutorials – Win-Vector Blog
Browsing latest articles
Browse All 302 View Live

Cross-Methods are a Leak/Variance Trade-Off

We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting...

View Article



Image may be NSFW.
Clik here to view.

Keep Calm and Use vtreat (in R and in Python)

A big thank you to Dmytro Perepolkin for sharing a “Keep Calm and Use vtreat” poster! Also, we have translated the Python vtreat steps from our recent “Cross-Methods are a Leak/Variance Trade-Off”...

View Article

Use the Same Cross-Plan Between Steps

Students have asked me if it is better to use the same cross-validation plan in each step of an analysis or to use different ones. Our answer is: unless you are coordinating the many plans in some way...

View Article

Free Coupon for our R Video Course: Introduction to Data Science

For all our remote learners, we are sharing a free coupon code for our R video course Introduction to Data Science. The code is ITDS2020, and can be used at this URL...

View Article

Image may be NSFW.
Clik here to view.

Version Control is a Time Machine That Translates Common Hindsight Into...

For data science projects I recommend using source control or version control, and committing changes at a very fine level of granularity. This means checking in possibly broken code, and the possibly...

View Article


Re-Share: vtreat Data Preparation Documentation and Video

I would like to re-share vtreat (R version, Python version) a data preparation documentation for machine learning tasks. vtreat is a system for preparing messy real world data for predictive modeling...

View Article

wrapr 2.0.0 up on CRAN

wrapr 2.0.0 is now up on CRAN. This means the := variant of unpack[] is now easy to install. Please give it a try!

View Article

R Tip: How To Look Up Matrix Values Quickly

R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely...

View Article


Image may be NSFW.
Clik here to view.

Y-Conditionally Regularized Neural Nets

Win Vector LLC’s Dr. Nina Zumel has had great success applying y-aware methods to machine learning problems, and working out the detailed cross-validation methods needed to make y-aware procedures...

View Article


Image may be NSFW.
Clik here to view.

Imputing Out of Mixtures, or Un-Stirring Spicy Soup

Here is a fun combinatorial puzzle. I’ve probably seen this used to teach before, but let’s try to define or work this one from memory. I would love to hear more solutions/analyses of this problem....

View Article
Browsing latest articles
Browse All 302 View Live




Latest Images