Cross-Methods are a Leak/Variance Trade-Off
We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting...
View ArticleKeep Calm and Use vtreat (in R and in Python)
A big thank you to Dmytro Perepolkin for sharing a “Keep Calm and Use vtreat” poster! Also, we have translated the Python vtreat steps from our recent “Cross-Methods are a Leak/Variance Trade-Off”...
View ArticleUse the Same Cross-Plan Between Steps
Students have asked me if it is better to use the same cross-validation plan in each step of an analysis or to use different ones. Our answer is: unless you are coordinating the many plans in some way...
View ArticleFree Coupon for our R Video Course: Introduction to Data Science
For all our remote learners, we are sharing a free coupon code for our R video course Introduction to Data Science. The code is ITDS2020, and can be used at this URL...
View ArticleVersion Control is a Time Machine That Translates Common Hindsight Into...
For data science projects I recommend using source control or version control, and committing changes at a very fine level of granularity. This means checking in possibly broken code, and the possibly...
View ArticleRe-Share: vtreat Data Preparation Documentation and Video
I would like to re-share vtreat (R version, Python version) a data preparation documentation for machine learning tasks. vtreat is a system for preparing messy real world data for predictive modeling...
View Articlewrapr 2.0.0 up on CRAN
wrapr 2.0.0 is now up on CRAN. This means the := variant of unpack[] is now easy to install. Please give it a try!
View ArticleR Tip: How To Look Up Matrix Values Quickly
R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely...
View ArticleY-Conditionally Regularized Neural Nets
Win Vector LLC’s Dr. Nina Zumel has had great success applying y-aware methods to machine learning problems, and working out the detailed cross-validation methods needed to make y-aware procedures...
View ArticleImputing Out of Mixtures, or Un-Stirring Spicy Soup
Here is a fun combinatorial puzzle. I’ve probably seen this used to teach before, but let’s try to define or work this one from memory. I would love to hear more solutions/analyses of this problem....
View Article
More Pages to Explore .....