Namaste Canada Musings — Chapter 1

**Introduction**

My wife and I recently moved to Canada from the USA amidst the government imposed lockdown and border closure restrictions due to COVID-19. To avoid the complications associated with air travel, we decided to drive from our nest in Ohio to our new home in Alberta, Canada. We were familiar with the landscape in the USA but were touching the Canadian land for the first time — naturally, we were ambivalent. Today, we have spent six months in Canada after crossing through that Pembina (USA) — Emerson (Canada) border and in this post…

Hypothesis are our assumptions about the data which may or may not be true. In this post we’ll discuss about the statistical process of evaluating the truthiness of a hypothesis — this process is known as hypothesis testing.

Most of the statistical analysis has its genesis in comparing two types of distributions: population distribution and sample distribution. Let’s understand these terms through an example — Suppose we want to statistically test our hypothesis that* on average, the performance of students in a standard aptitude test has improved in the last decade*. We’re given a dataset containing the marks (maximum marks…

In this post we’ll build intuitive understanding of descriptive statistics including mean and standard deviation, and inferential statistics including standard error of the mean, and confidence intervals. We’ll also develop an understanding of the central limit theorem in this process. The R code used for generating examples in this post is available here.

Let’s begin our journey by assuming that we have N = 10,000 students in a country who studied Physics in their l0th grade. We noted the marks they obtained, out of 100, after their final exam and the histogram (with intervals of 10) of these marks is…

How to select the best model from the available options?

Statistical performance measures are often used for model selection in machine learning and statistical inference. From multiple models trained with different sets of hyper-parameters and parameters, the one that gives best performance in terms of a selected performance criterion is finally adopted. Let’s understand commonly used performance metrics for model selection through an example so that we can choose one of these for our model selection.

Let’s assume that we have a pregnancy test that gives us binary results, positive (or 1), if tested individual is pregnant and negative (or…

Applied machine learning researcher | Writer | Fitness Enthusiast | Book lover