# Normalization and Standardization

The nice thing about being delayed for 3 hours at the airport is that it gives you time to catch up on reading and to write blog posts. And the airline feeds you snacks. And you can share the snacks with the little bird who is flying around inside the terminal. My new sparrow friend, I call him Peanut, liked the crust of my PBJ and also seems to enjoy Cheetos (the crunchy ones, not the puffs).

So, before I give tiny Peanut (or not so tiny Me) diabetes, I think this is a good time to put away the little bag of cookies and discuss two things we can do to our variables prior to analysis, normalization and standardization.

Why would we do this to our data? Because sometimes we want to compare data measured in different units, to more accurately or easily see the differences. We normalize or standardize to take away the units of measure and just look at the magnitude of the measurements on one uniform scale. Normalization and standardization allow us to take apples and oranges and compare them like they are both the same fruit.

Mathematically, normalization and standardization are needed when measurements are being compared via Euclidean distance. I’ll let you research the math stuff on your own.

An example for using normalization or standardization would be comparing test scores on two different tests, say, an English test that has a range of scores from 50 to 250 and a math test that has a range of scores from 200 to 400. If we were to leave the scores as is and compare students’ scores on each test to see which test they performed better on, then of course almost everyone would do better at math.

And we know not everyone does better at math! So, we need to scale the scores in a way that will allow us to compare the scores on an even playing field.

Normalization

You will see the word normalization used for many different approaches to data transformation. In this post, I am describing the normalization technique of “feature scaling” which is used to make all of the raw data values fit into a range between 0 and 1.

And since statisticians have at least three names for everything, it is also called “min-max” normalization and “unity” normalization.

To normalize a variable to a range between 0 and 1 you need the lowest value and the highest value of the measurements on the variable and then use a simple formula to use on each measurement:

In words:

Normalized Measurement = (original measurement – minimum measurement value) divided by (maximum measurement value – minimum measurement value)

There numerous techniques for normalizing variables. A few are normalizing within a range of (-1 to 1), and mean normalization. And you might want to check out the coefficient of variation too.

Standardization

Standardization is a type of feature scaling and you may even hear it referred to as normalization. The formula below will transform your data so that the mean will be 0 and the standard deviation will be 1 (called a unit standard deviation).

In words:

Standardized Measurement = (original measurement – mean of the variable) divided by (standard deviation of the variable)

When to Normalize? When to Standardize?

In many cases you can use normalization or standardization to scale your variables. However, if you have many outliers, normalization will not show outliers as well, because all data is scaled between two numbers (0,1).

On the other hand, standardization does not have any constraint on the resulting range of numbers. So, although in a normal distribution we would like the range of numbers to be between -3 and 3, they don’t have to be…so you will see outliers (such as values of say, -5 or 3.4, etc.) more easily.

Also, standardization makes it easy to see if a particular measurement is above or below the mean because negative numbers will be below the mean of 0 and positive numbers will be above the mean of 0.

And retaining the spread using standardization allows one to assign probabilities to measurements, and percentiles, while taking into account the spread of the data

References:

https://www.statisticshowto.datasciencecentral.com/normalized/

Standardization vs. normalization

# Propensity Scores vs. Regression Adjustment for Observational Studies

Randomized controlled trials (RCTs) are considered the gold standard approach for estimating treatment effects. However, not all clinical research involves randomization of subjects into treatment and control groups. These studies are commonly referred to as non-experimental or observational studies. Some examples of non-experimental (observational) studies include:

• Comparing a treatment group with a group of historical controls
• Subjects pick the treatment they desire, hence, they self-select into a particular group
• Subjects are compared on a variable that cannot be randomized, such as gender, race/ethnicity, drug use (yes/no)
• Subjects are retrospectively pulled from a large dataset for review.
• In observational studies, the treatment selection is influenced by the characteristics of subjects. Therefore, any differences between groups are not randomized out, and the baseline characteristics of the subjects could differ between treatment and control groups. In essence, the baseline characteristics systematically differ and we must find a way to account for these systematic baseline differences.

Researchers often use regression adjustments to account for baseline differences between groups. A regression adjustment is made by using one or more (usually more) variables obtained at baseline as predictors, and one dependent variable of the treatment outcome. Using a regression adjustment to investigate baseline differences in observational studies has the following issues:

• It is difficult to determine whether the model specification in a regression is the correct one to use. A researcher cannot reliably measure whether the variables he or she chooses are indeed the correct ones to use to control for the systematic baseline differences between groups. Model diagnostics such as the model R-squared of a multiple linear regression gives an indication of how well the predictors “predict” the outcome, but knowing how well a model fits as it relates to an outcome doesn’t tell us whether the model chosen actually included the predictors related to systematic baseline differences.
• Using a regression model with the treatment as an outcome introduces researcher bias. This is because it can be very tempting for researchers to try different model specifications to get the model they desire. For instance, a researcher might, in good faith, start with a model that includes baseline variables that he or she believes are different between groups. Then, when the findings of the regression model indicate no significant effect on the outcome, or the model R-squared is too low, the researcher will change or add predictors to enhance the model. Not a good idea. And as noted above, it is very tempting to do.
• Propensity scores, and matching subjects from each of the study groups using propensity scores, are constructed without taking the treatment outcome into consideration. The use of propensity scores keeps the researcher’s attention on baseline characteristics only. However, once the subjects are scored and matched (defined as balanced), a regression model can be analyzed to further adjust for any residual imbalance between the groups. So regression models still have their use! But they are used after the propensity scoring and matching.

Propensity scores have the added benefit of allowing a researcher to see the actual amount of overlap, or lack of it, between treatment groups. After propensity scores are assigned to each individual in each group, then the researcher uses the scores to “match” pairs of subjects in the treatment with subjects in the control group. The easiest way to match is with a one-to-one match: one treatment subject to one control subject. But some, more-advanced matching techniques can match one-to-many.

After matching, the researcher can see if there are many unmatched individuals left over (indicative of large differences in baseline characteristics between the groups). A large difference between groups might not just indicate that the treatment and control groups differed at baseline, but that the differences between groups might be too large to assess any meaning on the outcome of the intervention. After all, the reason for propensity score matching is to derive groups that simulate equal baseline characteristics. If they can’t be matched, they were just not similar. Hence, treatment efficacy cannot be derived or established.

Here is a very simple example of the use of propensity scores and matching for a non-experimental study:

A study is performed to assess the treatment effects of two analgesic drugs given to patients presenting to an emergency room with severe cluster headaches. The type of drug given, A or B, is decided upon various factors such as the time span of the current headache, frequency of headaches, age of the patient, and various comorbidities. Thus, the patients were not randomized into the two drug treaments.

This non-randomization is also called “selection bias”. The clinician selected the treatment to give to each patient. If more than one clinician was involved in the decision making of treatment, we should control for this also! Perhaps some clinicians like one treatment over the other.

The outcome is time to pain management. And looking at the data without any adjustment, Drug A appears to relive pain significantly faster than Drug B. But, maybe this is not the case. Something else could be at work here, or maybe there isn’t a difference between the drugs at all. Or maybe the patients in Group A (patients who took drug A) are much too different from the patients in Group B (patients who took Drug B) to make any assessment of efficacy.

So, we will develop a propensity score for each patient based on the covariates we believe (or better, know from our knowledge and the literature). The propensity score, let’s call it Z, predicts how likely a person is to get Drug A.

• We assume that the likelihood of a person receiving Drug A is very similar for all people with the same propensity score Z.
• We then group people with similar propensity scores between the two groups of patients, such that patients with, say, Z = .30 in Group A are matched with patients with Z = .30 in Group B.

• Then we can run tests on matched groups of patients to test treatment efficacy. With propensity score matching, we’ve removed some of the effects of baseline differences, and now we have something close to tiny RCT’s.

Nothing is as good as the RCT, but I hope I’ve opened up your thinking a bit to the use of propensity scores in observational studies. There are many ways to score and match and analyze observational study groups. A good reference to start with is the article by Austin (2011) listed below. Rosenbaum and Rubin (1983) wrote the seminal work on propensity scores, but even I think it is a bit too theory heavy. But it is also listed in the references below if you are so inclined.

Not everyone likes propensity scores for matching cases. The article by King and Nielsen (2016, also referenced below) presents some limitations in propensity score matching and some remedies for when many individual cases remain after the matching attempt.
Stata has a function for tseffects for obtaining propensity scores, and the function of psmatch for propensity score matching. You can also run post-estimation regression with the functions.

For R fans, here is a nice tutorial on propensity score matching. Not as nice as the Stata code, but hey, it’s free! http://pareonline.net/pdf/v19n18.pdf

References

Austin, Peter. (2011). An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate behavioral research. 46. 399-424. 10.1080/00273171.2011.568786.

R. Rosenbaum, Paul & B Rubin, David. (1983). The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika. 70. 41-55. 10.2307/2335942.

Not everyone likes propensity scores for matching:
Gary King and Richard Nielsen. Working Paper. “Why Propensity Scores Should Not Be Used for Matching”. Copy at http://j.mp/2ovYGsW