## STATISTICS Course

# Converting a variable in R

Sometimes, your variables are qualitative (because you know it), but R didn't detect it as qualitative, so you would have to convert it.

## Changing the type

You should remember that R is proposing functions like `as.type(x)`

to convert a variable $x$ to the type $type$ (as.integer(), as.Date() etc.).

## Weighted data

If you ever have two vectors, one for the values $v$ and another one for the probabilities $p$ then you may use

```
library('questionr')
wtd.mean(v, p)
wtd.var(v, p)
# base package
weighted.mean(v,p)
```

## Qualitative to Quantitative

Let's say "qual" is a qualitative variable of $v$, then $as.integer(v$qual)$ created a quantitative variable (but $qual is unchanged, we are usually keeping the old variable and create a new one).

```
data(Puromycin)
v <- Puromycin
str(v)
# 'data.frame': 23 obs. of 3 variables:
# ...
# $ state: Factor w/ 2 levels "treated","untreated": 1 1 1 1 1 1 1 1 1 1 ...
v$state_quant <- as.integer(v$state)
```

## Quantitative to Qualitative (simple case)

Simply use the factor function `factor(ech$quant)`

, with `levels(ech$qual)`

/`table(ech$qual)`

to see the different kinds of values of a qualitative variable.

```
# converting back v$state_quant
v$state_qual <- factor(v$state_quant, levels = c(1,2), labels = c("treated", "untreated"))
```

## Quantitative to Qualitative (Unsupervised discretization)

If you have too many levels (different values), then you may want something else than the simple factor. For instance, let's say you got a variable "Age" and you want to make a qualitative variable to make some groups by age.

```
library('arules')
# create groups of n values
discretize(qual, method = "frequency", breaks = n)
# split in n interval having the same size
discretize(qual, method = "interval", breaks = n)
# put the values in a group with the ones near them
discretize(qual, method = "cluster", breaks = n)
```

## Quantitative to Qualitative (Supervised discretization)

This is another alternative in which we are grouping the values according to a qualitative criterion.

- chiM

```
library('discretization')
# if the values are near (epsilon=0.05), then they are in the
# same group
chiM(ech$quant, alpha = 0.05)
```

- cut

```
# cut, v is a vector like
# (1,3,5) will split the distribution
# in [min,1] U ]1,3] U ]3,5]
cut(x, breaks = v, include.lowest = TRUE)
```

- make.groups

```
library('lattice')
# check the examples/documentation
g <- make.groups(name=v, ...)
# levels
g$which
# values
g$data
# the result is something like this
g
# data which
# uniform1 0.2988667 uniform
# uniform2 0.5579879 uniform
# exponential1 2.1288421 exponential
# exponential2 0.7936762 exponential
# lognormal1 0.6568099 lognormal
# lognormal2 1.8459960 lognormal
```