Things not to do in R

Mar 1, 2026

I have written a few posts over the years about how to do various things in R. When talking about how to do things, you often provide examples on what to avoid, but just as often the things to avoid end up being more implicit or up to the reader to infer. For example, when you describe how to save data in an object, you do not necessarily say how you should avoid saving data in an object.

In this post I will share a list of things not to do in R. However, as usual, a few caveats. First, a lot of these ideas are related to reproducibility and might not apply if you just want to do something quick and dirty in R. In other words, do whatever you need to do to get things up and running (and do not let the perfect code stand in the way of productivity).

Second, the list is to a large extent opinionated and I encourage people to disagree (it is usually better to disagree for a reason than to agree for no reason). The list is driven by what I have seen and (alas) done over the years, my favourite linters, etc., and it is by no means exhaustive. To get familiar with good coding practises in R, you can also check out a package like {lintr}. However, the most recent version of the package is bad (especially in my VS Code setup), so I also encourage people to check out the new promising tool, jarl (it works nicely in my R setup in Neovim).

Third, I will only be talking about things to do within R, and not make points about, say, "Use X instead of R for Y" where X could be Python and Y could be ML in production. There are a lot of things that also apply to most languages and not only, such as not setting your random seed to 42, but I here focus on things that are primarily relevant for R.

The idea is that - if you already know R - one way to get better is to get rid of bad habits. I could most likely have categorised some of these things into general guidelines such as "Do not make assumptions", but my assumption is that it will be easier to learn a thing or two if I go through specific examples rather than a few abstract ideas.

1. Do not use rm(list = ls())

I often encounter scripts using rm(list = ls()). There is no good reason to do this. As Jenny Bryan wrote in a blog post many years ago: "The problem is that rm(list = ls()) does NOT, in fact, create a fresh R process. All it does is delete user-created objects from the global workspace."

Accordingly, if you rely on rm(list = ls()), there are most likely bigger problems with your workflow and you should look into material on how to set up a reproducible workflow in R. In most cases, and unless you have very good reasons, you will not need to use rm() at all.

2. Do not mix pipes

Since R 4.1.0 you have been able to use the base pipe, |>, in your code. However, it is likely that you - if you have used R for many years - rely on the magrittr pipe, %>%, in some scripts. Accordingly, you might copy some old code using the magrittr pipe into your script where most of the code is using the base pipe. Another possibility is that you will copy code from ChatGPT that rely on the magrittr pipe (due to the fact that it has been trained on a lot of older R code).

This is not good practise. Make sure to rely on one pipe in your script (most likely the base pipe), and only introduce other pipes if you need features beyond what the base pipe can offer. It should be clear to the reader of your code that you only rely on different pipes if absolutely necessary for the code to work. For information on the differences between the pipes, check out this post by Hadley Wickham. If you use the {lintr} package, you can use the following linter to help you out: pipe_consistency_linter(pipe = "|>").

3. Do not use = for assignment

If you are primarily familiar with Python, chances are that you are using = as an assignment operator in R as well. In R, however, we use <- as an assignment operator, not =.

# Bad
x = 2

# Good
x <- 2

The reasons are primarily for consistency and conventions, but the best argument for using <- is the explicit distinction between assignments and arguments.

4. Do not use file.choose()

There are some functions in R I would happily get rid of. file.choose() is one of them. The function gives you the opportunity to interactively choose a file, e.g., a CSV file you want to load with read.csv(). There are obvious issues if you use this function, most importantly that it is not clear for the people reading your script which file you ended up choosing.

It is much better to get used to finding the absolute path of your file, or even better, using the relative path within your workflow. In any case, do not rely on file.choose().

5. Do not use two spaces for indentation

This goes against the tidyverse style guide which says "The contents should be indented by two spaces." However, I fully believe your code is much easier to read when it uses four spaces. I was convinced by this blog post that four spaces is better than two. One reason that is not mentioned in the post is that it will also make your code consistent across Python and R.

# Bad
mtcars |>
  head()

# Good
mtcars |>
    head()

6. Do not load the tidyverse package

If you have to quickly load and visualise some data, there is nothing wrong with loading the {tidyverse} package - instead of, say, loading {dplyr} and {ggplot2} - and get work done. However, if this is for a project you plan to work on for more than a few minutes with additional packages, or if for something that should be shared with others, do not load the {tidyverse} package.

# Bad
library("tidyverse")

# Good
library("dplyr")
library("tidyr")

Only load the packages that are strictly required to make your script run as intended. Once you are done with your analysis, it is also good to confirm that you actually do rely on all the packages you load in the beginning of the script. I often encounter that specific packages in the beginning of the script are not actually being used but is primarily there because they were needed for a previous iteration of the code to work.

7. Do not use attach()

Yet another function I would like to remove from R. Do not get too attached to your objects. This is the easiest way to encounter namespace issues. You will basically be putting the columns of a data frame directly into the search path. You might not be strong at coding (e.g., you have a preference for Stata), but using attach() is not a good way to go about life in R.

8. Do not install packages in your script

This is one I see way too often compared to how problematic it is. There is simply no way to install the same package again and again whenever you run a script. Instead, assume that people have installed the packages in your script.

# Bad
install.packages("dplyr")
library("dplyr")

# Good
library("dplyr")

If the package is not on CRAN, you can add a comment with a URL to the repository on the same line as you load the package (but again, do not make the code run the installation of the package).

9. Do not use View() in your script

I use View() a lot, but you will never see me share a script where I use View(). The idea of View() is to explore data, not to force the person running your script to explore the data at a certain point in your script in a new window/tab.

The best way to avoid putting View() in your script is to never type out View() in your script. Here is the keybinding I rely on in VS Code, but I am sure your editor of choice will have similar shortcuts available:

{
    "command": "r.view",
    "key": "ctrl+v",
    "when": "editorLangId == r || editorLangId == rmd && editorTextFocus"
}

10. Do not use setwd()

Do not include setwd() in your script to set an absolute path that is only working for you. Instead, use relative paths or project-based workflows (e.g., with RStudio projects or the {here} package). In general, I prefer to not specify any path at all in the script and let the workflow be within the project folder.

11. Do not use 'for loops'

Loops do not look good in R and, when possible, make sure to use vectorized functions like apply() or tidyverse alternatives such as purrr::map(). You will often encounter people also saying that 'for loops' are slower, but even if there is no difference in performance, you should still go with vectorized functions. In other words, if you have any loops in your code, try to refactor the code and use apply functions instead.

12. Do not save your workspace with .RData

You can save your whole workspace and load it into R, but this is not the same as you should. Again, we are dealing with an issue of reproducibility. A lot of things can go wrong and it is simply much better to save the files the script that is required to get to a certain workspace than saving the workspace itself.

13. Do not use T and F as shortcuts for TRUE and FALSE

In R you can use T and F as shortcuts for TRUE and FALSE, respectively. Here is an example:

T
#> [1] TRUE

While this can be a useful shortcut, it is much better to simply use TRUE and FALSE. First, you only save a few letters, and it is easier to read and understand the logic of the code, if it says FALSE instead of only F. On my keyboard F is dangeoursly close to T, and to avoid making any typos with drastic implications, write it all out.

Second, TRUE will always be TRUE and can never be assigned another value. The same is not the case with T or F. So you could in theory end up in a situation where T is the opposite of what you expect:

T <- FALSE
T
#> [1] FALSE

But if you try to assign FALSE to TRUE, or vice versa, you will get an error (as you should).

14. Do not use right assignment

It is possible to use -> to assign data to an object in R, but I cannot think of any good reason why you would ever do that. The two examples below will do the same but stick to latter ojne.

# Bad
2 -> x

# Good
x <- 2

15. Do not rely on position for arguments in functions

Functions often come with multiple parameters and you can use the position for arguments instead of specifying the relevant parameter. This is not a good idea for two reasons. First, it makes it more difficult to read the code and see whether it is set to work as intended. Second, if the function will change in the future, and in particular the order of parameters in the function, you can be sure that the function will no longer work (or even worse, still work but give an incorrect output).

# Bad
mean(c(1:10, NA), 0, TRUE)

# Good
mean(c(1:10, NA), na.rm = TRUE)

16. Do not use merge() without by

When you merge data frames in R, R will automatically join the data frames based upon the columns the two data frames have in common. Accordingly, the below example works just fine as R is able to identify column A in both the first and the second data frame.

merge(data.frame(A = 1:3, B = LETTERS[1:3]),
      data.frame(A = 1:3, C = LETTERS[4:6]))
#>   A B C
#> 1 1 A D
#> 2 2 B E
#> 3 3 C F

This is not a good approach. The first problem is that it is not clear for the person reading your code what you are trying to merge on (making it difficult to interpret whether the code is working as intended). The second problem is that there is no guarantee that the code will work in the future if you are to update the input data in any shape and form.

For that reason, it is much better to be explicit about what column you are joining on. In addition, you should also be explicit about whether you want all rows from the first or the second column (or both). Here is the better code:

merge(data.frame(A = 1:3, B = LETTERS[1:3]),
      data.frame(A = 1:3, C = LETTERS[4:6]),
      by = "A",
      all = TRUE)
#>   A B C
#> 1 1 A D
#> 2 2 B E
#> 3 3 C F

Finally, if you use merge() within {data.table}, make sure to use merge.data.table() to avoid any confusion for the reader of your code about what exact function you are using (yet another reason I am starting to appreciate Python's approach to classes over R's generic function).

17. Do not name an object x or df

When I do stuff in R I tend to save data in objects with names such as x, df, test, y, etc. This can be fine if I am not sure about the direction of where I am going or what I am doing, but it is a sign of exactly that: no direction or meaning. Instead, make sure that your object names give some idea about what is being stored in the object.

18. Do not use dplyr::group_by()

In most cases when you group your data, it is with the goal of running an operation of this grouping. For this purpose, it is generally much better to do per-operation grouping with .by in your function instead of using group_by() (which has been possible since {dplyr} 1.1.0). The good thing about this is that you get cleaner code (as you do not need to ungroup() your data), and you can can easily see what operation you want to aggregate by (e.g., in summarise()):

# Bad
mtcars |>
    dplyr::group_by(am) |>
    dplyr::summarise(wt = mean(wt))

# Good
mtcars |>
    dplyr::summarise(wt = mean(wt), .by = am)

19. Do not put library() in the middle of your script

Begin all your scripts with loading all the packages you need to execute the script. I sometimes see scripts loading a package in the middle of a script, e.g., when somebody found out in the middle of the work that a function from another package was now needed in order to do what was needed. So make sure to put all packages you load in a script at the top of the script. If you are concerned that the time (or order) at which you load a package will matter for the reproducibility of your script, make sure to also use {conflicted}.

20. Do not use := in data.table

I know that most LLMs still prefer to write code with := when using {data.table}, but let() is much easier to read and use, especially if you want to work on multiple columns within the same function call. Accordingly, use let() instead of :=. Here is an example:

library("data.table")

mtcarsDT <- as.data.table(mtcars)

# Bad
mtcarsDT[, `:=`(wt_squared = wt^2,
                vs_squared = vs^2)]

# Good
mtcarsDT[, let(wt_squared = wt^2,
               vs_squared = vs^2)]

Again, a lot of the advice presented here is opinionated so do feel free to disagree, but do at least think about why you disagree and if you believe I might have missed something important, please do reach out.

Erik Gahner Larsen
RSS
https://erikgahner.github.io/posts/feed.xml