Reflections of a Data Scientist: (R) Establishing Working Directory & Importing Data

This is the first article, of what will probably be many articles, pertaining to R-Software. I am assuming that you are familiar with R-Software, and that you have the software installed. Additionally, I am also assuming that you have RStudio, the IDE, also installed.

Once you have the R Console open, you will first want to set your working directory.

This can be achieved with the command:

setwd("<pathway of working directory>")

For example, you could create a designated folder on your Window's Desktop for such a directory, and make that folder your working directory. The code for such would resemble:

setwd("C:/Users/Name/Desktop/RWorkDirectory")

It is important to note that you will have to change the default "\" to "/", as R does not utilize the backslash in path directory listings.

The advantage for establishing a working directory, is that it allows for a certain level of convenience in importing, exporting, and saving data.

For example, if you were importing data without establishing a working directory, the code template for such would resemble:

(Assuming that the file is a .csv)

DataFrameA <- read.table("C:/Users/Name/Desktop/RWorkDirectory/Filename.csv", fill = TRUE, header = TRUE, sep = "," )

or

(Assuming that the file is tab delineated)

DataFrameB <- read.table("C:/Users/Name/Desktop/RWorkDirectory/Filename.txt", fill = TRUE, header = TRUE, sep = "\t" )

If you had established the working directory, the code statement would be much shorter:

DataFrameA <- read.table("Filename.csv", fill = TRUE, header=TRUE, sep="," )

or

DataFrameA <- read.table("Filename.txt", fill = TRUE, header=TRUE, sep="\t" )

Import Options

Fill, Header, and Sep are optional statements, but typically their inclusion is necessary. Here is what each option enables:

Fill - This option notifies R that the variable observation data is of unequal length, and that some records will be missing observational data. In the case of missing data, 'N/A' values will be added if this option is enabled.

Header - This indicates to R, that the first row of data contains column names.

Sep - This indicates the type of delineation that separates each data observation. "," indicates a comma separated file, and "\t" indicates a tab delineated file. Additionally, if the data values are separated by some other exotic format, (ex. #, @, or |), you can indicate this as an import option, by listing it after sep =. Ex sep = "|".

Get Working Directory

If you ever forget where your work directory is located, you can always have it printed to the console by utilizing the command:

getwd()

In our example case, running the above command should output:

C:/Users/Name/Desktop/RWorkDirectory

In the next article, I will discuss how to check the integrity of newly imported data.

Reflections of a Data Scientist

Monday, July 3, 2017

(R) Establishing Working Directory & Importing Data

No comments:

Post a Comment