《R语言实战》学习记录：R语言介绍及创建数据集

程序员文章站 2022-03-05 12:17:23

...

时间：2018-08-12
教程：《R语言实战》
学习内容：第一章、第二章

R语言实战

第一章：R语言介绍

> getwd()   # 显示当前工作目录
[1] "D:/Documents"
> setwd("E:\\Learning_R\\practice")
Error in setwd("E:\\Learning_R\\practice") : 
  cannot change working directory
> dir.create("E:\\Learning_R\\practice")   # 创建新的目录
> setwd("E:\\Learning_R\\practice")   # 更改当前工作目录
> getwd()
[1] "E:/Learning_R/practice"
> age <- c(1,3,5,2,11,9,3,9,12,3)
> weight <- c(4.4, 5.3, 7.2, 5.2, 8.5, 7.3, 6.0, 10.4, 10.2, 6.1)
> ls()   显示工作空间中的对象
[1] "age"    "weight"
> rm(age)   # 删除对象age

> history()

《R语言实战》学习记录：R语言介绍及创建数据集

> savehistory("s")   # 将命令历史保存到文件s.Rhistory

清除命令历史:
《R语言实战》学习记录：R语言介绍及创建数据集

> loadhistory("s")   # 载入命令历史文件s.Rhistory

《R语言实战》学习记录：R语言介绍及创建数据集

> rm(list = ls())   # 清除工作空间中所有对象

《R语言实战》学习记录：R语言介绍及创建数据集

> load("ss")

《R语言实战》学习记录：R语言介绍及创建数据集

> age <- c(1,3,5,2,11,9,3,9,12,3)
> save(age, file = "sss")   # 将对象age保存到文件sss.RData中
> rm(list = ls())

《R语言实战》学习记录：R语言介绍及创建数据集

> load("sss")

《R语言实战》学习记录：R语言介绍及创建数据集

写一个脚本文件，名为1.R，并将其保存在工作目录中：
《R语言实战》学习记录：R语言介绍及创建数据集

> source("1.R")   # 执行1.R中的命名

《R语言实战》学习记录：R语言介绍及创建数据集
R自带了一系列默认包：base、datasets、utils、grDevices、graphics、stats、methods。

> library()   # 显示库中的所有包
> search()   # 显示已加载可以直接使用的包
 [1] ".GlobalEnv"        "tools:rstudio"    
 [3] "package:stats"     "package:graphics" 
 [5] "package:grDevices" "package:utils"    
 [7] "package:datasets"  "package:methods"  
 [9] "Autoloads"         "package:base"

对于某函数或某包不了解时，可以使用help函数查询：

> help("RODBC")   # 错误示范
No documentation for ‘RODBC’ in specified packages and libraries:
you could try ‘??RODBC’
> help(package = "RODBC")   # 查询包的内容

查询包中的函数的帮助需要事先加载该包：

> help("read_xls")
No documentation for ‘read_xls’ in specified packages and libraries:
you could try ‘??read_xls’
> library("readxl")
> help("read_xls")

下面，使用《R语言实战》中的一个例子对上述方法进行练习。
《R语言实战》学习记录：R语言介绍及创建数据集
代码：

help.start()
install.packages("vcd")
help(package = "vcd")
library(vcd)
help("Arthritis")
Arthritis
example(Arthritis)
q()

第二章：创建数据集

1.向量

创建向量使用matrix函数，格式如下：
matrix(vector, nrow = number_of_rows, ncol = number_of_columns, byrow = FALSE, dimnames = list(char_vector_rownames, char_vector_colnames))

2.数组

创建数组使用array函数，格式如下：
array(vector, dimensions, dimnames)

eg:

array(1:24, c(2,3,4), dimnames = list(dim1,dim2,dim3))

3.数据框

创建数据框使用data.frame函数，格式如下：
data.frame(col1, col2, col3, …)

eg1:

> num <- 12:14
> names <- c("A", "B", "C")
> age <- c(17, 19, 21)
> major <- c("Math", "English", "Chinese")
> mydata <- data.frame(names, age, major, row.names = num)
> mydata
   names age   major
12     A  17    Math
13     B  19 English
14     C  21 Chinese

eg2:

> mydata2 <- data.frame(names = c("A", "B", "C"), 
+                       age = c(17, 19, 21), 
+                       major = c("Math", "English", "Chinese"), 
+                       row.names = 12:14)
> mydata2
   names age   major
12     A  17    Math
13     B  19 English
14     C  21 Chinese

提取数据框中内容：

> mydata[c("age","names")]
  age names
1  17     A
2  19     B
3  21     C
> mydata[1:2]
  names age
1     A  17
2     B  19
3     C  21
> mydata$age
[1] 17 19 21
> attach(mydata)   # 将数据框mydata添加到搜索路径中
The following objects are masked _by_ .GlobalEnv:

    age, major, names

> age
[1] 17 19 21
> names
[1] "A" "B" "C"
> detach(mydata)   # 将数据框mydata从搜索路径中移除
> rm(list = c("mydata2","age", "names", "major"))
> age <- 12
> attach(mydata)
The following objects are masked _by_ .GlobalEnv:

    age

> age
[1] 12
> names
[1] "A" "B" "C"
> detach(mydata)

若在使用attach函数将数据框加入搜索路径前，环境中已经存在一个对象与数据框中元素名相同时，前者具有优先权。出现这种情况时R会发出警告。

with函数：

> with(mydata, {
+   age
+   summary(names)
+   a <- summary(age)
+   a
+   b <<- summary(names)
+ })
> a
Error: object 'a' not found
> b
A B C 
1 1 1

4.因子：

使用fator函数创建因子。

eg：

# study
> data <- c("Poor", "Improved", "Excellent", "Poor")
> status1 <- factor(data)
> status1
[1] Poor      Improved  Excellent Poor     
Levels: Excellent Improved Poor
> status2 <- factor(data, order = TRUE)
> status2
[1] Poor      Improved  Excellent Poor     
Levels: Excellent < Improved < Poor
> status3 <- factor(data, order = TRUE, 
+                   levels = c("Poor", "Improved", "Excellent"))
> status3
[1] Poor      Improved  Excellent Poor     
Levels: Poor < Improved < Excellent
> status4 <- factor(c(3,2,1,3), order = TRUE, 
+                   levels = c(3,2,1),
+                   labels = c("Poor", "Improved", "Excellent"))
> status4
[1] Poor      Improved  Excellent Poor     
Levels: Poor < Improved < Excellent

# practice
> num <- 12:14
> names <- c("A", "B", "C")
> age <- c(17, 19, 21)
> major <- factor(c("Math", "English", "Math"))
> grade <- factor(c("First", "Second", "Third"), order = TRUE, 
+                 level = c("First", "Second", "Third"))
> mydata <- data.frame(names, age, major, grade, row.names = num)
> str(mydata)
'data.frame':   3 obs. of  4 variables:
 $ names: Factor w/ 3 levels "A","B","C": 1 2 3
 $ age  : num  17 19 21
 $ major: Factor w/ 2 levels "English","Math": 2 1 2
 $ grade: Ord.factor w/ 3 levels "First"<"Second"<..: 1 2 3
> summary(mydata)
 names      age         major      grade  
 A:1   Min.   :17   English:1   First :1  
 B:1   1st Qu.:18   Math   :2   Second:1  
 C:1   Median :19               Third :1  
       Mean   :19                         
       3rd Qu.:20                         
       Max.   :21

5.列表：

使用list函数创建列表。
eg：

> a <- "My"
> b <- matrix(c(1,5,2,7), nrow = 2, dimnames = list(c("A","B")))
> c <- matrix(1:4, nrow = 2, dimnames = list(c("A","B"), c("C","D")))
> d <- factor(c("one", "two", "two"), order = TRUE, 
+             levels = c("one", "two", "three"))
> e <- data.frame(w = c("I", "am", "XXX"), m = c(1,2,3))
> mylist <- list(title = a, numb = b, c, d, e)
> mylist
$`title`
[1] "My"

$numb
  [,1] [,2]
A    1    2
B    5    7

[[3]]
  C D
A 1 3
B 2 4

[[4]]
[1] one two two
Levels: one < two < three

[[5]]
    w m
1   I 1
2  am 2
3 XXX 3

提取列表中数据：

> mylist[4]
[[1]]
[1] one two two
Levels: one < two < three

> mylist[[4]]
[1] one two two
Levels: one < two < three
> mylist["numb"]
$`numb`
  [,1] [,2]
A    1    2
B    5    7

> mylist[["numb"]]
  [,1] [,2]
A    1    2
B    5    7
> class(mylist[4])
[1] "list"
> class(mylist[[4]])
[1] "ordered" "factor" 
> mylist[4][2]
$<NA>
NULL

> mylist[[4]][c(1,3)]
[1] one two
Levels: one < two < three

列表名[n]：将列表的第n个元素作为列表的一部分截取出来，返回对象类型为列表。
列表名[[n]]：返回列表的第n个元素的内容，返回对象类型为第n个元素本身的类型。
代码”mylist[4][2]”中”mylist[4]”为列表mylist的一部分，这一部分是一个整体，所以该部分的第二个元素为缺失值。
代码”mylist[[4]][c(1,3)]”中”mylist[[4]]”为元素的内容，是一个含有三个元素的因子，因此，该因子的第1和第3个元素不是缺失值。

6.输入数据

键盘输入：
eg：

> mydata <- data.frame(age = numeric(0), gender = character(0))
> mydata <- edit(mydata)

弹出窗口：
《R语言实战》学习记录：R语言介绍及创建数据集

> mydata <- data.frame(age = numeric(0), gender = character(0))
> fix(mydata1)

numeric(0)表示生成一个类型为数值型的空变量。
使用edit函数必须将其赋值回本身，因为该函数生成原对象的副本，并在副本上进行数据输入。如果不对其结果进行赋值，则关闭数据编辑器后数据将会丢失。
使用fix函数可以直接编辑对象内容。

键盘输入的方法适用于小数据集，如果需要将别的文件导入R需要用相应的包和函数进行。（具体内容见《R语言实战》P32-P37）

《R语言实战》学习记录：R语言介绍及创建数据集

R语言实战

第一章：R语言介绍

第二章：创建数据集

1.向量

2.数组

3.数据框

4.因子：

5.列表：

6.输入数据

【R语言】依知乎问题标签数据集绘图（3月24日学习笔记）

《R语言实战》学习笔记：第四章基本数据管理

R语言实战学习笔记-第五章高级数据管理

R语言笔记——创建数据集（一）

《R语言实战》学习记录：基本数据管理

《R语言实战》学习记录：图形初阶