欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

R语言关于数据帧的知识点详解

程序员文章站 2023-01-24 20:27:14
数据帧是表或二维阵列状结构,其中每一列包含一个变量的值,并且每一行包含来自每一列的一组值。以下是数据帧的特性。 列名称应为非空。 行名称应该是唯一的。 存储在数据帧中的数据可以是数字...

数据帧是表或二维阵列状结构,其中每一列包含一个变量的值,并且每一行包含来自每一列的一组值。
以下是数据帧的特性。

  • 列名称应为非空。
  • 行名称应该是唯一的。
  • 存储在数据帧中的数据可以是数字,因子或字符类型。
  • 每个列应包含相同数量的数据项。

创建数据帧

# create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("rick","dan","michelle","ryan","gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsasfactors = false
)
# print the data frame.			
print(emp.data) 

当我们执行上面的代码,它产生以下结果 -

 emp_id    emp_name     salary     start_date
1     1     rick        623.30     2012-01-01
2     2     dan         515.20     2013-09-23
3     3     michelle    611.00     2014-11-15
4     4     ryan        729.00     2014-05-11
5     5     gary        843.25     2015-03-27

获取数据帧的结构

通过使用str()函数可以看到数据帧的结构。

# create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("rick","dan","michelle","ryan","gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsasfactors = false
)
# get the structure of the data frame.
str(emp.data)

当我们执行上面的代码,它产生以下结果 -

'data.frame':   5 obs. of  4 variables:
 $ emp_id    : int  1 2 3 4 5
 $ emp_name  : chr  "rick" "dan" "michelle" "ryan" ...
 $ salary    : num  623 515 611 729 843
 $ start_date: date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...

数据框中的数据摘要

可以通过应用summary()函数获取数据的统计摘要和性质。

# create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("rick","dan","michelle","ryan","gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsasfactors = false
)
# print the summary.
print(summary(emp.data))  

当我们执行上面的代码,它产生以下结果 -

     emp_id    emp_name             salary        start_date        
 min.   :1   length:5           min.   :515.2   min.   :2012-01-01  
 1st qu.:2   class :character   1st qu.:611.0   1st qu.:2013-09-23  
 median :3   mode  :character   median :623.3   median :2014-05-11  
 mean   :3                      mean   :664.4   mean   :2014-01-14  
 3rd qu.:4                      3rd qu.:729.0   3rd qu.:2014-11-15  
 max.   :5                      max.   :843.2   max.   :2015-03-27 

从数据帧提取数据

使用列名称从数据框中提取特定列。

# create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5),
   emp_name = c("rick","dan","michelle","ryan","gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25),
   
   start_date = as.date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11",
      "2015-03-27")),
   stringsasfactors = false
)
# extract specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

当我们执行上面的代码,它产生以下结果 -

  emp.data.emp_name emp.data.salary
1              rick          623.30
2               dan          515.20
3          michelle          611.00
4              ryan          729.00
5              gary          843.25

先提取前两行,然后提取所有列

# create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5),
   emp_name = c("rick","dan","michelle","ryan","gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25),
   
   start_date = as.date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsasfactors = false
)
# extract first two rows.
result <- emp.data[1:2,]
print(result)

当我们执行上面的代码,它产生以下结果 -

  emp_id    emp_name   salary    start_date
1      1     rick      623.3     2012-01-01
2      2     dan       515.2     2013-09-23

用第2和第4列提取第3和第5行

# create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("rick","dan","michelle","ryan","gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
	start_date = as.date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsasfactors = false
)

# extract 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)]
print(result)

当我们执行上面的代码,它产生以下结果 -

  emp_name start_date
3 michelle 2014-11-15
5     gary 2015-03-27

扩展数据帧

可以通过添加列和行来扩展数据帧。

添加列

只需使用新的列名称添加列向量。

# create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("rick","dan","michelle","ryan","gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsasfactors = false
)

# add the "dept" coulmn.
emp.data$dept <- c("it","operations","it","hr","finance")
v <- emp.data
print(v)

当我们执行上面的代码,它产生以下结果 -

  emp_id   emp_name    salary    start_date       dept
1     1    rick        623.30    2012-01-01       it
2     2    dan         515.20    2013-09-23       operations
3     3    michelle    611.00    2014-11-15       it
4     4    ryan        729.00    2014-05-11       hr
5     5    gary        843.25    2015-03-27       finance

添加行

要将更多行永久添加到现有数据帧,我们需要引入与现有数据帧相同结构的新行,并使用rbind()函数。
在下面的示例中,我们创建一个包含新行的数据帧,并将其与现有数据帧合并以创建最终数据帧。

# create the first data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("rick","dan","michelle","ryan","gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   dept = c("it","operations","it","hr","finance"),
   stringsasfactors = false
)

# create the second data frame
emp.newdata <- 	data.frame(
   emp_id = c (6:8), 
   emp_name = c("rasmi","pranab","tusar"),
   salary = c(578.0,722.5,632.8), 
   start_date = as.date(c("2013-05-21","2013-07-30","2014-06-17")),
   dept = c("it","operations","fianance"),
   stringsasfactors = false
)

# bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

当我们执行上面的代码,它产生以下结果 -

  emp_id     emp_name    salary     start_date       dept
1      1     rick        623.30     2012-01-01       it
2      2     dan         515.20     2013-09-23       operations
3      3     michelle    611.00     2014-11-15       it
4      4     ryan        729.00     2014-05-11       hr
5      5     gary        843.25     2015-03-27       finance
6      6     rasmi       578.00     2013-05-21       it
7      7     pranab      722.50     2013-07-30       operations
8      8     tusar       632.80     2014-06-17       fianance

到此这篇关于r语言关于数据帧的知识点详解的文章就介绍到这了,更多相关r语言数据帧内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持!

相关标签: R语言 数据帧