Find annotations in a data frame — find

find_annotations() takes a data frame and identifies possible annotations contained within it and returns them as a named list. guess_annotations() is a low-level helper that extracts annotations and returns them as a tibble of cell values, row and column positions.

Usage

find_annotations(
  df,
  type = c("sheet", "cells"),
  title_first = TRUE,
  guess_source = TRUE,
  .row_var = row,
  .col_var = col,
  .value_var = value
)

guess_annotations(
  df,
  type = c("sheet", "cells"),
  .row_var = row,
  .col_var = col,
  .value_var = value
)

Arguments

df: A data frame object
type: Whether the data frame is in "sheet" format or "cells" format
title_first: Whether the first annotation should be treated as the table title
guess_source: Whether to guess a source note from the annoations
.row_var: When using type = "cells" the name of the variable with row positions
.col_var: When using type = "cells" the name of the variable with column positions
.value_var: When using type = "cells" the name of the variable with row positions

Details

Data frames have a declared type, which must be either "sheet" format (the default) or "cells" format. "sheet" format is a standard two-dimensional data frame format, such as those read in by base::read.csv() or readxl::read_excel(). "cells" format is for data frames where each row represents a cell from a spreadsheet and contains a variable for the cell's value, and separate variables providing the row and column variable.

By default find_annotations() will try to help parse the annotations found by guess_annotations(). With title_first = TRUE, the first annotation found in a data frame is assumed to provide a title or label for the table contained in the data frame. With guess_source = TRUE, the annotations will be searched for one starting with either "Source:", "Data source:" or "Source data:".

When using type = "cells" the variables identifying the row, column and cell values are specified by .row_var, .col_var and .value_var respectively.

Examples

example_df <- tibble::tibble(
  col1 = c(
    "Table 1", "An example sheet", "species", "Adelie", "Gentoo", "Chinstrap",
    "This table is based on data in the palmerpenguins R package",
    "Source: {palmerpenguins} R package"
  ),
  col2 = c(NA_character_, NA_character_, "bill_length_mm", "38.791",
           "47.505", "48.834", NA_character_, NA_character_),
  col3 = c(NA_character_, NA_character_, "bill_depth_mm", "18.346",
           "14.982", "18.421", NA_character_, NA_character_)
)

example_df
#> # A tibble: 8 × 3
#>   col1                                                        col2         col3 
#>   <chr>                                                       <chr>        <chr>
#> 1 Table 1                                                     NA           NA   
#> 2 An example sheet                                            NA           NA   
#> 3 species                                                     bill_length… bill…
#> 4 Adelie                                                      38.791       18.3…
#> 5 Gentoo                                                      47.505       14.9…
#> 6 Chinstrap                                                   48.834       18.4…
#> 7 This table is based on data in the palmerpenguins R package NA           NA   
#> 8 Source: {palmerpenguins} R package                          NA           NA   

find_annotations(example_df)
#> ── Notes found in `example_df` ─────────────────────────────────────────────────
#> Title: Table 1
#> Source: Source: {palmerpenguins} R package
#> Notes:
#> • An example sheet
#> • This table is based on data in the palmerpenguins R package

guess_annotations(example_df)
#> # A tibble: 4 × 3
#>     row   col annotation                                                 
#>   <int> <int> <chr>                                                      
#> 1     1     1 Table 1                                                    
#> 2     2     1 An example sheet                                           
#> 3     7     1 This table is based on data in the palmerpenguins R package
#> 4     8     1 Source: {palmerpenguins} R package