Read cells from an ODS sheet — read_ods

Import cells from an OpenDocument Spreadsheet (ODS) file. In the resulting dataset each cell is its own row with columns providing information about the cell's position, value types and containing formulas.

Usage

read_ods_cells(path, sheet = 1, quick = FALSE, quiet = FALSE)

Arguments

path: The ODS file
sheet: The sheet name or index number, set to NA for all sheets
quick: Whether to use the quick reading process
quiet: Whether to silence console messages (recommended for bulk processing)

Value

A tibble (data.frame) of cells from the ODS sheet(s).

Details

The aim of read_ods_cells() is to extract the constituent value(s) and other information about cells stored in an ODS file and to present that in a "tidy" format that allows further programmatic manipulation. It is modelled after the functionality of tidyxl::tidy_xlsx() which performs a similar role for Microsoft Excel spreadsheets.

There are between two to four presentations of a cell's value in the resulting tibble:

all cells have a base_value, a character vector providing the "most raw" version of a cell's value;
all cells also have a cell_content, a character vector providing the version of a cell's value as seen by the user of a spreadsheet application (i.e. having applied number formatting rules);
for float, currency and percentage value types, cells will have a numeric_value with the raw value of the cell as a base R numeric vector, currency value types also have a currency_symbol providing the 3-character ISO currency symbol;
for date value types, cells will have a date_value, a character vector with the date or date-time in ISO 8601 format;
for time value types, cells will have a time_value, a character vector with the time duration in ISO 8601 format.

Processing the ODS XML is a memory intensive process, you can achieve significant speed enhancements by setting the quick argument to FALSE. This process will extract only a minimum of information about the cells, namely: sheet, row, col, value_type and a base_value. The base_value when using the quick argument will combine the raw value stored for float and percentage value types with the cell_content (i.e. the formatted character string) for all other value types.

More details on the types of information extracted by read_ods_cells(), including examples of how the different value types are stored in the ODS file format and extracted by read_ods_cells() can be found in the vignette, vignette("read_cells", package = "tidyods").

Examples

example <- system.file("extdata", "basic_example.ods", package = "tidyods")
example_cells <- read_ods_cells(example, 1)
dplyr::glimpse(example_cells)
#> Rows: 28
#> Columns: 28
#> $ sheet                  <chr> "penguins", "penguins", "penguins", "penguins",…
#> $ address                <chr> "A1", "B1", "C1", "D1", "A2", "B2", "C2", "D2",…
#> $ row                    <dbl> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4,…
#> $ col                    <dbl> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,…
#> $ cell_type              <chr> "cell", "cell", "cell", "cell", "cell", "cell",…
#> $ is_empty               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ value_type             <chr> "string", "string", "string", "string", "string…
#> $ cell_content           <chr> "species", "female", "bill_length_mm", "body_ma…
#> $ base_value             <chr> "species", "female", "bill_length_mm", "body_ma…
#> $ numeric_value          <dbl> NA, NA, NA, NA, NA, NA, 40.39041, 4043.49315, N…
#> $ currency_symbol        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ boolean_value          <lgl> NA, NA, NA, NA, NA, FALSE, NA, NA, NA, TRUE, NA…
#> $ date_value             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ time_value             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ has_formula            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ formula                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ has_error              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ error_type             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ has_annotation         <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ annotation             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ is_merged              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ merge_colspan          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ merge_rowspan          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ merge_shape            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ cell_style             <chr> NA, "Default", "Default", "Default", NA, NA, NA…
#> $ row_style              <chr> "ro1", "ro1", "ro1", "ro1", "ro1", "ro1", "ro1"…
#> $ col_style              <chr> "co1", "co2", "co3", "co4", "co1", "co2", "co3"…
#> $ col_default_cell_style <chr> "Default", "ce56", "ce57", "ce58", "Default", "…