Skip to contents

Import cells from an OpenDocument Spreadsheet (ODS) file. In the resulting dataaset each cell is its own row with columns providing information about the cell's postion, value types and containing formulas.

Usage

read_ods_cells(path, sheet = 1, quick = FALSE, whitespace = FALSE)

Arguments

path

The ODS file

sheet

The sheet within the ODS file

quick

Whether to use the quick reading process

whitespace

Whether to process multiple whitespaces

Value

A tibble with the following information:

  • sheet: the sheet of the cell

  • row: the row number of the cell

  • col: the column number of the cell

  • cell_type: the type of cell, either "cell", "empty" or "merged"

  • value_type: the ODS value type (see details)

  • is_empty: if the cell has no content (can apply to cell and merged types)

  • cell_content: the value of the cell as represented to the user of a spreadsheet application (i.e. with text formatting applied)

  • base_value: the underlying value of a cell (see details)

  • numeric_value: the base_value as a numeric vector for numeric types

  • logical_value: the base_value as a logical vector for boolean types

  • currency_symbol: the currency symbol for cells with a value_type of currency

  • has_formula: whether the cell has a formula

  • cell_formula: the ODS formula specification

  • has_error: whether the cell has an error

  • error_type: the error type

If using the quick argument then only row, col and cell_content are returned.

Details

Cells are assigned a cell_type of "merged" if they are covered by a cell merge, "blank" if they have no value type or cell content, otherwise they are a "cell". Use is_empty to determine if the cell is blank, as the ODS specification allows for the retention of content in cells hidden by a merge.

The value_type of a cell is based on the "value-type" attribute defined by the OpenDocument Format specification. There are 7 value types:

  • boolean: equivalent to R's base::logical() data type

  • currency: a numeric value, optionally with a currency symbol

  • date: a date or date-time stored in ISO format

  • float: a numeric value, the ODS format does not distinguish between different numeric types

  • percentage: a numeric value, formatted for display as a percentage

  • string: equivalent to R's base::character()

  • time: a duration, stored in ISO format

For currency value types optionally a currency symbol (e.g. GBP, EUR, USD) can be set, this is provided separately from the base_value as currency_symbol.

read_ods_cells provides two representations of a cell's value. The cell_content column provides the cell's value as presented to the user of a traditional spreadsheet application (such as Microsoft Excel, Google Sheets or LibreOffice Calc), i.e. after applying cell formatting rules. The base_value column provides the cell's underlying value, i.e. without formatting.

Processing the ODS XML is a memory intensive process, you can achieve significant speed enhancements by setting the quick argument to FALSE, this will extract only the text content of the cell. Note this also ignores repeated white space elements (i.e. "A B C" will be returned as "A B C").

The values of error_type are equivalent to those produced by Microsoft Excel and LibreOffice.

Examples

example <- system.file("extdata", "basic_example.ods", package = "tidyods")
example_cells <- read_ods_cells(example, 1)
#>  Extracting XML
#>  Extracting XML [26ms]
#> 
#>  Getting sheet
#>  Getting sheet [30ms]
#> 
#>  Extracting cell and row info
#>  Extracting cell and row info [168ms]
#> 
#>  Generating output table
#>  Generating output table [79ms]
#> 
dplyr::glimpse(example_cells)
#> Rows: 28
#> Columns: 15
#> $ sheet           <chr> "penguins", "penguins", "penguins", "penguins", "pengu…
#> $ row             <int> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, …
#> $ col             <int> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, …
#> $ cell_type       <chr> "cell", "cell", "cell", "cell", "cell", "cell", "cell"…
#> $ value_type      <chr> "string", "string", "string", "string", "string", "boo…
#> $ is_empty        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ cell_content    <chr> "species", "female", "bill_length_mm", "body_mass_g", …
#> $ base_value      <chr> "species", "female", "bill_length_mm", "body_mass_g", …
#> $ numeric_value   <dbl> NA, NA, NA, NA, NA, NA, 40.39041, 4043.49315, NA, NA, …
#> $ logical_value   <lgl> NA, NA, NA, NA, NA, FALSE, NA, NA, NA, TRUE, NA, NA, N…
#> $ currency_symbol <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ has_formula     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ cell_formula    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ has_error       <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ error_type      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…