Skip to contents

Data is often published with shorthand and symbols, and regularly these tags are found in the same container (e.g. a spreadsheet/table cell) as the numeric value. The aim of shrthnd is to process character vectors of numerical data that also contain non-numeric shorthand and symbols, and to ensure both pieces of information can be easily retained and worked with.

Installation

shrthnd is not yet on CRAN, but binary versions can be installed from R-universe:

install.packages(
  "shrthnd",
  repos = c("https://mattkerlogue.r-universe.dev", "https://cran.r-project.org")
)

You can install the development version of shrthnd like so:

# install.packages("remotes")
remotes::install_github("mattkerlogue/shrthnd")

Usage

Use shrthnd_num() to convert a character vector to a shrthnd_num vector. In effect a shrthnd_num() is a pair of vectors, one numeric and a character vector to store the non-numeric components of the input vector. By default a shrthnd_num() will try to behave as a numeric vector, and can be explicitly coerced into a numeric vector with as.numeric(). You can use shrthnd_tags(), amongst other functions, to interact with the non-numeric (“tag”) component of the input vector. shrthnd also provides for the annotation of data.frames, specifically of the tibble::tibble() flavour.

Full usage details are available on the shrthnd documentation website.

library(shrthnd)

x <- c("12", "34.567", "[c]", "NA", "56.78 [e]", "78.9", "90.123[e]", 
       "321.09*", "987.564 \u2021", ".", "..")

sh_x <- shrthnd_num(x)

sh_x
#> <shrthnd_num[11]>
#>  [1]  12.00      34.57         NA [c]     NA      56.78 [e]  78.90    
#>  [7]  90.12 [e] 321.09 *   987.56 ‡       NA .       NA ..

shrthnd_list(sh_x)
#> <shrthnd_list[6]>
#> [c] (1 location): 3 
#> [e] (2 locations): 5, 7 
#> * (1 location): 8 
#> ‡ (1 location): 9 
#> . (1 location): 10 
#> .. (1 location): 11

tbl <- tibble::tibble(
  x = x,
  sh_x = sh_x,
  as_num = as.numeric(sh_x), 
  as_char = as.character(sh_x),
  tag = shrthnd_tags(sh_x), 
  as_shrthnd = as_shrthnd(sh_x), 
  as_shrthnd2 = as_shrthnd(sh_x, digits = 3)
)

tbl
#> # A tibble: 11 × 7
#>    x               sh_x as_num as_char tag   as_shrthnd as_shrthnd2
#>    <chr>       <sh_dbl>  <dbl> <chr>   <chr> <chr>      <chr>      
#>  1 12         12.00       12   12      <NA>  12.00      12.000     
#>  2 34.567     34.57       34.6 34.567  <NA>  34.57      34.567     
#>  3 [c]           NA [c]   NA   <NA>    [c]   NA [c]     NA [c]     
#>  4 NA            NA       NA   <NA>    <NA>  NA         NA         
#>  5 56.78 [e]  56.78 [e]   56.8 56.78   [e]   56.78 [e]  56.780 [e] 
#>  6 78.9       78.90       78.9 78.9    <NA>  78.90      78.900     
#>  7 90.123[e]  90.12 [e]   90.1 90.123  [e]   90.12 [e]  90.123 [e] 
#>  8 321.09*   321.09 *    321.  321.09  *     321.09 *   321.090 *  
#>  9 987.564 ‡ 987.56 ‡    988.  987.564 ‡     987.56 ‡   987.564 ‡  
#> 10 .             NA .     NA   <NA>    .     NA .       NA .       
#> 11 ..            NA ..    NA   <NA>    ..    NA ..      NA ..

sh_tbl <- shrthnd_tbl(
  tbl,
  title = "Example table",
  notes = c("Note 1", "Note 2"),
  source_note = "Shrthnd documentation, 2023"
)

sh_tbl
#> # Title:    Example table
#> # A tibble: 11 × 7
#>    x               sh_x as_num as_char tag   as_shrthnd as_shrthnd2
#>    <chr>       <sh_dbl>  <dbl> <chr>   <chr> <chr>      <chr>      
#>  1 12         12.00       12   12      <NA>  12.00      12.000     
#>  2 34.567     34.57       34.6 34.567  <NA>  34.57      34.567     
#>  3 [c]           NA [c]   NA   <NA>    [c]   NA [c]     NA [c]     
#>  4 NA            NA       NA   <NA>    <NA>  NA         NA         
#>  5 56.78 [e]  56.78 [e]   56.8 56.78   [e]   56.78 [e]  56.780 [e] 
#>  6 78.9       78.90       78.9 78.9    <NA>  78.90      78.900     
#>  7 90.123[e]  90.12 [e]   90.1 90.123  [e]   90.12 [e]  90.123 [e] 
#>  8 321.09*   321.09 *    321.  321.09  *     321.09 *   321.090 *  
#>  9 987.564 ‡ 987.56 ‡    988.  987.564 ‡     987.56 ‡   987.564 ‡  
#> 10 .             NA .     NA   <NA>    .     NA .       NA .       
#> 11 ..            NA ..    NA   <NA>    ..    NA ..      NA ..      
#> # ☰ Source: Shrthnd documentation, 2023
#> # ☰ There are 2 notes, use `annotations(x)` to view

annotations(sh_tbl)
#> ── Notes for `sh_tbl` ──────────────────────────────────────────────────────────
#> Title: Example table
#> Source: Shrthnd documentation, 2023
#> Notes:
#> • Note 1
#> • Note 2

Philosophy

Datasets, especially statistical data published by governments, international institutions and academia, often comes with symbols and markers to provide further details about the values: that a value is estimated, the reason for why a value is missing, or that a value has a given statistical significance level.

The most common approach to processing data that contains both numeric and non-numeric components is to scrub the non-numeric content, so that the input can be coerced into a numeric vector. However, this non-numeric content (“tags”) often convey useful information that it might be useful to retain. If you want to access this non-numeric content, you may need to re-import your dataset or change your processing. This creates opportunity for error and, critically, de-linking the numeric and non-numeric components. The shrthnd_num() data type builds on vctrs::new_rcrd() to separate, but keep linked, these numeric and non-numeric components of a vector.

The shrthnd package logo is a combination of the word “shorthand” written in Pitman shorthand alongside an asterisk. The image was drawn by hand with plot points then adjusted for plotting in ggplot2. The “shorthand” shape is based on the representation in Arthur Reynold’s Pitman’s English and Shorthand Dictionary, retrieved from the Internet Archive on 2023-05-11.