-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Some taxa are of conservation importance but are not taxonomically recognised. For example, if we look up the Victorian conservation list:
library(galah)
library(dplyr)
show_all(lists) |>
filter(isAuthoritative == TRUE,
region == "Victoria")
# A tibble: 1 × 22
species_list_uid listName description listType dateCreated lastUpdated lastUploaded
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 dr655 Victoria : Cons… "" CONSERV… 2015-04-04… 2025-07-08… 2025-07-08T…
# ℹ 15 more variables: lastMatched <chr>, username <chr>, itemCount <int>, region <chr>,
# isAuthoritative <lgl>, isInvasive <lgl>, isThreatened <lgl>, isBIE <lgl>, isSDS <lgl>,
# wkt <chr>, category <chr>, generalisation <chr>, authority <chr>, sdsType <chr>,
# looseSearch <lgl>
Then look up what species are on that list, and filter to those that are a single word:
species_list <- request_metadata() |>
filter(list == "dr655") |>
unnest() |>
collect()
species_list |>
filter(grepl("^[[:alpha:]]+$", scientificName))
# A tibble: 3 × 6
id name commonName scientificName lsid dataResourceUid
<int> <chr> <chr> <chr> <chr> <chr>
1 6793854 Chiastocaulon biseriale NA Chiastocaulon NZOR-6-7… dr655
2 6794205 Eucalyptus X oxypoma Studley Park Gum Eucalyptus https://… dr655
3 6795458 Eucalyptus X studleyensis Studley Park Gum Eucalyptus https://… dr655
Each of these entries is supplied as a species, but returns a genus. We can confirm this by trying the same query with search_taxa(), e.g.
search_taxa("Eucalyptus X studleyensis")
# A tibble: 1 × 14
search_term scientific_name scientific_name_auth…¹ taxon_concept_id rank match_type
<chr> <chr> <chr> <chr> <chr> <chr>
1 Eucalyptus X stud… Eucalyptus L'Hér. https://id.biod… genus exactMatch
# ℹ abbreviated name: ¹scientific_name_authorship
# ℹ 8 more variables: kingdom <chr>, phylum <chr>, class <chr>, order <chr>, family <chr>,
# genus <chr>, vernacular_name <chr>, issues <chr>
Again, this links the taxon concept to "Eucalyptus", and further describes match_type as exactMatch, meaning we wouldn't normally flag this as an error. The problem, therefore, is that calling this taxon name in a pipe will lead to all Eucalyptus observations being returned, which is almost certainly not what the user wants:
galah_call() |>
identify("Eucalyptus X studleyensis") |>
group_by(scientificName) |>
count() |>
collect()
# A tibble: 1,193 × 2
scientificName count
<chr> <int>
1 Eucalyptus 44131
2 Eucalyptus obliqua 43001
3 Eucalyptus camaldulensis 41418
4 Eucalyptus sieberi 25192
5 Eucalyptus melliodora 24164
6 Eucalyptus crebra 22957
7 Eucalyptus globoidea 22700
8 Eucalyptus macrorhyncha 21812
9 Eucalyptus tereticornis 21447
10 Eucalyptus muelleriana 18830
# ℹ 1,183 more rows
# ℹ Use `print(n = ...)` to see more rows
So, in summary, sometimes the ALA returns poorly targeted information that is technically correct but not useful, and doesn't provide any flags (such as match_type) that we would usually reference to identify undesirable behaviour.
One solution might be to show the user what taxon rank is being returned by search_taxa(), for example by grouping by the rank column:
search_taxa("Eucalyptus X studleyensis") |>
group_by(rank) |>
summarize(count = n())
# A tibble: 1 × 2
rank count
<chr> <int>
1 genus 1
This wouldn't help much in piped queries, but for taxon queries it might highlight unexpected behaviour. It would probably need to be controlled by the verbose argument of galah_config().