Minor release to fix failing tests on CRAN.
Zachary Foster is now the maintainer of taxize
.
tnrs()
and tnrs_sources()
functions are
defunct. The service has been unreliable for years now, and AFAICT is
down for good. Associated changes have been made throughout the package,
eg. resolve()
no longer has an option for tnrs, etc. (#841)
(#842)tol_resolve()
test following new version
of rotl
package on cran (#816)class2tree()
function documentation regarding
how the function works in more detail (#849) (#851)worms_downstream()
, children(..., db="worms")
and downstream(..., db="worms")
: now paginate automatically
for the user to get all results, and allow parameter
marine_only
to be passed through the high level functions
children()
/downstream()
down to
worrms::wm_children()
where it toggles whether marine only
results are returned (#848) thanks @oharac !ncbi_downstream()
(which cascades up to
downstream(..., db="ncbi")
): an unneeded line of code was
removed that was also throwing an error in some cases (#850)worms_downstream()
, children(..., db="worms")
and downstream(..., db="worms")
: added ranks
epifamily
and infraphylum
. In addition, when a
rank is missing in data returned from WORMS, we’ll change the missing
rank to “no rank” (#847)worms_downstream()
docs: make it clear that
users can use parameters passed down to
worrms::wm_children()
(#831)get_pow_()
docs: add section on rate limits,
what are rate limits for KEW POW and a user facing resolution
(#836)rank_ref
) in the package: biotype, forma specialis,
isolate, pathogroup, series, serogroup, serotype, and strain - queries
from downstream()
and other functions that rely on relative
rank information should not fail anymore when they contain these 8 rank
names (#830)rank_ref_zoo
reference data.frame specfically for
zoological rank types - right now only used for WORMS. main difference
is section/subsection in rank_ref_zoo
are nested between
the order and family, whereas in rank_ref
(used for all
other data sources) section/subsection are on the genus rank level
(#833)class2tree()
. Problem sorted out now (#835)
(#838) (#839) (#840)sci
will always only accept a scientific name;
com
accdepts only a common name; id
accepts a
taxonomic identifier; sci_com
accepts or scientific or
common name; sci_id
accepts a scientific name or taxonomic
identifier. In most cases we have retained the old parameter name and
you can still use it but you get a warning with information. In a future
package version the replaced parameters will be removed completely. See
https://github.com/ropensci/taxize/issues/723 for tables covering the
functions affected, their old and new parameter names (#723) (#829)apg_families
and
apg_order
) to v14 (from July 2017) (#827)worrms_downstream()
: three rank names were not
accounted for in our internal set of ranks (suptertribe, subterclass,
parvorder) (#824)classification.gbifid
was returning a duplicate last
taxon, i.e., the last two rows in the output data.frame were the same.
fixed. (#825)lowest_common()
due to problem in
classification.uid()
when a taxon UID was merged into
another taxon (#828)ELEMENT_GLOBAL.2.
part
is redundant for every identifier (#823)rankagg()
and tax_agg()
fixes:
rankagg()
examples now conditional on availability of
vegan
as it should be, and now real abundance data are used
in the example. tax_agg()
fixes species name ordering in
dune
data (#822) work by @jarioksaclass2tree()
(#818) (#820) thx to @adriangeerre for the
report & the fix by @trvinhworms_downstream()
: user encountered a rank name
(“phylum (division)”) we hadn’t dealt with yet for worms (#821) thx
@msweetlove for
the reportbold_children()
,
bold_downstream()
and new S3 methods for
boldid
: children.boldid
and
downstream.boldid
. Beware that these new methods are built
on top of a function that scrapes BOLD’s website - their API doesn’t
provide access to taxonomic children (only parents) - so we’ve taken the
liberty of trying to liberate that data and make it easy to access
(#817)tol_resolve()
test - upstream package
rotl had the bug; told maintainer about it and he’ll submit a new
version soon; affected commented out for now (#814)synonyms()
gains a method for Plants of the World
Online (synonyms.pow
); and new associated helper function
pow_synonyms()
used within synonyms.pow
(#812)iucn_summary()
to allow
get_iucn()
failures and the function to still proceed - to
make a better experience when passing in more than 1 name (#810)species_plantarum_binomials
datasetclassification()
for data source GBIF wasn’t working
when the queried taxon rank was below species (e.g., subspecies or
variety); GBIF didn’t return the same fields for ranks below species, so
we tack on that information with a bit of extra code (#809)classification()
with data
source GBIF; at some point introduced bug in how results were sorted
(#811)use_eol()
is now defunct; EOL no longer requires an API
key (#749) (#803) thanks @padpadpadpadvascan_search()
, taxize_cite()
, all
*_ping()
functions, get_wormsid()
,
get_pow()
, get_eolid()
,
get_gbifid()
, get_boldid()
,
gbif_name_usage()
; and in various places in documentation
(#799)classification.uid()
now does batch HTTP requests. NCBI
Entrez web service allows requests with up to 50 identifiers; @zachary-foster did
the work to make this method now use batch queries so its much faster
(#678) (#798)class2tree()
improvement in taxonomy rank indexing
(#805) work by @trvinhtaxon_state_messages
parameter in
the taxize_options
help file (#806)ncbi_children()
now accepts numeric and character class
ids (#800)classification.gbifid()
, was failing because GBIF
changed the order of results (#802)class2tree()
fix: problem was due ultimately to a bug
in classification.gbifid()
(see line above) (#801)tax_rank()
fix - for db="ncbi"
was not
giving correct ranks for queried names - was due to a change in
classification.uid
(#804)get_eolid()
when filtering by data source
lead to no results (#808)ncbi_downstream
(and thereby fix for
downstream()
with db="ncbi"
): for some taxa a
query to NCBI resulted in children as well the queried name itself, and
the next query would give the same results, leading to an endless while
loop - now we remove the taxon itself that was queried to prevent this
(#807)COL introduced rate limiting recently in 2019 - which has made the API essentially unusable - CoL+ is coming soon and we’ll incorporate it here when it’s stable. see https://github.com/ropensci/colpluz for the in development R client (#796)
gn_parse()
to access the Global
Names scientific name parser. it’s a super fast parser. see the section
on name parsers
(https://docs.ropensci.org/taxize/reference/index.html#section-name-parsers)
for the 3 functions that do name parsing (#794)get_wormsid()
gains two new parameters:
fuzzy
and marine_only
; both are passed through
to
worrms::wm_records_name()
/worrms::wm_records_name()
(#790)worrms_ranks
to apply rank names in cases
where WORMS fails to return rank names in their dataget_tpsid()
example that passes in names as
factors; get_*
functions no longer accept factorsclassification.tpsid()
: change to an internal
fxn changed its output; fix for that (#797)get_boldid()
: when filtering (e.g., w/
rank
, division
, parent
) returned
no match, get_boldid
was failing on downstream parsing;
return NA nowget_wormsid_()
: was missing
marine_only
and fuzzy
parameterspow_search()
: an if statement was leading to length
> 1 booleanssynonyms()
: an if statement in internal fxn
process_syn_ids
was leading to length > 1 booleansclassification.gbifid
: select columns only if they
exist instead of failing on plucking non-existtent columnsget_ids()
gains a new parameter suppress
(default:FALSE
) to toggle pakage cli
messages
stating which database is being worked on (#719)taxize::downstream()
:
rank_ref
, theplantlist
,
apg_families
, apg_orders
(#777) (#781)get_*
functions have S3 methods that dispatch on those
get_*
output classes. however, you can still pass in a
db
parameter, which is IGNORED when dispatching on the
input class. the db
parameter is used (not ignored) when
passing in a taxon id as character/numeric/etc. now these functions
(children, classification, comm2sci, sci2comm, downstream, id2name,
synonyms, upstream) warn when the user passes a db
value
which will be ignored (#780)http_version=2L
across all Entrez requests (#783)col_search()
: COL now does rate limiting (if
you make too many requests within a time period they will stop allowing
requests from your IP address/your computer); documented rate limiting,
what I know at least; changed checklist
parameter behavior:
years 2014 and back dont provide JSON, so we return
xml_document
objects now for those years that the user can
parse themselves (#786)tax_rank
somehow (my bad) had two .default
methods. previous behavior is the same as current behavior (this
version) (#784)ncbi_children()
: fixed regex that was supposed to
flag ambiguous taxa only, it was supposed to flag sp.
and
spp.
, but was including subsp.
, which we
didn’t want included (#777) (#781)ncbi_children()
: when ID is passed
rather than a name, we need to then set id=NULL
after
switching to the equivalent taxononmic name internally to avoid getting
duplicate data back (#777) (#781)eubon_search()
gains new params limit
and
page
; other eubon functions have no pagination (#766)ipni_search()
from http to https,
via (#773)synonyms()
to always return NA
for
name not found, and always return a zero row data.frame when name found
BUT no synonyms found; updated docs to indicate better what’s returned
(#763) (#765)xml2
package, so we have to remove
them using regex; we throw a message when we’re doing this so the user
knows (#768)classification()
docs with a new
EOL
section discussing that EOL does not have good failure
behavior, and what to expect from them (#775)taxize::downstream()
:
rank_ref
, theplantlist
,
apg_families
, apg_orders
(#777)sci2comm()
and comm2sci()
improvements:
for db="ncbi"
we no longer stop with error when when
there’s no results for a query; instead we return
character(0)
. In addition, now all data source options for
both functions now return character(0)
when there’s no
results for a query (#778)id2name.uid()
now actually passes on ...
internally for curl optionsget_nbnid()
: was returning non-taxon entities, have
ot add idxtype:TAXON
to the fq
query
(#761)as.eolid()
and as.colid()
-
don’t run through helper function that was raising error on HTTP
404/etc., dont want to fail (#762)class2tree()
: set root node name to NA if it
does not exist, ITIS does not set a root node (#767) (#769) work by
@gpliipni_search()
: IPNI changed parameter names,
fixes for that; and now returning tibble’s instead of data.frame’s
(#773) thanks @joelnitta !ncbi_children()
: fixed regex that was supposed to
flag ambiguous taxa only, it was supposed to flag sp.
and
spp.
, but was including subsp.
, which we
didn’t want included (#777)ncbi_children()
: when ID is passed
rather than a name, we need to then set id=NULL
after
switching to the equivalent taxononmic name internally to avoid getting
duplicate data back (#777)get_*
functions gain some new features (associated
new fxns are taxon_last
and taxon_clear
): a)
nicer messages printed to the console when iterating through taxa, and a
summary at the end of what was done; and b) state is now saved when
running get_*
functions. That is, in an object external to
the get_*
function call we keep track of what happened, so
that if an error is encountered, you can easily restart where you left
off; this is especially useful when dealing with a large number of
inputs to a get_*
function. To utilize, pass the output of
taxon_last()
to a get_*
function call.
Associated with these changes are new package imports: R6, crayon and
cli (#736) (#757)taxize_options()
to set options
when using taxize. the first reason for the function is to set two
options for the above item for get_*
functions:
taxon_state_messages
to allow taxon state tracking messages
in get_*
functions or not, and quiet=TRUE
quiets output from the taxize_options()
function
itselfid2name()
and worms_downstream()
use
worrms::wm_record
instead of
worrms::wm_record_
for newest version of
worrms
(#760)get_*
functions and col_downstream()
parameter verbose
changed to messages
to not
conflict with a verbose
curl options parameter passed in to
crul
gbif_downstream()
- GBIF in some cases returns a
rank of “unranked”, which we hadn’t accounted for in internal rank
processing code (#758) thanks @ocstringhamclass2tree()
gains node labels when present (#644)
(#748) thanks @gpliget_pow()
, get_pow_()
, as.pow()
,
classification.pow()
, pow_search()
, and
pow_lookup()
(#598) (#739)taxize
. the
string will look something like
r-curl/3.3 crul/0.7.0 rOpenSci(taxize/0.9.6)
, including the
versions of the curl
R pkg, the crul
package,
and the taxize
package (#662)get_colid
functionality: we weren’t
paginating for the user when there were more than 50 results for a
query; we now paginate for the user using async HTTP requests; this
means that some requests will take longer than they did before if they
have more than 50 results; this is a good change given that you get all
the results for your query now (#743)get_*
functions: in some of the
get_*
functions we tried for a direct match (e.g.,
"Poa" == "Poa"
) and if one was found, then we were done and
returned that record. however, we didn’t deploy the same logic across
all get_*
functions. Now all get_*
functions
check for a direct match. Of course if there is a direct match with more
than 1 result, you still get the prompt asking you which name you want.
(#631) (#734)taxize-authentication
manual file
covering authentication information across the package (#681)gnr_resolve()
docs about age of datasets
used in the Global Names Resolver, and how to access age of datasets
(#737)get_eolid()
fixes: gains new attribute
pageid
; uri
’s given are updated to EOL’s new
URL format; rank
and datasource
parameters
were not documented, now are; we no longer use short names for data
sources within EOL, but instead use their full names (#702) (#742)col_search()
now returns attributes on the output
data.frame’s with number of results found and returned, and other
metadata about the searchgnr_datasources()
loses the todf
parameter; now always returns a data.frame and the data.frame has all
the columns, whereas the default call returned a limited set of columns
in previous versionsget_wormsid()
, was failing when there was a
direct match found with more than 1 result (#740)get_*
functions: linting of the input to
the rows
parmeter was failing with a vector of values in
some cases (#741)iucn_summary()
; we weren’t passing on the API
key internally correctly (#735) thanks @PrincessPi314 for the reportiucn_summary_id()
is defunct, use
iucn_summary()
insteadcol_downstream()
gains parameter
extant_only
(logical) to optionally keep extant taxa only
(#714) thanks @ArielGreiner for the inquirydownstream()
gains another db
options:
Worms. You can now set db="worms"
to use Worms to get taxa
downstream from a target taxon. In addition, taxize
gains
new function worms_downstream()
, which is used under the
hood in downstream(..., db="worms")
(#713) (#715)id2name()
with db
options for tol, itis, ncbi, worms, gbif, col, and bold. the function
converts taxonomid IDs to names. It’s sort of the inverse of the
get_*()
family of functions. (#712) (#716)tax_rank()
gains new parameter rows
so
that one can pass rows
down to get_*()
functionssynonyms()
warning from an internal
cbind()
call now fixed (#704) (#705) thanks @vijaybarvetaxize
function calls thrown when notifying
users about API keys (e.g., taxize::use_tropicos()
) to make
it very clear where the functions live (to avoid confusion with
usethis
) (#724) (#725) thanks @maelleiucn_summary()
to output the same structure
when no match is found as when a match is found so that when output is
passed to iucn_status()
behavior is the same (#708) thanks
@Rekyttax_name()
tests on CRAN (#728)httr
replaced by crul
throughout
(#590)vcr
, making tests much faster and not prone to errors to
remote services being down (#729)eol_dataobjects()
gains new
parameter language
. eol_pages()
loses
iucn
, images
, videos
,
sounds
, maps
, and text
parameters, and gains images_per_page
,
videos_per_page
, sounds_per_page
,
maps_per_page
, texts_per_page
, and
texts_page
. Please do let us know if you find any problems
with any EOL functions (#717) (#718)db
value for
comm2sci()
and sci2comm()
is now
ncbi
instead of eol
get_*()
functions changed parameter
verbose
to messages
to not conflict with
verbose
passed down to crul::HttpClient
ncbi_ping()
reworked to allow use of
your api key as a parameter or pulled from your environemnt;
eol_ping()
using https instead of http, and parsing JSON
instead of XML.get_eolid()
was erroring when no results found for a
query due to not assigning an internal variable (#701) (#709) thanks for
the fix @taddallasget_tolid()
was erroring when values were
NULL
- now replacing all NULLL
with
NA_character_
to make data.table::rbindlist()
happy (#710) (#711) thanks @gpli for the fixrank_ref
data.frame of
taxonomic ranks: species subgroup, forma, varietas, clade, megacohort,
supercohort, cohort, subcohort, infracohort. when there’s no matched
rank errors can result in many of the downstream functions. The
data.frame now has 43 rows. (#720) (#727)downstream()
and
ncbi_get_taxon_summary()
: change in
ncbi_get_taxon_summary
to break up queries into smaller
chunks to avoid HTTP 414 errors (“URI too long”) (#727) (#730) thanks
for reporting @fischhoff and @benjaminschwetzuse_entrez()
, use_eol()
,
use_iucn()
(which uses internally
rredlist::rl_use_iucn()
), and use_tropicos()
(#682) (#691) (#693) By @maelletropicos_ping()
downstream()
and gbif_downstream()
:
some of the results don’t have a canonicalName
, so now
safely try to get that field (#673)as.uid()
, was erroring when passing in a taxon ID
(#674) (#675) by @zachary-fosterget_boldid()
(and by extension
classification(..., db = "bold")
): was failing when no
parent taxon found, just fill in with NA now (#680)synonyms()
: was failing for some TSNs for
db="itis"
(#685)tax_name()
: rows
arg wasn’t being
passed on internally (#686)gnr_resolve()
and
gnr_datasources()
: problems were caused by http scheme,
switched to use https instead of http (#687)class2tree()
: organisms with unique rank lower
than non-unique ranks will give extra wrong rows (#689) (#690) thanks
@gplincbi_get_taxon_summary()
: changes in the NCBI
API most likely lead to HTTP 414 (URI Too Long) errors. we now loop
internally for the user. By extension this helps problems upsteam in
downstream()
/ncbi_downstream()
/ncbi_children()
(#698)class2tree()
: was erroring when name strings
contained pound signs (e.g., #
) (#699) (#700) thanks @gpliSys.sleep
for NCBI
requests if the user has an API key (#667)?taxize-authentication
verbose
to messages
across the package so that supressing calls to message()
do
not conflict with curl options passed ingenbank2uid()
and
ncbi_get_taxon_summary()
to use crul
instead
of httr
for HTTP requestsget_tolid()
: it was missing assignment of the
att
attribute internally, causing failures in some cases
(#663) (#672)ncbi_children()
(and thus
children()
when requesting NCBI data) to not fail when
there is an empty result from the internal call to
classification()
(#664) thanks @arendseeclass2tree()
gets a major overhaul thanks to @gedankenstuecke
and @trvinh (!!). The
function now takes unnamed ranks into account when clustering, which
fixes problem where trees were unresolved for many splits as the named
taxonomy levels were shared between them. Now it makes full use of the
NCBI Taxonomy string, including the unnamed ranks, leading to higher
resolution trees that have less multifurcations (#611) (#634)?taxize-authentication
for help.
Importantly, note that API key names (both R options and environment
variables) have changed. They are now the same for R options and env
vars: TROPICOS_KEY, EOL_KEY, PLANTMINER_KEY, ENTREZ_KEY. You no longer
need an API key for Plantminer. (#640) (#646)crul
and
zoo
downstream()
we now pass on limit
and
start
parameters to gbif_downstream()
; we
weren’t doing that before; the two parameters control pagination
(#638)genbank2uid()
now returns the correct ID when there are
multiple possibilities and invalid IDs no longer make whole batches fail
(#642) thanks @zachary-fosterchildren()
outputs made more consistent for certain
cases when no results found for searches (#648) (#649) thanks @arendseedownstream()
by passing ...
(additional parameters) down to ncbi_children()
used
internally. allows e.g., use of ambiguous
parameter in
ncbi_children()
allows you to remove ambiguousl named nodes
(#653) (#654) thanks @arendseehttr
for crul
in EOL
and Tropics functions - note that this won’t affect you unless you’re
passing curl options. see package crul
for help on curl
options. Along with this change, the parameter verbose
has
changed to messages
(for toggling printing of information
messages)CONTRIBUTING.md
file for
how to contribute to the test suite (#635)genbank2uid
now returns the correct ID when there are
multiple possibilities and invalid IDs no longer make whole batches
fail.downstream()
: passing numeric taxon ids to the
function while using db="ncbi"
wasn’t working (#641) thanks
@arendseechildren()
: passing numeric taxon ids to the
function while using db="worms"
wasn’t working (#650)
(#651) thanks @arendseesynonyms_df()
- that attemps to combine many outputs
from the synonyms()
function - now removes NA/NULL/empy
outputs before attempting the combination (#636)gnr_resolve()
: before if
preferred_data_sources
was used, you would get the
preferred data but only a few columns of the response. We now return all
fields; however, we only return the preferred data part when that
parameter is used (#656)children()
. It was returning unexpected
results for amgiguous taxonomic names (e.g., there’s some insects that
are returned when searching within Bacteria). It was also failing when
one tried to get the children of a root taxon (e.g., the children of the
NCBI id 131567). (#639) (#647) fixed via PR (#659) thanks @arendsee and @zachary-fosterget_*()
functionsget*()
functions had NaN
as default
rows
parameter value. Those all changed to
NA
rows
parameter value givenget_*()
functionsget_*()
functions to behave
the same when ask = FALSE, rows = 1
and
ask = TRUE, rows = 1
as these should result in the same
outcome. (#627) thanks @zachary-foster !NA
with no inication that there were
multiple matches.comm2sci()
to S3 setup with methods for
character
, uid
, and tsn
(#621)iucn_status()
now has S3 setup with a single method
that only handles output from the iucn_summary()
function.key
parameter to fxn
iucn_id()
(#633)sci2comm()
: to indicate how to get
non-simplified output (which includes what language the common name is
from) vs. getting simplified output (#623) thanks @glaroc !sci2comm()
to not be case sensitive when looking
for matches (#625) thanks @glaroc !eol_search()
:
link
and content
eol_search()
to describe returned
data.frame
bold_bing()
to use new base URL for their APIrank_ref
, see
?rank_ref
downstream()
via fix to rank_ref
dataset to include “infraspecies” and make “unspecified” and “no rank”
requivalent. Fix to col_downstream()
to remove properly
ranks lower than allowed. (#620) thanks @cdeterman !iucn_summary
: changed to using rredlist
package internally. sciname
param changed to
x
. iucn_summary_id()
now is deprecated in
favor of iucn_summary()
. iucn_summary()
now
has a S3 setup, with methods for character
and
iucn
(#622)rank_ref
dataset as that rank
sometimes used at NCBI (from bug reported in
ncbi_downstream()
) (#626)sci2comm()
, add tryCatch()
to
internals to catch failed requests for specific pageid’s (#624) thanks
@glaroc !get_nbnid()
(#632)ape::neworder_phylo
object, which is not used
anymore in taxize
ncbi_downstream()
and now NCBI is an
option in the function downstream()
(#583) thanks for the
push @andzandz11wikitaxa
, with contributions from @ezwelty (#317)scrapenames()
gains a parameter
return_content
, a boolean, to optionally return the OCR
content as a text string with the results. (#614) thanks @fgabriel1891get_iucn()
- to get IUCN Red List ids for
taxa. In addition, new S3 methods synonyms.iucn
and
sci2comm.iucn
- no other methods could be made to work with
IUCN Red List ids as they do no share their taxonomic classification
data (#578) thanks @diogoprovbold
now an option in classification()
function (#588)genbank2uid()
can give back more than 1 taxon matched
to a given Genbank accession number. Now the function can return more
than one match for each query, e.g., try
genbank2uid(id = "AM420293")
(#602) thanks @sariyacbind()
usage to incclude
...
for method consistency (#612)tax_rank()
used to be able to do only ncbi and itis.
Can now do a lot more data sources: ncbi, itis, eol, col, tropicos,
gbif, nbn, worms, natserv, bold (#587)classification()
docs in a section
Lots of results
a note about how to deal with results when
there are A LOT of them. (#596) thanks @ahhurlbert for raising the issuetnrs()
now returns the resulting data.frame in the oder
of the names passed in by the user (#613) thanks @wpetrygnr_resolve()
to now strip out taxonomic
names submitted by user that are NA, or zero length strings, or are not
of class character (#606)gnr_resolve()
(#610) thanks @kamaputnrs()
docs that the service doesn’t
provide any information about homonyms. (#610) thanks @kamapuparvorder
to the taxize
rank_ref
dataset - used by NCBI - if tax returned with that
rank, some functions in taxize
were failing due to that
rank missing in our reference dataset rank_ref
(#615)get_colid()
via problem in parsing within
col_search()
(#585)gbif_downstream
(and thus fix in
downstream()
): there was two rows with form in our
rank_ref
reference dataset of rank names, causing > 1
result in some cases, then causing vapply
to fail as it’s
expecting length 1 result (#599) thanks @andzandz11genbank2uid()
: was failing when getting more than 1
result back, works now (#603) and fails better now, giving back
warnings/error messages that are more informative (see also #602) thanks
@sariyasynonyms.tsn()
: in some cases a TSN has > 1
accepted name. We get accepted names first from the TSN, then look for
synonyms, and hadn’t accounted for > 1 accepted name. Fixed now
(#607) thanks @tdjamessci2comm()
- was not dealing internally
with passing the simplify
parameter (#616)worrms
package on
CRAN. Adds functions as.wormsid()
,
get_wormsid()
, get_wormsid_()
,
children.wormsid()
, classification.wormsid()
,
sci2comm.wormsid()
, comm2sci.wormsid()
, and
synonyms.wormsid()
(#574) (#579)as.natservid
, get_natservid
,
get_natservid_
, and classification.natservid
(#126)rankagg()
with respect to vegan
package to work with older and new version of vegan
- thank
@jarioksa (#580)
(#581)get_tolid()
, get_tolid_()
, and
as.tolid()
(#517)classification()
gains new method for
TOL datalowest_common()
gains new method for
TOL dataritis
package, an external dependency for
ITIS taxonomy data. Note that a large number of ITIS functions were
removed, and are now available via the package ritis
.
However, there are still many high level functions for working with ITIS
data (see functions prefixed with itis_
), and
get_tsn()
, classification.tsn()
, and similar
high level functions remain unchanged. (#525)eubon()
fxn is now
eubon_search()
, although either still work - though
eubon()
will be made defunct in the next version of this
package. Additional new functions were added:
eubon_capabilities()
, eubon_children()
, and
eubon_hierarchy()
(#567)lowest_common()
function gains two new data source
options: COL (Catalogue of Life) and TOL (Tree of Life) (#505)synonyms_df()
as a slim wrapper
around data.table::rbindlist()
to make it easy to combine
many outputs from synonyms()
for a single data source -
there is a lot of heterogeneity among data sources in how they report
synonyms data, so we don’t attempt to combine data across sources
(#533)https
from http
(#571)tax_name()
in which when an invalid taxon
was searched for then classification()
returned no data and
caused an error. Fixed now. (#560) thanks @ljvillanueva for reporting it!gnr_resolve()
in which order of input
names to the function was not retained. fixed now. (#561) thanks @bomeara for reporting
it!gbif_parse()
- data format changed coming
back from GBIF - needed to replace NULL
with
NA
(#568) thanks @ChrKoenig for reporting it!get_*()
functions now have new attributes to further
help the user: multiple_matches
(logical) indicating
whether there were multiple matches or not, and
pattern_match
(logical) indicating whether a pattern match
was made, or not. (#550) from (#547) discussion, thanks @ahhurlbert ! see also
(#551)xml2::xml_find_one()
to
xml2::xml_find_first()
for new xml2
version
(#546)gnr_resolve()
now retains user supplied taxa that had
no matches - this could affect your code, make sure to check your
existing code (#558)gnr_resolve()
- stop sorting output data.frame, so
order of rows in output data.frame now same as user input vector/list
(#559)sub_rows()
inside of most
get_*()
functions to not fail when the data.frame rows were
less than that requested by the user in rows
parameter
(#556)get_gbifid()
, as sometimes calls failed because
we now return numberic IDs but used to return character IDs (#555)get_()
functions to call the internal
sub_rows()
function later in the function flow so as not to
interfere with taxonomic based filtering (e.g., user filtering by a
taxonomic rank) (#555)gnr_resolve()
, to not fail on parsing when no
data returned when a preferred data source specified (#557)iucn_summary()
(#543) thanks @mcsiplencbi_get_taxon_summary()
suggesting to break up the ids
into chunks (#541) thanks @daattaliitis_acceptname()
to accept multiple names
(#534) and now gives back same output regardless of whether match found
or not (#531)tax_name()
for some queries that return no
classification data via internal call to classification()
(#542) thanks @daattalitax_name()
(#530) thanks @ibartomeusrankagg()
function, use
requireNamespace()
in examples to make sure user has
vegan
installed (#529)eol_invasive()
and
gisd_invasive()
to point to new location in the originr package.
Also, cleaned out code in those functions as not avail. anymore
(#494)get_gbifid()
to use new internal code to provide
two ways to search GBIF taxonomy API, either via
/species/match
or via /species/search
, instead
of /species/suggest
, which we used previously. The suggest
route was too coarse. get_gbifid()
also gains a parameter
method
to toggle whether you search for names using
/species/match
or /species/search
. (#528)col_search()
to handle when COL can return a
value of missapplied name
, which a switch()
statement didn’t handle yet (#511) thanks @JoStaerk !get_colid()
and col_search()
(#523) thanks @zachary-foster !bold
, which fixes
taxize::bold_search()
, so no actual changes in
taxize
for this, but take note (#521)gnr_resolve()
where we indexed to data
incorrectly. And added tests to account for this problem. Thanks @raredd ! (#519) (#520)iucn_summary()
introduced in last version.
iucn_summary()
now uses the package rredlist
,
which requires an API key, and I didn’t document how to use the key.
Function now allows user to pass the key in as a parameter, and
documents how to get a key and save it in either .Renviron
or in .Rprofile
(#522)lowest_common()
for obtaining the lowest
common taxon and rank for a given taxon name or ID. Methods so far for
ITIS, NCBI, and GBIF (#505)rredlist
rredlistiucn_summary_id()
- same as
iucn_summary()
, except takes IUCN IDs as input instead of
taxonomic names (#493)iucn_summary()
fixes, long story short: a number of bug
fixes, and uses the new IUCN API via the newish package
rredlist
when IDs are given as input, but uses the old IUCN
API when taxonomic names given. Also: gains new parameter
distr_details
(#174) (#472) (#487) (#488)XML
with xml2
for XML parsing
(#499)httr::content
to explicitly
state encoding="UTF-8"
(#498)gnr_resolve()
now outputs a column
(user_supplied_name
) for the exact input taxon name -
facilitates merging data back to original data inputs (#486) thanks
@Alectoriaeol_dataobjects()
gains new parameter
taxonomy
to toggle whether to return any taxonomy details
from different data providers (#497)classification()
was giving back rank values in mixed
case from different data providers (e.g., class
vs. Class
). All rank values are now all lowercase
(#504)get_gbfid
to 50 from 20. Gives back more results, so more
likely to get the thing searched for (#513)gni_search()
to make all output columns
character
classiucn_id()
, tpl_families()
, and
tpl_get()
all gain a new parameter ...
to pass
on curl options to httr::GET()
get_eolid()
: URI returned now always has the
pageid, and goes to the right place; API key if passed in now actually
used, woopsy (#484)get_uid()
: when a taxon not found, the “match”
attribute was saying found sometimes anyway - that is now fixed;
additionally, fixed docs to correctly state that we give back
'NA due to ask=FALSE'
when ask = FALSE
(#489)
Additionally, made this doc fix in other get_*()
function
docsapgOrders()
function (#490)tp_search()
which fixes
get_tpsid()
: Tropicos doesn’t allow periods
(.
) in query strings, so those are URL encoded now;
Tropicos doesn’t like sub-specific rank names in name query strings, so
we warn when those are found, but don’t alter user inputs; and improved
docs to be more clear about how the function fails (#491) thanks @scelmendorf !classification(db = "itis")
to fail better when
no taxa found (#495) thanks @ashenkin !eol_pages()
fixes: the EOL API route for this method
gained a new parameter taxonomy
, this function gains that
parameter. That change caused this fxn to fail. Now fixed. Also,
parameter subject
changed to subjects
(#500)col_search()
due to when
misapplied name
come back as a data slot. There was
previously no parser for that type. Now there is, and it works
(#512)R >= 3.2.1
. Good idea to update your R
installation anyway (#476)ion()
for obtaining data from Index of
Organism Names (#345)eubon()
for obtaining data from EU
(European Union) BON taxonomy (#466) Note that you may onloy get partial
results for some requests as paging isn’t implemented yet in the EU BON
API (#481)fg_*()
for
obtaining data from Index Fungorum. More work has to be done yet on this
data source, but these initial functions allow some Index Fungorum data
access (#471)gbif_downstream()
for obtaining downstream
names from GBIF’s backbone taxonomy. Also available in
downstream()
, where you can request downstream names from
GBIF, along with other data sources (#414)db
parameters to warn users
that if they provide the wrong db
value for the given taxon
ID, they can get data back, but it would be wrong. That is, all
taxonomic data sources available in taxize
use their own
unique IDs, so a single ID value can be in multiple data sources, even
though the ID refers to different taxa in each data source. There is no
way we can think of to prevent this from happening, so be cautious.
(#465)gnr_resolve()
to by default capitalize first
name of a name string passed to the function. GNR is case sensitive, so
case matters (#469)phylomatic_tree()
and phylomatic_format()
are defunct. They were deprecated in recent versions, but are now gone.
See the new package brranching
for Phylomatic data
(#479)stripauthority
argument in gnr_resolve()
has been renamed to canonical
to better match what it
actually does (#451)gnr_resolve()
now returns a single data.frame in
output, or NULL
when no data found. The input taxa that
have no match at all are returned in an attribute with name
not_known
(#448)vascan_search()
changed callopts
parameter to ...
to pass in curl options to the
request.ipni_search()
changed callopts
parameter to ...
to pass in curl options to the request. In
addition, better http error handling, and added a test suite for this
function. (#458)stringsAsFactors=FALSE
now used for
gibf_parse()
(https://github.com/ropensci/taxize/commit/c0c4175d3a0b24d403f18c057258b67d3fbf17f0)get_uid()
to make more clear how to use the varoious parameters to get the desired
result, and how to avoid certain pitfalls (#436)asdf
from the function
eol_dataobjects()
- now returning data.frame’s only.get_eolid()
via
tryCatch()
to fail better when names not found.openssl
as a package dependency. Not needed
anymore because uBio dropped.gnr_resolve()
failed when no canonical form was
found.gnr_resolve()
when no results found when
best_match_only=TRUE
(#432)itisdf()
to give back an
empty data.frame when no results found, often with subspecific taxa.
Helps solve errors reported in use of downstream()
,
itis_downstream()
, and
gethierarchydownfromtsn()
(#459)gnr_resolve()
gains new parameter
with_canonical_ranks
(logical) to choose whether
infraspecific ranks are returned or not.iucn_id()
to get the IUCN ID for a taxon
from it’s name. (#431)ubio_classification()
,
ubio_classification_search()
, ubio_id()
,
ubio_search()
, ubio_synonyms()
,
get_ubioid()
, ubio_ping()
. In addition, ubio
has been removed as an option in the synonyms()
function,
and references for uBio have been removed from the
taxize_cite()
utility function. (#449)rankagg()
doesn’t depend on data.table
anymore (fixes issue with CRAN checks)RCurl::base64Decode()
with
openssl::base64_decode()
, needed for ubio_*()
functions (#447)importFrom
) used across
all imports now (#446). In addition, importFrom
for all
non-base R pkgs, including graphics
, methods
,
stats
and utils
packages (#441)query
parameter in GET()
,
but can pass NULL
(#445)gni_*()
functions, including code
tidying, some DRYing out, and ability to pass in curl options
(#444)taxize_cite()
classification()
where numeric IDs as
input got converted to itis ids just because they were numeric. Fixed
now. (#434)synonyms
function to get name synonyms. (#430)apgFamilies
and apgOrders
.
(#418)col_search()
gains parameters response
to
get a terse or full response, and ...
to pass in curl
options.eol_dataobjects()
gains parameter ...
to
pass in curl options, and parameter returntype
renamed to
asdf
(for “as data.frame”).ncb_get_taxon_summary()
gains parameter
...
to pass in curl options.children()
function gains the rows
parameter passed on to get_*()
functions, supported for
data sources ITIS and Catalogue of Life, but not for NCBI.upstream()
function gains the rows
parameter passed on to get_*()
functions, supported for
both data sources ITIS and Catalogue of Life.classification()
function gains the
rows
parameter passed on to get_*()
functions,
for all sources used in the function.downstream()
function gains the rows
parameter passed on to get_*()
functions, for all sources
used in the function.get_*()
) gain new parameters to help filter results (e.g.,
division
, phylum
, class
,
family
, parent
, rank
, etc.).
These parameters allow direct matching or regex filters (e.g.,
.a
to match any character followed by an a
).
(#410) (#385)get_*()
) now give back more information (mostly higher
taxonomic data) to help in the interactive decision process. (#327)synonyms()
function: Catalogue
of Life. (#430)vegan
package, used in class2tree()
function, moved from Imports to Suggests. (#392)taxize_cite()
a lot - get URLs and sometimes
citation information for data sources available in taxize. (#270)apg_lookup()
function. (#422)apg_families()
function.
(#418)callopts
parameter in eol_pages()
,
eol_search()
, gnr_resolve()
,
tp_accnames()
, tp_dist()
,
tp_search()
, tp_summary()
,
tp_synonyms()
, ubio_search()
changed to
...
accepted
parameter in get_tsn()
changed to
FALSE
by default. (#425)db
parameter in resolve()
changed to gnr
as tnrs
is often quite
slow.tpl_families()
and
tpl_get()
. (#424)ncbi_getbyname()
,
ncbi_getbyid()
, ncbi_search()
,
eol_invasive()
, gisd_isinvasive()
. These
functions are available in the traits
package. (#382)phylomatic_tree()
is deprecated, but will be defunct in
a upcoming version.taxize
. E.g., itis_ping()
pings ITIS and
returns a logical, indicating if the ITIS API is working or not. You can
also do a very basic test to see whether content returned matches what’s
expected. (#394)status_codes()
to get vector of HTTP
status codes. (#394)itis_ping()
, and all
*_ping()
functions.\donttest
into
\dontrun
.genbank2uid()
to get a NCBI taxonomic id
(i.e., a uid) from a either a GenBank accession number of GI
number. (#375)get_nbnid()
to get a UK National
Biodiversity Network taxonomic id (i.e., a nbnid). (#332)nbn_classification()
to get a taxonomic
classification for a UK National Biodiversity Network taxonomic id.
Using this new function, generic method classification()
gains method for nbnid
. (#332)nbn_synonyms()
to get taxonomic synonyms
for a UK National Biodiversity Network taxonomic id. Using this new
function, generic method synonyms()
gains method for
nbnid
. (#332)nbn_search()
to search for taxa in the UK
National Biodiversity Network. (#332)ncbi_children()
to get direct taxonomic
children for a NCBI taxonomic id. Using this new function, generic
method children()
gains method for ncbi
.
(#348) (#351) (#354)upstream()
to get taxa upstream of a
taxon. E.g., getting families upstream from a genus gets all families
within the one level higher up taxonomic class than family. (#343)as.*()
to coerce
numeric/alphanumeric codes to taxonomic identifiers for various
databases. There are methods on this function for each of itis, ncbi,
tropicos, gbif, nbn, bold, col, eol, and ubio. By default
as.*()
funtions make a quick check that the identifier is a
real one by making a GET request against the identifier URI - this can
be toggle off by setting check=FALSE
. There are methods for
returning itself, character, numeric, list, and data.frame. In addition,
if the as.*.data.frame()
function is used, a generic method
exists to coerce the data.frame
back to a identifier
object. (#362)get_tsn_()
(the underscore is the only different from the previous function name).
These functions don’t do the normal interactive process of prompts that
e.g., get_tsn()
do, but instead returned a list of all ids,
or a subset via the rows
parameter. (#237)ncbi_get_taxon_summary()
to get taxonomic
name and rank for 1 or more NCBI uid’s. (#348)assertthat
removed from package imports, replaced with
stopifnot()
, to reduce dependency load. (#387)eol_hierarchy()
now defunct (no longer available)
(#228) (#381)tp_classifcation()
now defunct (no longer available)
(#228) (#381)col_classification()
now defunct (no longer available)
(#228) (#381)?fxn-name
.get_*()
functions gain a new parameter
rows
to allow selection of particular rows. For example,
rows=1
to select the first row, or rows=1:3
to
select rows 1 through 3. (#347)classification()
now by default returns taxonomic
identifiers for each of the names. This can be toggled off by the
return_id=FALSE
. (#359) (#360)switch()
on the db
parameter, which helps give
better error message when a db
value is not possible or
spelled incorrectly. (#379)children()
, which is a single interface to
various data sources to get immediate children from a given taxonomic
name. (#304)bold_search()
that searches for taxa in the BOLD database of barcode data;
get_boldid()
to search for a BOLD taxon identifier.
(#301)get_ubioid()
to get a uBio taxon
identifier. (#318)taxize
:
taxize_cite()
. (#270)jsonlite
instead of RJSONIO
throughout the taxize
.get_ids()
gains new option to search for a uBio ID, in
addition to the others, itis, ncbi, eol, col, tropicos, and gbif.stripauthority
parameter
gnr_resolve()
. (#325)iplant_resolve()
now outputs data.frame structure
instead of a list. (#306)seqrange
in
ncbi_getbyname()
and ncbi_search()
(#328)synonyms()
gains new data source, can now get synonyms
from uBio data source (#319)vascan_search()
giving back more useful results
now.tnrs()
function, including more
meaningful error messages on failures (#323) (#331)getpublicationsfromtsn()
that caused
function to fail on data.frame’s with no data on name assignment
(#297)sci2comm()
that caused fxn to fail when
using db=itis
sometimes (#293)scrapenames()
. Sending a text blob via the
text
parameter now works.resolve()
so that function now works for all 3
data sources. (#337)iplant_resolve()
to do name resolution
using the iPlant name resolution service. Note, this is different from
http://taxosaurus.org/ that is wrapped in the tnrs()
function.ipni_search()
to search for names in the
International Plant Names Index (IPNI).resolve()
that unifies name resolution
services from iPlant’s name resolution service (via
iplant_resolve()
), Taxosaurus’ TNRS (via
tnrs()
), and GNR’s name resolution service (via
gnr_resolve()
).get_*()
functions how returning a new uri
attribute that is a link to the taxon on on the web. If NA is given back
(e.g. nothing found), the uri attribute is blank. You can go directly to
the uri in your default browser by doing, for example:
browseURL(attr(result, "uri"))
.get_eolid()
now returns an attribute provider
because EOL collates taxonomic data form a lot of sources, then gives
back IDs that are internal EOL ids, not those matching the id of the
source they pull from. This should help with provenance, and should help
if there is confusion about why the id givenb back by this function does
not match that from the original source.get_tsn()
function, now using the function
itis_terms()
, which gives back the accepted status of the
taxa. This allows a new parameter in the function
(accepted
, logical) that allows user to say give back only
accepted status names (accepted=TRUE
), or to give back all
names (accepted=FALSE
).gnr_resolve()
gains two new parameters
best_match_only
(logical, to return best match only) and
preferred_data_sources
(to return preferred data sources)
and callopts
to pass in curl options.tnrs()
, tp_accnames()
,
tp_refs()
, tp_summary()
, and
tp_synonyms()
gain new parameter callopts
to
pass in curl options.class2tree()
can now handle NA in classification
objects.classification.eolid()
and
classification.colid()
now return the submitted name along
with the classification.plyr
functions, see #275.verbose
parameter to many more functions to
allow suppression of help messages.httr
, now manually parsing
JSON to a list then to another data format instead of allowing internal
httr
parsing - in addition added checks on content type and
encoding in many functions.match.arg
iternally to get_ids()
for
the db
parameter so that a) unique short abbreviations of
possible values are possible, and b) gives a meaningful warning if
unsupported values are given.getexpertsfromtsn
,
getgeographicdivisionsfromtsn
) gain parameter
curlopts
to pass in curl options.stringsAsFactors=FALSE
to all
data.frame
creations to eliminate factor variables.classification.gbifid()
did not return the correct
result when taxon not found.classification()
used to fail when it was passed a
subset of a vector of ids, in which case the class information was
stripped off. Now works (#284)