Introduction
Track expression allows to retrieve numerical data that is recorded in the tracks. Track expressions are widely used in various functions (emr_screen
, emr_extract
, emr_dist
, ...).
Track expression is a character string that closely resembles a valid R/Python expression. Just like any other R/Python expression it may include conditions, function calls and variables defined beforehand. "1 > 2"
, "mean(1:10)"
and "myvar < 17"
are all valid track expressions. Unlike regular R/Python expressions track expression might also contain track names and / or virtual track names.
To understand how the track expression allows the access to the tracks we must explain how the track expression gets evaluated.
Every track expression is accompanied by an iterator that produces a set of id-time points of (id, time, ref)
type. For each each iterator point the track expression is evaluated. The value of the track expression "mean(1:10)"
is constant regardless the iterator point. However the track expression might contain a track name mytrack
, like: "mytrack * 3"
. Naryn recognizes then that mytrack
is not a regular R/Python variable but rather a track name. A new run-time track variable named mytrack
is added then to R environment (or Python module local dictionary). For each iterator point this variable is assigned the value of the track that matches (id, time, ref)
(or NaN if no matching value exists in the track). Once mytrack
is assigned the corresponding value, the track expression is evaluated in R/Python.
Run-time Track Variable is a Vector
To boost the performance of the track expression evaluation, run-time track variables are actually defined as vectors in R rather than scalars. The result of the evaluation is expected to be also a vector of a similar size. One should always keep in his mind the vectorial notation and write the track expressions accordingly.
For example, at first glance a track expression "min(mytrack, 10)"
seems to be perfectly fine. However the evaluation of this expression produces always a scalar, i.e. a single number even if mytrack
is actually a vector. The way to correct the specific track expression so that it works on vectors, is to use pmin
function instead of min
.
Python
Similarly to R, a track variable in Python is not a scalar but rather an instance of numpy.ndarray
. The evaluation of a track expression must therefore produce a numpy.ndarray
as well. Various operations on numpy arrays indeed work the same way as with scalars, however logical operations require different syntax. For instance:
screen("mytrack1 > 1 and mytrack2 < 2", iterator = "mytrack1")
will produce an error given that mytrack1
and mytrack2
are numpy arrays. The correct way to write the expression is:
screen("(mytrack1 > 1) & (mytrack2 < 2)", iterator="mytrack1")
One may coerce the track variable to behave like a scalar: by setting emr_eval.buf.size
option to 1
(see Appendix for more details). Beware though that this might take its heavy toll on run-time.
Matching Reference in the Track Expression
If the track expression contains a track (or virtual track) name, then the values from the track are fetched one-by-one into the identically named R variable based on id
, time
and ref
of the iterator point. If however ref
of the iterator point equals to -1
, we treat it as a "wildcard": matching is required then only for id
and time
.
"Wildcard" reference in the iterator might create a new issue: more than one track value might match then a single iterator point. In this case the value placed in the track variable (e.g. mytrack
) depends on the type of the track. If the track is categorical the track variable is set to -1
, otherwise it is set to the average of all matching values.
Virtual Tracks
So far we have shown that in some situations mytrack
variable can be set to the average of the matching track values. But what if we do not want to average the values but rather pick up the maximal, minimal or median value? What if we want to use the percentile of a track value rather than the value itself? And maybe we even want to alter the time of the iterator point: shift it or expand to a time window and by that look at the different set of track values? For instance: given an iterator point we might want to know what was the maximal level of glucose during the last year that preceded the time of the point.
This is where virtual tracks come in use.
Virtual track is a named set of rules that describe how the track should be proceeded, and how the time of the iterator point should be modified. Virtual tracks are created by emr_vtrack.create
function:
emr_vtrack.create("annual_glucose",
src = "glucose_track", func = "quantile",
param = 0.5, time.shift = c(-year(), 0)
)
This call creates a new virtual track named annual_glucose
based on the underlying physical source track glucose_track
. For each iterator point with time T
we look at values of glucose_track
in the time window of [T-365*24,T]
, i.e. one year prior to T
. We calculate then the median over the values (func="quantile"
, param=0.5
).
There is a rich set of various functions besides "quantile" that can be applied to the track values. Some of these functions can be used only with categorical tracks, other ones - only with quantitative tracks and some functions can be applied to both types of the track. Please refer the documentation of emr_vtrack.create
.
Once a virtual track is created it can be used in a track expression:
emr_extract("annual_glucose", iterator = list(year(), "patients.dob"))
This would give us a median of an annual glucose level in year-steps starting from the patient's birthday. (This example makes use of an Extended Beat Iterator that would be explained later.)
Let's expand our example further and ignore in our calculations the glucose readings that had been made within a week after steroids had been prescribed. We can use an additional filter
parameter to do that.
emr_filter.create("steroids_filter", "steroids_track", time.shift=c(-week(), 0))
emr_vtrack.create("annual_glucose",
src = "glucose_track", func = "quantile",
param = 0.5, time.shift = c(-year(), 0), filter = "!steroids_filter"
)
emr_extract("annual_glucose", iterator = list(year(), "date_of_birth_track"))
Filter is applied to the ID-Time points of the source track (e.g. glucose_track
in our example). The virtual track function (quantile
, ...) is applied then only to the points that pass the filter. The concept of filters is explained extensively in a separate chapter.
Virtual tracks allow also to remap the patient ids. This is done via id.map
parameter which accepts a data frame that defines the id mapping. Remapping ids might be useful if family ties are explored. For example, instead of glucose level of the patient we are interested to check the glucose level of one of his family members.