rolap
PackageOnce we developed a star database in R using the rolap
package, in addition to exporting it to exploit it with other tools, we
can perform multidimensional queries from R: The rolap
package offers the possibility of formulating and running simple queries
on a multidimensional schema.
The main objective of this document is to show the multidimensional query formulation and execution functionality offered by this package. First, the data model is briefly discussed: the possibility of defining stars and constellations. Next, the functions defined to support multidimensional queries are presented. Finally, finish with the conclusions.
Strictly speaking, a star is composed of a fact table and
several associated dimension tables. A constellation is made up
of several stars that can share dimensions. In the rolap
package they are treated in a unified way under the
star_database
class: It is used both to define stars and
constellations.
The variable mrs_db
, obtained in the vignette titled
Obtaining and transforming flat tables,
vignette("v05-flat-table-op")
, contains an object of class
star_database
that we will use in the example.
We can see a representation of the tables it contains using the
draw_tables()
function, as shown below.
We can see that it is a constellation because it contains more than one fact table.
A query is defined on a star_database
object and the
result of executing it is another star_database
object.
This section presents the functions available to define queries.
star_query()
From a star_database
object, an empty
star_query
object is created where we can select fact
measures, dimension attributes and filter dimension rows.
Example:
At least one fact table with one dimension must be included in each query.
select_fact()
To define the fact table to be consulted, its name is indicated, optionally, a vector of names of selected measures and another of aggregation functions are also indicated. If the name of any of the measures is not indicated, the measure corresponding to the number of rows added will be included, which is always included. If no aggregation function is included, those defined for the measures are considered.
Examples:
The measure is considered with the indicated aggregation function. In addition, the measure corresponding to the number of grouped records that make up the result is automatically included.
The measure is considered with the aggregation function defined in the multidimensional scheme.
Only the measure corresponding to the number of grouped records is included.
sq_4 <- sq |>
select_fact(name = "mrs_age",
measures = "all_deaths") |>
select_fact(name = "mrs_cause")
In a query we can select several fact tables, at least we have to select one.
select_dimension()
To include a dimension in a star_query
object, we have
to define its name and a subset of the dimension attributes. If only the
name of the dimension is indicated, it is considered that all its
attributes should be added.
Example:
Only the indicated attributes of the dimension will be included.
All attributes of the dimension will be included.
filter_dimension()
Allows us to define selection conditions for dimension rows.
Conditions can be defined on any attribute of the dimension, not only on
attributes selected in the query for the dimension. They can also be
defined on unselected dimensions. Filtering is done using the function
dplyr::filter()
. Conditions are defined in exactly the same
way as in that function.
Example:
run_query()
Once we have selected the facts, dimensions and defined the conditions on the instances of dimensions, we can execute the query to obtain the result.
The query can be executed on any star_database
object
that has in its structure the elements that appear in it.
Example:
sq <- star_query(mrs_db) |>
select_dimension(name = "where",
attributes = c("region", "state")) |>
select_dimension(name = "when",
attributes = "year") |>
select_fact(name = "mrs_age",
measures = "all_deaths") |>
select_fact(name = "mrs_cause",
measures = "all_deaths") |>
filter_dimension(name = "when", week <= " 3" & year >= "2010")
mrs_db_2 <- mrs_db |>
run_query(sq)
class(mrs_db_2)
#> [1] "star_database"
The result of running a query is an object of the
star_database
class that meets the conditions defined in
the query: Other queries can continue to be defined on this object.
We can see a representation of the tables of the result, as shown below.
This section shows an example of how to exploit the result of the multidimensional query.
The first thing we do is transform it into flat tables.
Below are the rows of from one of the result tables.
year | region | state | all_deaths | nrow_agg |
---|---|---|---|---|
2010 | 2 | NJ | 1 | 1 |
2010 | 5 | VA | 50 | 5 |
2011 | 3 | IN | 65 | 4 |
2011 | 4 | MO | 115 | 5 |
2011 | 6 | AL | 165 | 5 |
2011 | 9 | CA | 44 | 5 |
2012 | 8 | NM | 122 | 5 |
2013 | 2 | NY | 23 | 5 |
2013 | 9 | WA | 72 | 5 |
2014 | 3 | OH | 327 | 5 |
2014 | 6 | TN | 70 | 5 |
2015 | 5 | FL | 175 | 5 |
2016 | 1 | MA | 20 | 5 |
2016 | 2 | NY | 74 | 10 |
2016 | 6 | KY | 98 | 5 |
2016 | 8 | CO | 74 | 10 |
2016 | 9 | CA | 14 | 5 |
From the results in the form of flat tables, pivottabler
package can be used to present it in the form of pivot tables.
This document presents some of the querying possibilities that offers
the rolap
package. The queries are formulated on an object
of class star_database
and the result is another object of
the same class on which additional queries can be made.
Queries can be formulated about a star or set of stars or constellation.