These are convenience functions to handle interaction with SQLite or mySQL/mariaDB. They open one and only one database, and handle most of the interaction therewith for you.
You will probably first use apop_text_to_db to pull data into the database, then apop_query to clean the data in the database, and finally apop_query_to_data to pull some subset of the data out for analysis.
printf
form. For example: See the Database moments (plus pow()!) section below for not-SQL-standard math functions that you can use when sending queries from Apophenia, such as pow
, stddev
, or sqrt
.
.output_type='d'
, this prints your apop_data set to the database. apop_
, so the reader is advised to not generate tables with such names, and is free to ignore or delete any such tables that turn up. attach database
. By default with SQLite, Apophenia opens an in-memory database handle. It is a sensible workflow to use the faster in-memory database as the primary database, and then attach an on-disk database to read in data and write final output tables.See the print functions at Legible output. E.g.
A few functions have proven to be useful enough to be worth breaking out into their own programs, for use in scripts or other data analysis from the command line:
apop_text_to_db
command line utility is a wrapper for the apop_text_to_db command. apop_db_to_crosstab
function is a wrapper for the apop_db_to_crosstab function.SQLite lets users define new functions for use in queries, and Apophenia uses this facility to define a few common functions.
select ran() from table
will produce a new random number between zero and one for every row of the input table, using gsl_rng_uniform
.count(x)
and avg(x)
aggregators, but statisticians are usually interested in higher moments as well—at least the variance. Therefore, SQL queries using the Apophenia library may include any of these moments:var
and variance
; kurt
and kurtosis
do the same thing; choose the one that sounds better to you. Kurtosis is the fourth central moment by itself, not adjusted by subtracting three or dividing by variance squared. var
, var_samp
, stddev
and stddev_samp
give sample variance/standard deviation; variance
, var_pop
, std
and stddev_pop
give population standard deviation. The plethora of variants are for mySQL compatibility.
sqrt(x)
, pow(x,y)
, exp(x)
, log(x)
, and trig functions. They call the standard math library function of the same name to calculate ran()
function calls gsl_rng_uniform
to produce a uniform draw between zero and one. It uses the stock of RNGs from apop_rng_get_thread.Here is a test script using many of the above.