How I gather data

perdalum, May 7, 2020

I have been collecting data for many decades, just because it is fun. Oh well, also during my years as a master student of chemistry and NMR, but that was not for fun. Even though it actually was fun.

I have used paper and pencil, Excel sheets, Google Sheets, CSV files, e-mails, IFTTT, Wolfram Data Drop, and so on. And a Varian 400 spectrometer

Excel got ditched a long time ago.

I have used the canned solutions from Google and Wolfram Inc, especially the Wolfram Data Drop: Universal Data Accumulator, but they kind of either cost money or something worse.

Therefore, I have been contemplating how to make a flexible, scalable and easy to maintain system.

This text describes, how I do it now.

Data Storage

Having realized that a very simple data structure, could actually be very flexible, I decided to brush up on my PHP skills. The plan was to emulate the idea behind the Wolfram Data Drop, i.e. have a web API that can receive data, store data, and give access to the data.

I ended up with this simple API:

http://<server>/drop.php?token=<token>&dt=<observation date>&v=<observation>

The token has two functions: 1) to ensure that the request comes from an approved application and 2) to store the data in the correct data store.

dt is the date of the observation, and is stored as written. No interpretation is performed. The value can therefore be anything.

v is the actual value, which is also stored as is, and no interpretation is performed either.

The complete script is like this

include 'tokens.php';

if(empty($_GET['token']) || empty($_GET['dt']) || empty($_GET['v'])) {
	header("HTTP/1.1 "."400");
	echo ("Ups. Missing some values.");
} else {
	$now = date('c');
	$token = $_GET['token'];

	if( !in_array($token, $tokens) ) {
		header("HTTP/1.1 "."400");
    	echo(" Invalid token!");
	} else {
		header("HTTP/1.1 "."200");
		$file_name = $datapath + "/data"."_".$token.".csv";
		$fp = fopen($file_name, 'a');
		fwrite($fp, '"'.$now.'"'."; ".'"'.$_GET['dt'].'"'."; ".$_GET['v']."\n");
		$response['status'] = "200";
		$json_response = json_encode($response);
		echo $json_response;

Beside the given date and observation, also the time of computation is stored in the data file.

The tokens.php file defines the valid tokens and the path to the data store. The file could look like

$tokens = array("weight", "runs", "laptopbattery");
$datapath = "datastore";

Those two files are actual all it takes to mimic enough of the Wolfram Data Drop functionality to serve my purpose. It is actual better, as I now have complete control over the data, and it can grow almost indefinitely , both in size and time.

Still, I am sure, that the security could be better than relying on a simple token and obfuscation. But, as I don't store sensitive data, this security level is adequate for now.

Reading the data is a easy as curl http://<server>/<datastore>/data_<token>.csv'. I have disabled directory listing of http://<server>/datastore/ so the data cannot be browsed or discovered.

Getting the data to the data store

For data collection, I now use iOS Shortcuts. I have a Shortcut for each data collection type, e.g. morning weight.

All Shortcuts are alike, differing only by the token and the observation type. The morning weight Shortcut looks like this


So, every morning I step on the weight, swipe right on my phone, taps the Shortcut and enters the observation. As an added bonus, the Shortcut above has on more step, as the observation is send on into the iOS Health app. The next step would be to learn Siri to understand this Shortcut, so I could just say out loud "Hey Siri, record my weight at 77,5 kg". Or even just raise my wrist and utter "I now weigh 77,5 kg".

Visualizing the data

As mentioned, I have used both R and the Wolfram Language/Mathematica for my data visualizations. At the moment, I use R, or actually the Tidyverse.

At the moment I have a cron job that each morning runs this fairly simple visualization

#!/usr/bin/env Rscript

library(ggplot2, quietly=TRUE)
library(dplyr, quietly = TRUE)
library(readr, quietly = TRUE)
library(ggrepel, quietly = TRUE)
library(lubridate, quietly = TRUE)
library(stringr, quietly = TRUE)

data <- read_csv2(
  col_types = cols(
    col_datetime(format = ""),
    col_datetime(format = "%Y-%m-%dT%H:%M:%S %z"),
  col_names = c("datetime", "obs_time", "obs")

last_obs <- filter(data,row_number() == n())

plot <-ggplot(
  data %>% 
    filter(datetime > today() - months(3))
) +
    x = datetime,
    y = obs,
    alpha = 0.2
  ), show.legend = FALSE) +
    x = datetime,
    y = obs
  )) +
  # Make the last point red #
    data = last_obs,
      x = datetime,
      y = obs
    colour = "red"
  ) +
  theme(axis.title.x=element_blank()) +
  ylab("vægt [kg]") +
  ylim(76,81) +
    "Min vægt",
    str_c("Diagram beregnet ", now())

  width = 16,
  height = 12,
  units = "cm",
  bg = "transparent"

Which produces this rather standard themed ggplot


What do I measure

Apart from my weight, I also keep tap on how often and by what amount I need to adjust an old mechanical watch and the evening battery level of my Apple Watch.

To add a new data drop, I just need to log onto my webserver and add a token to the tokens array in the tokens.php file. This is so easy that I can even do that on my phone while running for the bus 😎