Motivation behind this example
I was diagnosed with sleep apnea last year, and have to use a continuous positive airway pressure (CPAP) machine to sleep well enough to feel alert during the day. The machine uploads data (via cellular connection) to a website that will give me results for the last two weeks. This data includes both usage (time of usage, air leakage, number of times mask was put on/taken off), and results (apnea-hypopnea index, which is an average of the number of times per hour that slow or no breathing occurred for at least 10 seconds). The website only displays results from the last two weeks, and I’d like to eventually do a long-term analysis. I’d also like to have things displayed my own way, because, well, I’m like that.
I could enter this information in a spreadsheet, and for import into R or other statistical software that might be the sensible thing to do. However, by having this data in context of other diary entries and text surround it I get to see this data in context of other things going on in my life. This information does not exist in a vacuum, and is important context for other things. For instance, if I’m dealing with a particularly stressful situation, it would be nice to go back and see how I dealt with that in the context of how my sleeping is going (and vice versa - does the apnea get better or worse during that time?). Another issue is that I’m dealing with migraines, and I’d like to know something about the frequency and severity in the context of sleep.
Methodology for data collection
This personal data collection exercise uses an excellent piece of software specifically for journaling called The Journal. I’ve been using The Journal since 2007 to record events and just simply jog my memory of goings on in my life. The software has a few nifty features that dovetail nicely with data collection.
Daily entries
The Journal splits writing up into categories. Categories can be either loose-leaf (where entries can be organized hierarchically any way you want) and daily (where entries are organized by the date of entry). If you set it up a certain way, you can have the Journal lock entries on every day except for the day you are working on. It can also automatically create an entry for the day you are working on. Very handy for just daily jouraling in general.
Topics
Topics are tags for specific pieces of text or entries. If you select a piece of text and tag it with a topic (say, CPAP), you can extract that piece of text later. Couple this with the Search by Topic command, and you can extract all text tagged with a certain topic into one document and save a single document with all text from that topic. So, for example, I will tag all my CPAP writings with the CPAP topic, and later on save a text file with what I have written about CPAP therapy (in this case, the data I collected).
Templates
The Journal has a sophisticated template system that can insert not only the same text over and over, but tag it automatically with a certain topic and even fill in certain data such as the current date and time. I use the template feature to create some structured text (a data entry form of sorts) and tag the whole piece of inserted text with the CPAP topic. That way, I don’t have to bother with selecting and tagging manually. I can simply insert the text and fill in the numbers when I read the website.
The template looks like this:
Sleep numbers for <ENTRYDATE format=“mm/dd/yyyy”/>
* Usage:
* Leakage: L/min
* AHI: events/h
* Mask on/off:
* MyAir score:
* Comments:
Because the text follows the same structure for all such entries, it is easy to write R code to pull out the data and make a data.frame
.
What you don’t see (and is hard to show here) is that in the template itself I selected all of the text and tagged it CPAP. That way, my CPAP entries will always be tagged, and I can easily extract them later.
Methodology for analysis
The first part of data extraction is in The Journal. I use a saved search from the Search Entries by Topic function, then click View All Result Entries to see the text I had entered. The result is a screen showing the last 100 pieces of text I tagged CPAP (which may include other pieces of text if I felt the need to write on the topic). I can change this with an option. Clicking Save to File will allow me to save to a Journal file, and RTF, or a TXT file. I save the result to a TXT file so that I can easily read it in R. The text file contains only the data I entered for the CPAP machine, as well as any other text I tagged (which is fairly uncommon).
Data import
This is where I pay the price for putting the data in a diary rather than a tabular format. I use readlines
.