S2.2: The Quantified Self

Learning Outcomes

Understand

What self tracking and self quantification is depends a lot on the social context in which it occurs
Tracking and monitoring is a classic case of an activity that has anyficiaries – those who benefit and those who will be damaged.
All data needs to be checked for reliability and validity.
Remember
quantified self
differential privacy
cultural probe
ecological momentary assessment
reliability
validity
representativeness

Apply

Use Ablative Framework and Cockton’s A_tefacts framework to think through real world design problems .

The Quantified Self

Trackers focused on their health want to ensure that their medical practitioners don’t miss the particulars of their condition; trackers who record their mental states are often trying to find their own way to personal fulfillment amid the seductions of marketing and the errors of common opinion; fitness trackers are trying to tune their training regimes to their own body types and competitive goals, but they are also looking to understand their strengths and weaknesses, to uncover potential they didn’t know they had. Self-tracking, in this way, is not really a tool of optimization but of discovery, and if tracking regimes that we would once have thought bizarre are becoming normal, one of the most interesting effects may be to make us re-evaluate what “normal” means.

(Gary Wolf, “The Data Driven Life”)

The Bietz et al. paper emphasise a concept from Chapter 1 of Neff/Nafus: people appropriate trackers to gather evidence of what works for them. This is also referred to as life hacking in the technology community. Generic recommendations, such as walking 10000 steps per day, are derived from large, population based studies, and tracking your food and exercise is also a mechanism that is often recommended. However, for some people, these recommendations are unsuitable, or even harmful – the authors cite the examples of the woman who’d had knee surgery, and the woman recovering from anorexia who campaigned to have the Health app removed from her iPhone.

Social Aspects of Tracking

Deborah Lupton usefully distinguishes between communal tracking, where people donate their own data for the greater good, pushed tracking, where people are given incentives to track their own health (these can be social or economic), and imposed tracking, where activity tracking becomes a prerequisite for a service.

Activity: What are the ethical concerns around each of these types of tracking? Who are the anyficiaries?

Complexities of Tracking

Tracking is something that people are routinely asked to do in medicine (key physiological parameters like blood pressure, or food intake), but in practice, people often don’t track reliably or consistently. Food diaries (or diaries in general) are a particular problem.

Activity: Try searching PubMed for papers about the reliability or consistency of food diaries / food intake self report. What papers do you find? What do they say?

For doctors, data from activity tracking are a double-edged sword. While they provide potentially more information about how a patient is actually doing, doctors need to

have time to engage with data
have skills and infrastructure to engage with data
understand the information
know how to use it for decision making
ensure that they store any such patient data
engage with the information patients receive from their apps
calm down patients who are unduly alarmed by their app
educate patients who are unduly reassured by their app

Having access to one’s data is not a magic solution to all your problems. More often than not, interpreting these data requires skill, training, and experience, and the data you can collect are not necessarily the data you need. People need to be supported through diagnoses and setbacks, encouraged, and coached to ensure they can take the required actions. In Neff/Nafus (iBooks 398), this is not a problem, but an opportunity – doctor and patient can work together to reach a deeper understanding of what is going on with the patient, using their data, and judicious self-experimentation.

Activity: The full Student Life data set is accessible through a dedicated web site, which also shows you some of the analyses that have been completed. Group the data obtained through Student Life according to invasiveness, privacy concerns, ease of obtaining the data, and amount of work involved in entering the data.

Activity: What can we design from, with, and by Student Life data? What is the purpose, what would be the anyficiaries, how can we evaluate our solution?

Reliability and Validity

Reliability

Reliability has many definitions across disciplines. Reliable data are trustworthy, they are authentic, they are internally consistent. A reliable method can be easily replicated by different groups.

For surveys, reliability means that the same questionnaire completed by the same person under similar circumstances should yield almost the same results, if the survey is meant to measure an aspect of what the person does, thinks, believes, or feels. People’s responses can vary a little from day to day, but they should not vary a lot, unless the circumstances are radically different. Also, questions that aim to uncover similar information should yield similar results.

Example: The Big Five.

One of the main theories of personality claims that there are five main components to every person’s personality, which are called the Big Five: Openness to Experience, Conscientiousness, Agreeableness, Extraversion, and Neuroticism. When you take one of the well-validated surveys a couple of times, you should get similar ratings on each of the five traits every time you take it.

Example: The PANAS Positive and Negative Affect Scale

The PANAS is a useful scale for assessing how somebody is feeling at the moment, or how they have been feeling over the past days, weeks, or months. The PANAS consists of a list of positive and negative adjectives. People’s responses to the positive adjectives are strongly correlated with each other, and their responses to the negative adjectives are strongly correlated as well. This means that we can use the responses for positive adjectives to derive a single score for positive affect, and likewise for the negative adjectives.

Example: The speed of EUCLID

When assessing the efficiency of a system, it is tempting to just ask users how fast or slow they think a system is (maybe even on a scale from 1 to 5). However, this does not tell us anything about the actual speed of a system. It only tells us about the perceived speed in the context in which users access EUCLID. Thus, it is not a valid measure of actual system speed.

For usability measures, this means that if the same person performs the same tests under the same circumstances, the results should be broadly similar. This means that we need to be very clear about the methods we used for our measurements, and about the circumstances under which we obtained them.

Example: Speed of EUCLID

The speed with which people can use EUCLID depends on a person’s experience with EUCLID, the computer they are using, the speed of the network they are on, and the speed of the back end – the database of students, courses, and rules. If they are inexperienced, they will spend time figuring out which courses are listed in which section. If they are on a slow Internet connection, the web pages will take longer to load. Some things can be measured directly (current connection speed), some can only be inferred (speed of the various back end components).

Validity

Validity means that the data are relevant for the research questions that you are interested in, and the concepts that you are working with.

Example 1: The Big Five

One of the big criticisms of the Big Five is that they are nothing but sets of statements that are generally answered in the same way, they lack a solid theoretical basis, they don’t cover all aspects of human personality, and the way in which traits affect human behaviour vary greatly between cultures.

Example 2: The PANAS

The adjectives in the PANAS work best for native speakers of American English, because this is the population on which the PANAS has been originally tested. To make sure that responses by people who are not native speakers of English are reliable, the measure has been translated into different languages. An intercultural version of the English PANAS has been created, as well.

Example 3: The speed of EUCLID

Representativeness

Representativeness means that the situations, users, products, or behaviours you study are typical of the entire group. Many of the studies that you see in the literature are not representative. In fact, none of the surveys or questionnaires you will do in your time at University will be representative, and if you claim that your study is representative, you will be expected to explain how you established that it is representative. The typical way of doing this is to define a target user group, because it’s relatively simple to establish their demographics, and then investigate a representative sample of users to establish common behaviours. There is also plenty of market research that will allow you to find representative products.

Example: Activity Tracker wrist band

Representative of the situations = covers all or most of the situations in which the wrist band will be used by its intended users.

Representative of products = covers at least the main wrist bands (maybe specified in terms of market share) available to the target population

Representative of users = covers a sample with the same demographics and the same jobs and living situations as the target users

Representative of behaviours = covers a sample of people who exhibit all or most of the usage patterns seen for the wrist band

Key concepts

Differential privacy is fundamental to sharing your data. It means that it should not be possible to identify you from the data you have provided. Differential privacy can be modelled formally / mathematically.

Ecological Momentary Assessment is a form of monitoring that is used to study what people feel, think, and experience, and how they behave, in real-world contexts. It is used to bridge the gap between self-reported data and more neutral observations by others, and the gap between how people behave in laboratory studies versus the real world.

This is a case study of how EMA works:

Burke, L. E., Shiffman, S., Music, E., Styn, M. A., Kriska, A., Smailagic, A., … Rathbun, S. L. (2017). Ecological Momentary Assessment in Behavioral Research: Addressing Technological and Human Participant Challenges. Journal of Medical Internet Research, 19(3), e77. http://doi.org/10.2196/jmir.7138

Cultural Probes are far less formal than Ecological Momentary Assessment. They are a design technique for learning more about how people think, behave, and live. Participants receive a kit that they can use to document their experiences. This can range from a simple diary to materials for creating art, from cameras to sound recorders.

Introduction from Designresearchtechniques.com