Summary: Every day, people interact with large amounts of digital data: files, photos, texts, apps, and so on. Storage space, whether on hard drives or in the cloud, is cheap. But how do people decide what data to keep or discard over the long term? To find out, I interviewed 23 individuals and asked them about their data. I identified a spectrum of behaviors with two extremes: hoarding (keeping most data) and minimalism (getting rid of as much data as possible).

Methods: Interviews, thematic analysis.

My role: This study was the starting point of my PhD research. It lasted 4 months and gave me a chance to focus on qualitative data analysis. I was the lead investigator: I designed the study, choosing all the methods. I recruited participants and conducted all the interviews. I analyzed the data with the help of my co-authors (Izabelle Janzen & Joanna McGrenere). I wrote the final paper based on the project (that won a best paper award at CHI 2018).

When & where: UBC, 2017.

Publication: Francesco Vitale, Izabelle Janzen, and Joanna McGrenere. 2018. Hoarding and Minimalism: Tendencies in Digital Data Preservation. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ‘18).

· · ·

It all started with backups

When I studied how people prepare for an operating system upgrade, I was surprised to see that many didn’t do a backup beforehand. So I decided to investigate how people take care of their digital stuff over time and make sure not to lose it. I wanted to understand whether backups are still a thing in today’s world. I am always reminded of an old episode of Sex and the City in which Carrie loses her data because she never backed up. 1It is season 4, episode 8: “My Motherboard, Myself” aired in 2000. Here are some funny scenes from it.  It might be easy to blame her, but who hasn’t lost some data by accident?

Still from an episode of Sex and the City
Carrie Bradshaw learns about backups in an old episode of Sex and the City. I started the project by wanting to investigate backups and data loss.

Exploring data practices

I decided to set up an exploratory interview study. I would ask people to bring their devices and talk about their data. What they consider their data to be. How they manage their digital things. What they consider important. What they would miss if they lost their devices, and so on. My initial focus was on backups, but the interviews were semi-structured. I had a set of questions and topics to cover, but I followed the lead of participants. I probed into their answers and we ended up talking about related but unexpected topics. In one section, I asked participants to sketch a chronology of their data to see how it moved from device to device.

I also decided to analyze the interviews as soon as possible, instead of waiting until the end. After the first batch of interviews, data collection and analysis went on in parallel. This approach was key to identify the main contribution of the study. 2In the paper, we refer to the Braun and Clarke approach to thematic analysis, even though the idea of having data collection and analysis in parallel is closer to grounded theory. This might seem confusing. But as Braun and Clarke explain, their approach to thematic analysis shares some aspects with constructivist grounded theory procedures. Of course, grounded theory is a full methodology and thematic analysis is a method. But the two have some clear overlaps.

Handwritten sketches of data histories from participants
I asked participants to sketch the history of their data over the years to see what they had kept or discarded.

Identifying minimalism (and hoarding)

In the middle of the study, my focus shifted from backups to a broader concept. I met a participant with a specific approach in deciding what data to keep or discard. On the participant’s laptop, there was one main folder, called “Life.” No cloud, no smartphone, only one main folder. Everything that mattered was in there, pretty organized in sub-folders. The participant summarised the approach saying:

“It’s very minimal. I try to delete everything that I don’t need as fast as I can.”

What struck me was the way this approach differed from what I had seen up until that point. For example, in the interview before, another participant explained an opposite approach:

“I’m a bit of a hoarder. I just keep all the stuff and nothing ever goes away.”

So I kept interviewing people, now alert to the idea of hoarding and minimalism. And it started to become more clear that these two labels can represent two extremes. I never mentioned them in my questions. But in the following interviews, more participants used them to describe their behaviors.

Illustration of a computer folder called "Life"
One participant had a minimal approach in deciding what data to keep. A single main folder called “Life” contained everything important.

Going beyond the binary

Another important insight was the nuanced nature of these behaviors. At first, I thought it was two opposite categories, a binary of sorts. But by iterating on the analysis, I realized it was more complex than that. Participants had a general approach, but they also discussed interesting exceptions. This made it hard to identify them as either “hoarders” or “minimalists.” For example, one participant who self-described as a strong minimalist showed me a large collection of New Yorker articles. Every new issue of the magazine downloaded, with the best articles saved for the future.

A collection is not the best fit under the label “hoarding”. 3I’ll clarify that we do not talk of hoarding as a disorder. Instead, we see it as an everyday tendency. In this sense, we re-define the word for a different, non-medical context. But saving a large number of articles (available online) seemed to contradict the otherwise minimalist approach. Once again, having an open-ended approach in the interviews proved an advantage. Participants could talk about what they considered to be their own data. They were able to discuss what mattered to them most and highlight their own idiosyncrasies. 4I also want to acknowledge and thank the peer-reviewers of the paper. They pushed us to be explicit in framing these behaviors as a spectrum.

Examples of New Yorker magazine covers
An exception to an otherwise minimalist approach: a collection of “New Yorker” articles.

Where do we go from here

The key implication of the study is that technology should accommodate these tendencies. To do that, we need data management tools that take a long-term perspective. There have been some efforts in this direction recently. In 2016, macOS introduced a storage panel to explore unused files. At the end of 2017, Google released “Files Go,” an Android application that recommends what data to delete to free up space.

These examples are good first steps. But it’s not clear how they fit people’s needs and practices, or how popular they are. At the end of the day, deciding what digital data to keep or discard is a difficult and complex task. Technology can help us decide, but we need better tools.

Screenshot of macOS detailed storage panel
macOS recently introduced a feature to “reduce clutter” by reviewing large files.

Reflections and learnings

I started this project after taking a graduate course in qualitative research methods. That course changed my approach to research. I learned about epistemology, ontology, and how they inform the research process. Here I was able to use many of those concepts to better frame my work and guide my decisions. Throughout the project, I learned to be flexible and open. To reflect on, discuss, and evolve my understanding after each new insight. To position myself as an active part of the research process. All this made me realize that qualitative (or better, interpretive) research is what I lean to. And that I‘ll always find learning about people’s habits fascinating.

· · ·

In the CHI 2018 paper, you will find more details about the methodology, the role of these tendencies for identity construction, and the relation to previous research. If you wanna know even more about how people manage their digital data, I recommend “Data Narratives: uncovering tensions in personal data management” by Janet Vertesi and colleagues, from CSCW 2016.