Fish in a Tree Fish Differentiating data in digital biome classification

Fish Belong in the Sea: Differentiating Data in the Digital Biome

In biological ecosystems, organisms flourish in favorable environments and languish in unfavorable ones. Coral is suited for life in tropical waters not the desert. Different kinds of data also require different conditions to thrive.

As the data-driven economy matures, the forces shaping it are differentiating data (and by extension content) into more refined morphological branches, where particular species occupy the habitats (or market niches) with which they have co-evolved.

Intrinsic to this evolution is the systematic classification of the data being collected, manipulated, analyzed, bought and sold: health data (clinical, genetic, pharmacological), open data, personal data, scientific data, student data, financial data, systems data, metadata, audit data, business-critical data, sensor data, polling data.

In spite of this rich lexicon, too often we refer to data in generalized terms, as a vaguely homogeneous mass. This is a mistake.

Data is anything but uniform.

Fish Need Water

Derived from a variety of sources, used in innumerable applications, data requires disparate protection and maintenance routines to maximize its utility, protect its integrity, and honor its value.

Information system experts recognize that data is heterogeneous; they spend their days classifying it—doing so is critical to its preservation and security.

Recent polls demonstrate that the non-expert public also recognizes differences among data types in the digital biome, choosing, for example, to avoid interactions online that require disclosing personal information.

This is not the same, however, as fully grasping the logical next step.

Data that is not uniform—in terms of format, lifespan, value, vulnerability, etc.—should not be treated uniformly.

This is especially true where sensitive, private, personal data is concerned.

Plant, Animal, or Mineral?

Just as different species inhabit different ecosystem niches suited to their particular physical and behavioral traits, different data types will thrive or vitiate under different conditions.

Data itself takes multiple forms: words, numbers, pixels, coded content organized in a table or grid. Some data is highly complex; some is simple. It’s critical to be aware of all facets of a given body of data and to understand what constitutes favorable or unfavorable conditions.

This requires unraveling data attributes and evaluating them.

Metadata

Consider, for example, a digital image whose geo-location is embedded in its metadata. Though seemingly innocuous, when digital images are compiled and overlaid on a map, they can reveal more than you might think. Publically available images posted on social media have disclosed the precise addresses of pet owners and helped track down criminals.

Lifespan

When delivery to a recipient is all that’s needed or when time-sensitive data expires, archiving it doesn’t make sense. The distinction between permanent and impermanent content is of paramount importance given the volume of data we generate. Storing it might be cheap (in terms of dollars if not environmental impact), but managing and securing it all is not. Data that does need to be retained faces the challenges of digital preservation (such as long-term readability and bit rot).

Format

Incompatible files, “not supported on this device,” are frustrating. Some file types are so specialized that the data they contain is accessible only to those with an exclusive key. Computational biologists, for example, need proprietary bioinformatics tools to visualize in 3-D the interactions of drugs and their genetic targets.

Source and access

As with the shift from Linnaean classification to Phylogenetic nomenclature, our classification of data types will be fluid, adapting to outside forces of science, business, and culture. General categories today outline both the origins of and access to different types of data, including open data, restricted data (e.g., data subject to government regulations like COPPA, HIPAA, and Privacy Shield), proprietary data, and personal/sensitive/confidential/private data (e.g., PII, ePHI, genetic data, student data).

Co-Evolution, Symbiosis, and Habitats in the Information Ecosystem

Data itself is but one element of the information ecosystem. The expertise needed to do something useful with it embodies one of the forces driving the co-evolution of data and its market, prompting the wild growth of analytics and data visualization, influencing our behavior online, shifting the fulcrum of the adtech-adblocking seesaw, and tightening our focus on data protection.

Consider for example how these factors interact within the information ecosystem:

  • The proliferation of niche markets for analytics, defined, in part, by the questions being asked (SMEs, consumer products, biotech, medicine, travel, etc.)
  • The rising use of AI and algorithms to curate and filter
    • The relationship of technology and people in the information ecosystem can be a symbiotic one. Software algorithms have not eclipsed the need for human cognition in filtering, pattern recognition, and analytics. Intelligence analysts still outperform machines when it comes to intuition and inference and are instrumental in improving machine learning.
    • And while algorithms make information more discoverable in an environment of overload and can help separate signal from noise, they necessarily reflect the bias of their embedded assumptions.
  • The increased nuance in notions of privacy and the rise of data privacy expertise
    • As more and more sensitive data finds its way into the information ecosystem, and as regulations increasingly govern the use of that data, the demand for data privacy officers will reach 28,000 at a minimum in the coming years, according to the IAPP.
    • Distinctions between legitimate monitoring and surveillance have become subtler.
    • With consumers becoming less trusting and more reluctant to share their personal, browsing, and financial data, researchers continue to look for a balanced solution to the give-and-take of personalization and control, of public interest and privacy rights.
  • The concomitant differentiation of security risks
    • As with any ecosystem, parasites find ways to exploit available resources and feed off poorly protected data. Although hackers are often motivated by criminal intent, human error can also poke holes in data defenses.
    • Experts’ approach to data protection is based on a variety of factors: whether data is likely to be captured via the network or an endpoint; by an insider or outsider; whether an attack is designed to steal the data or to introduce ransomware; and whether there is an actual breach of cyber-defenses or a manipulation of vulnerabilities (e.g., breaches at the IRS where data was accessed “not through a forcible compromise of the computer systems, but by hackers who correctly answered security questions that should have only been answerable by the actual individual.”).

What’s in a Name?

To more fully understand the next life stage of this evolution, we need a taxonomic language and generally accepted classification criteria.

Big data is patently imprecise. “Big” describes a certain volume, but it implies nothing about wide-ranging sources and uses.

  • Software-analytics company SAS does refer to big data’s variability, to its velocity, variety, complexity, and its applicability to business decision-making, as does IBM.
  • Many claim that big data explains the “why” (though not always, as philosopher Michael Lynch points out in The Internet of Us); while small data explains the “what.”

Small data is an even more egregious misnomer: according to most definitions, there are vast quantities of it, yet there is no consensus on what it is.

  • Some say it is generated by the connected devices in the Internet of Things (i.e., specific attributes derived from sensors detecting current states like those in so-called smart cities), or that it derives from the digital breadcrumbs of our online lives used for both customer segmentation and personalized medicine, or the “lean data” processed from data streams to eliminate all but the relevant elements.
  • Others, like author Martin Lindstrom, say it is the subtle, detailed observations related to human behavior (such as what people around the world are eating) collected by people not machines.

If we imagine Data electronicum as the order, below this might be the families of big data and small data, which can themselves be divided into personal data and non-personal data, which in turn include myriad species each with numerous facets that define them.

Mapping out data types by classifying them might be construed merely as an exercise in content management. One might argue that nomenclature is nothing more than an abstraction. So why does data differentiation matter?

Bycatch or Targeted Breed?

It matters because we need to know what we are fishing for. Knowing what we are after affects the tools we use, the areas of the ocean we trawl, the season and time of day we cast our nets.

Protected Waters

Imagine concentric rings expanding outward from the singular point of our identities.

Data furthest from us might be historical data or data over which we have little or no control like roadway images or government records.

Closer in are things like health information and records of financial transactions, student records and HR files: these are held by fewer institutions but still beyond our reach.

Closest are those things that we can or might wish to control like personal photos, email, and text messages.

All of this data is nominally “ours.” It is, after all, about us. And it is often generated by us. But much of it is collected, shared, or stored unnecessarily, without our consent, or without a fair exchange.

Have you ever wondered why a weather app needs to check your location every 10 minutes? Or whether a photo-sharing app really needs access to the physical addresses, birthdays, and other notes recorded in your digital rolodex?

Unwanted Species

All of the bycatch in data collectors’ nets crowds out legitimate uses, exposes latent data to misuse, and expends resources unnecessarily.

Defining data with greater precision by citing attributes relevant to a specific use will help us control more and waste less.

Issues of controlling and protecting data and using it efficiently pivot on the value we assign to it. When we make decisions about collecting, retaining, and securing data, we must first appraise it and consider its utility, usability, ROI.

Some correlations are completely spurious. A lot of data is unstructured and difficult to decipher or inaccessible because of regulatory restrictions. And in some cases, its shelf life is too short to bother with.

Sometimes what we catch isn’t worth the bait.

No Fish in a Tree

You won’t find a tropical plant in the Arctic. Fish belong in water.

A retail coupon serves no purpose once it has expired. Genetic data doesn’t belong in the digital vault of an online bank.

We need to stop treating data as uniform and develop and implement tools and policies that allow us to act on distinctions appropriately by looking at all of the characteristics that define a given type of data.

Differentiating data can be a win-win: fewer resources, greater security, better ROI.

Image: junko | Pixabay

bitpuf blackboard blog digital dead wood why do we keep so many photos

Digital Dead Wood: Why Do We Keep So Many Photos?

A picture might be worth a thousand words, but a thousand pictures are worth little if we retain them because of inertia, indecision, or failure to filter.

Digital imaging has vastly simplified photography but complicated the processing of its prolific output.

As pixel resolution increases, our resolution to select, manage, and preserve our image collections weakens. The result: an inverse relationship between the volume of digital images we create and our ability to control them.

Who enjoys sorting through the 60 or 100 photos taken over a weekend, culling out the duplicates, the blurry ones, the backlit and poorly composed, selecting only those worth keeping? Who among us performs this chore routinely?

Dead wood in the tree of knowledge

If digital data were a living thing, it would constitute the roots of information-economy flora. Although nothing can grow without these roots, they alone will not generate a tree of knowledge, let alone a blossom. Often they create an impenetrable tangle.

Big data, small data, the Internet of Things (IoT), the thousands of digital images that we amass every year…much of it is dead wood and should be pruned judiciously.

Although we recognize that excessive, indiscriminate collecting gives rise to all kinds of dysfunction, we don’t wield the shears to cut it down to size.

Data economics don’t make sense

The deleterious effects of data hoarding and information overload are widespread and longstanding. Futurist Alvin Toffler wrote about these in 1970, well before the first digital camera appeared on the market in 1991.

As these phenomena progressed, the terms describing them evolved as well—information glut, data glut, data smog, digital noise—and gave rise to new terms for related issues—filter failure, time famine, information fatigue syndrome, and data exhaust.

No matter what we call it, too much data and poorly targeted data are problematic.

Note the numbers

First, consider the rate at which data flow and connectivity will increase; it is staggering.

  • The amount of digital data produced worldwide is expected to reach 180 zettabytes by 2025, via more than 80 billion connected devices
  • In 2016, 5.5 million new things will connect to the Internet of Things (IoT) every day
  • Global Internet traffic (web, email, IM) is forecast to reach 193,104 PB by 2019

By the way, 1,000 GB ≈ 1 TB (Terabyte); 1,000 TB ≈ 1 PB (Petabyte); 1,000 PB ≈ 1 EB (Exabyte); 1,000 EB ≈ ZB (Zettabyte)

Personal data represents a remarkable proportion of these totals (and companies are taking notice, collecting personal data and combining it with big data from other sources in order to improve analytics and by extension the personalized offers presented to consumers).

The digital-image share of this personal data is also astounding: an estimated 1 trillion photos captured in 2015.

Next, consider the costs of storage.

  • Data centers leave huge environmental footprints. Some calculate that their annual electricity usage in kilowatt-hours rivals that of a country of about 17 million. Uploading photos to the cloud using “free” services comes at a cost, even if we do not incur it directly.
  • A refrigerator-sized device can store 16 PB data. With so much real-time data streaming in from embedded sensors, it shouldn’t be difficult to fill them to capacity. However, with vendors charging up to $20 per GB, data storage costs are anything but free.

One storage vendor notes: “No one can look at all their data anymore; they need algorithms just to decide what to look at.” Indeed only 0.5% of data gathered is even analyzed. If it’s good data, it might reveal something useful; if it’s bad data, it can be misleading.

Obviously, it all comes down to making choices, evaluating what we capture, collect, and store, controlling the impulse to hold on to it all, just in case.

The logic of selecting and deleting

If we address the behavior that drives data overload, we can nip it in the bud rather than pruning the full-grown results.

Every time we snap a photo, post it, share it, or send it to the cloud, we need to ask: Do I really need to save this? Every digital artifact we create should either be destined for deletion or properly prepared for preservation.

History is constructed from artifacts that survive. Physical preservation is one factor in their survival. Another is the very act of selection.

Shakespeare famously left behind little for posterity to examine. Many visual artists destroy their own work (e.g., Monet and Picasso). Deletion is a highly effective strategy for protecting one’s reputation and legacy.

Implicit in this tradition is the acknowledgement that

  • inferior, intermediate, or temporary output is not worth saving
  • destroying personal documents ensures privacy and control
  • what we release into the world reflects us post facto

So what of the impulse today to chronicle every inane element of our lives? What compels this micro-documentation?

These are questions for sociologists, anthropologists, and psychologists, but the behavior itself concerns technologists, data scientists, and security experts. (Because it isn’t just photos. Sensitive content of all kinds is left to dangle indefinitely in cyberspace at significant risk.)

Excessive volume

Some materials do acquire value over time. The personal correspondence of literary greats and other famous people satisfy our nostalgia, voyeurism, and celebrity worship, and they provide a sightline into the creative lives of artistic genius.

But the unit of measure has changed. Digital output dwarfs most paper archives. (Compare the presidential papers of 100 years ago to more recent ones.)

The fact is, in the digital age, the issue of volume is more acute.

Taking a photo used to be a deliberate act; it required taking notice of the angle of the sun, ensuring proper focus, framing the composition, waiting for Cartier-Bresson’s “decisive moment.”

Now we think nothing of taking 20 photos to get a single winner. Unfortunately, we rarely look at the other 19.

When we postpone the acts of filtering and eliminating (telling ourselves that we’ll do it later), the volume of data we generate quickly becomes overwhelming! (Concierge photo organizers will do this for those who can justify the expense. There’s also software for automatic album creation.)

Indiscriminate value

At least as important as the issue of quantity is the question of quality, or more precisely, value.

Our frenetic digital sharing lends new meaning to the term “quantified self.” How (and why) would a biographer sort through and make sense of tens of thousands of Kardashian Tweets?

“Self-showing…can be…a sort of charming ritual of daily inventory,” writes cultural critic Adam Gopnick.

An occasional dose of charm is understandably appealing. But rituals exist in defined time and space, not on a continuum. Once performed, a ritual’s purpose is exhausted. So why preserve it?

This is not a criticism of the impulse to capture memorable moments, but a consideration of the confounding consequences:

  • the climate implications of our digital carbon footprint
  • the information security risks of retaining personal data in the cloud
  • the issues of visibility, irretrievability, and digital preservation of the unstructured data in emails, messages, images, etc.

To say nothing of what relying on digital records does to the physiological construction of our personal histories.

What kind of emotional imprint can we create in our minds when we experience things through the lens of a smartphone screen? Our memories are so much richer when encoded through the input of all five senses (smelling the salt air, hearing laughter, feeling the warmth of the sun).

Neverlasting is natural

A very short-lived species of mayfly takes its name from the Greek word ephemera, meaning “lasting one day.” In their abbreviated life span, these insects serve their purpose and then expire.

We might do well to take notice of this cycle as we consider the volume and value of the data we generate, consume, and store.

Let’s do ourselves a favor by sharing in the moment, relieving ourselves of the “selection” burden, and letting go of the impulse to save every digital communication we create.

Differentiate content. Favor quality over quantity. Define value. Eliminate dead wood.

You can start by using bitpuf. It’s designed for sharing impermanent content.

 

Make it Neverlasting!

Image: hotblack | morgueFile

bitpuf blog harboring dark data

Are You Harboring Dark Data?

Dark Matters

We’ve all heard the refrain about poor data: garbage in, garbage out. A less well-recognized issue concerns data that is collected and stored but not used.

Many companies draw on only a fraction of the data they posses and often fail to derive anything useful from it. The explosion in data analytics will help redress this gap, enabling organizations to identify patterns, make predictions, and personalize products and services. The advanced analytics market is projected to grow to nearly $30B by 2019.

But data analytics rely on seeing the data that is being analyzed and some shades of big data are difficult to discern. Most organizations retain vast quantities of this darker stuff. Some estimate that as much as 90% of big data is so-called “dark data.” Though not always shady, it is not always a valuable resource either.

Hidden in the cloud or dark matter of cyberspace, either way the dark data you harbor can be an unseen force—for better or worse.

What is Dark Data and Why Does It Matter?

Dark data refers to data that is collected and stored, then neglected. It may include data of minimal value or great potential.

  1. Sometimes the term refers to data that is undetected and therefore unusable. Often this is simply a matter of unstructured information contained in text-heavy documents or files that are not tagged or annotated in any systematic way (imagine an encyclopedia without an index).

Email is a prime example. Though often archived as a matter of policy, it is unlikely to be cataloged in a content-management system. Because there are privacy laws specific to email, knowing where it resides and what it contains is paramount.

Undetected dark data might also include personal files, like music and video that employees store on company machines, or worse on unsanctioned cloud apps on third-party servers. The storage costs accumulate quickly.

According to a study of companies in the UK, “A typical midsize company with 500 terabytes of data wastes nearly a million pounds [$1.5 million] each year maintaining trivial files, including … personal photos stored by 57 percent of employees, personal ID and legal documents by 53 percent, as well as music, games and videos, stored by 45 percent, 43 percent and 29 percent respectively.”

  1. Other times “dark” implies dangerous, meaning that it exposes information systems to significant risks. This includes data that is redundant, obsolete, or trivial, also known as ROT (an apt acronym). When retained beyond its useful life, it remains vulnerable to misuse.
  1. On a less sinister note, dark data can also refer to data that is simply inaccessible. In some cases it holds promise but requires transformation first, either from an outdated digital format or from a non-digital one.

There is a treasure trove of information locked up in libraries, museums, and research collections: e.g., objects, photographs, even metadata in card catalogs. These are unequivocally worth preserving in digital form, contributing as they do to innovation and scholarship.

Got ROT? Deal with Your Databerg in Four Steps

Whether perceived as a business risk or potential asset, caring for this all of this data is a Herculean task.

“Databergs” threaten to rip a hole in information systems. ROT alone is projected to cost organizations $891B by 2020 in storage, migration, and security.

The intangible costs of data protection are equally significant. Trust is considered the “cornerstone of the digital economy,” yet the reputational and financial risks of data breaches are too often recognized after a hacking incident not before.

Minimizing these risks is essential. How?

  • First and foremost prevent unauthorized access.

Though no one wants to talk about it, most data breaches are the result of accidental or deliberate unauthorized access by employees. What to do? Training in data ethics and implementing and executing clear information governance policies.

  • Second, address digital decay and excise the rotten bits.

Data is delicate with a relatively short shelf-life: it must be periodically accessed and migrated to ensure its integrity.

Storage media can be unstable and prone to corruption or defect, but they must be readable in the future. File formats, especially proprietary formats, quickly become outdated, as the applications needed to view them become incompatible with current operating systems and devices.

  • Third, eliminate data that isn’t needed rather than storing it indefinitely, which is wasteful and risky if no one is monitoring it.

One person’s ROT might be another’s loot and it should be periodically purged. (Obsolescence and triviality make a strong case for temporary content–neverlasting as we like to call it!)

Notwithstanding the costs and risks of keeping data that holds no value, determining what to retain also has legal and cultural implications.

Culturally, we must ask: what will we commit to the digital record? Legally, we must comply with data protection and privacy regulations.

  • Finally, keep assets secure.

Cybersecurity today must protect not only the data itself, but the data used to authenticate access to it (biometrics both physical and behavioral hold promise in some applications but can also be stolen for nefarious use).

Dark Data Checklist

In the simplest terms, any approach to caring for dark data will involve:

  • Identifying it—locating it, classifying it, etc.
  • Ensuring appropriate access, now and in the future, in terms of both authorization and integrity
  • Evaluating it and eliminating what isn’t needed, e.g., ROT, unrecoverable, and sensitive data
  • Protecting what is kept

And this will mean addressing some rotten habits:

  • Hoarding data
  • Misusing the corporate cloud and third-party storage apps
  • Failing to differentiate between valuable data and ROT
  • Failing to annotate data when it is captured or created
  • Racking up storage costs and leaving a huge environmental footprint

Keeping Private Data in the Dark

With nearly every move we make online generating a steady stream of digital bits, dark data touches all of us.

We can support the use of data in research where it contributes to the common good while also holding data brokers accountable (especially when they sell health data without our specific consent).

And we can support the use of personal data to customize an offer when it benefits all parties involved.

But we must insist that possessing and using sensitive data conveys a big responsibility. To help ensure that it is well protected and treated ethically, organizations should focus on obtaining transparent consent, collecting only what is needed, selecting what’s valuable, and eliminating the rest.

We at bitpuf have chosen not to collect your data, because we believe in keeping your personal information in the dark, off the record, in a word private.

If you believe privacy matters…

Sign up for bitpuf!

 

Image source: Pixabay

getting ahead digital data guidelines access encryption privacy security

Getting Ahead of Digital Data: the cart’s before the horse and it’s rolling away!

Human history is punctuated with examples of new science and technologies gaining powerful momentum before society considered the repercussions of their applications and established guidelines for their uses.

American drivers were bumping along in Model T Fords for several years before they were required to obtain licenses and moviegoers had been buying tickets for decades when motion picture and television rating systems were introduced.

Our uniquely human drive to discover, invent, and improve is a wondrous thing, but we can get ahead of ourselves by adopting advances before considering the potential for undesirable consequences or taking measures to avoid them (Nobel laureate Alexander Fleming, who discovered penicillin, predicted antibiotic resistance as a result of misuse but his warnings went unheeded for generations).

How did personal computing become personalized ads?

Progress empowers; it enables and enriches. It also introduces new challenges as we see now with the rise of big data and the “personal information economy.” Our ability to capture and crunch data has leapfrogged ahead of a framework to guide its responsible use.

Early triumphs of the digital age arose from computing power—the ability to grind through calculations at an unprecedented rate. The advent of personal computing saw word processing replace the typewriter and the introduction of desktop publishing.

Then came the Internet and email, web search and browsing, e-commerce, and eventually social networks. Each of these developments contributed to the next one and each incrementally encroached upon our online privacy.

Today data, much of it personally identifiable information (PII), drives a significant portion of the global economy and contributes inestimably to our daily activities and interactions.

Retail transactions, traffic apps, fitness trackers, private communications, even media consumption involve surrendering various fragments of data that can easily be combined to create rich profiles and to identify and locate users with great specificity.

We cede this personal data in exchange for convenience, or so the argument goes. Yet in the absence of a universal, or at least widely adopted, ethical framework to guide the responsible use of data, we expose ourselves to questionable manipulation and outright abuse. It’s very difficult to know where to draw the line.

To move forward, we must first step back

Perhaps it will become clearer if we step back and take the long view. It’s fair to say that we are collectively realizing the need to reexamine the very concept of privacy, to redefine it in light of changes wrought by information technology, just as we had to redefine labor in the industrial age to address child welfare, public health, and urbanization.

The modern factory became emblematic of the industrial age, embodying both its promise and peril. The Internet represents the multifarious face of today’s technology: globalization, speed, connectivity, convenience, and scale, but also unintended exposure, inconsistent regulations, and every imaginable scam.

Automation, as a driver of the industrial age, transformed both manufacturing and labor. Initially, in a rush to reap its benefits, we failed to account for the human factor and treated workers as machines.

Appalling conditions, occupational hazards, inhumane hours, and child labor gave rise to a spate of new legislation that, in effect, stepped backward to identify the best way forward.

Debating ethical use, responsibility, and regulations

We find ourselves at a similar junction today as we debate how to use data responsibly. Having rushed headlong into our current state, we must now retreat and reconsider the physical and tacit boundaries that once demarcated private spaces.

We must unravel each strand of a complicated topic to evaluate issues of access, encryption and surveillance, data privacy, the protection of student data, data ownership, and security.

Fierce competition among data brokers and the potential for anti-trust actions against those with the most valuable troves indicate just how high the stakes have become.

So what next?

We all hold the reins

Having acknowledged that left unbridled our digital world is becoming increasingly vulnerable (data breaches and the hacking of Wi-Fi enabled toys starkly illustrate its darker side), we can direct our forward momentum toward an acknowledgement that regulation is needed.

It’s not an all-or-nothing issue. We’ve traveled too far down the path of progress to roll back the conveniences we’ve come to enjoy.

So let’s balance economic benefits with individual rights by agreeing to basic ground rules: broadly speaking in the form of data ethics, and more narrowly in the form of specific implementations, e.g., architectures, privacy policies, and business models.

It seems neither practical nor desirable to eliminate completely the capture and use of personal data. Our expectations and habits have changed. But it is entirely within our power to demand and create a code of ethics, to pull back on the reins a bit and return things to a workable order.

 

Privacy matters!

5 privacy trends for 2016 big data ad blocking personal data

5 Privacy Trends for 2016: a battle for big data, bandwidth, and ad blocking

As we look ahead to the coming year, our eyes are inevitably drawn to the digital landscape and the billions of personal data points that map its contours.

Nearly every what-to-watch-in-2016 list refers to data privacy. And nearly every one points to a significant shift in the balance of control over personal data: tipping away from AdTech and toward consumers.

To relinquish or control, that is the question

People are bristling at the unbridled collection and use of data about their behavior online, their every move through physical space, and literally thousands of facets of their “persona” (up to 4,000 data points on a single user—one journalist asks whether he could come up with that many data points on his spouse!).

And we consumers are footing the bill: the frenetic pop-ups and “vexing videos” that plague our mobile screens have voracious appetites for bandwidth, sometimes consuming more than the content itself.

Cross-device tracking using digital fingerprinting represents a particularly egregious invasion of personal space.

Yet consumer opinion remains divided, largely along generational and cultural lines, about the risks and benefits of permitting data collection.

Some, especially “digital natives,” are accustomed to letting their private lives spill out in full view of the online public. (Though cybersecurity specialists predict that Millennials will take a closer look at privacy.) Many others shrink from the spotlight, wondering what really lies behind the glow.

Of course, there is no immutable law of information technology declaring that we must relinquish our personal data and privacy in order to participate as digital citizens. We can demand control.

Blocking, faking, refusing

Has data-driven personalization reached its limit? It certainly has met its match in ad-blocking technology and consumers’ evasive strategies.

  • Symantec’s State of Privacy 2015 finds that 33% of consumers in the UK provide fake data and 53% avoid posting personal data online.
  • A Pew Research Center study shows that 24% of American Internet users provide inaccurate information about themselves and 57% have refused to provide information irrelevant to the transaction at hand.
  • The dizzying rise of ad-blocking software (198 million active ad blockers globally including 34% of 16-24 year olds using the Internet) illustrates our collective frustration with increasingly intrusive advertising strategies.

Big data is bittersweet

Big data has many worthwhile and legitimate uses but the anonymization of personal data is notoriously difficult and the data collected often far exceeds what is needed for a given service or transaction, for example:

  • identifying individuals based on retail transactions (as few as 4 data points provided 90% accuracy!)
  • seeking excessive permissions (up to 235 permissions, the average Android app seeks 5)

Forks in the road: what can we do?

We can choose more palatable paths through the digital world. Consider these 5 alternatives:

  • Matching content to channel: Differentiate content types and select communication channels that are aligned with their attributes: e.g., broadcast public content; choose user-to-user or authenticated access for private content; delete temporary content when it becomes irrelevant; archive permanent content for posterity.
  • Managing our own personal-data: Ask users to define the privacy parameters of their online presence based on the context of what is being served (e.g., search results, e-retail, social content, branded content, academic research, professional content, etc.). Researchers in EdTech have already taken steps down this path granting students greater control over what personal data is displayed on a given page. They call it “sovereign source identity.”
  • Re-defining regulatory frameworks: Support national and international laws that promote more transparent terms of service, explicit opt-in, the right-to-be-forgotten, and what law professor Lawrence Lessig calls systems that draw on personal data for “single-use purposes.”
  • Favoring private-by-design: Appeal to consumers by offering inherently private, secure devices like ReVault’s wearable data storage, Purism’s laptop, or the Blackphone 2.
  • Data minimization: Do not collect sensitive information if it isn’t needed for a given service and delete it once it is no longer relevant. Store personal data locally rather than in the cloud. (Data minimization will be critical for the Internet of Things.)
  • Permission-based advertising: Encourage permission marketing rather than interruption marketing. The former is not a new idea but it may enjoy a renaissance. Rather than pushing intrusive ads to consumers, marketers and advertisers may offer them something in exchange for their attention or action.

All of these options implicitly treat personal data as a monetizeable asset. Given that we are the source of this in-demand resource, shouldn’t we exercise our right to determine its value and the conditions of its exchange?

Shouldn’t we demand more than the simple convenience that data controllers point to as the current trade-off? (An Annenberg School for Communication survey reveals that most Americans don’t buy this “tradeoff fallacy” anyway.)

More bandits, more breaches

As we explore these options, cybercriminals will continue to test our systems’ vulnerabilities relentlessly and will penetrate inadequate defenses. The incidence of data breaches continues to increase (780 in the U.S. in 2015), as does the sophistication of the attacks. Those seeking unauthorized access to personal data are devising increasingly subtle ploys. Social-engineering fraud preys on our gullibility and turns our socially-shared information against us.

Leaky connections

The profusion of connected devices spawned by the Internet of Things (IoT) will expose still more of our data to additional “controllers” and attacks. The Gartner Group estimates that the number of connected things will reach 25 billion by 2020.

And the range of entities seeking to use information about our behavior and demographic data keeps expanding: note the granularity of voter profiling in the current U.S. presidential race. Psycho-graphic, behavioral microtargeting is providing candidates’ campaigns with detailed information gleaned from voters’ “Like” patterns on social media.

As with any data stored in the cloud, these records can be leaked. A researcher was able to access 191 million voting records from one database a few weeks ago and an additional 56 million records from another.

New rules of the road

We can all expect to be rated on our data-ethics performance and our reliability vis-à-vis privacy and security. Driven by both consumer pressure and the risks of cybercrime, businesses will continue to adapt by creating new roles (chief privacy officer), adopting new regulations, developing new privacy-enhancing technologies (so-called PETs), and implementing new policies and training, all addressing data security and data ethics. The economic and reputational risks of failing to do so could be crippling.

So who says privacy is dead? 93% of Americans feel that it is important to control who can get information about them and 90% feel it’s important to control what information is collected. Those numbers unequivocally refute any claim to privacy’s demise. To ignore them is akin to junk food marketers asserting that healthy eating is dead.

Privacy will be dead when we digital citizens give it up. Nothing indicates that moment is near.


Read additional perspectives on what to expect in the privacy space in 2016:

  • Mary Meehan writing about consumer culture in Forbes
  • Christos K. Dimitriadis writing about cyber-risk trends in TechInsider
  • Victor Pineiro writing about social-media marketing trends in AdAge
  • Global design firm Fjord predicts: “big data will get some manners.” Let’s hope they are correct.

Photo credit: Russell Johnson

Try bitpuf!

bitpuf blog privacy private conversation

Privacy is not what you think

Privacy is not about hiding. Secrecy is.

Privacy is about setting boundaries and making distinctions; it’s about asserting the right to control what is personal.

Privacy is:

  • trusting the postal service to deliver mail, unopened
  • having a conversation, unrecorded
  • shredding bank statements
  • asking permission, not presuming it

Unfortunately, digital privacy usually has it the other way around.

Recent research, reported by Gavin O’Malley of MediaPost, “tested 110 of the most popular Android and iOS apps on the market to see which ones shared personal, behavioral, and location data with third parties…a whopping 73% of Android apps shared personal information, such as email address with third parties, while 47% of iOS apps shared geo-coordinates and other location data,” very often without notifying users or asking for their permission.

Given the fierce debates surrounding data protection and consumer rights, it’s clear that balancing the risks and benefits of monitoring behavior, compiling dossiers, and storing data will not be easy.

bitpuf was designed to protect users’ privacy, to minimize these risks, and to create value around trust.

bitpuf does not use location or other tracking, cookies, photo tagging, or facial recognition. We ask for users’ email at sign up but nothing else. All content is fully encrypted, both in transit and “at rest.” And nothing is archived. We see privacy as an inherent right not an opting out of intrusive data collection.

Our policy is transparency because we believe privacy matters. We know that many of you share this view.

Wondering what privacy looks like on bitpuf? See for yourself!

Photo credit: nd3000 / Fotolia
 

Get started!
 

bitpuf blog neverlasting content no digital footprint

Introducing… Neverlasting content

Somewhere along our path through the digital age, we stopped distinguishing between content that we’d like to record, save, and archive and content that should disappear. We stopped recognizing the benefits of impermanence: privacy, security, and simplicity. There are many reasons for making digital content ephemeral—perhaps it is casual or short-lived, sensitive or confidential.

In the real world, we can hand off a document to our attorney, accountant, or doctor; we can have conversations, share photos, or exchange information selectively and quietly, in the moment and not beyond.

We’ve largely lost this kind of interaction in the online world, but not because it isn’t still valuable. With so much of the digital economy fueled by the highly targeted ads aimed at us, personal data has become the resource sine qua non.

Nearly every interaction we engage in online becomes part of a record. But is everything we do online worth capturing and analyzing? Surely it is not. Are we aware that we’re providing this raw material? Often we are not.

bitpuf was founded with a vision to recreate these ephemeral exchanges, to provide a secure channel for delivery, and to respect the privacy of personal data. We’re all for the convenience and pleasure of sharing online. We just believe in giving you the right to control your privacy and to choose whether your content will last forever.

bitpuf it!

Photo credit: Silroby / Fotolia