bitpuf blog harboring dark data

Are You Harboring Dark Data?

Dark Matters

We’ve all heard the refrain about poor data: garbage in, garbage out. A less well-recognized issue concerns data that is collected and stored but not used.

Many companies draw on only a fraction of the data they posses and often fail to derive anything useful from it. The explosion in data analytics will help redress this gap, enabling organizations to identify patterns, make predictions, and personalize products and services. The advanced analytics market is projected to grow to nearly $30B by 2019.

But data analytics rely on seeing the data that is being analyzed and some shades of big data are difficult to discern. Most organizations retain vast quantities of this darker stuff. Some estimate that as much as 90% of big data is so-called “dark data.” Though not always shady, it is not always a valuable resource either.

Hidden in the cloud or dark matter of cyberspace, either way the dark data you harbor can be an unseen force—for better or worse.

What is Dark Data and Why Does It Matter?

Dark data refers to data that is collected and stored, then neglected. It may include data of minimal value or great potential.

  1. Sometimes the term refers to data that is undetected and therefore unusable. Often this is simply a matter of unstructured information contained in text-heavy documents or files that are not tagged or annotated in any systematic way (imagine an encyclopedia without an index).

Email is a prime example. Though often archived as a matter of policy, it is unlikely to be cataloged in a content-management system. Because there are privacy laws specific to email, knowing where it resides and what it contains is paramount.

Undetected dark data might also include personal files, like music and video that employees store on company machines, or worse on unsanctioned cloud apps on third-party servers. The storage costs accumulate quickly.

According to a study of companies in the UK, “A typical midsize company with 500 terabytes of data wastes nearly a million pounds [$1.5 million] each year maintaining trivial files, including … personal photos stored by 57 percent of employees, personal ID and legal documents by 53 percent, as well as music, games and videos, stored by 45 percent, 43 percent and 29 percent respectively.”

  1. Other times “dark” implies dangerous, meaning that it exposes information systems to significant risks. This includes data that is redundant, obsolete, or trivial, also known as ROT (an apt acronym). When retained beyond its useful life, it remains vulnerable to misuse.
  1. On a less sinister note, dark data can also refer to data that is simply inaccessible. In some cases it holds promise but requires transformation first, either from an outdated digital format or from a non-digital one.

There is a treasure trove of information locked up in libraries, museums, and research collections: e.g., objects, photographs, even metadata in card catalogs. These are unequivocally worth preserving in digital form, contributing as they do to innovation and scholarship.

Got ROT? Deal with Your Databerg in Four Steps

Whether perceived as a business risk or potential asset, caring for this all of this data is a Herculean task.

“Databergs” threaten to rip a hole in information systems. ROT alone is projected to cost organizations $891B by 2020 in storage, migration, and security.

The intangible costs of data protection are equally significant. Trust is considered the “cornerstone of the digital economy,” yet the reputational and financial risks of data breaches are too often recognized after a hacking incident not before.

Minimizing these risks is essential. How?

  • First and foremost prevent unauthorized access.

Though no one wants to talk about it, most data breaches are the result of accidental or deliberate unauthorized access by employees. What to do? Training in data ethics and implementing and executing clear information governance policies.

  • Second, address digital decay and excise the rotten bits.

Data is delicate with a relatively short shelf-life: it must be periodically accessed and migrated to ensure its integrity.

Storage media can be unstable and prone to corruption or defect, but they must be readable in the future. File formats, especially proprietary formats, quickly become outdated, as the applications needed to view them become incompatible with current operating systems and devices.

  • Third, eliminate data that isn’t needed rather than storing it indefinitely, which is wasteful and risky if no one is monitoring it.

One person’s ROT might be another’s loot and it should be periodically purged. (Obsolescence and triviality make a strong case for temporary content–neverlasting as we like to call it!)

Notwithstanding the costs and risks of keeping data that holds no value, determining what to retain also has legal and cultural implications.

Culturally, we must ask: what will we commit to the digital record? Legally, we must comply with data protection and privacy regulations.

  • Finally, keep assets secure.

Cybersecurity today must protect not only the data itself, but the data used to authenticate access to it (biometrics both physical and behavioral hold promise in some applications but can also be stolen for nefarious use).

Dark Data Checklist

In the simplest terms, any approach to caring for dark data will involve:

  • Identifying it—locating it, classifying it, etc.
  • Ensuring appropriate access, now and in the future, in terms of both authorization and integrity
  • Evaluating it and eliminating what isn’t needed, e.g., ROT, unrecoverable, and sensitive data
  • Protecting what is kept

And this will mean addressing some rotten habits:

  • Hoarding data
  • Misusing the corporate cloud and third-party storage apps
  • Failing to differentiate between valuable data and ROT
  • Failing to annotate data when it is captured or created
  • Racking up storage costs and leaving a huge environmental footprint

Keeping Private Data in the Dark

With nearly every move we make online generating a steady stream of digital bits, dark data touches all of us.

We can support the use of data in research where it contributes to the common good while also holding data brokers accountable (especially when they sell health data without our specific consent).

And we can support the use of personal data to customize an offer when it benefits all parties involved.

But we must insist that possessing and using sensitive data conveys a big responsibility. To help ensure that it is well protected and treated ethically, organizations should focus on obtaining transparent consent, collecting only what is needed, selecting what’s valuable, and eliminating the rest.

We at bitpuf have chosen not to collect your data, because we believe in keeping your personal information in the dark, off the record, in a word private.

If you believe privacy matters…

Sign up for bitpuf!


Image source: Pixabay

getting ahead digital data guidelines access encryption privacy security

Getting Ahead of Digital Data: the cart’s before the horse and it’s rolling away!

Human history is punctuated with examples of new science and technologies gaining powerful momentum before society considered the repercussions of their applications and established guidelines for their uses.

American drivers were bumping along in Model T Fords for several years before they were required to obtain licenses and moviegoers had been buying tickets for decades when motion picture and television rating systems were introduced.

Our uniquely human drive to discover, invent, and improve is a wondrous thing, but we can get ahead of ourselves by adopting advances before considering the potential for undesirable consequences or taking measures to avoid them (Nobel laureate Alexander Fleming, who discovered penicillin, predicted antibiotic resistance as a result of misuse but his warnings went unheeded for generations).

How did personal computing become personalized ads?

Progress empowers; it enables and enriches. It also introduces new challenges as we see now with the rise of big data and the “personal information economy.” Our ability to capture and crunch data has leapfrogged ahead of a framework to guide its responsible use.

Early triumphs of the digital age arose from computing power—the ability to grind through calculations at an unprecedented rate. The advent of personal computing saw word processing replace the typewriter and the introduction of desktop publishing.

Then came the Internet and email, web search and browsing, e-commerce, and eventually social networks. Each of these developments contributed to the next one and each incrementally encroached upon our online privacy.

Today data, much of it personally identifiable information (PII), drives a significant portion of the global economy and contributes inestimably to our daily activities and interactions.

Retail transactions, traffic apps, fitness trackers, private communications, even media consumption involve surrendering various fragments of data that can easily be combined to create rich profiles and to identify and locate users with great specificity.

We cede this personal data in exchange for convenience, or so the argument goes. Yet in the absence of a universal, or at least widely adopted, ethical framework to guide the responsible use of data, we expose ourselves to questionable manipulation and outright abuse. It’s very difficult to know where to draw the line.

To move forward, we must first step back

Perhaps it will become clearer if we step back and take the long view. It’s fair to say that we are collectively realizing the need to reexamine the very concept of privacy, to redefine it in light of changes wrought by information technology, just as we had to redefine labor in the industrial age to address child welfare, public health, and urbanization.

The modern factory became emblematic of the industrial age, embodying both its promise and peril. The Internet represents the multifarious face of today’s technology: globalization, speed, connectivity, convenience, and scale, but also unintended exposure, inconsistent regulations, and every imaginable scam.

Automation, as a driver of the industrial age, transformed both manufacturing and labor. Initially, in a rush to reap its benefits, we failed to account for the human factor and treated workers as machines.

Appalling conditions, occupational hazards, inhumane hours, and child labor gave rise to a spate of new legislation that, in effect, stepped backward to identify the best way forward.

Debating ethical use, responsibility, and regulations

We find ourselves at a similar junction today as we debate how to use data responsibly. Having rushed headlong into our current state, we must now retreat and reconsider the physical and tacit boundaries that once demarcated private spaces.

We must unravel each strand of a complicated topic to evaluate issues of access, encryption and surveillance, data privacy, the protection of student data, data ownership, and security.

Fierce competition among data brokers and the potential for anti-trust actions against those with the most valuable troves indicate just how high the stakes have become.

So what next?

We all hold the reins

Having acknowledged that left unbridled our digital world is becoming increasingly vulnerable (data breaches and the hacking of Wi-Fi enabled toys starkly illustrate its darker side), we can direct our forward momentum toward an acknowledgement that regulation is needed.

It’s not an all-or-nothing issue. We’ve traveled too far down the path of progress to roll back the conveniences we’ve come to enjoy.

So let’s balance economic benefits with individual rights by agreeing to basic ground rules: broadly speaking in the form of data ethics, and more narrowly in the form of specific implementations, e.g., architectures, privacy policies, and business models.

It seems neither practical nor desirable to eliminate completely the capture and use of personal data. Our expectations and habits have changed. But it is entirely within our power to demand and create a code of ethics, to pull back on the reins a bit and return things to a workable order.


Privacy matters!

bitpuf blackboard blog 5 types of content that shouldn't live forever

5 Types of Content that Shouldn’t Last Forever

Message in a bottle

Journalist John Markoff recently reported on a potential breakthrough in storage technology that could make it possible to store all the world’s data on synthetic DNA. It would fit in 12 wine bottles.

The Library of Congress holds more than 160 million items in 470 languages stored on approximately 838 miles of bookshelves: works of literature, scientific data, presidential papers, sheet music, rare manuscripts and maps…there are so many things worth preserving and archiving.

But there are even more that are not: grocery lists, casual emails and text messages, social media posts, telephone conversations, memos, homework assignments, doodles…Think about what it takes to store all the content produced by a single person in one year!

Digital is different

While we must file away important documents in a safe place—a signed original, for example—we do not want multiple digital copies stored indefinitely elsewhere. Yet that is precisely what happens when we share content using email and many file-sharing services.

Often, we cannot take the steps we’d like to protect sensitive information: deleting a file after it’s delivered digitally isn’t always the same as shredding its physical equivalent. In many cases, once we’ve sent it, its fate is out of our control.

And then there’s all of the not-worth-saving stuff that clutters our digital lives.

We’ve become digital hoarders

In transitioning to the digital world, we have failed to apply the same selectivity that drives our behavior in sorting through our physical materials. When was the last time you culled outdated files from your computer or old messages from your inbox? If we were to retain every piece of paper that passes through our lives, we’d be buried in it.

Here are 5 distinctly un-storage-worthy kinds of digital content.

1. Every photo ever taken

Today’s mobile phones produce fantastic, high-quality images and it’s very tempting to tap the screen. So much easier than working with manual focus cameras, cellulose film, the developing process, and leather-bound albums.

There’s a powerful impulse to capture a scene, a moment, a memorable event, to create a visual reminder, to save ourselves the time and effort of writing something down. Few of us enjoy the task of selecting, deleting, transferring, and uploading these digital images. But just because we can keep all of this stuff, doesn’t mean we should.

2. Casual communications

R u home yet? We need more milk. Are we still on for 1?

Once uttered, these exchanges lose all of their value. Digital advertisers might want to know that you have run out of milk or where you are at 1:00. But what further value do the messages bring you?

3. Time-sensitive content

A coupon expires after a certain date, a reminder is pointless after the fact, an invitation no longer needed following an event. Keeping these around wastes energy and space.

4. Confidential material

The convenience of electronic delivery is one of the great universal upsides of information technology. Just don’t forget the downsides: data breaches, identity fraud, lack of privacy control. When you email a copy of your 1099 to your accountant, or SMS a password to your spouse, you usually lose the ability to “shred” these digitally.

5. Private information

Who hasn’t expressed a private thought in an email to a friend? Maybe we’ve disclosed things that we’d prefer to be forgotten. The spoken word is ephemeral and more easily left behind. Our digital trifles follow us around.

We need to be more discerning with our digital content, to distinguish what we share by its intended lifespan and its vulnerability once it leaves our fingertips.

For neverlasting content, there’s always bitpuf.

Photo credit: dolphfyn / Fotolia

Try bitpuf!

bitpuf blog privacy private conversation

Privacy is not what you think

Privacy is not about hiding. Secrecy is.

Privacy is about setting boundaries and making distinctions; it’s about asserting the right to control what is personal.

Privacy is:

  • trusting the postal service to deliver mail, unopened
  • having a conversation, unrecorded
  • shredding bank statements
  • asking permission, not presuming it

Unfortunately, digital privacy usually has it the other way around.

Recent research, reported by Gavin O’Malley of MediaPost, “tested 110 of the most popular Android and iOS apps on the market to see which ones shared personal, behavioral, and location data with third parties…a whopping 73% of Android apps shared personal information, such as email address with third parties, while 47% of iOS apps shared geo-coordinates and other location data,” very often without notifying users or asking for their permission.

Given the fierce debates surrounding data protection and consumer rights, it’s clear that balancing the risks and benefits of monitoring behavior, compiling dossiers, and storing data will not be easy.

bitpuf was designed to protect users’ privacy, to minimize these risks, and to create value around trust.

bitpuf does not use location or other tracking, cookies, photo tagging, or facial recognition. We ask for users’ email at sign up but nothing else. All content is fully encrypted, both in transit and “at rest.” And nothing is archived. We see privacy as an inherent right not an opting out of intrusive data collection.

Our policy is transparency because we believe privacy matters. We know that many of you share this view.

Wondering what privacy looks like on bitpuf? See for yourself!

Photo credit: nd3000 / Fotolia

Get started!