White House Big Data and Privacy Review Reports

The following reports were released by the Executive Office of the President and the President’s Council of Advisors on Science and Technology on May 1, 2014.

Big Data: Seizing Opportunities, Preserving Values	85 pages	Download
Report to the President on Big Data and Privacy: A Technological Perspective	76 pages	Download

Since the first censuses were taken and crop yields recorded in ancient times, data collection and analysis have been essential to improving the functioning of society. Founda-tional work in calculus, probability theory, and statistics in the 17th and 18th centuries provided an array of new tools used by scientists to more precisely predict the move-ments of the sun and stars and determine population-wide rates of crime, marriage, and suicide. These tools often led to stunning advances. In the 1800s, Dr. John Snow used early modern data science to map cholera “clusters” in London. By tracing to a contami-nated public well a disease that was widely thought to be caused by “miasmatic” air, Snow helped lay the foundation for the germ theory of disease.

Gleaning insights from data to boost economic activity also took hold in American industry. Frederick Winslow Taylor’s use of a stopwatch and a clipboard to analyze productivity at Midvale Steel Works in Pennsylvania increased output on the shop floor and fueled his belief that data science could revolutionize every aspect of life. In 1911, Taylor wrote The Principles of Scientific Management to answer President Theodore Roosevelt’s call for increasing “national efficiency”:

[T]he fundamental principles of scientific management are applicable to all kinds of human activities, from our simplest individual acts to the work of our great corporations…. [W]henever these principles are correctly applied, results must fol-low which are truly astounding.

Today, data is more deeply woven into the fabric of our lives than ever before. We aspire to use data to solve problems, improve well-being, and generate economic prosperity. The collection, storage, and analysis of data is on an upward and seemingly unbounded trajectory, fueled by increases in processing power, the cratering costs of computation and storage, and the growing number of sensor technologies embedded in devices of all kinds. In 2011, some estimated the amount of information created and replicated would surpass 1.8 zettabytes. In 2013, estimates reached 4 zettabytes of data generated worldwide.

More than 500 million photos are uploaded and shared every day, along with more than 200 hours of video every minute. But the volume of information that people create themselves—the full range of communications from voice calls, emails and texts to uploaded pictures, video, and music—pales in comparison to the amount of digital information created about them each day.

These trends will continue. We are only in the very nascent stage of the so-called “Internet of Things,” when our appliances, our vehicles and a growing set of “wearable” technologies will be able to communicate with each other. Technological advances have driven down the cost of creating, capturing, managing, and storing information to one-sixth of what it was in 2005. And since 2005, business investment in hardware, software, talent, and services has increased as much as 50 percent, to $4 trillion.

There are many definitions of “big data” which may differ depending on whether you are a computer scientist, a financial analyst, or an entrepreneur pitching an idea to a venture capitalist. Most definitions reflect the growing technological ability to capture, aggregate, and process an ever-greater volume, velocity, and variety of data. In other words, “data is now available faster, has greater coverage and scope, and includes new types of observations and measurements that previously were not available.” More precisely, big datasets are “large, diverse, complex, longitudinal, and/or distributed datasets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.”

What really matters about big data is what it does. Aside from how we define big data as a technological phenomenon, the wide variety of potential uses for big data analytics raises crucial questions about whether our legal, ethical, and social norms are sufficient to protect privacy and other values in a big data world. Unprecedented computational power and sophistication make possible unexpected discoveries, innovations, and advancements in our quality of life. But these capabilities, most of which are not visible or available to the average consumer, also create an asymmetry of power between those who hold the data and those who intentionally or inadvertently supply it.

Part of the challenge, too, lies in understanding the many different contexts in which big data comes into play. Big data may be viewed as property, as a public resource, or as an expression of individual identity. Big data applications may be the driver of America’s economic future or a threat to cherished liberties. Big data may be all of these things. For the purposes of this 90-day study, the review group does not purport to have all the answers to big data. Both the technology of big data and the industries that support it are constantly innovating and changing. Instead, the study focuses on asking the most important questions about the relationship between individuals and those who collect and use data about them.

…

2.3 Challenges to the home’s special status

The home has special significance as a sanctuary of individual privacy. The Fourth Amendment’s list, “persons, houses, papers, and effects,” puts only the physical body in the rhetorically more prominent position; and a house is often the physical container for the other three, a boundary inside of which enhanced privacy rights apply.

Existing interpretations of the Fourth Amendment are inadequate for the present world, however. We, along with the “papers and effects” contemplated by the Fourth Amendment, live increasingly in cyberspace, where the physical boundary of the home has little relevance. In 1980, a family’s financial records were paper documents, located perhaps in a desk drawer inside the house. By 2000, they were migrating to the hard drive of the home computer – but still within the house. By 2020, it is likely that most such records will be in the cloud, not just outside the house, but likely replicated in multiple legal jurisdictions – because cloud storage typically uses location diversity to achieve reliability. The picture is the same if one substitutes for financial records something like “political books we purchase,” or “love letters that we receive,” or “erotic videos that we watch.” Absent different policy, legislative, and judicial approaches, the physical sanctity of the home’s papers and effects is rapidly becoming an empty legal vessel.

The home is also the central locus of Brandeis’ “right to be left alone.” This right is also increasingly fragile, however. Increasingly, people bring sensors into their homes whose immediate purpose is to provide convenience, safety, and security. Smoke and carbon monoxide alarms are common, and often required by safety codes. Radon detectors are usual in some parts of the country. Integrated air monitors that can detect and identify many different kinds of pollutants and allergens are readily foreseeable. Refrigerators may soon be able to “sniff” for gases released from spoiled food, or, as another possible path, may be able to “read” food expiration dates from radio‐frequency identification (RFID) tags in the food’s packaging. Rather than today’s annoying cacophony of beeps, tomorrow’s sensors (as some already do today) will interface to a family through integrated apps on mobile devices or display screens. The data will have been processed and interpreted. Most likely that processing will occur in the cloud. So, to deliver services the consumer wants, much data will need to have left the home.

Environmental sensors that enable new food and air safety may also be able to detect and characterize tobacco or marijuana smoke. Health care or health insurance providers may want assurance that self‐declared nonsmokers are telling the truth. Might they, as a condition of lower premiums, require the homeowner’s consent for tapping into the environmental monitors’ data? If the monitor detects heroin smoking, is an insurance company obligated to report this to the police? Can the insurer cancel the homeowner’s property insurance? To some, it seems farfetched that the typical home will foreseeably acquire cameras and microphones in every room, but that appears to be a likely trend. What can your cell phone (already equipped with front and back cameras) hear or see when it is on the nightstand next to your bed? Tablets, laptops, and many desktop computers have cameras and microphones. Motion detector technology for home intrusion alarms will likely move from ultrasound and infrared to imaging cameras – with the benefit of fewer false alarms and the ability to distinguish pets from people. Facial‐recognition technology will allow further security and convenience. For the safety of the elderly, cameras and microphones will be able to detect falls or collapses, or calls for help, and be networked to summon aid.

People naturally communicate by voice and gesture. It is inevitable that people will communicate with their electronic servants in both such modes (necessitating that they have access to cameras and microphones).

Companies such as PrimeSense, an Israeli firm recently bought by Apple, are developing sophisticated computer‐vision software for gesture reading, already a key feature in the consumer computer game console market (e.g., Microsoft Kinect). Consumer televisions are already among the first “appliances” to respond to gesture; already, devices such as the Nest smoke detector respond to gestures. The consumer who taps his temple to signal a spoken command to Google Glass may want to use the same gesture for the television, or for that matter for the thermostat or light switch, in any room at home. This implies omnipresent audio and video collection within the home.

All of these audio, video, and sensor data will be generated within the supposed sanctuary of the home. But they are no more likely to stay in the home than the “papers and effects” already discussed. Electronic devices in the home already invisibly communicate to the outside world via multiple separate infrastructures: The cable industry’s hardwired connection to the home provides multiple types of two‐way communication, including broadband Internet. Wireline phone is still used by some home‐intrusion alarms and satellite TV receivers, and as the physical layer for DSL broadband subscribers. Some home devices use the cell‐phone wireless infrastructure. Many others piggyback on the home Wi‐Fi network that is increasingly a necessity of modern life. Today’s smart home‐entertainment system knows what a person records on a DVR, what she actually watches, and when she watches it. Like personal financial records in 2000, this information today is in part localized inside the home, on the hard drive inside the DVR. As with financial information today, however, it is on track to move into the cloud. Today, Netflix or Amazon can offer entertainment suggestions based on customers’ past key‐click streams and viewing history on their platforms. Tomorrow, even better suggestions may be enabled by interpreting their minute‐by‐minute facial expressions as seen by the gesture‐reading camera in the television.

…

2.4 Tradeoffs among privacy, security, and convenience

Notions of privacy change generationally. One sees today marked differences between the younger generation of “digital natives” and their parents or grandparents. In turn, the children of today’s digital natives will likely have still different attitudes about the flow of their personal information. Raised in a world with digital assistants who know everything about them, and (one may hope) with wise policies in force to govern use of the data, future generations may see little threat in scenarios that individuals today would find threatening, if not Orwellian. PCAST’s final scenario, perhaps at the outer limit of its ability to prognosticate, is constructed to illustrate this point.

Taylor Rodriguez prepares for a short business trip. She packed a bag the night before and put it outside the front door of her home for pickup. No worries that it will be stolen: The camera on the streetlight was watching it; and, in any case, almost every item in it has a tiny RFID tag. Any would‐be thief would be tracked and arrested within minutes. Nor is there any need to give explicit instructions to the delivery company, because the cloud knows Taylor’s itinerary and plans; the bag is picked up overnight and will be in Taylor’s destination hotel room by the time of her arrival.

Taylor finishes breakfast and steps out the front door. Knowing the schedule, the cloud has provided a self-driving car, waiting at the curb. At the airport, Taylor walks directly to the gate – no need to go through any security. Nor are there any formalities at the gate: A twenty‐minute “open door” interval is provided for passengers to stroll onto the plane and take their seats (which each sees individually highlighted in his or her wearable optical device). There are no boarding passes and no organized lines. Why bother, when Taylor’s identity (as for everyone else who enters the airport) has been tracked and is known absolutely? When her known information emanations (phone, RFID tags in clothes, facial recognition, gait, emotional state) are known to the cloud, vetted, and essentially unforgeable? When, in the unlikely event that Taylor has become deranged and dangerous, many detectable signs would already have been tracked, detected, and acted on?

Indeed, everything that Taylor carries has been screened far more effectively than any rushed airport search today. Friendly cameras in every LED lighting fixture in Taylor’s house have watched her dress and pack, as they do every day. Normally these data would be used only by Taylor’s personal digital assistants, perhaps to offer reminders or fashion advice. As a condition of using the airport transit system, however, Taylor has authorized the use of the data for ensuring airport security and public safety.

Taylor’s world seems creepy to us. Taylor has accepted a different balance among the public goods of convenience, privacy, and security than would most people today. Taylor acts in the unconscious belief (whether justified or not, depending on the nature and effectiveness of policies in force) that the cloud and its robotic servants are trustworthy in matters of personal privacy. In such a world, major improvements in the convenience and security of everyday life become possible.

White House

White House Big Data and Privacy Review Reports

May 12, 2014

Share this: