The concept of “Big Data” (or massive data) refers to a set of data so large that it is difficult to handle with conventional tools. It is often data from multiple sources registered to enable their operations and analysis without a predetermined goal and for no time limit.
The emergence of Big Data created instruments for development. First, there is the development of the Internet and the increase in the number of connected objects that contribute to the creation of large volumes of data and secondly the development of storage capacity and computing that allows treatment to cost more and more.
Big Data in principle to meet four characteristics: volume, velocity, variety, and value.
- Volume: the Big Data represent large amounts of data. It is said that 90% of data available today were created in the last two years.
- Velocity: the data generated, captured and shared at higher speed delays and data analysis are always shorter, and they are often processed in real or near real time.
- Variety: data analyzed is not necessarily structured. They may come from different combined sources (and a different size as text, images, multimedia content, digital traces, etc.). Data stored in an internal customer database can be combined with external data from social networks, search engines, or open data portals managed by public authorities.
- Value: The last feature is the added value that the data represent and possible uses to do the analysis.
Some examples
The uses are extremely different. We can cite, for example the analysis of crowd movements using data from cell phones to facilitate the delivery of aid following the earthquake that occurred in Haiti in 2010, the adaptation of President Obama’s speech at the 2012 campaign based on reactions posted on Twitter, or the identification of areas and hours in a city where crimes are most likely to commit to better allocate resources.
Another famous example is the US company Target, which store can identify women who are expecting a child to offer their products for infants. For this, the company has analyzed millions of data from loyalty cards to women opening a list of baby shower gifts. For example, they observed that they began to buy creams fragrance to about three months of pregnancy, and some dietary supplements at a different stage of pregnancy. By applying these criteria (combined with others) to all its customers, Target can identify pregnant women with tremendous efficiency.
Where is data protection in all of this?
Big Data poses a real challenge for data protection because many basic principles are endangered. Requirements of data protection, apply only to the processing of personal data, or data related to an identified or identifiable person. Therefore excluded the data of anonymous. The problem here is that when anonymous data are combined with other data, they can quickly become identifiable.
Data can be processed in the order in which they were collected, and they must be destroyed once this goal is achieved. Big Data is based instead on the use of data for other purposes, even on the retention of data for future use (which is the not-yet-determined goal).
The person concerned must be aware of the processing of their data, including their transmission to third parties, which implies a clear and accurate information on the terms and goals of treatment. These rights are difficult to comply with the processing of Big Data. The correct data, as well as the guarantee of a right of access, can also be problematic.
This does not mean that standards of data protection are not applicable, or should be changed. Simply one who is collecting and analyzing Big Data must show good faith and transparency. It will also take the necessary measures to ensure as much as possible the anonymity of the data and to ensure its safety.