The 91st Installment
Data Analysis, Old and New
by Miyuki Nakano,
Professor, Master Program of Information Systems Architecture
More than six years have passed since the term "big data" became popular in the ICT field. In the meantime, the number of ICT companies quickly combining data analysis and services (Google, for example) has grown significantly. Perhaps never before has data from so many different parts of society received such attention from so many different people. At the same time, never before has so much miscellaneous data been provided by so many different people and devices.
When we look back throughout human history, the importance of data or records is not something new. History itself is made up of records people have left behind. While things that happened in prehistoric times are not certain because they were only preserved in murals from tens of thousands of years ago, the signs of record-keeping by mankind can also be seen there as well. Animals assumed to be important to society at that time are clearly depicted, and we can now draw the analogy from the murals that the surrounding environment was a grassland rather than an arid area, and that large herbivores were present. In addition, since the creation of writing, more and more records have been left behind as society developed. They include social norms such as the Code of Hammurabi, religious documents such as the Dead Sea Scrolls, which were created to spread them throughout society or to pass them on to the next generation, or records supporting social activities at the time (clay tablets marked with cuneiform characters containing records about taxation and court cases, presumably used by government officials).
In short, as a matter of habit, humans record and use data in our social activities, and the amount and content of data has changed with each innovative change in data recording technology. Before the written word, social activities were carried out and developed based on personal memory, and then based on written documents afterward. Even after the invention of writing, the invention of paper and the advent of the printing press increased the amount of data transmitted to people, and non-written records such as photographs, audio and video became possible around the 20th century. And now, in the 21st century, the amount of data generated has grown dramatically compared to the time before the advent of computer technology, especially thanks to the development of database and internet technology.
As one can see throughout history that records have been kept in various forms, why is it that the term "big data" is used so much more than ever to describe the importance of data analysis using computers? One reason is that, as we all realize, the amount of data that is now being generated far surpasses the amount of data that can be "seen" in real time. In 2014, 72 hours (!!) of video were uploaded to YouTube per minute. If you do the simple math, it would take more than a lifetime to watch all of the videos uploaded in a single month. When data from tireless IoT devices is also included, it is clear that analyses such as simply graphing them manually cannot keep up.
In addition, there is a desire in modern society to be able to use the results of data analysis as quickly as possible. Before leaving home, many of us check the traffic and public transportation operations without a second thought, just like the weather forecast. News about typhoons is always followed by the mass media from the moment they first form. On the other hand, there are many people who get more detailed information through crowdsourcing and social media, and if there is any danger or accidents, they can send out this information themselves. Further losses can be prevented by quickly providing accurate information about inclement weather and major accidents.
The need for data analysis has also been boosted by the fact that results have become widely accepted as a part of services and are being offered in various forms, whether as social or business services. Users of services also use mobile phones and other mobile devices to such an extent that it would not be an exaggeration to say that everyone has one. During my commute, I almost never see anyone spreading out newspapers on the train in the morning anymore, but there are people standing around with cell phones in one hand.
Moreover, information has always been a driving force in society. The trade routes between ancient Rome and China were most likely not only for the movement of goods, but also the information in each location itself was a kind of commodity. Even in Japan, the missions to the Sui and Tang dynasties in China were carried out by the government just to obtain information. At that time, information was acquired in years, but now it is possible to communicate with the whole world in a few milliseconds. In order to obtain new information or produce useful information, modern society needs to make full use of various devices such as IoT devices and process the obtained data in a timely and appropriate manner.
So, what must be done for data analysis today? Computers and data analysis tools (various tools from statistical analysis to machine learning “artificial intelligence” have been developed) can handle huge amounts of data, but the ultimate purpose of data analysis always depends on us. Not so long ago, data collection itself was very difficult to conduct for specific purposes, so it was clear at the time of collection. However, in today's age of seemingly overflowing data, it is often the case that we are so focused on the data that we do not know what it is being collected for. Nevertheless, the minimum requirement for data analysis is to clarify "what is the purpose of analyzing this data" when faced with certain data. Data analysis cannot be performed without a purpose. If you do not know how it will be useful but you want to analyze data because it is there, yo1u should give priority to the analysis of other data that you think will be useful for “A” or to understand “B.” On the other hand, since data is subject to change, it is also desirable to view obtained results as the optimal solution (or localized solution to a snapshot) at a certain time and as a clue to new data analysis without being bound to fixed stereotypes.
We are currently in what should be called a "content distribution revolution" that allows various people to analyze data for various purposes from various places around the world. Everyone involved is very much looking forward to looking back and seeing how our contributions affect data analysis in the decade and decades to come.