Data Analytics: Change is Constant
By Tim Chartier, Davidson College, illustrations by Ansley Earle
From your phone to your watch to possibly even your thermostat, data is being produced, and analyzing that data is a big and progressing field. For educators teaching data science and students studying it, there is one big constant: change. The rate that data is produced changes. The tech that collects the data changes. Even what types of data can be collected changes.
With change comes many uncertainties. Even so, there is one certainty for the foreseeable future: data will influence our lives. As such, embracing the velocity at which the data landscape can change is an important part of data science.
Changes in data science come with innovations in tech, which can evolve at staggering rates. Consider photography. It's been estimated that around 1.5 trillion pictures were taken from the advent of the camera in the 1800s to the year 2000; industry research estimates that 1.8 trillion were taken in 2023 alone.
Take a moment and think of this radical change in photography. At one time, even taking a picture was new technology. Now, more pictures were taken last year than in all the years prior to 2000. In tech, we essentially never know when the next big thing will change the tech landscape. In the 1980s, taking one's film to a pharmacy for development seemed normal and what we'd be doing for years. Now? Rather than going on vacation and rationing how to best take two or three rolls of 24 pictures each, we grab our phone and potentially take over a hundred pictures in a day.
According to a 2018 IBM study, that year 90% of internet data had been created since 2016, and the growth has only exploded from there. An estimated 362 billion emails are sent daily. If we assume an average length of 200 words, then the equivalent of approximately 130 million copies of War and Peace are emailed every minute! My copy of War and Peace is about 2.25 inches thick. If you place copies of the book one on top of each other, they'd reach 293 million inches or 4624 miles. Space starts 62 miles above the Earth's surface. The International Space Station is only about 250 miles above the Earth. If we put the books side by side, 130 million copies would reach from New York City to Los Angeles and back again!
Large amounts of data are also created via the “Internet of Things,” which includes such things as smart home devices, wearable devices, and devices used in fields such as healthcare, agriculture, and infrastructure. Data influences our lives as we swim in an ever growing ocean of bytes. Today’s unsolvable problems may yield new insight in the future. How can an unsolvable problem of today offer insight tomorrow?
First, changes in computing are inevitable. They lead to new ways to analyze large amounts of data. Let’s see an example from a historic computer, the Apollo Guidance Computer (AGC), which was on board Apollo 11. The AGC landed humanity on the moon just 55 years ago. Still, a modern phone has more than one million times more memory than the AGC had in RAM, and over 100,000 times the processing power of the AGC.
Now, maybe we aren't making a fair comparison. While they bounce around in a backpack, purse or pants pocket, phones are specialized computing devices. So, let’s compare the AGC to a TI-84, released in 2004. This simple calculator is used by many high school students and has been around for almost 20 years. The TI-84 calculator runs 350 times faster than Apollo 11's computer. A problem that takes a month of computational time today will take much less time in the future. This allows for more work on a problem that’s been unsolvable. The ability to compute quicker can lead to more insight and innovation.
With so much data, we must also have adequate storage. At one time, data was stored on square 3.5 disks with the capacity of holding 1.44 MB of data. My personal phone stores 128 GB of data. If you stacked enough of those square disks to store 128 GB, they would stretch over the length of three football fields.\
Finally, mathematical advances will yield new insight from data. Keep in mind that data on its own doesn’t create insight. We need tools to find the insight within the mass of data. Research advances come from academia to business conducted by students, professors, and people in business. For example, Netflix and Amazon ratings influence what people watch and buy. Nate Silver combines multiple political polls into one, more accurate poll. And that’s not even mentioning generative artificial intelligence!
Studying the ever-changing world of data involves acknowledging a world of data that largely doesn’t exist yet. That unknown digital world will be built with data that we don't yet know will be collected. Data will influence our world. An exciting part of today is that we collectively play a role in uncovering what that influence will be.
Tim Chartier is the Joseph R. Morton Professor of Mathematics and Computer Science at Davidson College.