‘A bird in hand is worth two in the bush’-J. Capgrave
‘a sparrow in thy hand is better than a thousand sparrows flying‘ – 6th century BC Proverbs of Ahiqar
A little while ago, I was advised to set professional goals with a freedom to choose one of the two paths that suit my experience – Data Architecture Vs Data Science.
Now, for an engineer who has, like a million others, passed through the mundane humdrum of engineering, IT job, some coding and analysis, I was confused! The first question that I refrained from asking (Maurice Switzer’s words ringing in my ears – ‘Better to remain silent and be thought a fool than to speak and to remove all doubt.’) was ‘aren’t they the same?’
Since then, through countless hours of research, I realized that we have been staring at the answer (even unknowingly) since the inception of data itself. Now, there are pundits and gurus in the field who would frown at my attempt of oversimplifying these two industries. Still, I believe that the simplest and most profound way of expressing the difference is just with two words – ‘Have’ and ‘Can’.We will come back to this in a bit.
Data – A Beast Ever-growing
As need increases, technology progresses. There was a time when designing a traditional data warehousewas relatively methodical (almost mundane) owing to structured data and handful of tools to manage the not so bestial volume.
With the advent and deep rooting of internet and smart devices now more than ever, information is no longer a tool only for engineers and organizations, seeping its way through the very fibers of individuals. That means, with every action that we take, we generate data. So much so that every day 2.5 Quintillion Bytes (that is 25 and 17 zeroes!) of data are created (credit: Ben Walker – Vouchercloud).That means, technology is catching up to consume and maintain the data volume for business needs.
Here is where data-architects step in. Armed with the knowledge of the most cutting edge and modern tools and experience of living with ever-growing information, they design the building blocks for an organization to methodically consume and process the data for reporting. Their designs are instrumental in the technological advancement of any organization through thorough thought and throughput.
However, isn’t this same as data science. As it turns out, no.
Science of Data
The word ‘Science’ is defined as ‘the intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment.’
The glowing words in this definition being ‘observation’ and ‘experiment’. There is absolute and then there is possibility.
This is where the science of data steps in. Let us take a very practical example. We buy groceries and the store records our purchases to report their earnings. To be able to record every customers’ purchases, every day for a year would result in an amount of data very difficult to store and manage. Hence, the need of an architect to build a solution for this very need.
However, these are the transactions that the stores ‘have’and are absolute. Now, with the same example, what if the teller at the grocery store could tell you:
- You have purchased all the ingredients for making a cake, however, most people who do similar purchases also go for vinegar because of its properties to give a better lift to the cake.
- You typically spend more than 70%of your grocery bill on fruits every week;here is discount just for you of 10% off your next purchase of fruits.
- You purchased flowers and chocolate for your anniversary last year. For the past few months, you have made healthy choices for your diet. Now that your anniversary is coming up, consider dark chocolate instead.
Right off the bat, this store would earn referrals and repeat businesses, because, they went the extra step in giving personalized advice. This is the essence of Data Science.
It is about the business you ‘can’ earn, the value that you ‘can’ provide, with the tools that you may or may not ‘have’ at hand. It is about the possibilities, the trends, the information and the insights that you ‘can’ have the data sing to you, which may not necessarily be absolute.
In those regards, data science starts with data architecture and takes it several steps further.
Typical processes/roles of a data scientist include but are not limited to:
Data cleansing and Processing, Predictive modeling, Machine learning, Identifying questions and running queries, Statistical Analysis, Correlating disparate data, Storytelling and Visualization
As opposed to the roles of a data architect being:
Data warehousing solutions, Extraction, Transformation and Load (ETL), Data Architecture Development, Data Modeling (credit: Elizabeth Mazenko – BetterBuys.com)
It is very evident, in the grand scheme of things, thatboth data architecture and data science are essential for information and business. Both are essential pegs in the clockwork of information management. While one defines a structure, the other explores the possibilities. It is however, the possibility of the other 2 birds in the bush or the thousand that are flying that makes all the difference.