Innovation and Technological advancements drive the current day world. One might be good in either, but unless and until both blend properly, the product would not stand out.
With the rapid growth in digitalization, approximately 2.5 Exa bytes of data is being generated everyday (See the picture below to understand how big this value is). It is important to keep in mind that storing this data doesn’t come for free. People definitely want to make use of this data to plan for the future based upon insights drawn from existing data.To plan, they look up to Data Analysis techniques which is only one aspect of the job. The more important part that needs to be tackled is selecting the right tool.
R , Python or SAS? For many, this is not even the right question considering that all three tools do an excellent job on what they are set out to do. This is like debating Mac vs Windows vs Linux where in the present world, we know that there is a place for all three. However, for budding professionals who are looking forward to building a career in Data Science with organizations that are building advanced analytics capabilities, this is an important question that stares at them. Here we shall discuss only key aspects like data management capability, advanced modelling, graphical capabilities and big data applications.
Graphical Visualization Utilities
When it comes to data visualization or graphical capabilities which arerequired to understand the databetter, R leads the packas it has packages like GGPlot, Lattice, GGVIS, RGIS etc. Python also has good graphical capabilities with packages like Matplotlib, VisPy, but relatively in comparison with R, they are labyrinthine with a steep learning curve.
Off late, SAS has worked on improvising its graphical capabilities. However, the options available are still not able to match up with those available in R and Python.
While SAS, R and Python share the same footage when it comes to standard statistical and modelling capabilities, for advanced algorithms like machine-learning and more nuanced options, R and Python outpace SAS hands-down.
Considering that R was designed to make a statistician’s life easy, it does have field specific advantages with more than 7500+ contributed packages and the list keeps growing. This is done through CRAN – Comprehensive R Archive Network. Owing to this ever active and enterprising community of R users,a number of state of the art techniques and experimental programs are available in R but not in SAS.
With respect to Python, an incredible number of libraries like NumPy, Pandas, SciPy, Scikit Learn, Matplotlib are available. However,there is no strong reason to opt it over R.
Big Data Applications
When it comes to Big Data, most of the organizations look for end-to-end applications rather than ad-hoc or standalone analysis tool. This, is where Python steals the show. This is evident from the fact that apart from Scala and JAVA, Python is the only language that Hadoop-Spark clusters support.R, like Python, integrates well with Hadoop, offers great parallelization capabilities and large-scale machine learning capabilities for analytics.
In recent years, SAS has come up with options to run analytics inside Hadoop (in-memory) without moving the data out of cluster, but given the flexibility with open-source platforms, R and Python remain the first preference for Data science professionals.
Cost, Upgrades and Support
Cost is one area where R and Python have an upper hand given the fact that they are open-source. This has been a key factor for the phenomenal rise in their usage. On the other hand, SAS is an expensive, licensed software, which has an excellent support system. Especially in critical areas and delicate scenarios where there is no room for experimentation, SAS has proven itself indispensable. However, considering the cost, many a time, SAS is out-of-bounds for most small organizations and especially for start-ups.
Though most versions of R and Python don’t have any support system and come with absolutely no warranty, a silver lining lies in its vibrant user community. This rapidly evolving community comprises of people from various walks of life like academicians, students, programmers and analysts. With every brain contributing and troubleshooting, and hence adding improvements to the R and Python APIs via Packages and Libraries. Given the considerable cost advantage, small-to-mid sized companies prefer to go with R or Python.
R is preferred by companies which are primarily focused on advanced analytics and has pretty much become a lingua franca for Data Science. On the other hand, Python is preferred by tech companies where they need end-to-end integration and develop analytics based applications which leverage analytics friendly libraries.
It is difficult to make a conclusive argument about the three platforms as the selection among them will depend upon parameters like nature of industry, budget, strength of user community, flexibility in terms of usage and integration. Off-late SAS is coming up with various API’s along with the SAS/STAT API to integrate R and Python so that they can facilitate use cases that previously were not available. Hence there is a future where SAS, R and Python co-exist with each other.