Data Scientist vs Data Engineer (Cont.)


Responsibilities (Cont.)
The data scientist, on the other hand, is someone who cleans, massages, and organizes (big) data. Data scientists will usually already get data that has passed a first round of cleaning and manipulation, which they can use to feed to sophisticated analytics programs and machine learning and statistical methods to prepare data for use in predictive and prescriptive modeling.

Languages, Tools, and Software
You will often see data engineers working with tools such as SAP, Oracle, Cassandra, MySQL, Redis, Riak, PostgreSQL, MongoDB, neo4j, Hive, and Sqoop. Data scientists will make use of languages such as SPSS, R, Python, SAS, Stata, and Julia to build models.

The most popular tools here are, without a doubt, Python and R. When you’re working with Python and R for data science, you will most often resort to packages such as ggplot2 to make amazing data visualizations in R or the Python data manipulation library Pandas.




      After losing his queen, the chess player    
      threw in the tower (give up) and resigned.