Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insight from structured and unstructured data. Data Science is used in the Discovery of Data Insight.
Data scientists must become detectives when trying to figure out patterns, they must investigate leads and try and understand characteristics within their data sets which requires a significant amount of analytical creativity.
Data Science is an approach to unify statistics, data analysis, and machine learning in order to understand and analyze the actual narrative found within the data.
Data Science employs techniques and theories drawn from many fields: mathematics, statistics, information science, and computer science. Data Science is often used interchangeably with earlier concepts like Business Analytics, Business Intelligence, Predictive Modeling, and Statistics. Similar to data mining and big data analytics, data science uses powerful hardware, powerful programming systems, and efficient algorithms in order to solve problems.
Data science is all about diving in at a granular level to mine and understand complex behaviors, trends, and inferences to uncover insight found within the data. It’s about discovering hidden wisdom that can help companies make smarter business decisions.
- Netflix data mines movie viewing patterns to understand what drives user interest and uses this information to make decisions on which Netflix original series to produce.
- Target identifies customer segments within its base and identifies the unique shopping behaviors within each segment; this helps guide messaging to each specific market audience.
- Proctor & Gamble utilizes time series models to more clearly understand future demand, this helps P&G optimally plan their production levels.
Data Science and the Development of the “Data Product”
A “data product” is a technical asset that: (1) utilizes data as input, and (2) processes that data to return algorithmically-generated results.
The classic example of a data product is a recommendation engine, which ingests user data and makes personalized recommendations based on that data.
Here are some examples of Data Products:
- Amazon’s recommendation engines suggest items for users to buy determined by their algorithms.
- Netflix recommends movies to users determined by their algorithms
- Spotify recommends music to users determined by their algorithms.
- Gmail’s spam filter is data product. An algorithm behind the scenes processes incoming mail and determines if a message is junk or not.
- Computer vision used for self-driving cars is also data product machine learning algorithm able to recognize traffic lights, pedestrians, other cars on the road, etc.
Data Scientists play a central role in developing data products. Data scientists serve as technical developers required to build out the algorithms, test the algorithms, refine the algorithms, and technical deployment.
Data Science requires a blend of Skills including Mathematics
At the heart of mining data for insight and building data product is the ability to view the data through a quantitative lens. There are textures, dimensions, and correlations in data that can be expressed mathematically.
Utilizing data to find solutions becomes a brain teaser of examining quantitative technique. Solutions to many business problems involve building analytic models grounded in the hard math. Being able to understand the underlying mechanics of those models is key to success in building them.
A popular misconception is that data science all about statistics. While statistics is important, it is not the only type of math utilized. There are two branches of statistics, classical statistics and Bayesian statistics. When most people refer to statistics, they are generally referring to classical statistics, but knowledge of both types is helpful.
Furthermore, many inferential techniques and machine learning algorithms lean on knowledge of linear algebra. For example, a popular method to discover hidden characteristics in a data set is SVD, which is grounded in matrix math and has much less to do with classical statistics. It is helpful for data scientists to have broadness and depth in their knowledge of mathematics.
Data Scientists must possess knowledge of Technology and Hacking.
When we refer to hacking, we are not talking about hacking in the sense of breaking into computers. In the tech programmer subculture, the meaning of hacking is the act of using creativity, ingenuity, and technical skills to build things, and find clever solutions to problems.
Data scientists need to be able to code, prototype quick solutions, as well as integrate with complex data systems. Core languages associated with data science include SQL, Python, R, and SAS and on a lower level Java, Scala, Julia, etc. A hacker must know a bit more than just language fundamentals, a hacker must be a technical ninja, able to creatively navigate their way through technical challenges in order to make their code work.
A data science hacker must be a solid algorithmic thinker, having the ability to break down messy problems and recompose them in ways that are solvable; This skill is critical, data scientists operate within a lot of algorithmic complexity. They need to have a strong mental comprehension of high-dimensional data in order to establish full clarity on how all the pieces come together to form a cohesive solution.
Data Scientists must Possess a Strong Business Acumen.
It is important for a data scientist to be a tactical business consultant.
Working so closely with data, data scientists are positioned to learn from data in ways no one else can. This creates the responsibility to translate these observations to shared knowledge and contribute to strategy on how to solve core business problems.
Having a strong business acumen is just as important as having acumen for tech and algorithms. There needs to be clear alignment between data science projects and business goals. Ultimately, value doesn’t come from data, math, and tech itself; the real value comes from leveraging all of the above using data insights as supporting pillars which leads to guidance and formation of robust business strategies.
A common personality trait of Data Scientists is that they are deep thinkers with intense intellectual curiosity.
Data science is all about being inquisitive: asking questions, making new discoveries, and learning new things.
Ask a data scientist who is obsessed with their work what drives them, they won’t say “money”. The real motivation is being able to use their creativity and ingenuity to solve hard problems and constantly indulge in their curiosity. Deriving complex reads from data is beyond just making an observation, it is about uncovering “truth” that lies hidden beneath the surface. Problem solving is not a task, but an intellectually-stimulating journey to a solution.