What is data science?


Welcome to Data Society’s Introduction to
R and Visualization course. In this course, we’ll teach you what data science is, and
how to get started with one of the most powerful weapons of a data scientist, R. My name is Dr. Harlan Harris and I will be
your instructor for this course. I have an academic background in computer science, machine
learning, and cognitive science; I co-founded Data Science DC and Data Community DC, which
are professional groups for data scientists and others in the DC area. I wrote a short
book that you can download for free about data scientists, called “Analyzing the Analyzers,”
published by O’Reilly; and I currently work in the education sector, building large-scale
predictive analytics systems. This course will first answer the question
“What is data science?” – we’ll go over how it is used in the world today, what
tools you have available, and what types of jobs are out there. Then, we’ll teach you
how to how to use R, one of the most powerful tools available to a data scientist. Finally,
we’ll teach you how to create compelling visualizations in R including 2D and 3D charts,
and how to overlay spatial data on top of Google Maps. Before we get started, let’s set expectations
for this course. Data science takes a lot of practice and dedication, and it requires
work to become competent and to be able to use tools such as R fluently. The first step
is to take this course, so congratulations! You can check that item off the list. Then
practice – when we say ‘practice’, we mean to type out the code we provide, play
around with it on your own, perhaps apply these techniques to data that you know and care
about. When you come across an error, where something doesn’t work as you expect, figure
out how to fix it. This is the best way to ensure that you understand what you’re building,
and what’s going on behind the scenes! Along those lines, to get the most out of this class,
make sure you review class material on your own and practice some more. Complete exercises
outside of class and practice again. It’s also important to stay current with the latest
industry trends and technology developments, as this can help you hone your skills and stay involved
in the community. Here is a more detailed outline for this course.
As we transition to different topics, you will see this slide periodically to prepare
you for what comes next. So let’s dive in. What is data science? What’s going on with data today? Here are
a few relevant quotes about the prevalence of data. It’s important to keep in mind
that data itself does very little. Just because we have a lot of it that doesn’t mean that
we’re being effective or efficient. It’s as though you have money under your mattress;
it only becomes useful when you start using it to buy things or invest in the future. In our courses,
you’ll learn how to take that data and ‘invest’ it so that you will get good returns. Here are some examples of how companies and
individuals are using data today. Data science is relevant in every industry and its uses
are expanding daily. The more people become data literate the more breakthroughs and applications
you will see over time. Here are a few more real world applications.
From marketing and healthcare to finance and politics, data science is used to answer complex
questions. Experts in all these fields are trying to predict and guide the future. How
will people behave? What will they do? What will cause a disease or an outbreak? What
are the forces that determine pricing of housing and other items? What will happen to the economy?
What makes data science so fascinating is how its applications can uncover hidden relationships
about the world, how it can predict people’s behavior, and lead to better decision-making. Data science goes hand in hand with big data.
So what is big data? Big data is large volumes of information, big data is about moving,
storing, manipulating and accessing lots of data, and in particular the eclectic, novel, fine-grained
data, often generated by individual people. The term has become synonymous with
data analysis – it is not. Having a lot of data is not helpful without analysis or
insights. That’s why you’re in this course! This diagram is a variant of one originally created by Drew Conway, about 5 years ago,
and it’s still highly relevant. Data science is at the intersection of three domains:
First, we have industry knowledge. In order to understand the problem or issue that you’re
tackling, you should have a good idea of what the limitations are, what resources you have
at your disposal, and what you are trying to solve for. Second on this list is math
and statistics – while you don’t have to be a card-carrying mathematician to be successful,
you do need to understand how we can quantitatively define relationships and categories in order
to determine how they interact, and you have to understand how randomness works, and how
to think about imprecision. Last but not least, we have programming – once you figure out
what problem you want to solve and how you want to approach it, you need to tell the
computer what to do. So who is a data scientist? A data scientist can have many responsibilities – they may
be asked to pose the right questions about data that has been gathered or to format the data
correctly and analyze the data to provide conclusions about the data and what generated it. In many cases,
data scientists build data products, which are applications that provide predictions, recommendations,
and insights back to people who can make the best use of them. While everything on this slide can come under the purview of
a data scientist, they might not all be used for a particular task. It’s important to recognize that not everyone who works with data is a data
scientist. There are actually several levels and types of jobs that are available in this
domain. The first level is that of a data analyst – they are more likely to be responsible
for cleaning and managing the data, as well as creating some basic visualizations for
others to analyze. Historically, a lot of work like this has been done by business analysts,
using spreadsheets, but using the tools of a data scientist, you can be much more productive,
and generate even more compelling work. Next, we have the data modeler who will build basic
statistical models to answer specific questions – while they have a good understanding of
the data, they are not expected to know the tools as broadly as a data scientist, and
they are more likely to use off-the-shelf software to do analyses. A lot of traditional
statisticians fall into this category. But the new role of data scientist has a command of
a broader range of tools and methods, including computer programming and the technologies
used in the big data world. They are able to ask the right questions to guide research, and
does so while thinking scientifically about how the world works. This chart illustrates how companies work
with varying levels of data and what skills they might look for in their employees. Different
organizations are more or less mature in how they use data, and different sets of skills
are required to make a big impact. The first column here outlines a company that is not so data
driven, but does work with some data using tools such as spreadsheets, typically generating insights with bar charts and other related types of visualization. This company would require fundamental skills with basic tools, but might not require much knowledge in software engineering or data processing as would a business that
is starting to move towards data collection in the second column. Organizations that produce
data (such as Twitter or NASDAQ or a government agency) would be looking for data skills across
the broad spectrum. A critically important role is at organizations that have matured
to the point where they use advanced analytical methods to make decisions and drive the bottom
line for their organization. Keep this in mind as you are progressing through this course – what type of work
do you want to be doing, how do you want to fit in, and what sort of impact do you want to
be able to make? Almost every large company these days hires
data scientists and others with similar skills, as do many smaller organizations. Here are
just a few companies that hire data scientists, but there are many many more out there and they
span just about every industry. Often times, you can actually make the most impact by helping a
old company in an old industry move beyond the basics in how they think about and work
with data. You don’t have to work with petabytes of data at a silicon valley startup. That’s an option but there are lots of ways to make a big impact. You might be wondering how much data scientists
make and why it’s worth it to become more proficient with data. The skills with software
and statistics that are key to being a data scientist are incredibly valuable for almost
anyone in today’s economy According to a recent survey, data scientists can earn a median salary that can be up to 40% higher than predictive analytics professionals that are roughly the data modelers I mentioned previously, at the same level of seniority. Even if you don’t want to invest the time to become a full-fledged data scientist, having some of their skills, and understanding what’s possible, can be incredibly valuable in whatever job market or career path you pursue.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *