From the course: Learning Data Science
Unlock this course with a free trial
Join today to access over 24,700 courses taught by industry experts.
Using statistics and software
From the course: Learning Data Science
Using statistics and software
- Because data science is still defined by practice, there's an extra emphasis on using common tools and software. Try not to get too focused on learning all the tools. The tools in themselves will not make you a data scientist. It's the scientific method, and not the tools, that make someone a data scientist. The tools basically fall into three categories: storing, scrubbing, and analyzing. To store the data, you can use spreadsheets, databases, and key value stores. Some popular ones are MongoDB, Cassandra, and PostgreSQL. Scrubbing is a common practice to make the data easier to work with. You'll use text editors, scripting tools, and programming languages, like Python and Scala. Finally, there are the statistical packages to help you analyze the data. The most popular are the open-source packages R, SPSS, and Python's data libraries. When you use these tools, you can also visualize the data and create nice charts and graphs. Let's first look at the tools you need to know to hold…