From the course: Utilizing Excel, Python, and Copilot as a Citizen Data Scientist

What is a citizen data scientist?

- In this first module, you'll learn about the fundamentals of what it means to be an analyst and data scientist and how to differentiate it compared to a traditional data scientist. So what's the difference? Now, within the data science discipline, you tend to have those that are less technical in nature, those that are more technical in nature. Now, the less technical in nature is known as a citizen data scientist. They're an individual that leverage predictive or prescriptive analytics, but whose primary job function is outside the pure field of statistics and analytics. Then you have the more technical, the data scientist, who is an individual who generally uses algorithmic techniques to really analyze data through applied statistics. Now, breaking this down further, there's a few key areas that are different. Firstly, the citizen data scientist has more focus on descriptive statistics. They look at certain KPI measurements, such as: "What is the trend for our month-on-month revenues?" "What is the month-on-month variance in our expenses?" And "What is the growth in the number of daily active users?" So descriptive measures of how business is performing. We then also have them using codeless visualization tools, like Power BI, Tableau, or Alteryx. They're able to then create key insights without having a technical coding background. Additionally, because these individuals don't have technical coding background, they can't clean messy data. So generally data is provided by a data engineer in clean data marts that can be accessed, so they're pre-processed. And lastly, they tend to focus significantly on business indicators. Now, the data scientist, on the other hand, is also a business-facing focus, but on a much more inferential statistics side. So rather than looking at a number that can be easily calculated, they focus more on inferred statistics, such as: "What is the predicted growth in the number of users for our product over the next 12 months?" "When releasing a new feature, is there a statistically significant uplift in conversion rates between feature A and B?" Or, "Can we look at using an algorithm to identify abnormal pricing periods through anomaly detection?" So a very different skillset than that of our citizen data scientist. Additionally, they tend to use a lot more algorithms to help analyze, automate, and synthesize their data. On top of this, they also have the capabilities to process messy and unstructured data and pull this together into one overall dataset. So a combination of CSVs, TSVs, JSON files, XML, they can use Python or R or Scala to pull this all together and get ready to analyze that data accordingly. And lastly, there's a focus on applied statistics. Now, this is the emphasis on either using supervised or unsupervised learning techniques to identify distinct patterns in our underlying dataset. So there we have it. We're contrasting our less technical and more technical disciplines. Our citizen data scientists are much more focused on leveraging simplistic descriptive means to analyze data, as opposed to our more technical data scientists using algorithmic means to identify patterns within datasets.

Contents