:badminton_racquet_and_shuttlecock: Quick question...
# thinking-together
a
🏸 Quick question: Do data scientists use inferential statistics such as t-tests, ANOVAs (and related stats like p-values and confidence intervals)? If so, is this done to compliment ML approaches? I’d love to hear some examples. A social scientist is usually concerned with explanation rather than prediction. The opposite seems to be true for data scientists. I’m curious how much overlap there might be from a data scientists perspective.
l
Do data scientists use inferential statistics such as t-tests, ANOVAs (and related stats like p-values and confidence intervals)?
Yes
If so, is this done to compliment ML approaches?
Yes, but also a lot of people with that title are essentially just analysts, so they use them the normal way. So data scientists themselves are split between the ‘explanation’ and (ML-heavy) ‘prediction’ camps. I think “data scientist” is a pretty roughly defined term and it’s best not to take it too seriously.
👍 1
s
hey Allan, I’m knee deep in the data world so I can provide some color here!
1. The higher the ‘stakes’ the more important the need for statistics and deeper modern understanding. Industries like insurance DO use things like machine learning, but statistics helps them actually better stay compliant (e.g. you REALLY need to make sure you aren’t discriminating on things like race and sex). But hey if you’re just predicting which vacation rental someone is more likely to rent, lower stakes (and revenue, A/B testing, etc can be used as a ‘process’ to improve the system thru space and time)
2. Statistics is especially used in controlled experiments. Things like A/B tests etc. ML is often used when you maybe can’t do that type of testing or the stakes aren’t that high
3. Definitely agree with Luke that “data scientist” is a very broad / poor name for a job. I have 2 mind about this. In one sense, better specialization and job titles makes roles clearer. On the other hand, we’ve seen what happens when people think about data in TOO narrow of a function (’just throw ML at it! YOLO!’) and there’s something to be said about full stack data people (even if they’re specialized in 1 or 2 things they should deeply understand the rest of the disciplines and the system I feel). But besides companies that are doing title inflation (https://medium.com/@chamandy/whats-in-a-name-ce42f419d16c#:~:text=At%20Lyft%2C%20we're%20rebranding,Science! for the sake of attracting candidates), usually Analysts are focused on understanding ‘the past’ / the current while ML Engineers, Data Scientists, etc are focused on trying to predict the future, provide more holistic / strategic advice, etc