← Back to Newsletters

data subtldr week 42 year 2024

r/MachineLearningr/dataengineeringr/sql

Unraveling the Mysteries of 'Fake Jobs', Deep Dive into Data Tools - Snowflake, Databricks, Redshift, Building Advanced SQL Skills, and the Productivity Secrets of US PhD Students in AI

Week 42, 2024
Posted in r/MachineLearningbyu/SilenceForLife10/19/2024
795

[D] Why do PhD Students in the US seem like overpowered final bosses

Discussion
The Reddit thread [D] Why do PhD Students in the US seem like overpowered final bosses reflects a sentiment of awe and curiosity about the high productivity of US PhD students in AI/ML/CV. Top comments attribute this to intense work culture, with students often working 7 days a week for 10+ hours. It is also noted that the top US institutions attract the best global talent, leading to high productivity. Commenters highlight that these students have access to substantial resources, including expensive GPU clusters for faster research. Some cynicism exists regarding the influence of institution reputation on publication acceptance. Comments suggest that the pressure, while leading to high output, also has serious implications for mental health.
245 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/bjogc4206910/14/2024
326

Is your job fake?

Discussion
The Reddit thread titled Is your job fake? generated a lively discussion about the perceived value (or lack thereof) of certain roles within mid-size to large companies. Many users resonated with the idea of 'fake jobs', roles that exist primarily for reasons such as empire building, diffusing responsibility, or box checking. Several commenters shared their experiences of performing tasks with little impact, often resulting in projects that were abandoned or rarely used. They also noted that data often seems to be used to confirm pre-existing beliefs rather than driving strategy. Overall, the sentiment was one of frustration, with some users advising others to find value in their paycheck or to seek more fulfilling work in smaller companies.
107 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/parlancex10/17/2024
290

[D] PyTorch 2.5.0 released!

Discussion
The PyTorch 2.5.0 release is well-received in the Reddit thread, with users expressing enthusiasm over new features and improvements. Highlights include the addition of support for `torch.istft` in `torch.compile`, enhancing execution speed, and the introduction of a new CuDNN backend for SDPA. However, the sentiment towards `torch.compile` is mixed, with some users noting improvements in speed while others report persistent issues. The release's contribution to fast vocoders is also appreciated. One user humorously comments on the near 'greatness' of the 4095 commits. Overall, the community eagerly anticipates more improvements, specifically for AMD users and FFT functions.
27 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/mdchefff10/15/2024
245

What are Snowflake, Databricks and Redshift actually?

Help
The Reddit thread discusses the functionalities of Snowflake, Databricks, and Redshift. The users highlighted that these platforms are columnar data stores designed for OLAP style analytical workloads with large datasets. They are optimized for query speed, not write speed, or create update delete transactions. Redshift is likened to a regular database where compute & storage are combined. It's functional but slow and requires maintenance. Databricks and Snowflake separate storage from compute. Databricks is a shared SaaS model where customers maintain the compute and storage. Snowflake is a full SaaS handling everything from software to security, with minimal maintenance. PySpark in Databricks was compared to pandas for larger data. The thread has positive sentiment with users sharing detailed comparisons and features of the tools.
68 comments
Share
Save
View on Reddit →