← Back to Newsletters

data subtldr week 4 year 2025

r/MachineLearningr/dataengineeringr/sql

Self-Taught Data Engineering: Reality or Myth?, SQL Struggles: A Universal Experience, Oracle SQL Developer Alternatives: DBeaver vs DataGrip, AI Efficiency: Are We Reaching a 90% Benchmark?

Week 4, 2025
Posted in r/MachineLearningbyu/yogimankk1/22/2025
378

[D]: A 3blue1brown Video that Explains Attention Mechanism in Detail

Discussion
The Reddit thread discusses a 3blue1brown video that provides a detailed explanation of the attention mechanism in machine learning. The comments reflect a high regard for the video's ability to visually explain complex concepts. Users appreciate the step-by-step method of introducing the problem and gradually revealing the attention mechanism as the solution. One user also clarifies a point at 11:22 in the video, explaining it as a method of measuring the model's prediction accuracy. Additional resources are suggested in the comments, including talks and tutorials related to the topic. The overall sentiment is positive, acknowledging the value of clear explanations in understanding challenging subjects.
13 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Over-Reply61851/21/2025
255

How many of you are self-taught Data Engineer?

Career
The Reddit thread titled 'How many of you are self-taught Data Engineer?' initiated by Over-Reply6185, questioned the possibility of becoming a self-taught data engineer. Despite the initial skepticism, the thread revealed a significant number of successful self-taught data engineers. Users like 'winsletts' and 'neoneo112' emphasized that many data engineers are self-taught due to the lack of relevant degrees in the past, and the broad availability of resources like Google for learning. Others acknowledged transitioning from different backgrounds like business and teaching, to data engineering roles. The discussion underscored that while having a degree is beneficial, it's not a strict necessity, with experience and self-learning being equally valuable.
158 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/katxwoods1/24/2025
249

Anthropic CEO says at the beginning of 2024, models scored ~3% at SWE-bench. Ten months later, we were at 50%. He thinks in another year we’ll probably be at 90% [N]

News
The thread discusses Anthropic CEO's prediction of AI reaching 90% efficiency on SWE-bench in a year, up from 50% in 10 months. The top comments express skepticism about this claim. Many comments critique the idea of linear extrapolation in AI progress, pointing out that improvements often become harder as systems approach optimal performance. Others make light of the CEO’s optimism, joking about it with humorous exaggerations. Several comments imply skepticism about the CEO's motive, suggesting he might be biased due to his position. The overall sentiment is one of doubt and cynicism towards the CEO's prediction, reflecting a general sentiment of skepticism about such optimistic predictions in the AI field.
109 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Happysedits1/25/2025
242

[R] Replicating DeepSeek-R3-Zero RL recipe on 3B LLM for <30$, the model develops self-verification and search abilities all on its own

Research
The Reddit thread discusses the replication of DeepSeek-R3-Zero RL recipe on 3B LLM for less than $30. Users were impressed that Reinforcement Learning (RL) on language models was achievable and could be reproduced on a small scale. Some users suggested trying more combinations to prevent data leakage and called for implementation of an efficient RL on LLM for vertical domain reasoning. One user criticized the model's iterative revision ability, pointing out that it repeatedly tried the same solution. Some users also questioned whether the model was trained on the full DeepSeek-R1 dataset or just the countdown dataset. A direct link to the GitHub repository was provided for safer access to the project.
11 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/irrationalindian1/22/2025
205

Looking for a Data Engineer Buddy to Grow Together 🚀

Career
The Reddit thread titled Looking for a Data Engineer Buddy to Grow Together 🚀 is about a user seeking a peer to enhance their data engineering skills with. The thread has mixed sentiments, with some users expressing confusion and skepticism about the post's tone and the concept of pairing up in a field perceived as introverted. However, it also spurred collaboration, with a Discord group being set up to foster communication and learning among interested participants. Despite some doubts, the thread seems to have encouraged participation from users at all skill levels, including beginners willing to learn. The thread has overall been a platform for networking and skill-building in data engineering.
94 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/ephemeral4041/20/2025
168

This chapter from the book Homo Deus

Discussion
The Reddit thread discussing a chapter on 'Dataism' from the book 'Homo Deus' consists of mixed sentiments. Many participants criticize the author's concepts as 'pop science' and express skepticism towards his portrayal of biological organisms as biochemical algorithms. Users also highlight the limitations of data and simulations in truly reflecting reality, arguing that life is not quantifiable and data is only a projection of the world, not an accurate reflection. They question the author's credibility and believe his ideas to be a simplification of complex scientific concepts. However, one positive viewpoint appreciates his insight on the influence of pervasive and manipulative algorithms on our identities. Overall, the thread leans towards skepticism about the book's content and the author's approach.
31 comments
Share
Save
View on Reddit →