data subTLDR week 11 year 2025

r/MachineLearningr/dataengineeringr/SQL

Elon Musk's Data Expert's Overheating Hard Drive Mocked, 600+ Data Engineering Interview Questions Compiled, DuckDB's New Local UI Applauded, Dynamic Tanh Challenges Normalization in Transformers, Gemma 3 Outperforms Deepseek v3 with Less Resources

March 16, 2025•Week 11, 2025

Posted in r/dataengineeringbyu/ChipsAhoy21•3/15/2025

3753

Elon Musk’s Data Engineering expert’s “hard drive overheats” after processing 60k rows

Meme

The Reddit thread discusses a claim from one of Elon Musk's Data Engineering experts about a hard drive overheating after processing 60k rows. The majority of the comments mock this claim, comparing the situation to a poorly written TV character or joking about running the data on a calculator. They criticize the inefficiency and absurdity of such a statement, pointing out that in their own experiences, they process significantly larger amounts of data daily without such issues. There is also mention of Elon Musk promoting this information, leading to further incredulity and humor. Essentially, the sentiment in the thread is largely skeptical and amused at the claim's lack of technical credibility.

800 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Dubinko•3/12/2025

475

Parsed 600+ Data Engineering Questions from top Companies

Career

The reddit thread discusses a compilation of 600+ data engineering interview questions from top companies, freely accessible with certain limits. It generated humor with users joking about their study habits and the perceived simplicity of some questions. Some also referenced the disparity between interview questions and real-world job challenges. Others appreciated the resource, finding it valuable for interview preparation. A few comments also touched on the difficulties of managing stakeholder expectations in data engineering roles. Overall, the sentiment was positive, with users expressing gratitude for the hard work put into creating this resource.

41 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/TransportationOk2403•3/12/2025

348

DuckDB released a local UI

Blog

The Reddit thread revolves around the release of DuckDB's new local UI. The overall sentiment is overwhelmingly positive, with users expressing excitement and admiration for the new user interface. Several comments highlight the coolness of the new feature and its potential to improve their workflow. However, there are also questions raised about the client code and a comparison with Azure Data Studio. Despite the apparent enthusiasm, a few users warn about taking these praises at face value, suspecting they may be paid comments. There's also a clarification that the UI will not be open sourced, contrary to some expectations.

42 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Nunki08•3/15/2025

229

[R] Transformers without Normalization (FAIR Meta, New York University, MIT, Princeton University)

Research

The Reddit thread discusses a new technique, Dynamic Tanh (DyT), developed by researchers from FAIR, NYU, MIT, and Princeton, which challenges the need for normalization layers in Transformers. DyT can potentially match or exceed the performance of normalized counterparts across diverse settings. Comments express fascination, scepticism, and interest in the technique. Some users highlighted the technique's potential benefits like minimizing parameters and separating feature transformation from aggregation. However, others questioned the lack of thorough analysis on DyT's convenience and its similarities to existing normalization techniques. One user shared their experience with DyT, noting slightly slower training but comparable quality. The sentiment overall appeared mixed with curiosity, critique, and some skepticism regarding the novelty and effectiveness of DyT.

47 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/we_are_mammals•3/12/2025

136

Gemma 3 released: beats Deepseek v3 in the Arena, while using 1 GPU instead of 32 [N]

News

The Reddit thread discusses the release of Gemma 3, a chatbot model lauded for beating Deepseek v3 with only 1 GPU. However, the sentiment in the comments is mixed. Some are impressed by the advancements in smaller models, speculating on possible image understanding and reasoning capabilities in future versions. Others express skepticism about the model's effectiveness, and there's criticism of the lmarena ranking system. Doubts are also raised about the larger models' ability to run on a single GPU. Google's offer of $10,000 in cloud credits to promote Gemma 3-based research is viewed as a bribe to encourage adoption. A few users also report satisfactory comparisons with other models.

38 comments

Save

View on Reddit →

Posted in r/SQLbyu/LaneKerman•3/12/2025

108

Ticketed by query police

PostgreSQL

The Reddit thread discusses a user's difficulty with a query scanning 200 million records where dates were stored as varchars, slowing the runtime significantly. The user eventually resolved the issue by filtering on a properly formatted date field, reducing the query time to 20 seconds. The top comments highlight the inefficiency of storing dates as varchars, suggesting possible workarounds such as using a supporting calendar table with actual date formats or generating a sequence of dates as strings. Some also recommended asking database administrators for a better solution or restructuring the data for future efficiency. The overall sentiment was frustration at the poor data structure, emphasizing the importance of proper date formats for efficient queries.

60 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/ripototo•3/11/2025

104

[D] Math in ML Papers

Discussion

A Reddit user in the machine learning field questioned the need for extensive math in ML papers, suggesting that verbal explanations might suffice. The user found that math serves more as a symbolic representation, rather than having a direct role in implementation. However, the top comments mostly disagreed, arguing that math brings rigor, clarity, and a common language to these papers. Many believe that math proves the functionality of algorithms and is fundamental to the field. A few comments indicated that the effectiveness of math might depend on the author's skill in technical writing and the reader's background in math. Some comments also hinted at the presence of unnecessary math to please reviewers or to appear more scholarly.

59 comments

Save

View on Reddit →

Posted in r/SQLbyu/developing_fowl•3/15/2025

How to understand queries that are 600+ lines long?

Discussion

The Reddit thread titled How to understand queries that are 600+ lines long? is largely about a SQL developer intern struggling to understand complex queries and feeling unsupported by their team. The user is seeking advice for understanding large SQL queries and improving their skills. Among the 94 comments, the sentiments range from empathy to providing practical advice. Many suggest breaking down queries into smaller parts for better understanding, practicing more, seeking external resources for learning, and improving communication with team members. The overall sentiment is supportive, with a focus on self-improvement and perseverance. Despite the intern's initial feelings of insecurity, commenters encourage them to keep pushing forward and not be afraid to ask questions.

98 comments

Save

View on Reddit →

Posted in r/SQLbyu/113862421•3/13/2025

Circular Dependencies?

PostgreSQL

The Reddit thread discusses a PostgreSQL database design for a music academy. Commenters offered various suggestions to optimize the database, such as using star schema and Ralph Kimball's modelling techniques. Some pointed out potential redundancy in the current design, such as unnecessary tables for teacher-student relationships and recital-songs. Others provided insights into how to better represent relationships, like using keys in the recital table to connect to other entities, or rethinking the many-to-many relationship between teachers and instruments. The sentiment was constructive, with users helping to improve the database design and clear up confusion.

41 comments

Save

View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

data subTLDR week 11 year 2025

Subscribe to data-subtldr

Get the weekly data subTLDR in your inbox!