← Back to data subTLDR
data subTLDR week 11 year 2025
r/MachineLearningr/dataengineeringr/SQL
Elon Musk's Data Expert's Overheating Hard Drive Mocked, 600+ Data Engineering Interview Questions Compiled, DuckDB's New Local UI Applauded, Dynamic Tanh Challenges Normalization in Transformers, Gemma 3 Outperforms Deepseek v3 with Less Resources
•Week 11, 2025
Posted in r/dataengineeringbyu/ChipsAhoy21•3/15/2025
3753
Elon Musk’s Data Engineering expert’s “hard drive overheats” after processing 60k rows
Meme
The Reddit thread discusses a claim from one of Elon Musk's Data Engineering experts about a hard drive overheating after processing 60k rows. The majority of the comments mock this claim, comparing the situation to a poorly written TV character or joking about running the data on a calculator. They criticize the inefficiency and absurdity of such a statement, pointing out that in their own experiences, they process significantly larger amounts of data daily without such issues. There is also mention of Elon Musk promoting this information, leading to further incredulity and humor. Essentially, the sentiment in the thread is largely skeptical and amused at the claim's lack of technical credibility.
Posted in r/dataengineeringbyu/Dubinko•3/12/2025
475
Parsed 600+ Data Engineering Questions from top Companies
Career
The reddit thread discusses a compilation of 600+ data engineering interview questions from top companies, freely accessible with certain limits. It generated humor with users joking about their study habits and the perceived simplicity of some questions. Some also referenced the disparity between interview questions and real-world job challenges. Others appreciated the resource, finding it valuable for interview preparation. A few comments also touched on the difficulties of managing stakeholder expectations in data engineering roles. Overall, the sentiment was positive, with users expressing gratitude for the hard work put into creating this resource.
Posted in r/dataengineeringbyu/TransportationOk2403•3/12/2025
348
DuckDB released a local UI
Blog
The Reddit thread revolves around the release of DuckDB's new local UI. The overall sentiment is overwhelmingly positive, with users expressing excitement and admiration for the new user interface. Several comments highlight the coolness of the new feature and its potential to improve their workflow. However, there are also questions raised about the client code and a comparison with Azure Data Studio. Despite the apparent enthusiasm, a few users warn about taking these praises at face value, suspecting they may be paid comments. There's also a clarification that the UI will not be open sourced, contrary to some expectations.
Posted in r/MachineLearningbyu/Nunki08•3/15/2025
229
[R] Transformers without Normalization (FAIR Meta, New York University, MIT, Princeton University)
Research
The Reddit thread discusses a new technique, Dynamic Tanh (DyT), developed by researchers from FAIR, NYU, MIT, and Princeton, which challenges the need for normalization layers in Transformers. DyT can potentially match or exceed the performance of normalized counterparts across diverse settings. Comments express fascination, scepticism, and interest in the technique. Some users highlighted the technique's potential benefits like minimizing parameters and separating feature transformation from aggregation. However, others questioned the lack of thorough analysis on DyT's convenience and its similarities to existing normalization techniques. One user shared their experience with DyT, noting slightly slower training but comparable quality. The sentiment overall appeared mixed with curiosity, critique, and some skepticism regarding the novelty and effectiveness of DyT.
Posted in r/MachineLearningbyu/we_are_mammals•3/12/2025
136
Gemma 3 released: beats Deepseek v3 in the Arena, while using 1 GPU instead of 32 [N]
News
The Reddit thread discusses the release of Gemma 3, a chatbot model lauded for beating Deepseek v3 with only 1 GPU. However, the sentiment in the comments is mixed. Some are impressed by the advancements in smaller models, speculating on possible image understanding and reasoning capabilities in future versions. Others express skepticism about the model's effectiveness, and there's criticism of the lmarena ranking system. Doubts are also raised about the larger models' ability to run on a single GPU. Google's offer of $10,000 in cloud credits to promote Gemma 3-based research is viewed as a bribe to encourage adoption. A few users also report satisfactory comparisons with other models.
Posted in r/SQLbyu/LaneKerman•3/12/2025
108
Ticketed by query police
PostgreSQL
The Reddit thread discusses a user's difficulty with a query scanning 200 million records where dates were stored as varchars, slowing the runtime significantly. The user eventually resolved the issue by filtering on a properly formatted date field, reducing the query time to 20 seconds. The top comments highlight the inefficiency of storing dates as varchars, suggesting possible workarounds such as using a supporting calendar table with actual date formats or generating a sequence of dates as strings. Some also recommended asking database administrators for a better solution or restructuring the data for future efficiency. The overall sentiment was frustration at the poor data structure, emphasizing the importance of proper date formats for efficient queries.
Posted in r/MachineLearningbyu/ripototo•3/11/2025
104
[D] Math in ML Papers
Discussion
A Reddit user in the machine learning field questioned the need for extensive math in ML papers, suggesting that verbal explanations might suffice. The user found that math serves more as a symbolic representation, rather than having a direct role in implementation. However, the top comments mostly disagreed, arguing that math brings rigor, clarity, and a common language to these papers. Many believe that math proves the functionality of algorithms and is fundamental to the field. A few comments indicated that the effectiveness of math might depend on the author's skill in technical writing and the reader's background in math. Some comments also hinted at the presence of unnecessary math to please reviewers or to appear more scholarly.
Posted in r/SQLbyu/developing_fowl•3/15/2025
95
How to understand queries that are 600+ lines long?
Discussion
The Reddit thread titled How to understand queries that are 600+ lines long? is largely about a SQL developer intern struggling to understand complex queries and feeling unsupported by their team. The user is seeking advice for understanding large SQL queries and improving their skills. Among the 94 comments, the sentiments range from empathy to providing practical advice. Many suggest breaking down queries into smaller parts for better understanding, practicing more, seeking external resources for learning, and improving communication with team members. The overall sentiment is supportive, with a focus on self-improvement and perseverance. Despite the intern's initial feelings of insecurity, commenters encourage them to keep pushing forward and not be afraid to ask questions.
Posted in r/SQLbyu/113862421•3/13/2025
93
Circular Dependencies?
PostgreSQL
The Reddit thread discusses a PostgreSQL database design for a music academy. Commenters offered various suggestions to optimize the database, such as using star schema and Ralph Kimball's modelling techniques. Some pointed out potential redundancy in the current design, such as unnecessary tables for teacher-student relationships and recital-songs. Others provided insights into how to better represent relationships, like using keys in the recital table to connect to other entities, or rethinking the many-to-many relationship between teachers and instruments. The sentiment was constructive, with users helping to improve the database design and clear up confusion.
Subscribe to data-subtldr
Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.