data subTLDR week 15 year 2025

r/MachineLearningr/dataengineeringr/SQL

Unraveling SQL Mysteries with Noir Game, Navigating SQL Interview Questions, Tackling SQL Learning Challenges, Costly Missteps with Microsoft Fabric, Insights from a Data Engineer's Job Hunt

April 13, 2025•Week 15, 2025

Posted in r/dataengineeringbyu/Embarrassed_War3366•4/10/2025

643

Tried to roll out Microsoft Fabric… ended up rolling straight into a $20K/month wall

Blog

A flawed implementation of Microsoft Fabric led to a complete drain of capacity, locking the tenant and leading to a potential upgrade to a $20K/month Enterprise tier. The mishap demonstrates the pitfalls of rushing into AI-powered pipelines without proper version control and testing; a move that was initially brushed off for the sake of speed. Commenters suggest contacting Microsoft directly for a resolution, though some express skepticism about the company's willingness to aid. Concerns were raised about the absence of hard daily cost limits, and there were calls for more informed management and protection against future overages. The sentiment is predominantly negative, highlighting the need for thorough planning and understanding when implementing complex systems.

152 comments

Save

View on Reddit →

Posted in r/SQLbyu/chrisBhappy•4/7/2025

520

SQL Noir – 2 new SQL cases added to the open-source crime-solving game

SQLite

The open-source game SQL Noir, which teaches SQL through detective-style cases, has added two new cases, making it a total of six. The game is appreciated for its unique and engaging approach to gamifying SQL queries. It is highly recommended for anyone looking to improve their SQL skills and offers a fun challenge. Users look forward to solving new cases and suggest incorporating real-life unsolved mysteries. However, some users have found certain cases to be challenging. Despite this, the overall sentiment remains positive with users expressing gratitude for the free educational tool.

40 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/deal_damage•4/11/2025

459

My 2025 Job Search

Career

Job seekers with experience have an advantage in the tech industry, as seen in a recent discussion of a data engineer's job search. The engineer, who secured a new role after submitting 30 applications, advised others to focus on companies where they felt a good conversational rapport and advised against lengthy 4-hour interviews. Other professionals chimed in, noting that despite having less technical knowledge than recent graduates, their experience often gave them an edge. Some participants expressed frustration with the application process, citing high numbers of applications and intense competition, especially in high cost of living areas. Overall, the sentiment was a mix of optimism, frustration, and resolve.

66 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/fauxmosexual•4/7/2025

363

So are there any actual data engineers here anymore?

Discussion

The data engineering subreddit is experiencing a shift from technical discussions and advice towards startup-related content and market research, according to top comments. This trend is not unique to this subreddit, as software-related platforms are seeing a similar pattern. Despite this, data engineers continue to use traditional tools in their daily work, demonstrating a disconnect between industry practice and subreddit content. There's a concern that the increase in 'noise' could lead to a decrease in participation from experienced professionals. Suggestions to tackle this issue include stricter content tagging and better moderation. The overall sentiment is mixed, with frustration expressed about the current state of the subreddit.

122 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/hiskuu•4/10/2025

321

[D] Yann LeCun Auto-Regressive LLMs are Doomed

Discussion

Yann LeCun's criticism of auto-regressive Language Models (LLMs) has stirred up mixed opinions. While some agree with LeCun, citing the need for an architecture and efficiency overhaul, others argue that diffusion-based LLMs show promise and posit that errors don't necessarily grow exponentially with sequence length. Others note the potential for models to self-correct after producing an incorrect token, and that the success of auto-regressive LLMs may be due to the absence of superior alternatives. The debate also highlights the question of how to effectively train new models and the possibility of multimodal training or using games. Overall, while there's skepticism towards auto-regressive LLMs, there's no clear consensus on the best way forward.

135 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/fumeisama•4/11/2025

137

[P] A lightweight open-source model for generating manga

Project

The creator of an open-source model for generating manga shared their approach and results on Reddit. They fine-tuned Pixart-Sigma on 20 million manga images and resolved character consistency issues by using embeddings from a pre-trained manga character encoder. While the model runs smoothly on consumer GPUs and can generate detailed black-and-white manga art, it struggles with clothing consistency, hand rendering, and scene consistency. The response to the model was overwhelmingly positive, with Reddit users praising the ability to control image composition and the impressive results given the model’s size. Some users expressed curiosity about future developments and potential improvements in capturing scenery and viewpoint as embeddings.

27 comments

Save

View on Reddit →

Posted in r/SQLbyu/jbnpoc•4/8/2025

Got stumped on this interview question

Discussion

The thread discussed a SQL-based interview question about modifying a dataset's structure. The most upvoted responses recommended leveraging LEAD or LAG window functions to mark the first and last rows of each range, and then summarizing outside of a Common Table Expression (CTE). One user provided code, identifying the issue as a gaps and islands problem. A helper column was suggested to assign a ChangeID to each row where the ChangeID would increment each time there's a change in values. A humorous comment noted disapproval of the date format used. Overall, the sentiment was constructive with users offering helpful advice and solutions.

58 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/igorsusmelj•4/10/2025

[P] B200 vs H100 Benchmarks: Early Tests Show Up to 57% Faster Training Throughput & Self-Hosting Cost Analysis

Project

Independent benchmarks by Lightly AI reveal Nvidia B200 GPUs provide up to 57% higher training throughput than H100s in computer vision model training workloads. From a cost angle, self-hosted B200s could potentially be 6x-30x cheaper than typical cloud H100 instances, though this heavily relies on utilization, energy costs, and amortization. Some users express skepticism about the benchmarks, citing potential errors and unoptimized testing parameters. Despite this, the community appreciates the insights, with interest in exploring the advanced capabilities of the B200, especially in enterprise-grade hardware comparison and batched inference. Overall, sentiment is mixed with both excitement and skepticism present.

5 comments

Save

View on Reddit →

Posted in r/SQLbyu/PalindromicPalindrom•4/9/2025

Why am I struggling with SQL?

PostgreSQL

Struggling with SQL is a common issue among beginners. Many users suggested breaking down complex problems into smaller logical steps before jumping into the coding process. This approach is crucial in programming, not just SQL. Comparing the learning process to mastering a skateboard trick or playing a musical instrument, it was emphasized that practice is key. Some users highlighted the importance of understanding real-world context in practice questions, suggesting that changing learning methods or visualizing data can be beneficial. Lastly, it was recommended to stop thinking procedurally and start thinking declaratively, demanding the output with a SQL query. The sentiment is encouraging, reminding beginners that the struggle is a normal part of the learning process.

50 comments

Save

View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

data subTLDR week 15 year 2025

Subscribe to data-subtldr

Get the weekly data subTLDR in your inbox!