← Back to data subTLDR

data subTLDR week 15 year 2025

r/MachineLearningr/dataengineeringr/SQL

Unraveling SQL Mysteries with Noir Game, Navigating SQL Interview Questions, Tackling SQL Learning Challenges, Costly Missteps with Microsoft Fabric, Insights from a Data Engineer's Job Hunt

Week 15, 2025
Posted in r/dataengineeringbyu/Embarrassed_War33664/10/2025
643

Tried to roll out Microsoft Fabric… ended up rolling straight into a $20K/month wall

Blog
A flawed implementation of Microsoft Fabric led to a complete drain of capacity, locking the tenant and leading to a potential upgrade to a $20K/month Enterprise tier. The mishap demonstrates the pitfalls of rushing into AI-powered pipelines without proper version control and testing; a move that was initially brushed off for the sake of speed. Commenters suggest contacting Microsoft directly for a resolution, though some express skepticism about the company's willingness to aid. Concerns were raised about the absence of hard daily cost limits, and there were calls for more informed management and protection against future overages. The sentiment is predominantly negative, highlighting the need for thorough planning and understanding when implementing complex systems.
152 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/chrisBhappy4/7/2025
520

SQL Noir – 2 new SQL cases added to the open-source crime-solving game

SQLite
The open-source game SQL Noir, which teaches SQL through detective-style cases, has added two new cases, making it a total of six. The game is appreciated for its unique and engaging approach to gamifying SQL queries. It is highly recommended for anyone looking to improve their SQL skills and offers a fun challenge. Users look forward to solving new cases and suggest incorporating real-life unsolved mysteries. However, some users have found certain cases to be challenging. Despite this, the overall sentiment remains positive with users expressing gratitude for the free educational tool.
40 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/deal_damage4/11/2025
459

My 2025 Job Search

Career
Job seekers with experience have an advantage in the tech industry, as seen in a recent discussion of a data engineer's job search. The engineer, who secured a new role after submitting 30 applications, advised others to focus on companies where they felt a good conversational rapport and advised against lengthy 4-hour interviews. Other professionals chimed in, noting that despite having less technical knowledge than recent graduates, their experience often gave them an edge. Some participants expressed frustration with the application process, citing high numbers of applications and intense competition, especially in high cost of living areas. Overall, the sentiment was a mix of optimism, frustration, and resolve.
66 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/fauxmosexual4/7/2025
363

So are there any actual data engineers here anymore?

Discussion
The data engineering subreddit is experiencing a shift from technical discussions and advice towards startup-related content and market research, according to top comments. This trend is not unique to this subreddit, as software-related platforms are seeing a similar pattern. Despite this, data engineers continue to use traditional tools in their daily work, demonstrating a disconnect between industry practice and subreddit content. There's a concern that the increase in 'noise' could lead to a decrease in participation from experienced professionals. Suggestions to tackle this issue include stricter content tagging and better moderation. The overall sentiment is mixed, with frustration expressed about the current state of the subreddit.
122 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/hiskuu4/10/2025
321

[D] Yann LeCun Auto-Regressive LLMs are Doomed

Discussion
Yann LeCun's criticism of auto-regressive Language Models (LLMs) has stirred up mixed opinions. While some agree with LeCun, citing the need for an architecture and efficiency overhaul, others argue that diffusion-based LLMs show promise and posit that errors don't necessarily grow exponentially with sequence length. Others note the potential for models to self-correct after producing an incorrect token, and that the success of auto-regressive LLMs may be due to the absence of superior alternatives. The debate also highlights the question of how to effectively train new models and the possibility of multimodal training or using games. Overall, while there's skepticism towards auto-regressive LLMs, there's no clear consensus on the best way forward.
135 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/fumeisama4/11/2025
137

[P] A lightweight open-source model for generating manga

Project
The creator of an open-source model for generating manga shared their approach and results on Reddit. They fine-tuned Pixart-Sigma on 20 million manga images and resolved character consistency issues by using embeddings from a pre-trained manga character encoder. While the model runs smoothly on consumer GPUs and can generate detailed black-and-white manga art, it struggles with clothing consistency, hand rendering, and scene consistency. The response to the model was overwhelmingly positive, with Reddit users praising the ability to control image composition and the impressive results given the model’s size. Some users expressed curiosity about future developments and potential improvements in capturing scenery and viewpoint as embeddings.
27 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/jbnpoc4/8/2025
93

Got stumped on this interview question

Discussion
The thread discussed a SQL-based interview question about modifying a dataset's structure. The most upvoted responses recommended leveraging LEAD or LAG window functions to mark the first and last rows of each range, and then summarizing outside of a Common Table Expression (CTE). One user provided code, identifying the issue as a gaps and islands problem. A helper column was suggested to assign a ChangeID to each row where the ChangeID would increment each time there's a change in values. A humorous comment noted disapproval of the date format used. Overall, the sentiment was constructive with users offering helpful advice and solutions.
58 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/igorsusmelj4/10/2025
65

[P] B200 vs H100 Benchmarks: Early Tests Show Up to 57% Faster Training Throughput & Self-Hosting Cost Analysis

Project
Independent benchmarks by Lightly AI reveal Nvidia B200 GPUs provide up to 57% higher training throughput than H100s in computer vision model training workloads. From a cost angle, self-hosted B200s could potentially be 6x-30x cheaper than typical cloud H100 instances, though this heavily relies on utilization, energy costs, and amortization. Some users express skepticism about the benchmarks, citing potential errors and unoptimized testing parameters. Despite this, the community appreciates the insights, with interest in exploring the advanced capabilities of the B200, especially in enterprise-grade hardware comparison and batched inference. Overall, sentiment is mixed with both excitement and skepticism present.
5 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/PalindromicPalindrom4/9/2025
58

Why am I struggling with SQL?

PostgreSQL
Struggling with SQL is a common issue among beginners. Many users suggested breaking down complex problems into smaller logical steps before jumping into the coding process. This approach is crucial in programming, not just SQL. Comparing the learning process to mastering a skateboard trick or playing a musical instrument, it was emphasized that practice is key. Some users highlighted the importance of understanding real-world context in practice questions, suggesting that changing learning methods or visualizing data can be beneficial. Lastly, it was recommended to stop thinking procedurally and start thinking declaratively, demanding the output with a SQL query. The sentiment is encouraging, reminding beginners that the struggle is a normal part of the learning process.
50 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.