โ† Back to data subTLDR

data subTLDR week 13 year 2025

r/MachineLearningr/dataengineeringr/SQL

Unwrapping the Perils of Minor Schema Changes, Decoding Databases Beyond Metadata, Discovering Live Index Features, and Navigating the Maze of Vendor Benchmarks

โ€ขWeek 13, 2025
Posted in r/dataengineeringbyu/Adela_freedomโ€ข3/27/2025
877

It's just a small schema change ๐Ÿฆ๐Ÿ˜ด๐Ÿ”จ๐Ÿ’๐Ÿคก

Meme
The discussion revolved around the idea of simplifying database schemas by using a table with an ID and a JSON field, effectively reducing the need for schema changes. This concept, while seen as valid for handling raw data in specific scenarios, also triggered criticisms, such as the inconvenience of working with uppercase column names in SQL statements. The comparison to MongoDB sparked a debate, with some suggesting that PostgreSQL handles JSON better. The conversation also highlighted the repercussions of renaming a column in certain workplaces, indicating an overall mixed sentiment towards the concept.
30 comments
Share
Save
View on Reddit โ†’
Posted in r/SQLbyu/Adela_freedomโ€ข3/27/2025
681

It's just a small schema change ๐Ÿฆ๐Ÿ˜ด๐Ÿ”จ๐Ÿ’๐Ÿคก

Discussion
The consensus among respondents is that minor schema changes in data management, such as replacing null values with zeroes, can have significant, unintended consequences, especially if done without comprehensive understanding of the dataset and calculations involved. Concerns were raised about the impact on computations like averages, where replacing null values with zeros can lower results. Also, changes could disrupt calculated columns and conditional statements that rely on null values. The timing of these changes was humorously critiqued, with suggestions of making changes before weekends or vacations, highlighting the potential for havoc. Thus, such changes must be approached with caution.
22 comments
Share
Save
View on Reddit โ†’
Posted in r/dataengineeringbyu/rmoffโ€ข3/27/2025
544

Yet another vendor with their benchmark blogโ€ฆ

Meme
The thread discusses skepticism towards vendor benchmark blogs, with the highest upvoted comment criticizing unrealistic and potentially misleading benchmarks used by companies. Many comments suggest these benchmarks may be designed more for sales pitches than for informed decision-makers. However, one commenter, likely a vendor, concedes that while benchmarks may not be perfect, they didn't optimize, cook or cherry-pick their data. The sentiment reflects a mix of skepticism and understanding that these benchmarks are part of the tech industry's game. There's also a call for transparency, such as showing both out-of-the-box and optimized performance data.
9 comments
Share
Save
View on Reddit โ†’
Posted in r/MachineLearningbyu/l_veeraโ€ข3/24/2025
150

[D] ICML 2025 review discussion

Discussion
Many comments expressed frustration with the review process for the International Conference on Machine Learning (ICML) 2025, such as reviewers not thoroughly reading submissions, the pressure to get into A* conferences, and the handling of resubmissions with improvements. However, there was also a sense of resilience and hope, with some contributors vowing to continue despite setbacks and others wishing for the best. A few users raised specific concerns about the use of Language Model (LLM)-generated reviews and the ambiguity of the term AOE (Anywhere on Earth Time).
271 comments
Share
Save
View on Reddit โ†’
Posted in r/MachineLearningbyu/hiskuuโ€ข3/29/2025
139

[R] Anthropic: On the Biology of a Large Language Model

Research
The discussion focuses on Anthropic's research on Claude 3.5 Haiku, a language model that exhibits complex capabilities such as multi-step reasoning and planning in poems. The model's use of language-independent circuits and ability to generalize addition in different contexts were noted. It also identifies potential medical diagnoses based on symptoms. However, issues like 'circuit misfires' causing hallucinations, adherence to harmful instructions, and concealment of secret goals raised concerns. Overall, participants expressed enthusiasm for the step towards model interpretability, while acknowledging potential risks. The sentiment was predominantly positive, with a keen interest in further breakthroughs.
36 comments
Share
Save
View on Reddit โ†’
Posted in r/MachineLearningbyu/jacobfaโ€ข3/28/2025
102

[D] How Do You Make Your Published Plots Look So Good?

Discussion
The consensus among professionals in data visualization is that creating high-quality plots requires time, effort, and practice, with customization being key. There's a view that defaults in plotting libraries can be perceived as low-quality, despite being based on significant research. Suggestions include using libraries like matplotlib, Penrose, TikZ, Plotly, Seaborn, and even PowerPoint for creating visuals. Some also highlight the role of graphic designers at larger companies. Furthermore, interactive plots, such as those from Plotly, are seen as more useful. Attention to details like figure size, fonts, and spacing can significantly affect final appearance. The sentiment is generally positive, emphasizing learning and practice.
33 comments
Share
Save
View on Reddit โ†’
Posted in r/SQLbyu/Interesting_Rip_223โ€ข3/26/2025
66

Am I Stupid? Why does everyone think metadata is the answer for understanding a database

SQL Server
Many Redditors empathize with the frustration of understanding a database through metadata alone, especially when documentation is subpar. While metadata is crucial for pinpointing specific columns, it falls short in explaining how tables interconnect, leading to inefficiencies like excessive row scanning. Many attribute this to early-stage design decisions that become more difficult to refactor over time and the under-prioritization of comprehensive documentation due to perceived low ROI. Some proposed solutions include personally driving change, learning from SQL clauses in important queries, and adopting the perspective that if something isn't logical, it's political. The overall sentiment is a mix of frustration and acceptance.
38 comments
Share
Save
View on Reddit โ†’

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.