← Back to Newsletters

data subtldr week 6 year 2025

r/MachineLearningr/dataengineeringr/sql

Frustrations over Fabric migration, Documenting hundreds of databases effectively, Favorite VSCode extensions, Learning SQL and Python for career progression, Essential value of SQL in work, Software tools for creating neural net architectures, and Importance of RNNs in Machine Learning

Week 6, 2025
Posted in r/dataengineeringbyu/Ok_Decision_58782/4/2025
502

Considering resigning because of Fabric

Help
The Reddit thread titled 'Considering resigning because of Fabric' by user Ok_Decision_5878 discusses their frustration with their company's decision to replace Databricks, Snowflake, and Collibra with Fabric. This decision was made against expert advice and has led to increases in cost and time, with the migration still unfinished after a year. Commenters sympathize with the original poster, advising them to look for a new job. They criticize corporate decisions, Microsoft's strategy, and express doubts about Fabric's capabilities. Some commenters confirm issues with Fabric, citing it as unsuitable for serious development and unfit for companies with mature platforms. A general sentiment of dissatisfaction and frustration prevails.
129 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/jacobfa2/7/2025
353

[R] It Turns Out We Really Did Need RNNs

Research
The reddit thread [R] It Turns Out We Really Did Need RNNs discusses a paper on the importance of recurrent neural networks (RNNs) in machine learning for efficient approximation and rapid convergence. Reddit users generally found the study intriguing, though some pointed out that RNNs are not the only solution for iterative refinement. The discussion also delved into the similarity between autoregressive methods and the properties of RNNs. The paper's author acknowledged feedback, including a suggestion to consider the computational cost per iterative step for a clearer comparison of different methods. Some comments veered towards humor, but overall, the thread was a substantive discussion on the role of RNNs in machine learning.
21 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/danielhanchen2/7/2025
263

[P] GRPO fits in 8GB VRAM - DeepSeek R1's Zero's recipe

Project
The Reddit thread discusses the successful implementation of GRPO in under 8GB of VRAM by user danielhanchen. The development, which utilizes Unsloth and LoRA/QLoRA, has caused excitement within the Machine Learning community, with users commending the work and expressing interest in contributing or interning with the project. The discussion also clarifies the paradigm shift of GRPO versus previous fine-tuning or QLoRA techniques, and the optimization of GRPO using QLoRA and LoRA without the need for more memory-intensive Full Fine Tuning. Some users shared their plans to try the setup with different models, while others asked about specific features, indicating active community engagement.
38 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/tiny-violin-2/7/2025
145

How do companies with hundreds of databases document them effectively?

Discussion
The Reddit thread discusses how companies with hundreds of databases document them effectively. A humorous sentiment is expressed suggesting that many companies don't effectively document their databases. A serious suggestion is the use of a data catalog, modified for specific needs and constantly improved. The responsibility of maintaining the catalog falls on the entity owner. Despite the challenge, 20% of the engineering department uses the platform monthly. A few users recommend OpenMetadata as a solution, praising its simplicity, native data quality, and supportive open-source community. There are also comments on the need for proper data governance in order to successfully manage a data catalog, acknowledging the difficulty due to data silos and disconnect between IT and business.
83 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/bottlecapsvgc2/6/2025
134

What are your favorite VSCode extensions?

Discussion
The Reddit thread What are your favorite VSCode extensions? gathered various responses from the community. Top suggestions included Markdown Preview, Create Terminal Here, SQL Beautify, Remote Explorer, TODOs, and Notes. Dbt Power User, Spell checker, Git Graph, and Rainbow csv were also mentioned frequently. A few users praised the use of Jupyter for prototyping work, Gitlens for code completions, and Cody from Sourcegraph. Some unique mentions included Drawio, VS Code Pets, and Data Wrangler. Overall, the sentiment was highly positive, with users sharing their favorite tools for optimizing their work in VSCode.
71 comments
Share
Save
View on Reddit →