← Back to Newsletters

data subtldr week 3 year 2025

r/MachineLearningr/dataengineeringr/sql

Life as a Data Engineer: Expectations vs Reality, Humor in Data and Dating Engineering, Debunking Python Techniques, Pros and Cons of the SELECT * Command in SQL, Resource Sharing for SQL Practice, The Limitations and Defense of Cosine Similarity, Fixes in Microsoft's Phi-4 Model, and the Love-Hate Relationship with Softmax in Machine Learning

Week 3, 2025
Posted in r/dataengineeringbyu/dan_the_lion1/18/2025
699

Life of a Data Engineer

Meme
The Reddit thread 'Life of a Data Engineer' is filled with comments reflecting the challenges and humor associated with the profession. Top comments highlight the disparity between job expectations and reality, with tasks often deviating from promised coding and data visualization to mundane ones like copying numbers onto a Word document. Some commenters sarcastically joke about finding scary numbers in spreadsheets and feeling emotions towards certain records. A remark about the disconnect between actual data sources and business expectations also stands out. Participants express a mixture of frustration, humor, and resignation, indicative of the complexities of the data engineering field.
31 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/skeltzyboiii1/13/2025
437

[R] Cosine Similarity Isn't the Silver Bullet We Thought It Was

Research
The Reddit thread discusses the limitations of cosine similarity in machine learning models, initiated by a study conducted by Netflix and Cornell University. Users noted that cosine similarity is meaningful when used with specific loss functions, but not universally applicable. Some criticized the sensationalist nature of the thread's title, emphasizing that the choice of similarity metric should be tailored to the embedding space. Others suggested alternatives like Euclidean distance, dot products, or normalization techniques. A few users defended cosine similarity, arguing it's a reasonable default that works well under certain conditions. The sentiment overall was mixed, with a shared call for task-specific evaluations to ensure robustness.
50 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/danielhanchen1/15/2025
299

[P] How I found & fixed 4 bugs in Microsoft's Phi-4 model

Project
The Reddit thread discusses how user 'danielhanchen' found and fixed four bugs in Microsoft's Phi-4 model. Users appreciated 'danielhanchen' and 'Unsloth', an open-source project he maintains with his brother, for their contributions to improving the model's output. They also expressed interest in a more detailed walkthrough of the debugging process. Some users inquired about the application of these fixes to 'ollama' and the potential for a 128k release from 'Unsloth'. The overall sentiment was highly positive, praising 'danielhanchen' for his efforts and contributions to open-source AI and expressing excitement for future developments.
27 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/eternviking1/17/2025
266

data engineering? try dating engineering...

Meme
The Reddit thread titled data engineering? try dating engineering... has a humorous undertone with 20 comments. The top comments reflect a mix of humor and personal experiences. User PuffcornSucks quips that any site can become a dating site with the right mindset. Bjogc42069 humorously shares that his wife doesn't understand his job, leading to a Sahara like situation when he tries to explain, but considers it a healthy dynamic. Foodwithfloyd shares a personal story of meeting his wife while fixing her dashboard. Ekkaia153's comment on the importance of good documentation attracting his fiancée struck a chord with readers. Overall, the thread combines humor with candid discussions on dating and professional life.
20 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Sad-Razzmatazz-51881/18/2025
214

[D] I hate softmax

Discussion
The Reddit thread titled [D] I hate softmax contains a mix of sentiments, but most comments lean towards defending the use of softmax. Some users, such as 'BinaryOperation', mention a recent paper on numerical stability issues with softmax. 'Ulfgardleo' provides a detailed explanation defending softmax's properties, explaining it as a generalization of the sigmoid function. User 'Matthyze' shares a learning experience involving cross-entropy loss and softmax. 'mccl30d' suggests checking out entmax, which provides sparse outputs. Some users, like 'SmolLM' and 'XYcritic', highlight that the identified 'problems' are actually intentional mathematical properties and needed for numerical stability in deep learning.
82 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/analyticsvector_1/14/2025
184

Python for Data Engineers: Key topics & techniques 👇

Blog
The Reddit thread titled Python for Data Engineers: Key topics & techniques received critical responses from users. Commenters generally agreed that the post was filled with jargon and buzzwords, making it unhelpful for beginners. Some voiced their discontent over the post’s representation of how a data engineer would use Python. A few commenters used sarcasm to highlight the post's confusing nature. One user with over 15 years of experience advised beginners to ignore the post as it was misleading. There were also allegations that the post was uploaded for promotional purposes and karma farming. Overall, the sentiment towards the content was negative.
41 comments
Share
Save
View on Reddit →