data subtldr week 8 year 2025

r/MachineLearningr/dataengineeringr/sql

Debating Elon Musk's Data Competence, Fairness in Data Engineering Salaries, Empathy for Laid-Off Data Engineers, AI in SQL Generation, SQL's Role in Debunking Social Security Myths, Skepticism towards Language Learning Models

February 23, 2025•Week 8, 2025

Posted in r/dataengineeringbyu/madredditscientist•2/17/2025

2263

Welcome to data engineering, Elon!

Meme

The Reddit thread titled Welcome to data engineering, Elon! features lively discussions questioning Elon Musk's interpretation of data. Several top comments express skepticism about Musk's understanding of the Social Security database, questioning the existence of a single isDead field and doubting his claims without more transparency. There's also criticism of Musk's lack of accountability and perceived attempts to erode trust. One comment highlights that of the 19 million people over the age of 100 without a verified death record, only 44,000 were actually drawing social security payments, suggesting Musk's interpretation could be misleading. The overall sentiment in the thread is largely critical of Musk's data engineering competence.

278 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Turbulent_Web_8278•2/19/2025

948

Startup wants all these skills for $120k

Discussion

The Reddit thread titled Startup wants all these skills for $120k spurred a discussion around the fairness of the stated salary for the required skill set. A majority of commenters agreed that $120k is reasonable for the skills listed, which they identified as typical for early to mid-career data engineers. Several users suggested that companies often don't expect candidates to possess 100% of the listed skills. One user shared their team's experience of hiring a candidate with just one of five required skills, valuing their honesty and potential for growth. A comparison was made to Spain, where the salary expectations are much lower. Overall, the sentiment was largely positive towards the salary offer.

346 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/EccentricTiger•2/18/2025

661

I've got a solid LATAM DE about to get laid off

Help

The Reddit thread revolves around 'EccentricTiger' expressing concern over a skilled Latin American Data Engineer on their team who is about to be laid off due to the company's unprofitability. The DE is proficient in Python, Spark, Airflow, Databricks/Spark, Elasticsearch, and Node.js. Various users appreciated the OP's effort to help the DE find a new job, with some even offering job search suggestions and potential opportunities. A user shared a similar experience of their father trying to help laid-off employees. Another user humorously wondered if they were the DE being discussed. Overall, the thread has a sentiment of empathy, admiration for the DE's skills, and appreciation for the OP's efforts.

43 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Successful-Western27•2/18/2025

188

[R] Evaluating LLMs on Real-World Software Engineering Tasks: A $1M Benchmark Study

Research

The Reddit thread discusses a new benchmark for evaluating Language Learning Models (LLMs) on real-world software engineering tasks. The benchmark uses tasks from Upwork freelance jobs and includes both coding tasks and management decisions. The thread's sentiment suggests skepticism about LLMs' current capabilities: one comment highlights that LLMs have not shown significant improvements in solving larger context problems. Another comment points out that the LLMs' error rate would have prevented them from completing tasks worth $300-400k in a real scenario. Some users express appreciation for the post and link to the study, while others warn against taking the summary at face value without reading the full paper.

28 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/jsonathan•2/20/2025

128

[D] What is the future of retrieval augmented generation?

Discussion

The Reddit thread discusses the future of Retrieval Augmented Generation (RAG). The sentiment leans towards skepticism about RAG's future, given its current inelegance and potential obsolescence. There are suggestions for alternatives, and Cache-Augmented Generation (CAG) is mentioned as a more efficient approach. Some comments raise doubts about the lost in the middle effect, questioning its relevance for larger models. Practical challenges in applying RAG to scenarios with complex document hierarchies and permissions are also discussed. Overall, users seem to favor strategies that optimize retrieval during inference, but there's also a recognition of the benefits of RAG, especially for managing large document sets. The thread is marked by a lively debate on the evolution of language models.

20 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Excellent_Delay_3701•2/20/2025

114

[P] Sakana AI released CUDA AI Engineer.

Project

The Reddit thread is about a new tool by Sakana AI called CUDA AI Engineer, which translates PyTorch code into CUDA kernels. However, several comments are critical of its validity. A user, Flaky-Ambition5900, claims that the alleged speed improvements are due to faulty kernels that compute incomplete results. Responding to queries about performance comparisons, Flaky-Ambition5900 also states that the tool's comparisons to PyTorch are inaccurate because they don't verify correctness. Another user, iMiragee, criticizes the tool for not comparing itself against state-of-the-art libraries and suggests it's just marketing. Other comments poke fun at the tool's alleged speed improvements and question its validity. The overall sentiment is skeptical and humorous, with no clear support for the tool.

20 comments

Save

View on Reddit →