data subtldr week 7 year 2025

r/MachineLearningr/dataengineeringr/sql

Musk's Misunderstanding of Databases, Debating Work vs. GitHub Profile, Engaging SQL learning game, Machine Learning Research Insights, Potential Transformer Successors

February 16, 2025•Week 7, 2025

Posted in r/dataengineeringbyu/Pretend-Algae1445•2/11/2025

2851

LOL...Elon "Super Genius" Musk doesn't know how Relational Databases work...but will that stop him from running his mouth about how Relational Databases work ?

Meme

The Reddit thread discusses Elon Musk's understanding of Relational Databases, with a sentiment of skepticism towards Musk. Key highlights include a consensus that both state and federal governments use SQL extensively, with one user noting that even in old systems like the Social Security Number database, SQL or a dialect of it is likely used. Criticism is targeted at Musk's apparent lack of understanding of this, with one user suggesting he might be under pressure to find fraud where there is none. There's also a quip about deletions in a government database, and a discussion about the challenges of using SSN as a unique key in large databases.

103 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/EarthGoddessDude•2/15/2025

1226

Work vs Public GitHub Profile

Meme

The Reddit thread 'Work vs Public GitHub Profile' in the 'dataengineering' subreddit has several key points. Many commenters express a preference for paid work over contributing to public GitHub projects, with some noting they don't work for free. There's also a sentiment of disillusionment with corporate jobs due to politics and poor management. The importance of a GitHub portfolio is questioned, with some agreeing its significance has diminished. Several users mentioned they prefer spending personal time away from screens or coding. However, others see coding as a hobby, comparable to painting. A common theme is the celebration of work-life balance and the need to stop idolizing public contributions.

42 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Stochastic_berserker•2/12/2025

1029

Message by message, holding up the world

Meme

The Reddit thread titled Message by message, holding up the world in the data engineering subreddit sparked a discussion about the significance of various data handling systems. Many users pointed out the importance and practicality of Excel, considering it to be a highly extensible, portable, and accessible data handling system. Some suggested alternatives like SQL and Apache Spark. A few users mentioned their lack of experience with Kafka despite lengthy careers in tech. There was also a sentiment that big data streaming is over-represented because of its appeal, but it's not always necessary. Python was mentioned, but the cost of skilled users was highlighted. Overall, the thread emphasized the value of cost-effective and user-friendly data management tools.

52 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/AntelopeWilling2928•2/13/2025

262

[D] How you do ML research from scratch?

Discussion

The Reddit thread titled '[D] How you do ML research from scratch?' offers insights into the process of Machine Learning (ML) research. Users suggest that the process requires time, commitment, and deep domain knowledge. It involves understanding the current state of one's interest area, replicating code and ideas, and keeping abreast with latest papers. A recommended practice is to start from a known working architecture and make small, controlled changes. Some users caution against excessive focus on programming skills, suggesting instead a focus on fundamental problems. Replicating top papers independently is deemed valuable, while one user shares a helpful resource - a video by Prof. Kilian Weinberger. The sentiment leans towards encouraging patience, constant learning, and original thinking in ML research.

49 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/jsonathan•2/15/2025

159

[D] What's the most promising successor to the Transformer?

Discussion

The Reddit thread discusses possible successors to the Transformer in machine learning. The MAMBA, Titan, and Decepticon models are mentioned as possible alternatives. The JEPA architecture is considered promising, but not a direct successor. Some believe that modified transformers, such as those integrating RNN-esque components, could be the next generation. However, others argue that future models will likely always include transformer-like components due to their efficiency. The thread also highlights the importance of not only architecture, but also learning methods, citing meta-RL as promising. Models like MEGALODON by Meta and Large Concept Models seem to be state-of-the-art in terms of efficiency. The sentiment is mixed between new architectures and improving the existing ones.

58 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Vivid-Entertainer752•2/11/2025

155

[D] Fine-tuning is making big money—how?

Discussion

The Reddit thread [D] Fine-tuning is making big money—how? discusses the role of fine-tuning in AI companies' revenues. The key points are: fine-tuning is necessary for tasks like customer service and consumer-facing projects. It helps models output in a specific tone or format, and is particularly important for domain-specific tasks and language-specific data. However, fine-tuning requires high quality training data and can be more expensive than using an API in the short term. Larger companies are more likely to use fine-tuning due to their mature AI implementations and the availability of talent. The potential of fine-tuned models to outperform larger models is still under debate.

46 comments

Save

View on Reddit →