← Back to Newsletters

data subtldr week 33 year 2024

r/MachineLearningr/dataengineeringr/sql

Debating the Value of Acquisitions vs Revenue, The Struggle with Job Monotony in the Mature Stage, Diverse Opinions on Advanced SQL and CTEs Use, Updates on OpenCL Backend for Pytorch, HuggingFace Transformers Design Controversy, and Classifying Blueberries using Machine Learning

Week 33, 2024
Posted in r/dataengineeringbyu/turboline-ai8/15/2024
263

I was shocked when I read this. Is the rev vs. acquisitions price true?

Discussion
The Reddit thread discusses the acquisition price of a company with a revenue of only $1M. Users speculate that the purchase was driven by Databricks wanting to control the data lake format wars and make a splash in the market against Snowflake. They also suggest that it was a strategic move to acquire the team behind the Iceberg project, an alternative to Delta Lake. Some comments mention that the acquisition wasn't about revenue but intellectual property and branding. Others argue that the purchase might have been overvalued given the community-driven success of the Iceberg project. The sentiment is mixed, reflecting a range of views on the acquisition's valuation and strategic significance.
53 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/ntdoyfanboy8/15/2024
234

I get bored once we reach the "mature" stage. Help.

Career
The Reddit thread discusses the boredom experienced by a data engineer after reaching the mature stage of their job. Key suggestions from commenters include changing jobs every 3-4 years to maximize salary growth and job satisfaction, and going into consulting. Many agree that building systems from scratch is more interesting than maintenance. The author also highlights the importance of choosing companies with guaranteed cash sign-on grants instead of private options. The thread reflects a sentiment of dissatisfaction with the monotony of maintenance and a desire for constant innovation and excitement. Some users suggest a shift towards more leadership roles or consulting as a potential solution.
58 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/artyombeilis8/17/2024
135

[P] Updates on OpenCL backend for Pytorch

Project
The Reddit thread discusses updates on the OpenCL backend for Pytorch by author 'artyombeilis'. The updates include support for Pytorch 2.4, easy installation with prebuild packages for Linux and Windows, and several other improvements. The performance is reasonable, usually around 60-70% for training and 70-80% for inference. The author explains that the convolution and matrix multiplication kernels aren't as efficient as ones written by NVidia developers. OpenCL was chosen over Vulkan for cross-platform GPU computing, easier conversion of existing cuda kernels, and the author's familiarity with OpenCL. The thread has a supportive sentiment, with users praising the work and expressing interest in contributing.
29 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/duffano8/16/2024
130

[D] HuggingFace transformers - Bad Design?

Discussion
The Reddit thread discusses concerns with the HuggingFace transformers library, indicating that while it is convenient for loading models and sharing them, its design is seen as problematic. Users reported difficulties due to lack of clear documentation and consistency in the API. Some argued the library has become bloated and not suited for complex use cases, but is efficient for simpler ones. Others observed that the library seems to have a significant technical debt due to rapid growth, with many arguments added repetitively. Some suggested using PyTorch for model development and others recommended alternatives like torchtune, llama, and vLLM. A few users defended HuggingFace, acknowledging its contributions to the ML field and its evolution. Overall, sentiment was mixed, with appreciation for its simplicity but criticism for lack of robustness for complex tasks.
50 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/whiterosephoenix8/13/2024
121

[R] Trying to classify Blueberries as "Crunchy", "Juicy" or "Soft" using Acoustic Signal Processing and Machine Learning

Research
The Reddit post discusses a researcher's attempt to classify blueberries as crunchy, juicy, or soft using acoustic signal processing and machine learning. The researcher struggles to classify juicy berries with the current method. The top comments suggest several solutions such as using statistical features of the spectrogram, unsupervised clustering algorithms, or manually labelling a subset of the data and applying semi-supervised learning. Some commenters also humorously express concern about the ethical implications of squishing berries, suggesting recording bouncing sounds instead. The overall sentiment of the thread is positive, with users showing enthusiasm for the novelty and potential applications of the study.
35 comments
Share
Save
View on Reddit →