← Back to Newsletters
data subtldr week 13 year 2024
r/MachineLearningr/dataengineeringr/sql
Debunking Job Listings Myths, DataEngineering Tools Debate, SQL Learning Resources, Salary Determination in AI, the Downfall of Stability AI
•Week 13, 2024
Posted in r/dataengineeringbyu/WadieXkiller•3/30/2024
558
Is this chart accurate?
Discussion
The Reddit thread on 'dataengineering' questions the accuracy of a Python packages chart. The top comments highlight that the list is generally accurate but incomplete. There's criticism over the 'database operations' category as some users believe it's misunderstood and inappropriately named. There's also emphasis on the redundancy of some tools, like Polars and Pandas. Commenters suggest that proficiency in all packages isn't required, instead, learning to read documentation and adapt to new tools is more crucial. Some comments also point out that set-based languages like SQL are necessary. Furthermore, there's skepticism about the capability of a single person to master such a wide array of tools. The overall sentiment is mixed with some humor and skepticism.
Posted in r/MachineLearningbyu/pg860•3/25/2024
550
[D] Your salary is determined mainly by geography, not your skill level (conclusions from the salary model built with 24k samples and 300 questions)
Discussion
The Reddit thread discusses a model predicting data scientist salaries based on geography and skill level. Many comments criticized the model, suggesting it confused correlation with causation and overlooked factors such as cost of living and skill concentrations in high paying areas. Some saw the findings as obvious, pointing out that people often migrate for better pay. One comment suggested outsourcing as a potential solution to capitalize on the wage disparity. The overall sentiment was skeptical towards the model's conclusions, emphasizing the complexity of salary determination and the need for rigorous causal inference, beyond just identifying correlations.
Posted in r/MachineLearningbyu/we_are_mammals•3/31/2024
418
WSJ: The AI industry spent 17x more on Nvidia chips than it brought in in revenue [N]
News
The Reddit thread discusses a Wall Street Journal report revealing that the AI industry spent 17 times more on Nvidia chips than it generated in revenue. Users debated the significance of this figure, suggesting that the high initial capital expenditure (CapEx) could be offset by recurring revenue and the exploration of new business ventures like AI. Some users expressed concerns over the rapid depreciation of GPUs and the high operational costs due to power consumption. Others pointed out the potential for growth in the AI industry, comparing it to a factory investment that pays off over time. Despite the challenges and high costs, the sentiment was generally optimistic about the financial soundness of such investments in the long term.
Posted in r/MachineLearningbyu/milaworld•3/30/2024
375
[N] How Stability AI’s Founder Tanked His Billion-Dollar Startup
The Reddit thread on the downfall of Stability AI, an early AI juggernaut, centers around its founder, Emad Mostaque's, poor business judgment and overspending, leading to the company's financial woes. Commenters appreciate Stability AI's contributions to the open-source and research communities, particularly their flagship text-to-image generator, Stable Diffusion. However, they criticize the company's lack of a sustainable business model and imprudent spending, especially on Amazon Web Services for compute. Some suggest the company should have forced companies to pay for their services or invested in their own GPUs. There's also skepticism around Mostaque's vision of AI as our collective intelligence and the company's reliance on rented AWS GPUs. Mostaque's response to the criticism is also noted, admitting to his shortcomings as a CEO.
Posted in r/dataengineeringbyu/JoeyWeinaFingas•3/27/2024
235
Airflow homies be like...
Meme
The Reddit thread titled Airflow homies be like... on the subreddit 'dataengineering' has a humorous tone but also reflects mixed sentiments about the use of Airflow. There are comments indicating dissatisfaction with Airflow, with users citing issues such as pipeline failures and blaming it for job losses. However, some users defend Airflow, stating that it is easy to understand and effective when used correctly. Others suggest alternatives like Prefect, Dagster, and Argo. It also appears that some users are considering shifting away from Airflow, while others are just beginning to learn it. Overall, the thread is a mix of humor, frustration, and informative discussions about data engineering tools.
Posted in r/dataengineeringbyu/Irachar•3/26/2024
141
Finding a new job, ridiculous
Career
The Reddit thread discusses the overwhelming requirements listed in job descriptions, especially in the tech sector. Commenters agree that these job listings often demand knowledge in an unrealistically wide range of technologies. Many suggest that if a candidate knows around 50% of what is requested, they should apply regardless. Some users emphasize that understanding the basic function and implementation of these technologies is often sufficient, and actual expertise in all of them is rarely needed. The sentiment of the comments is supportive, recommending that individuals apply despite not meeting all requirements, as the learning process often continues on the job. There's a consensus that these inflated requirements might be used by companies to negotiate lower salaries or are created by HR without proper understanding of the job role.