← Back to Newsletters
data subtldr week 1 year 2025
r/MachineLearningr/dataengineeringr/sql
Dissecting Oracle's Success and Controversies, Learning SQL the Practical Way, The Advent and Reception of SQL 2024, and The Rise and Fall of MAMBA in Machine Learning
•Week 1, 2025
Posted in r/MachineLearningbyu/TwoSunnySideUp•12/30/2024
242
[D] - Why MAMBA did not catch on?
Discussion
The Reddit thread discusses why MAMBA didn't replace transformers in machine learning despite its initial hype. Users' opinions reveal that MAMBA models' performance and inference speed are often on par or worse than transformers. Transformers' scalability and existing software and hardware support also keep them dominant. The cost to re-train models and performance trade-offs make MAMBA less appealing to some. Practical limitations of MAMBA, like its fixed state memory, were also pointed out. Some users mentioned alternatives to MAMBA and transformers, like the Hyena model. Additionally, the complexity of MAMBA and the investments already made in transformers were factors in its lack of widespread adoption. Overall, the sentiment seems to lean towards continued use of transformers until a significantly better alternative emerges.
Posted in r/dataengineeringbyu/Signal-Indication859•1/4/2025
228
hot take: most analytics projects fail bc they start w/ solutions not problems
Discussion
The Reddit thread discusses the issue of failure in most analytics projects due to starting with solutions rather than identifying problems. The user 'Signal-Indication859' suggests a shift in approach where businesses prioritize identifying specific problems and build minimal solutions around them. The top comments resonate with this perspective, though some highlight that in their companies, managers choose the data stack with little consideration for requirements. Another user, 'garathk', balances the argument by reminding of the importance of scalability and future planning in data solutions. However, 'Dysfu' emphasizes career growth and skill exposure over the most efficient solutions. The overall sentiment suggests a recognition of the problem and a mixed response to the proposed solution.
Posted in r/dataengineeringbyu/prlaur782•1/1/2025
225
Databases in 2024: A Year in Review
Blog
The Reddit thread titled Databases in 2024: A Year in Review authored by prlaur782 received positive feedback from users, with a score of 225. Top comments praised the post's quality and acknowledged its informative nature. Users found the post insightful about the vast number of databases in the world. One key highlight was the clarification by user '2minutestreaming' regarding Kafka's licensing, stating that it was always owned by Apache, and Confluent changed licenses of auxiliary systems. The overall sentiment of the comments was appreciative and informative, reflecting a good reception of the post.
Posted in r/dataengineeringbyu/bancaletto•12/30/2024
223
How Did Larry Ellison Become So Rich?
Discussion
The Reddit thread discusses how Larry Ellison, the co-founder of Oracle, amassed his wealth. The main points highlighted include Ellison's significant stake (approximately 42%) in Oracle, which contributed to his wealth. Oracle's strategy of buying out competing products used in large organizations also played a crucial role. The company's high-margin, locked-in products like databases were difficult to migrate away from, even for giants like Amazon. Despite dissatisfaction expressed by IT professionals, Oracle's products kept crucial stakeholders like Finance, CIOs, and business users satisfied. Oracle's aggressive licensing practices and the difficulty of migrating from their integrated systems were also noted. The overall sentiment of the thread leans towards a critical view of Oracle's business practices.
Posted in r/MachineLearningbyu/Training_Bet_7905•12/31/2024
115
[R] Is it acceptable to exclude non-reproducible state-of-the-art methods when benchmarking for publication?
Research
The Reddit thread centers around the acceptability of excluding non-reproducible state-of-the-art methods when benchmarking for research publication. Top comments suggest that it is acceptable to exclude these methods if reasons are justified in the publication, with at least 2-3 other well-chosen approaches for comparison. Some reviewers may even accept this. There are also suggestions to run the algorithm on the same benchmarks as non-reproducible methods and compare results. However, some users express frustration with researchers failing to share key settings or not responding to code requests, implying a lack of transparency in some research practices. Overall, the sentiment leans towards justifiable exclusion of non-reproducible methods with proper explanation.