← Back to Newsletters

data subtldr week 24 year 2024

r/MachineLearningr/dataengineeringr/sql

Databricks' Open Source Unity Catalog Stirs Debate, Senior Data Engineers Share Essential Advice, CTE Controversy on SQL, François Chollet's ARC Prize Challenge Questions AI Generalization

Week 24, 2024
Posted in r/dataengineeringbyu/Jimbob44546/12/2024
181

Databricks Open Sources Unity Catalog, Creating the Industry’s Only Universal Catalog for Data and AI

Open Source
The Reddit thread discusses Databricks' open sourcing of Unity Catalog. Users considered it a significant initiative, with some viewing it as a strategy to become the central platform for different table formats. However, there was skepticism about Databricks' open-source contributions, with some questioning if this tool would be another Databricks only open source. Some users defended Databricks' efforts, arguing that open source means the code is publicly available under a permissive license. There was also discussion about the retirement of the hive metastore and Databricks' superior position compared to competitors like Snowflake. The thread had mixed sentiments, combining optimism, skepticism, and defense of Databricks' actions.
80 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/perfektenschlagggg6/14/2024
140

Advice from senior DEs to junior DEs

Career
The Reddit thread titled Advice from senior DEs to junior DEs contains valuable advice for junior data engineers. Top commenters emphasized the importance of focusing on delivering results rather than getting overly concerned with performance and scale. The sentiment was that it's crucial to understand the business needs and create value, and technical skills are seen as tools to achieve this end. Simplicity was commonly advised to avoid unnecessary maintenance efforts, alongside caution against being swayed by every new tool available. Collaboration with other teams was recommended for more efficient data handling. Some commenters also suggested mastering cloud platforms like AWS or GCP and focusing on understanding concepts over tools. Despite a few divergent views, the overall tone was supportive and pragmatic.
103 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/BriefAd47616/14/2024
133

[D] Discussing Apple's Deployment of a 3 Billion Parameter AI Model on the iPhone 15 Pro - How Do They Do It?

Discussion
The Reddit thread discusses Apple's deployment of a 3 billion parameter AI model on the iPhone 15 Pro. Users shared insights on various mechanisms, including optimized attention mechanisms, shared vocabulary embeddings, and quantization techniques. A significant point was the use of 4-bit quantization that reduces RAM need from 14.4GB to 1.8GB, and the fine-tuning of LoRa adapters on a common base model for task-specific optimization. The use of efficient memory management and real-time power and latency analysis tools was also highlighted. The discussions reflected optimism about the potential of these techniques for future mobile applications, while acknowledging the need for further optimization and understanding of these strategies.
30 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/supersaiyanngod6/10/2024
126

Why did you (as a data analyst) switch to DE?

Career
The Reddit thread titled Why did you (as a data analyst) switch to DE? discusses various reasons for transitioning from a Data Analyst to a Data Engineer role. The main reasons expressed include a preference for more technical work and software engineering-like tasks, interest in ETL (Extract, Transform, Load) processes and pipelines, and dissatisfaction with the business-oriented aspects of data analysis such as reporting and stakeholder meetings. Several comments also mention a desire to escape the 'business language' and storytelling required in data analysis. Some comments mention that Data Engineering offers a higher skill ceiling and opportunities for growth, and that the field is rapidly expanding due to the rise of AI and Machine Learning.
82 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/we_are_mammals6/14/2024
102

[R] Lamini.AI introduces Memory Tuning: 95% LLM Accuracy, 10x Fewer Hallucinations

Research
The Reddit thread discusses Lamini.AI's introduction of Memory Tuning, a new method that enhances factual accuracy and reduces hallucinations in LLMs. The method has reportedly increased accuracy to 95% and decreased hallucinations from 50% to 5% for a Fortune 500 client. The approach involves tuning millions of expert adapters with precise facts. Users found the method interesting but expressed concerns about its scalability and the accuracy claims. There were comparisons to similar methods by Apple and questions about the use of a vector database. Some users appreciated the approach of training on facts until the loss becomes zero, and the concept of overfitting an adapter was discussed. Overall, the sentiment was inquisitive and cautiously optimistic about the potential of this new method.
28 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/HairyIndianDude6/12/2024
87

[D] François Chollet Announces New ARC Prize Challenge – Is It the Ultimate Test for AI Generalization?

Discussion
Reddit users discussed François Chollet's ARC Prize Challenge on the MachineLearning subreddit. The ARC benchmark aims to measure an AI's generalization skills, simulating human-like learning. Users pondered whether ARC is a good measure, the current state of AI generalization, its potential impact, and potential strategies for success. Some argued that ARC is a valid measure as it simulates thousands of tasks with few examples each, requiring AI to learn efficiently. Other insights included the need for AI to possess adaptive computations, continual learning at test time, and a preference for volatile memory. Some skepticism was expressed about ARC's effectiveness as a definitive test, with users cautioning against prematurely labeling it as an ultimate test and noting the emphasis on spatial reasoning in tasks.
56 comments
Share
Save
View on Reddit →