← Back to Newsletters

data subtldr week 32 year 2023

r/MachineLearningr/dataengineeringr/SQL

Unproductive Tasks in Data Engineering, Transition from Data Science to Data Engineering, Popularity of dbt for Transformation, CTEs vs Subqueries in SQL, High Schooler's First SQL Project, AI Solution for Voice Loss, Transformers Dominating Machine Learning, Making AMD GPUs Competitive for LLM Inference

Week 32, 2023
Posted in r/MachineLearningbyu/NWMoney1018/8/2023
210

[D] I’m losing my voice due to illness, and I’m looking for ML/AI solution

Discussion
The Reddit thread revolves around a user seeking an AI solution for voice loss due to Parkinson's disease. Top comments recommend focusing on gathering and annotating voice data, as this is key for training AI models; the technology will continue to improve over time. Services like ElevenLabs and Descript are suggested for voice cloning, but they operate on a subscription basis. Users also suggest utilizing free GitHub models like TorToise TTS, espnet, or coquiTTS. Annotation tools are also recommended for transcribing speech to text, with tags for emotions, for better data. The original poster thanks the community for their insights and commits to researching the suggested models.
71 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Active-Confidence9268/10/2023
201

[D] Is everything just transformers now?

Discussion
The Reddit thread revolves around the discussion of transformer architecture replacing other machine learning tasks. Top comments highlight that while transformers are predominantly used, other architectures such as CNNs and LSTMs are still necessary. For instance, CNNs are preferred for audio data, LSTMs for decoding sequence data, and MLPs are integral to Transformers. Additionally, some users argue that transformers don't outperform CNNs on tasks like image recognition, as shown in 'The ResNet strikes back' paper. Furthermore, CNNs are often more lightweight and require less training data, making them relevant for edge devices. However, it's noted that transformers can match many purpose-built architectures across various modalities, which adds to their appeal.
101 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/vinayak_singh_k8/8/2023
150

What is the most unproductive task you have to do as a data engineer?

Discussion
The Reddit thread asked data engineers about their most unproductive tasks. The top responses highlighted several issues: attending meetings, context switching, handling semi-structured data from APIs, manual data cleaning, and deciphering vague issue tickets. One user shared an open source project they've created to streamline data loading from APIs. The overall sentiment reflected frustration with inefficient processes and communication.
131 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Impressive_Fact_65618/10/2023
136

Got my first Data Engineer job

Career
The Reddit thread is about a user, Impressive_Fact_6561, who thanks the community for helping him transition from Data Science to Data Engineering. The top comments contain advice for the new job, with emphasis on having strong SQL fundamentals. Another user suggests that the new Data Engineer should relax and enjoy learning on the job. There are also questions about the interview process. One user provides a brief history of their career journey and advises the newbie to enjoy their work and not let vendors take over the process. Another comment suggests resting up before starting the new role and consulting the future manager or job description for preparation. The mood is generally supportive and encouraging.
49 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/crowwork8/9/2023
133

[Project] Making AMD GPUs competitive for LLM inference

Project
The Reddit thread discusses a project aimed at making AMD GPUs competitive for LLM inference. Key concerns among Redditors revolve around the cost-effectiveness of building an AMD cluster, potential for further optimization of the 7900XTX, and power draw implications. Suggestions for future metrics include tokens/sec/dollar, tokens/kWh, and tokens/gram of CO2 emitted. Other comments point out that LLMs are primarily about memory, and AMD's offering is on par with NVIDIA. One user mentions the potential for Intel onboard GPUs as a cost-effective alternative. Multi-GPU support is also confirmed to be on the project's roadmap.
34 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Majestic-Weakness2398/11/2023
106

Why is dbt popular for the transformation step?

Discussion
The Reddit thread discusses the popularity of dbt for data transformation. The top comments suggest that dbt is appreciated for its ability to generate SQL code and build sql transform collections effectively. However, it also has the potential to create problematic outputs. Some users noted that dbt can be a good fit for those without low-latency, strict data quality, and complex transformation requirements, and who are comfortable working with SQL. Others admitted confusion about how dbt differs from traditional SQL. A few mentioned the benefits of dbt, such as its lineage graphs and documentation, and its ability to simplify data curation. The sentiment was mixed, with some appreciating dbt's features and others expressing confusion or criticism.
78 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/FatLeeAdama28/8/2023
38

Why are CTEs easier to read than subqueries in the from clause?

Discussion
The Reddit thread discusses why Common Table Expressions (CTEs) are preferred over subqueries in SQL. Top comments highlight that CTEs are more modular, easier to test, and efficient for other developers to understand if given meaningful names. They make the query more readable by reducing bloat after the FROM and JOIN keywords, and allowing the query to be read almost like a story that logically progresses. Furthermore, CTEs can easily be converted into temp tables for performance tuning, especially when data is located on a different server. Despite some commenters joking about naming conventions, the overall sentiment leans towards the benefits of CTEs for readability and efficiency.
66 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/onurbaltaci8/10/2023
35

I recorded a SQL Interview Exercise (Query Questions & Solutions) video and uploaded it on YouTube

MySQL
A user named Onurbaltaci uploaded a SQL interview exercise video on YouTube, which contains medium and hard-level interview questions. The video received positive reception with a high upvote ratio of 0.91. Users appreciated the content, with one commenting on a particular query regarding finding the department with the lowest average salary. They admired the clever use of a subquery in a having clause and suggested a simpler alternative. The author responded positively to the feedback. The overall sentiment of the Reddit thread is encouraging and constructive.
4 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/TwistLow15588/9/2023
32

First SQL project complete! (high schooler)

Discussion
A high schooler, TwistLow1558, completed their first SQL project and shared it on Reddit, generating positive feedback and praise from the community. The project involved analyzing Coursera courses, and despite being a beginner's attempt, it was deemed better than some professional analyses. The top comment by 'toadkiller' provided constructive feedback, highlighting potential confusion in the conclusion and advising care with assumptions. Encouragement was also given, suggesting a bright future in analytics for the author. Other comments were complimentary, praising the effort and quality of the project. The author expressed gratitude for the overwhelming support received. Overall, the sentiment in the thread was highly positive and supportive.
8 comments
Share
Save
View on Reddit →