← Back to Newsletters
data subtldr week 27 year 2024
r/MachineLearningr/dataengineeringr/sql
Unveiling Endgames in Data Engineering and AI Labs, Decoding Advanced SQL for Job Interviews, Uncommon Skills for Exceptional ML Engineers, and Personal Uses of SQL in Everyday Life
•Week 27, 2024
Posted in r/MachineLearningbyu/bendee983•7/1/2024
236
[D] What's the endgame for AI labs that are spending billions on training generative models?
Discussion
The Reddit thread discusses the future of AI labs that invest billions into training generative models. A variety of viewpoints were presented with some users suggesting that the endgame for such labs could be acquisition by larger companies, while others believe these labs are betting on Language Model (LLM) inference becoming a standard feature in future technology. A few users are skeptical, suggesting that the current AI hype might lead to an AI winter - a period of reduced interest and funding. However, others disagreed, arguing that the current state of AI research and its applicability in various industries could prevent a complete winter. Another common sentiment was that while the training of new models remains expensive, future advancements could reduce costs and latencies.
Posted in r/dataengineeringbyu/pipeline_wizard•7/5/2024
190
Self-Taught Data Engineers! What's been the biggest 💡moment for you?
Career
The Reddit thread revolves around self-taught data engineers discussing their biggest insights. A common highlight is the importance of company and management support for data initiatives. It's emphasized that over-engineering is a frequent issue, and simple solutions often solve most data problems. Some believe that maintaining a business-centric language (ROI, revenue) helps avoid unnecessary interference from non-technical personnel. One user highlights the unexpected amount of software engineering involved in data engineering. There's a sentiment that despite the complexity of the work, end-users often only care about straightforward outputs like pie charts and CSV files. Lastly, the value of being in a team that values software engineering principles is emphasized. Overall, the sentiment is mixed, with a focus on practicality, simplicity, and the necessity of management support.
Posted in r/MachineLearningbyu/mrstealyoursoulll•7/3/2024
146
[D] What are issues in AI/ML that no one seems to talk about?
Discussion
The Reddit thread on AI/ML issues that aren't commonly discussed features several key points. Users expressed concerns about overutilizing AI/ML for problems that could be solved with simpler methods, and the lack of attention to the energy and data requirements for efficient machine learning. Difficulty in acquiring data from large organizations was also a significant issue, with the reluctance of these organizations to share data hindering progress. Reproducibility was flagged as a concern, with many AI/ML projects lacking the necessary information for others to reproduce their results. Finally, the lack of communication between different aspects of the AI community was highlighted, with calls for improved education and policy understanding. The overall sentiment was a desire for a more efficient, accessible, and transparent AI/ML field.
Posted in r/MachineLearningbyu/Avistian•7/4/2024
140
[D] Rare skills of execptional ML Engineers
Discussion
The Reddit thread discusses the unique skills of exceptional ML Engineers. Key skills highlighted include clear and concise communication, the ability to write minimalistic, readable code and a deep understanding of core machine learning principles. There's value in being able to translate research into practical applications. One user mentioned the rare ability to 'hear the music', referring to a deep understanding of the math in ML. Patience, the ability to explain complex problems to non-technical stakeholders, and an understanding of user needs were also noted as important. There was also emphasis on being nice, as the field can be intimidating and a welcoming attitude can increase productivity.
Posted in r/dataengineeringbyu/Thinker_Assignment•7/2/2024
132
What does data engineering career endgame look like?
Career
The Reddit thread titled What does data engineering career endgame look like? discusses the various paths data engineers can take in their careers. The top comments revealed a variety of perspectives. A 28-year veteran consultant shared his journey from starting with Visual Basic and MS Access to becoming a Data Architect, emphasising the importance of constant learning and the value of logical modeling. Another user highlighted that the endgame is financial freedom, viewing the job as a means to earn and invest. Another user, with 11 years of experience, expressed satisfaction with the work-life balance and financial stability his job provides. Some suggested career progression paths included senior data engineer, data architect, enterprise architect, and Chief Data Officer. Overall, the sentiment leans towards self-improvement, financial stability, and the importance of adaptability in technology.
Posted in r/dataengineeringbyu/kira2697•7/3/2024
111
Wasted 4-5 hours to install pyspark locally. Pain.
Help
A Reddit user expressed frustration over spending several hours trying to install pyspark without success. The thread generated various helpful responses. A popular recommendation was using Docker for a ready-to-go local setup, which includes Spark, Delta, Minio, Jupyter, and Postgres. Some users also suggested using containers for easier distribution across machines or cloud platforms. Another user advised using the pip install pyspark command, which includes spark in the install. Others provided insights on how to troubleshoot potential issues, such as checking the Spark_Home and Hadoop_Home configurations and using winutils from the Hadoop source code. Overall, the sentiment was helpful and solutions-oriented.