← Back to Newsletters
data subtldr week 29 year 2024
r/MachineLearningr/dataengineeringr/sql
Re-learning Data Engineering Basics, Job Success with Advanced SQL, Yoshua Bengio on AI Safety, and Viral Mimicry Exposed by Protein Language Models
•Week 29, 2024
Posted in r/dataengineeringbyu/rebecca-1313•7/19/2024
381
What I would do if had to re-learn Data Engineering Basics:
Career
In the Reddit thread What I would do if had to re-learn Data Engineering Basics, users shared their insights about essential skills for data engineers. Notable suggestions include learning database partitioning/indexing and understanding its impact on performance. Some users argue that the use of pandas for data manipulation is unnecessary, suggesting that SQL could accomplish the same tasks. One user emphasized the value of learning Terraform for infrastructure provisioning. A few users highlighted the importance of hands-on experience, recommending the practice of manipulating large datasets. The overall sentiment is positive, with users appreciating the shared advice and contributing their own experiences. Some pointed out errors in the original post, like misidentifying AWS Athena as a data warehouse.
Posted in r/dataengineeringbyu/the-driving-crooner-•7/16/2024
207
Explaining my db schema
Meme
The Reddit thread titled Explaining my db schema in the data engineering subreddit was met with positive reception, scoring 207 and having an upvote ratio of 0.99. The comments highlighted the humor and relevance of the post, with users such as 'deal_damage' and 'custardgod' expressing amusement and relatability. A video link was shared by 'pooppuffin' and 'toxic_acro', pointing to a sketch comedy show on Netflix by Tim Robinson as the source, which was eagerly asked for by 'picklesTommyPickles'. Overall, the thread was viewed as amusing and interesting, resonating well with the audience.
Posted in r/dataengineeringbyu/EarthGoddessDude•7/19/2024
164
Is this one of them Iceberg tables everyone keeps talking about?
Meme
The Reddit post titled Is this one of them Iceberg tables everyone keeps talking about? by EarthGoddessDude sparked a discussion on the 'dataengineering' subreddit. The post received positive engagement with an upvote ratio of 0.97. Top comments primarily made light-hearted remarks on the topic. User 'kaumaron' humorously suggested the 'Iceberg table' was stored on AWS S3 glacier tier. 'smeyn' remarked it was partitioned, while 'seekingwisdom1991' expressed interest in seeing the full iceberg. 'EmploymentMammoth659' suggested it was hosted on Snowflake. 'SnappyData' gave a more technical explanation, indicating the visible part was just the metadata layer. The overall sentiment was playful and jovial.
Posted in r/MachineLearningbyu/qtangs•7/15/2024
92
[N] Yoshua Bengio's latest letter addressing arguments against taking AI safety seriously
News
The Reddit thread discusses Yoshua Bengio's letter addressing arguments against AI safety. Some users express skepticism, noting that immediate economic and political disruptions from current AI are more concerning than potential AGI and ASI risks. Others question if safety experts have practical solutions apart from limiting AI access to big corporations or regulating GPU usage. Some users argue that Bengio and others are not necessarily providing solutions, but advocating for resource allocation towards developing solutions. They also note that the focus on long-term risks doesn't negate the importance of current issues. The thread highlights differing views on AI safety, emphasizing the need for a comprehensive approach to address both immediate and long-term risks. Overall, the sentiment is mixed, with users debating the validity of concerns about AGI and ASI.
Posted in r/MachineLearningbyu/ddofer•7/16/2024
92
[R] Protein language models expose viral mimicry and immune escape
Research
A Reddit thread discusses a research paper that uses Protein Language Models (PLMs) to expose how viruses mimic host proteins for immune evasion. The model achieves a 99.7% ROCAUC and 97% accuracy. Some users were initially skeptical about the high accuracy, but the author clarified the figures. A notable feature of the discussion was the potential application of this model in verifying theories such as the viral origin of the placenta. The thread also highlighted interesting errors made by the PLMs, suggesting these could provide important insights for developing more effective vaccines and treatments. Overall, the sentiment was largely positive and curious.