← Back to Newsletters

data subtldr week 30 year 2023

r/MachineLearningr/dataengineeringr/SQL

Data Engineering Challenges and Humor, SQL Queries in Job Interviews, Connecting Okta with MySQL, Importance of Code Reviews, EU AI Legislation Impact on Open Source, Empirical Rules of Machine Learning, Unlocking the Power of TabR Model

Week 30, 2023
Posted in r/dataengineeringbyu/shed_antlers7/26/2023
792

The data engineer came to me... tears in his eyes

Meme
The thread titled The data engineer came to me... tears in his eyes on the subreddit 'dataengineering' revolves around the challenges and intricacies of data engineering. Commenter 'Ein_Bear' humorously contrasts the seriousness of mastering data engineering skills like indexing and load balancing with casual activities. Another user, 'hisglasses66', explains the complexity of an inner join operation with multiple conditions. Commenter 'RuprectGern' highlights the importance of 'LastModified' columns, while 'Jg-mz' emphasizes the need for careful data cleaning. The overall sentiment suggests a mix of humor and acknowledgment of the complexities involved in data engineering.
66 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/morpho44447/29/2023
353

This explains A LOT

Meme
The Reddit thread titled This explains A LOT in the data engineering subreddit contained humorous and suggestive comments. The top comment by 'ratczar' advised engineers to focus less on their pipelines and more on personal connections, receiving the highest engagement. Another top-rated comment by 'mojitz' questioned an unexpected overlap between data engineering and a subreddit about banning pit bulls. 'Foodwithfloyd' found the thread amusing while the original author, 'morpho4444', emphasized on the thread's title. The thread, overall, was well-received with a high upvote ratio and a positive sentiment.
79 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/EmbarrassedHelp7/28/2023
235

[D] Hugging Face, GitHub and more unite to defend open source in EU AI legislation

Discussion
The Reddit thread discusses the unification of Hugging Face, GitHub and others to defend open source in the proposed EU AI legislation. Commenters expressed varying opinions. Some, like 'RageA333', support increased regulation for consumer rights, while others, like 'DisjointedHuntsville', question specific provisions of the EU AI Act and criticize the potential burden on open source projects. 'HateRedditCantQuitit' suggests a potential resolution by proposing the creation of new licenses akin to the General Public License (GPL) for data. They believe this may help balance regulatory requirements and open source principles. The overall sentiment leans towards apprehension regarding the impact of the proposed legislation.
64 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/tits_mcgee_927/27/2023
166

I have interviewed for 6 Data Analyst/Scientist roles. Here are a few of the technical SQL questions.

Discussion
The Reddit post is about a user sharing SQL technical questions they encountered during interviews for Data Analyst/Scientist roles. The questions range from basic to intermediate SQL topics, along with some data science-related questions. The thread generated useful discussions, including a user highlighting the interplay of NOT IN function and NULL markers per De Morgan's Law, which gained a lot of engagement. Another user asked for resources with both SQL questions and answers, and the post's author suggested Strarascratch and DataLemur as learning resources. Some users found the SQL questions relatively easy, implying a high level of expertise within the community. Overall, the sentiment is positive, with users appreciating the shared information and engaging in constructive discussions.
29 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Mulcyber7/24/2023
123

[D] Empirical rules of ML

Discussion
The Reddit thread [D] Empirical rules of ML features a discussion on the best practices for designing machine learning networks and choosing hyperparameters. Key highlights include recommendations to use Kaiming He initialization over random initialization to prevent quick activation saturation. The use of batchnorm, layernorm, and skip connections is emphasized for building large neural networks. A user also suggests using torch.inference_mode for inference as it enables extra optimizations. Classification is suggested as faster and more stable than regression, with M-estimators preferred over MLE for practical tasks. A discussion emerged on the Chinchilla law's applicability, advising caution for non-researchers. The thread also features a link to a thorough resource on network tuning.
68 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Yura527/27/2023
116

[R] New Tabular DL model: "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning"

Research
The Reddit thread discusses a tabular deep learning model called TabR, introduced by user Yura52. The conversation in the comments primarily focuses on the politics of scientific work, with user emfisabitch defending the merit of the work despite the authors' country of origin. There's also a consensus among the users that one should understand an open-source ML code before using it. A user named Trucker2827 humorously relates the discussion to the recent release of Oppenheimer. User pm_me_your_smth criticizes the mixing of unrelated global issues with the subject of the thread. Overall, the sentiment leans towards separating scientific work from political bias.
18 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/No_Excitement_10827/24/2023
78

Connecting Okta with MySQL - Seeking Advice

MySQL
The Reddit thread is a discussion on connecting Okta with MySQL for user management. Users suggest using Apono for setting up automated contextual access flows for managing MySQL permissions with Okta identities. Apono uses APIs and doesn't require changes to database server infrastructure. Okta's SCIM server is also recommended for handling provisioning. A link to MySQL's enterprise security product was shared. A service like Aiven was suggested for creating MySQL instances tied to Okta. The thread's sentiment is largely informational and constructive. The original poster thanked the contributors and planned to research the suggestions.
6 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/icysandstone7/24/2023
38

Does anyone use the clause WHERE 1=1?

Discussion
The Reddit thread on SQL's 'WHERE 1=1' clause reveals that while some consider it pointless in production, it is actually quite useful for exploratory data analysis and testing. It allows users to quickly comment out parts of a clause, saving keystrokes in the process. The clause is also used to copy the structure of a current table without its constraints. Some users prefer using 'WHERE TRUE' instead of 'WHERE 1=1'. Overall, despite initial doubts, the thread revealed the clause to be a handy tool for specific use cases in SQL coding.
79 comments
Share
Save
View on Reddit →