← Back to Newsletters
data subtldr week 23 year 2023
r/MachineLearningr/dataengineeringr/SQL
SQL mastery, Excel limitations, r/dataengineering blackout, 3D-printed Eigenfaces, Otter's multi-modal proficiency
•Week 23, 2023
Posted in r/MachineLearningbyu/BeatLeJuce•6/6/2023
2602
Should r/MachineLearning join the reddit blackout to protest changes to their API?
Discusssion
The r/MachineLearning subreddit is considering joining a Reddit blackout in protest of the platform's API changes, which could significantly impact third-party apps like Apollo and Reddit is Fun. These changes will make it more expensive for developers to run these apps, potentially leading to their demise or the introduction of monthly user fees. The community's top comments support participating in the protest, with some suggesting an indefinite shutdown until Reddit changes course. Users emphasize the importance of reminding Reddit that they use the site based on its functionality and that they could move to other platforms if necessary. The protest is planned for June 12th and may last 24-48 hours or longer.
Posted in r/dataengineeringbyu/OverratedDataScience•6/8/2023
985
"We have great datasets"
Meme
In the Reddit thread titled We have great datasets, users discuss the challenges of working with inconsistent data. The top comment humorously suggests using the 'drop table' command, implying the data is so bad it's not worth saving. Another user expresses discomfort with the image provided and requests its deletion. A third comment jokes about vendors slapping a GUI over a list of mapping values as a solution. One user points out difficulties in signing up on a website due to inconsistent location data, while another mentions the Levenshtein distance as a measure of similarity for dealing with such issues. The overall sentiment is negative and humorous, reflecting the frustrations of dealing with poor data quality.
Posted in r/dataengineeringbyu/Straight_House8628•6/6/2023
555
I’ve had the definition wrong this entire time…
Meme
The Reddit thread discusses the prevalence of Excel in data engineering. Users shared their experiences with Excel, highlighting its limitations and continued usage despite the availability of more advanced tools. One user mentioned using SharePoint and having over 100 people editing a sheet with colors and formulas. Another shared a story about a VP asking for a 1GB CSV file to do reporting herself, only to give up when faced with the data size. Some users found similarities with exporting data to Excel in the Power BI subreddit. One user suggested connecting Excel to Parquet files in a data lake for cost savings. Overall, the sentiment is that Excel remains widely used, despite its limitations and the availability of more advanced tools.
Posted in r/MachineLearningbyu/hardmaru•6/10/2023
478
Otter is a multi-modal model developed on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on a dataset of multi-modal instruction-response pairs. Otter demonstrates remarkable proficiency in multi-modal perception, reasoning, and in-context learning.
Project
The Reddit thread discusses Otter, a multi-modal model developed on OpenFlamingo, which demonstrates impressive proficiency in multi-modal perception, reasoning, and in-context learning. The model is trained on a dataset called MIMIC-IT, consisting of 2.8 million multimodal instruction-response pairs. Users appreciate the project but express concerns about high GPU requirements and the need for independent verification of the creators' claims. They also debate the real-time capabilities of the model and the challenges of parallelizing ML tasks, mentioning issues such as data transfer rates and bottlenecks. Overall, the sentiment is a mix of excitement for the advancements in the field and skepticism regarding certain aspects of the project.
Posted in r/dataengineeringbyu/AutoModerator•6/10/2023
278
r/dataengineering will be joining the blackout from June 12-14 to protest the proposed API changes which will end 3rd party apps.
Meta
The r/dataengineering subreddit is participating in a blackout from June 12-14 to protest Reddit's proposed API changes, which threaten to end third-party apps like Apollo, Reddit is Fun, Narwhal, and BaconReader. The policy change would make many quality-of-life features permanently inaccessible to users and impact subreddit moderators who rely on these tools. The community plans to go dark on June 12th, with some subreddits returning after 48 hours and others remaining offline unless the issue is addressed. Users are encouraged to complain, spread the word, boycott, and support alternative platforms during the blackout. Top comments on the thread include humorous remarks about becoming a data engineer and the new Blackout technology in comparison to SQL. Overall, the sentiment is in favor of the blackout protest.
Posted in r/SQLbyu/inner_attorney•6/7/2023
85
Very proud of myself! Taught myself multiple joins for the very first time
PostgreSQL
In the Reddit thread, the original poster (OP) shared their excitement about successfully learning multiple joins in SQL for the first time. The top comments offered words of encouragement, tips, and shared experiences. One user mentioned that complex queries with multiple table joins can be overwhelming but manageable with a left-to-right approach. Another suggested using table aliases for better readability. A third user recommended drawing out complex joins for better understanding, while another shared their recent Aha! moment with Common Table Expressions (CTEs). Lastly, a user recognized the course the OP was taking as the Udemy SQL Bootcamp and expressed their enjoyment of the course. Overall, the sentiment in the comments was positive and supportive.
Posted in r/SQLbyu/Casdom33•6/11/2023
44
SQL 😎😎😎
Discussion
The Reddit thread discusses whether SQL is a programming language or not. Most commenters agree that SQL is indeed a programming language, albeit with specific use cases and not a replacement for general-purpose languages like Python, Java, or C++. Some users argue that SQL is a functional language, with similarities to other programming languages in certain contexts, such as PL-SQL or MS Stored Procedures. Another user highlights the complexity of SQL, especially when dealing with databases. However, there is also a humorous tone in the thread, with users poking fun at the article's choice of a .py file image and the existence of various SQL language instances. Overall, the sentiment is positive and informative, with a touch of humor.
Posted in r/SQLbyu/roxy_coder•6/11/2023
36
SQL Interview Questions: A Comprehensive Guide to Commonly Asked Problems and Solutions
Oracle
In a Reddit thread discussing a comprehensive guide to SQL interview questions, users provided feedback on the content and pointed out some errors. One user noted that the example given for deleting duplicate entries incorrectly identified two employees as duplicates, while another user pointed out an issue with the code chunk in Question 7. The author, roxy_coder, acknowledged the errors and thanked the users for pointing them out, promising to correct the mistakes. Additionally, one user humorously appreciated the throwback 1990s phrase used by the author to express gratitude for reading their blog. Overall, the thread's sentiment is constructive with users helping to improve the content of the guide.