← Back to Newsletters

data subtldr week 7 year 2024

r/MachineLearningr/dataengineeringr/sql

Diversity in Tech Hiring Practices: Reddit Users Weigh In, Lessons from Using Airflow on Kubernetes, Decoding the Data Lake Mystery, The Trials of SQL Project Management and Interview Pressure, Starting Points for SQL Beginners, OpenAI's Impressive Text-to-Video Model Sora, Google's Gemini 1.5 Learning New Languages, Making Bookshelves Clickable with Computer Vision

Week 7, 2024
Posted in r/dataengineeringbyu/HiroKifa2/16/2024
769

Had an onsite interview with one of FAANG, all 6 interviewers were Indian

Interview
The Reddit thread discusses the perceived bias in tech hiring following a user's interview experience at a FAANG company with an all-Indian panel. Some users suggest that there may be a preference for candidates of the same nationality or culture, citing personal experiences of not receiving job offers after interviews with full Indian panels. Other users highlight the existence of discrimination in hiring, with one mentioning lawsuits against major tech companies for enforcing caste systems. However, some comments challenge this perspective, asserting positive experiences with diverse hiring panels. Overall, the thread underlines a need for further discussion about diversity and inclusivity in tech hiring practices.
71 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/htrp2/15/2024
372

[D] OpenAI Sora Video Gen -- How??

Discussion
The Reddit thread discussed OpenAI's Sora, a text-to-video model that can generate videos of up to a minute long. Users were impressed with the model's consistency across frames, with speculation that this was achieved through constraints or by generating entire videos instead of frames. There was discussion on the resources needed, with concerns about the high computational cost due to the model's complexity. Several comments expressed frustration with the limited resources available at their own workplaces compared to OpenAI. Users also pondered how OpenAI collected and labeled data for Sora, given the quality of the output. The thread reflected a mix of admiration and intrigue about OpenAI's capabilities.
194 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Electronic-Author-652/15/2024
283

[N] Gemini 1.5, MoE with 1M tokens of context-length

News
The Reddit thread discusses Google's next-generation model, Gemini 1.5, capable of processing 1 million tokens of context-length. Users expressed amazement at its ability to learn new languages, such as Kalamang, from minimal instructional materials. There's speculation about its performance, with some stating it could be a significant achievement if it performs similarly to the previous model. There were also jokes about the model's interpretive capabilities. The discussion highlighted Google's computational advantages over other companies like OpenAI. Concerns were raised about the shift towards more secretive practices in AI research. Users also discussed the model's technical aspects, including its ability to handle large amounts of data efficiently. The thread's sentiment is mostly positive, with users impressed by the model's capabilities.
67 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/DuckDatum2/14/2024
173

What the Hell is a Data Lake?

Discussion
The Reddit thread titled 'What the Hell is a Data Lake?' sparked a lively discussion about the nature and purpose of data lakes. The top comments highlight several perspectives: one user described a data lake as a project where data is collected without a clear purpose, another simplified it as storage for raw data files, and a third sarcastically mentioned it is a means to 'finally do AI'. Some users likened it to a 'bigass Google drive', while others blamed cloud hosting companies for hyping up the concept. The sentiment leaned towards skepticism and humor with a bit of frustration about the over-complication of the term and its misuse by companies.
119 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/zerojames_2/14/2024
128

[P] Making my bookshelves clickable with computer vision

Project
The Reddit thread discusses a project that uses computer vision to make bookshelves clickable. The author used tech such as SAM, OpenCV, GPT-4, Google Books API, and HTML to accomplish this. The sentiment in the comments primarily revolves around the usage of GPT-4 for OCR, with many suggesting it as overkill and recommending alternatives such as Tesseract, Amazon Textract, Azure OCR, easyocr, and Google's OCR for being cheaper and more efficient. The author is open to suggestions for improving the book detection rate. Overall, the project was well-received, but the community suggests more cost-effective solutions for the OCR part.
23 comments
Share
Save
View on Reddit →