← Back to data subTLDR
data subTLDR week 12 year 2025
r/MachineLearningr/dataengineeringr/SQL
Unlocking the Secret to DE Jobs: Likability & Interpersonal Skills, Mastering SQL, Python, Spark for Data Engineering, Corporate Inefficiency: Overabundance of Managers, AI's Irreversible Memory, and a 47% Leap in Code Completion with Qwen 2.5 Coder.
•Week 12, 2025
Posted in r/dataengineeringbyu/pawtherhood89•3/18/2025
571
Why you aren't getting a DE job
Career
The main insight from the discussion is the importance of likability and interpersonal skills in securing a Data Engineering (DE) job. Most candidates who pass HR screening are deemed qualified, making personality fit a key hiring factor. Examples shared indicate that building relationships, even outside formal work settings, can safeguard against layoffs. However, concerns were raised about being stuck due to outdated tech stacks, which was countered by a hiring manager emphasizing problem-solving and transferable skills over matching current tech stacks. The difficulty of landing the first DE job and the contrasting experiences in small and large organizations were also discussed. Overall, the sentiment was mixed but largely constructive.
Posted in r/dataengineeringbyu/ChoicePound5745•3/17/2025
524
Which one to choose?
Career
The majority of Reddit users recommend mastering SQL and Python, and learning Spark/PySpark for a solid foundation in modern data engineering. Docker is also suggested due to its cloud-agnostic nature. However, some users express frustration at the multitude of tools and trends in the field, likening it to a popularity contest. They advise choosing tools based on the specific goal, budget, and use case, rather than what's currently in vogue. The use of containers, such as Docker and Kubernetes, for managing multiple services is also highlighted. Despite some tongue-in-cheek suggestions for using simple tools like Excel, the overall sentiment leans toward mastering versatile, industry-standard technologies.
Posted in r/dataengineeringbyu/HMZ_PBI•3/21/2025
392
Corps are crazy!
Discussion
There's strong criticism of perceived corporate inefficiency, particularly an overabundance of managers whose roles seem superfluous. Many feel that managers often contribute to a slower workflow due to excessive meetings and insufficient practical contribution. However, redundancies reveal that roles focused on practical work, like engineering, are typically preserved over managerial roles. Some highlight the value of a good manager who can shield engineers from corporate distractions. The sentiment is largely negative, indicative of widespread frustration with perceived corporate bureaucracy and inefficiency. There's a call for more practical roles, like Data Engineers, and less managerial positions.
Posted in r/MachineLearningbyu/No_Release_3665•3/22/2025
194
[Research]Can AI remember irreversibly, like a brain does? I built a model that tries — and it works surprisingly well.
Research
The discussion revolves around the capability of an AI to remember irreversibly, akin to the human brain. Many participants are impressed with the successful model that was built, indicating broad support for the idea. Some expressed concerns about potential misuse and ethical implications, reflecting a slightly mixed sentiment. Other prevalent views underscore the importance of continuous learning and refinement in AI technology. However, there's a consensus that the technology is promising and could revolutionize fields like neurology and AI research. Overall, the sentiment is positive, with excitement about the technology's potential but cautious about its ethical use.
Posted in r/MachineLearningbyu/CountlessFlies•3/17/2025
173
[P] I fine-tuned Qwen 2.5 Coder on a single repo and got a 47% improvement in code completion accuracy
Project
The fine-tuning of Qwen 2.5 Coder on a single repo resulted in a significant 47% improvement in code completion accuracy. This strategy mirrors that of ninetyfive.gg. The process to determine prefix/middle/suffix splits for training, although basic, has proven effective and leaves room for improvement. While there's a potential for overfitting, this could be mitigated by training a different LoRA for each codebase. However, an overfit fine-tuned model could pose issues as codebases evolve or when implementing novel functionality. The training log is not publicly accessible due to WandB's premium feature for public sharing. Overall, the sentiment is positive.
Posted in r/dataengineeringbyu/DuckDatum•3/23/2025
159
Where is the Data Engineering industry headed?
Discussion
The future of Data Engineering is seen to be increasingly intertwined with Software Engineering, with a shift towards declarative processes and further incorporation of dev, staging, and prod branches. However, the industry's direction is debated. Some believe we are returning to traditional SQL databases and single machine processing due to advancements in CPU technology, while others foresee continued specialization and the emergence of dominant services. The offshoring of data engineering work is another key trend, although its effectiveness is contested. Concerns about current practices include unpredictable costs of cloud data warehouse solutions, which are causing some to revert to technologies with more manageable cost structures.
Posted in r/SQLbyu/Captain_Strudels•3/19/2025
134
I've worked with SQL for years and have no clue what GO does
SQL Server
In a discussion about the function of 'GO' in SQL, the consensus is that 'GO' acts as a virtual 'end of file', marking the boundary between different sections of code. If an error occurs in a later section, previous sections still complete successfully. It's particularly useful in testing, allowing a query to run multiple times. 'GO' signals the end of a batch of Transact-SQL statements, facilitating readability and execution of scripts. It plays a key role in managing variables, as variables in a batch are not in scope of other batches, but global variables are available.
Posted in r/MachineLearningbyu/faintlystranger•3/23/2025
94
[D] "Topological" Deep Learning - Promising or Hype?
Discussion
The emerging field of Topological Deep Learning (TDL) is sparking debate. While some users see TDL's potential for incorporating higher-order structural relationships in representations or architectures, others question its practicality due to the computational expense of modeling higher-order interactions. A few highlight its potential relevance in niche fields like biochemistry and material sciences. An author of a TDL position paper admits that current topological neural networks have limitations but insists on ongoing research to overcome these. Overall, the sentiment is mixed, with users recognizing TDL's theoretical appeal but expressing reservations about its current applicability and effectiveness.
Posted in r/MachineLearningbyu/skeltzyboiii•3/18/2025
89
[R] Jagged Flash Attention Optimization
Research
The Jagged Flash Attention Optimization could have significant practical implications, with experiments showing a 10% improvement in Queries Per Second (QPS) and an 18% reduction in memory usage. However, it's important to note that up to 9x speedup doesn't necessarily mean 9x faster inference across all applications. The efficiency of local versus cloud-based inference for larger models was also discussed, highlighting how even small latency improvements can be substantial for real-time applications. Many are eagerly awaiting the implementation, and there's curiosity about the specific model this optimization will be deployed in.
Posted in r/SQLbyu/Mafioso14c•3/18/2025
50
Interview struggle
Discussion
In a discussion about data integrity and validation during the interview process, users highlighted the importance of using appropriate data types, avoiding nulls where possible, setting indexes and unique constraints, and establishing foreign keys. Utilizing database tools to ensure data quality was emphasized. Dynamic SQL query construction using user-selected filter values was discussed, with parameterized queries being recommended to handle user input. Data profiling was suggested as a vital part of data validation, with checks for valid dates, numeric values, outliers, and unique primary keys. The role of stakeholder expectations and domain expertise in making technical decisions was underscored, with an emphasis on understanding data requirements and definitions.
Posted in r/SQLbyu/brandi_Iove•3/23/2025
49
A cool feature i just came across
SQL Server
The discovery of a live index feature in SQL Server and MSSMS, showing the count of rows being processed during execution, sparked a discussion on efficiency in database operations. Several contributors emphasized that set-based operations are more efficient than row-by-row updates, which are necessary for the live index feature to function. Questions were raised about the feasibility of executing row-by-row updates on millions of rows. Suggestions included using partition swapping for greater efficiency and adjusting practices to batch set-wise operations. The sentiment was mixed, with appreciation for the feature tempered by considerations of operational efficiency.
Posted in r/SQLbyu/_mr_villain_•3/18/2025
38
What is wrong here.
MySQL
The Reddit discussion revolved around a SQL query problem. The issue arose from attempting to use a function name as an alias for a column in MySQL. Though the user thought the 'DESC' addition fixed it, the problem was actually resolved by changing the alias. It was noted that standard SQL should implicitly invoke 'ASC' for 'ORDER BY'. It was also clarified that 'Partition-by' is optional, and the MySQL version can impact whether certain errors are thrown. Several users shared solutions and workarounds, including different queries for MySQL versions 8.0+ and 5.7 or older. The sentiment was constructive and solution-focused.
Subscribe to data-subtldr
Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.