- Identifies hardware and software failures, adjusts timeouts, and prevents false positives in mirroring.
- Fixes patterns that degrade performance: N+1, WHERE functions, and missing indexes.
- Avoid SQL errors (syntax, order, aliases) and bad design practices (PKs, normalization).
- Implement a KEDB to resolve recurring incidents quickly and transparently.
The goal is for you to leave with a clear map: why they occur, how to detect them, and what measures to take. You'll find detailed guidelines on Hardware and software errors in SQL Server (mirroring), performance-killing patterns, typical query writing mistakes, classic modeling/development failures, and an ITIL/ITSM approach to documenting and resolving recurring issues with a well-structured KEDB.
Hardware errors: signs, causes, and reaction times
Physical failures usually "sing" quickly because other system components notify the database engine. When this happens, the server receives a hardware error reported immediately, although sometimes there are latencies due to network or I/O timers that delay notification.
Common causes include: broken connection or a damaged cable, a failing network card, changes to the router or firewall, endpoint reconfiguration, loss of the drive hosting the transaction log, or process/OS errors. These are problems that, if they affect the log disk or the network, can cause disconnections or serious interruptions in database replication or mirroring.
Be aware that some network components and certain I/O subsystems apply their own internal waiting timesThese timeouts are independent of the database and can delay detection, increasing the time between the actual failure and the engine becoming aware of it.
To better understand what is happening “on the wire”, it is a good idea to ask the network team what messages arrive at the port when typical events such as DNS down, cables disconnected, ports blocked by firewall, an application listening to the port crash, a server name change, or a reboot. This inventory of symptoms speeds up diagnosis when service is suddenly interrupted.
Software bugs and timeouts: when to fix them and how to avoid false positives
Software failures do not communicate themselves: the server could be down waiting indefinitely If there were no monitoring mechanism in place. Therefore, in scenarios such as database mirroring, instances are periodically pinged, and if no signal arrives within the agreed time, a problem is considered present.
Among the conditions that trigger these waiting times are: Network errors (TCP timeouts, corrupted, lost, or incorrectly ordered packets), an unresponsive operating system/server/database, Windows-level timeouts, and resource shortages: disk or CPU overload, transaction log at 100%, insufficient memory or threads.
If you find yourself in that situation, you can choose to extend the timeout, reduce the load or improve the hardware to absorb demand. Setting the wait time too low causes false positives; setting it too high delays the response to real failures.
The ping/timeout mechanism in SQL Server mirroring
To keep each connection alive, each instance sends pings at a fixed interval. If a ping is received within the waiting time window (plus shipping time), it's assumed the communication is still active, and the counter is reset. If no ping is received within that interval, the timeout is declared and the connection is closed, handling the event according to the role and operating mode.
Even if the other server is fine, a timeout is taken as a failureIf the configured value is too short for the normal latency of the environment, "phantom" errors will appear. Therefore, it is recommended not to go below 10 seconds.
In high performance mode the wait time is always 10 s; this is usually sufficient to avoid false positives. In high-security mode, the default value is also 10 s, but it's configurable; adjust it in that mode to 10 s or more if the network is "lazy."
If you need to change it, remember that this modification is specific to sessions in high securityYou can view and modify it from the engine administration or with T-SQL depending on your version and policies.
How the server responds when there is an error
In the event of any type of error, the instance acts according to its role (primary/witness/secondary), mode of operation and connection statusWhen a partner loses, behavior differs depending on whether we're using high-performance or high-security with a witness, so it's key to document the operating mode of each session to anticipate interruption times and switchovers.
Performance-killing patterns in SQL (and how to fix them)
There are four very common "sins" that unnecessarily worsen latency. They're easy to spot, and if you avoid them, You save CPU, I/O, and trips to the database since the first day.
Queries inside loops: launching a query for each iteration (the classic N+1) multiplies trips and latencies. Bring the data at once with a union or an IN statement, or use batch queries. Process the logic in your code with structures already loaded in memory.
Loading too much data: Bringing in columns and rows you're not going to use is like killing flies with a cannonball. Filter in the database, select only what is necessary, pagination if applicable and avoid SELECT * unless you have a clear reason.
Functions in the WHERE clause: Applying LOWER(), DATE(), or other functions to columns often prevents the use of indexes. It's better to compare without transforming the column: preprocesses the data before or transform the literal. For example, filter by date range with date/time columns without wrapping them in functions.
Missing indexes: Forgetting indexes on columns that filter or join is asking for a full scan. Periodically review where your application filters/joins and create the appropriate indexes (composites when appropriate). Balance: too many indexes penalize writes.
Typical errors when writing SQL: syntax, order, and ambiguities
Most rookie (and not so rookie) mistakes are syntax y SQL injectionThe database doesn't understand what you're asking and complains. A highlighting editor helps, but knowing the typical pitfalls speeds up the fix.
Misspelled words: Mistakes with FROM, WHERE, or table/column names are common. Messages usually indicate where the parser failsUse an editor with highlighting and auto-completion; if a keyword isn't highlighted, be suspicious.
Parentheses and quotation marks: Missing a parentheses or quotation mark creates a hole that is difficult to see. Keep in mind the operator precedence (AND/OR) and group with parentheses. In text literals, escape internal quotes or alternate single/double quotes to avoid breaking the string (e.g., O'Reilly).
Invalid order in SELECT: the correct order is SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY. Changing the ORDER BY or HAVING position causes errors. Memorize it or have a cheat sheet handy.
Ignore table aliases: In a self join or when there are columns with the same name in two tables, you will get “ambiguous column”. Use short and clear aliases and reference columns with alias.column. Also, the SQL is more readable.
Case-sensitive or special names: If you insist on case-sensitive names or names with spaces, you will need to quote them with double quotes According to the engine. It's best to avoid those names; if not, be consistent when citing them.
Common mistakes in database development
Beyond writing queries, there are design decisions that make a difference in the medium and long term. Here are five common development mistakes, along with what you should do instead. protect integrity and maintainability.
Overusing stored procedures: They're useful, but with ORMs and modern access layers, you no longer need to put all your logic there. SPs have a cost of maintenance and versioning; create SPs for data access when justified, not for application business logic.
Do not use primary keys: Delegating uniqueness to views, SPs, or the application increases complexity and errors. Define Real PKs on all tables and use unique identifiers where applicable; you'll avoid post-process deduplication and fragile queries.
Hard delete instead of soft delete: Physical deletion complicates audits and recoveries from "oops, I messed up." For many use cases, it adds a check mark. active/inactive (soft delete) and exclude it from your queries. Leave the physical deletion for controlled purges.
Known Error Database (KEDB): What it is and why it's a good idea for you
In operations, not everything can be solved instantly. Workarounds are inevitable. limited resources, complexity or continuity needsThe KEDB is the repository where you document every known error, its cause (if any), and its temporary or permanent solution.
It is part of the ITIL framework and intersects with Problem Management and Knowledge Management. When a recurring incident arises, the team consults the KEDB, applies the tested resolution and reduce downtime instead of starting from scratch.
Benefits for users: faster resolution, fewer interruptions and results more predictableFor IT: efficiency (no reinventing the wheel), knowledge retention despite turnover, and data for continuous improvement. For stakeholders: transparency, informed capacity/risk decisions, and cost savings.
How to implement an effective KEDB step by step
1) Define scope and objectives: decide which failures are included (software, hardware, network or specific areas), how they are classified and what goals you pursue (reduce MTTR, improve satisfaction, etc.). Prioritize critical systems and services and align scope with business objectives and SLAs.
Guiding questions: Does it cover everything or do we start with the most impactful? Are we primarily looking to shorten resolution times or also spot patterns for prevention?
2) Collect and document: Work with Incident/Problem Management and teams to capture recurring errors and effective workarounds that aren't yet written. Use a simple template with fields like description, root cause (if known), temporary/final solution, impact, dates, and notes.
Tips: clear language, actionable steps, helpful ratings, linking related incidents and update entries when there is news.
3) Choose the tool: you need good search, categorization/tagging, links between elements, scalability, reporting and ability to integrate with your ITSM platform. Prioritize a user-friendly interface to encourage adoption; consider AI-powered search and customizable workflows.
4) Train teams: teach them how to document well, search effectively, and maintain quality. Includes practices with real cases, quick guides, videos, mentoring, and periodic refreshers. It encourages feedback to improve the process.
5) Maintain and improve: Define responsible parties, KPIs, and a review cycle (monthly for critical items, quarterly for the rest). Establish peer review of new entries. continuous documentation after each problem and a feedback channel from users and support.
6) Promote its use: carry out an internal campaign, recognize those who contribute the most and promote it. collaboration between areas (network, software, hardware). Integrate KEDB into your service culture so it's the first place to look.
KEDB vs. Knowledge Base (KDB): Key Differences
The KEDB focuses on known failures and their resolution (temporary or permanent), closely aligned with Problem and Incident Management. It is usually integrated with ITSM and its main audience is the technical team that resolves incidents.
The KDB covers much more: best practices, procedures, configuration data, help articles, etc. Its maintenance is more extensive, with reviews of procedures and good practices In addition to technical articles. In short: KEDB is the subset that specializes in "when this fails, do that."
As you can see, the stability and performance of a database depends as much on understanding the physical and logical failures (and their detection times) and writing good SQL, modeling wisely, and organizing operational knowledge. If you address the four performance patterns, avoid typical syntax errors, design with strong PKs and sensible normalization, and also build a live KEDB, you'll have a faster, more predictable, and easier-to-operate platform even when things go wrong.
Table of Contents
- Hardware errors: signs, causes, and reaction times
- Software bugs and timeouts: when to fix them and how to avoid false positives
- The ping/timeout mechanism in SQL Server mirroring
- How the server responds when there is an error
- Performance-killing patterns in SQL (and how to fix them)
- Typical errors when writing SQL: syntax, order, and ambiguities
- Common mistakes in database development
- Known Error Database (KEDB): What it is and why it's a good idea for you
- How to implement an effective KEDB step by step
- KEDB vs. Knowledge Base (KDB): Key Differences