Spoiling the Mystery and Suspense of Troubleshooting

As in the mystery and suspense movies of old, IT has always required a lot of detective work, and a lot of detectives. Troubleshooting, with all its suspense and iterations, has had a central role among EUC teams for decades.

Times are changing however – as are budgets (and budget allocations)! Today’s organizations still need detectives to help solve the daily mysteries and riddles, but it is time for effective technology solutions to start replacing the more expensive troubleshooting agents. The focus needs to be more on the tools a detective can use rather than old-fashioned intuition and narrow expertise. The goal should be to get ahead of issues before they occur: think Minority Report, rather than Sherlock Holmes.

I vividly remember one of my first meetings with a lead architect of a Global Service provider 8 years ago. A colleague and I had spent the week in Bengaluru trying to understand how Nexthink could help him deliver a superior service while keeping his operations costs low.

Friday evening, before heading to the airport, he pulled us into a small room. Face-to-face, in the hushed meeting, he explained his cost structures, including support Levels 1, 2 and 3. “We don’t need a sophisticated log aggregator to equip our most expensive team (L3),” he told us, “just so they spend their time digging inside logs. They’re too expensive and every minute they spend troubleshooting is a costly failure for us. Our only way is to make Level 3 redundant by solving issues in Level 1 and below. What we need is a solution that can automatically detect, diagnose, and propose the most probable fix. It must be understood and applied by our least expensive resources.”

No need for suspense: A good DEX solution should be a spoiler

L3 teams are knowledgeable and passionate about their work. They love to solve the mysteries of outages and glitches in the system. And like Sherlock Holmes, they’ll happily sift through mountains of data, searching for clues to land on a suspect and reveal the killer problem.

All the evidence, however, points to the fact that the L3 team is no match for a fit-for-purpose DEX solution.

A true DEX solution should be capable of solving and preventing incidents at the lowest level, with the following priorities adhered to:

1) Automatic fixes without intervention from the Service Desk;
2) Automatic fixing proposals to Level 1 so they can fix quickly;
3) Automatic data processing and correlation analysis presented in a simple manner to level 1 so they can finish the diagnosis and the fixing.

If a DEX solution doesn’t deliver the above 99% of the time, but instead proposes a ton of logs to Level 3, then it simply isn’t fit for purpose – it’s not powerful enough. It is essentially only putting the same logs Level 3 teams can access, via scripts, in a different format. It’s not a gamechanger. It doesn’t make EUC teams more effective.

The key to Workplace Management is not to give L3 millions of logs, retrieving crash dump files from devices so that a very sophisticated – and horrifically expensive – team can do the analysis. The aim should be to understand automatically what led to the crash. The power inside a DEX technology should be its ability to correlate events, to put all events on a timeline and to understand automatically what event led to that issue, what configuration difference made some of the users successful and some not, and what combination of systems, configurations, connectivity, OS, application, and other criteria led to the issue. This is the core power an exceptional DEX solution needs to have to achieve its objectives. So, removing the mystery and suspense – the long wait before the suspect is apprehended – is essential.

How much data, what data and how is it processed?

What my friend from the Managed Service provider explained to me 8 years ago is today well known by everybody in the Industry. So, it comes as a surprise that some vendors coming to the DEX space still push their troubleshooting capabilities, when they are fully aware of the horror induced in the accounting department when they see the associated L3 costs. Maybe they do it because they know that their own solutions lack the necessary basics to automate detection and diagnostic capabilities.

In the real world there exists an equation of sorts to solve an IT “crime”.

It follows:

Events: Events refer to specific incidents or occurrences. A detective needs to understand the sequence of events leading up to a crime. They reconstruct the timeline and can identify key moments and potential suspects. This is achieved by establishing the context surrounding an event.

Characteristics: This refers to the distinctive features associated with people, objects or locations involved. A detective needs to know behaviors, habits, and patterns, so they can connect evidence and establish links between the different elements.

Relations: Relations refer to the connections or associations between individuals, groups, or objects. Relations provide insights into potential suspects, with an analysis that establishes a timeline and uncovers conflicts.

The problem with many solutions born as monitoring technologies (but not as DEX solutions) is that their tech stacks operate like a clueless detective.

Instead of quickly finding clues and resolving cases, they push manual work onto the IT department. Those technologies are certainly pulling a huge amount of data but mostly it’s just one of the three types of data needed by the truly smart, modern detective. Often, they are pulling “characteristics” — a huge amount of detail about the digital objects. But they are incapable of seeing the two other major components: events and relationships. And without such data, and the ability to put everything on a timeline, the mystery is not solved automatically. Such solutions are little more than paper pushers, handing over tons of data to the L3 teams so that they can continue doing the same manual detective work! If a DEX solution can’t fully understand a company’s digital objects, events, characteristics and relations, in a computing context, it won’t be able to automatically fix the problem.

Some vendors highlight troubleshooting capabilities because they are branding a monitoring solution or a device management solution as a DEX solution. DEX is about observability, seeing all events, all relationships, all characteristics and putting it into context. It’s fine to facilitate troubleshooting for 1% of cases that can’t be fixed automatically, but focusing the worst-case scenario is misleading and IT departments should be cautious.

The Mystery and Suspense of the daily slog of fixing IT isn’t necessary anymore. The utility of Sherlock Holmes will live on forever, but L3’s time has passed.

Want to understand better how DEX is revolutionizing End-User Computing?

Watch this: