

In the next few blog posts I am going to talk about discovery tools and how they are evolving to support the goal of an up-to-date and accurate CMDB.
Discovery is often sold as a way to “automagically” populate key data into a CMDB removing the need for manual process. If you are starting your CMDB from scratch the idea of filling your database, quickly, with up-to-date data is compelling, although as with anything in life, if it seems too good to be true… it probably is.
For this blog I am going to look at one of the key challenges of discovery – data retrieval and normalisation. I’ll examine the changing nature of discovery tools to see how these changes will eventually impact the approach and outcome of configuration management.
The key thing to remember is that the data provided by the discovery tool is only as good as the data retrievable from the object being discovered.
In many cases data retrieved from similar sources can be different for all sorts of reasons; a slightly different operating system or software version, BIOS or patch levels, language selections, etc. This usually means that work is required to normalise and transform the in-bound data into a format and data set that is useful, consistent and can be acted on programmatically.
In an enterprise environment where new products and software are being introduced and updated all the time, normalisation can represent quite an investment in time and people. Having a normalised data set is essential when a company is undertaking an initiative like Software License Management, as it is necessary to be able to uniquely identify software products or even the number of cores in a CPU, so they can be matched to licenses.
To help with this challenge there are products to help manage this normalisation by providing a constantly updated matched list of discoverable items to vendor products that can be used to line up to their contracts.
Discovery through the ages
Not all discovery tools are equal, and the nature of discovery has changed substantially over the last 20 years. It’s worth looking at these changes to give some context.
First generation discovery
The first generation of discovery tools used agents. These were software apps that were designed to run full and incremental file scans, generally during quiet periods, so as not to impact the performance of the machine. As well as the file scans these agent-based tools were also able to gather real-time information about running software, network and external application connections.
Agents typically collected a lot of information and needed supporting infrastructure to move and store this data. They needed to be installed and maintained, and used valuable compute/memory resources.
This technology was the most expensive to implement and run and collected large amounts of in-depth data that had to be transformed and normalised.
Second generation discovery
The second-generation tools were the agentless. Agentless tools do not have access to the same breadth and depth of information that agent-based tools could provide. They generally collect data on a scheduled basis, using scripts to gather more targeted data than that of the agent. Information that can only be gathered in real time usually cannot be captured. Most of the useful information requires authentication, which is a consideration for deployment of these technologies.
What agentless discovery tools do provide is a cheaper, more convenient way of gathering a “good enough” data set that is important to most enterprises, without the overhead of installing and managing an agent-based infrastructure.
Once again transformation and normalisation of the data is required.
Third generation discovery
With the emergence of cloud technologies, the need to scan all systems and components is diminishing. The third generation of discovery isn’t really discovery at all. With AWS and Azure you tell the service provider what you require by creating a configuration or template that you apply to their environment. The configuration contains details of compute, storage, resilience and application requirements and is the basis on which the service is created and run. You are abstracted from the nuts and bolts of what is being delivered, as you are buying a service.
In this scenario “discovery” is effectively reading the configuration and asking the service console what is being consumed. The cloud consoles manage capacity and configuration in real time and so consideration is needed to decide how you wish to use this data within your CMDB.
It does however simplify the original problem, discovering the details and trying to merge them together into a useful data set. Instead of normalising software you are paying for a service where the particular version is completely determined by the contract. There is no room for ambiguity.
Third generation discovery doesn’t however completely solve the Application / Business Service discovery requirements. There is still a need to “Tag” configurations with organisation specific identifiers for applications and business services. Just as you need to do with previous discovery implementations.
Conclusion
There is no doubt that discovery tools have improved and become cheaper to implement and run. But that creates the risk that people implement the tooling without considering the end to end effort required to make the data valuable i.e. complete, accurate and up to date. There is always effort to normalise and validate the data and no magic tool has been created to remove that. It’s like buying a few pallets of bricks and expecting to be able to build a house. Until those bricks are organised and aligned and related to each other you won’t succeed.
While cloud based “discovery” tools potentially remove the need for a lot of the normalisation associated with the older generation of tools, it does not however, allow you to capture application nor business services without additional work packaging or tagging this information as the software is deployed and maintained. This remains true across all three types of discovery.
In the next in the series, I will look specifically at the discovery of applications, Business Services and the relationships to and from them.
Other posts in this series: