How important is operational data within an information environment? What is the difference between local data and central data? What is the importance of decentralization and standardization? Key questions addressed in this article.
Every company irrevocably has a certain amount of data, the nature of which differs depending on the department. So we can say to a certain extent that the diversity of the data depends on the number of departments. The data finds its way to the database through various channels. Usually this is via manual input within screen applications, in other cases it can be achieved by batch procedures or by loading external file data. Where the information is physically stored is less important in this case, information acquisition, as long as it is accessible. Extracting data after input causes data flow in the opposite direction. We can therefore divide data traffic into two types, incoming data traffic -or input- and outgoing data traffic -or output. Both definitions should be interpreted from the server’s point of view and not from the user’s. Processing data in a production environment is called Online Transaction Processing (OLTP). This means the live processing of data and includes both the input direction and the output direction of the data flow. There are many definitions in the literature for the concept of OLTP. The most important characteristic throughout all definitions is the transactional aspect, an aspect that only belongs in an operational environment. An OLTP transaction will therefore always focus on the processing of so-called production data. In most cases, such a system is compatible with managing a relational database or DBMS (Data Base Management System), a standardized structure that guides transactional OLTP processes. It includes all incoming and outgoing communications and is covered by an isolated security layer that must maintain the integrity of the data. A modern DBMS, such as Oracle or SQL Server, also provides various methods for efficient data management and performance tuning. So we can say that the DBMS forms a solid layer between the physical data and the sources that led to its creation.
What is often forgotten is that within every company there is a large amount of data that is not centrally available. It is located on the hard drives of the desktop computers or at best on some drive of the file system within the network. The local nature of such data does not diminish its importance, especially from the perspective of the person controlling the information. There are several reasons for decentralized storage of data, but the main one is the fact that the business often evolves much faster than the ICT development within the company. Managing local data in a sense circumvents the need for direct programming, a process that is often accompanied by significant delays. While there is no doubt that centrally managing data is in theory the only efficient method, there will always be a need to maintain local data. The business aspect of the company simply takes priority over the technical aspect.
Not all data originates within the company. It is not inconceivable that external files are supplied by other parties. For example, a purchasing department can upload the item data and prices of a major supplier within the central ERP application. Another example can be found in the CRM environment, where very large address files are often purchased for mailing purposes. Nowadays people also have the option to download all kinds of information from the Internet. A practical example is the acquisition of geographical data of countries with which one does business. This information can usually be obtained from the government of the countries in question and made accessible via internet access. Perhaps when creating a new customer, you would like to check whether the specified street exists within the street map of his country? Or perhaps you want to check the structure of your telephone number based on the telephone zones of that country? In such situations, this information is indispensable. The format of the external data can deviate from the DBMS format to such an extent that a lot of programming work is required to make loading possible. That is why classic text or spreadsheet-oriented processing is increasingly being replaced by standard formats. For example, the XML format, which stands for eXtensible Markup Language, has become one of the most used formats in recent years for supplying and uploading external data. XML is based on a fixed hierarchical structure of so-called tags, which takes into account various standardized ISO formats or character sets. Although the initial development of XML focused primarily on Internet-related uses, its unique nature leaves plenty of room for other uses. The principle of standardization makes an extensive form of multi-platform communication possible and also ensures that data transfers are much easier. Oracle was also not blind to this evolution and integrated various XML functionalities within its DBMS. Nowadays it is perfectly possible to store a complete XML document in its entirety within a field of a table, as this was also possible for graphic files (the so-called LOB principle). This distancing from the file system brings many additional options for further processing, archiving and backup. Compared to the stiff structure of classic text files, the flexible nature of XML can provide great added value for information processing. Figure 1 summarizes how incoming data traffic can find its way within the operational environment and how the obtained information can be physically structured.
Query & analyse
The physical distribution of the data and the diversity of its nature make it a hell of a task to provide reliable information within an operational environment. That is why data extraction on an OLTP system is better limited to use for strictly operational purposes. As an example, we can cite an accountant who receives a daily overview of outstanding invoices. This is a form of information that can also be important for a financial auditor, for example when investigating the payment behavior of customers. However, the information is too limited to make a definitive judgment as important factors, such as the financial condition of the customer or the market situation of the industry in which the customer operates, are missing. The auditor must seek this additional information through other sources of information, which will probably differ structurally from the accountant’s list. Hence, an operational environment is not at all suitable for analytical reporting. Any attempt to simulate this to some extent usually results in wasted time and data inconsistency. Now suppose that the data is structured in such a way that there are some analytical possibilities, then there still remains the issue of performance and utilization of the system. Input and output should be separated as much as possible. Since both occur on the same system, there will be a delay in certain cases, especially when the input and output will access the same database tables. Waiting times for output are annoying, but not insurmountable. However, the input should not experience any obstacles, because that is precisely the basis for the existence of the data. Without the input, there can be no extraction. It therefore takes precedence over output. Data extraction on an operational system should therefore be limited as much as possible to consulting specific information intended for the current operation of the company. As much as possible, it should consist of queries that can retrieve data based on a unique key – or primary key. Requesting information over longer periods or from various perspectives should be avoided because the structure of an operational environment is simply not suitable for this.
In practice, there are various options for extracting operational data. The most common and most rudimentary form is to retrieve a limited number of records from the screen applications where the initial input took place. Naturally, such extractions will rarely or never cause performance problems. However, they also provide very little global information and therefore only have a very specific use. A more general form of data extraction is reporting, a term that has already been discussed several times and which we must interpret here in the broadest sense of the word. Any overview of data, in any form, that is obtained outside the normal screen application can be categorized as a report. Although a report is more global in nature than a screen query, it is still appropriate to limit the amount of data associated with reporting somewhat. With operational reporting tools such as Oracle Reports, very complex matters can be achieved, matters that sometimes even correspond very closely with the functions of analytical reporting tools. However, the functional usefulness of the two is very far apart. For example, it is not appropriate to provide the option in an operational report to dynamically specify a certain selection period, for example when extracting invoice data based on the invoice date. In this way the gate is opened to possible performance problems. The user always defines the parameters based on his own needs and can – albeit completely unconsciously – cause significant technical obstacles.
When the need for information exceeds the operational level and the previously mentioned principles regarding performance and data quality can no longer be guaranteed, the data must be decentralized. In other words, it must be disconnected from the operational environment. Decentralizing data is primarily a technical matter and will therefore mainly rely on various system technical actions. This physical separation nevertheless remains completely dependent on the user purposes and can therefore only really be planned and implemented after the analysis of the functional needs. Even before these needs are known in detail, one can already say that there will be a need for a separate server. It is impossible to create an information environment on the same machine as the one on which the OLTP database is located. This tends to be a consideration when the amount of data is so limited that a common environment cannot cause performance issues. But even in that situation, a physical separation is appropriate, since the reporting environment can to a certain extent be considered a cluster of the OLTP system in terms of data extraction. Naturally, everything depends on the budgetary options associated with this operation. Before an analysis of the data separation can be made, both systems must speak the same language. In other words, the data used must be standardized. All too often, such uniformity is implemented when data is isolated from the transactional environment. However, it is much wiser to perform this exercise on the OLTP system, even before there is any data separation.
It is no exception that a company’s data has become contaminated over the years due to the parallel operation of business processes. Each aspect of the business is, as it were, on an island and has its own specific way of coding, a way that is generally known and accepted by the employees of the respective business processes. Usually the cause of this distortion lies in the historical growth of the company. As mentioned, functional business expansions always take priority over technical interventions and when starting up a company, people primarily think about storing the data and are little concerned about any data extractions that will follow. Depending on the evolution of the company, at a certain point the need for a uniform way of coding arises. Business processes are becoming more and more interconnected, even if the departments involved continue to represent separate segments. In such a situation, the solution lies in the development of a central application that will streamline the codes used. For example, if a department uses a specific code for a certain product, another department must know exactly which product it concerns. Only then can a global picture be created and the business situation be evaluated across the business units. Obviously, things are not that simple for large companies that have had a considerable lifespan behind them. The incorrect codes that are carried over the years through the various applications cannot simply be converted into a central codification. Impact analyzes must be done at every level of every application, conversions must be planned, in short, such an operation requires careful preparation and time. And that is often where the shoe pinches: one cannot allocate sufficient time or budget for an effort that is often described as maintenance. Not everyone is eager to leave the familiar working environment. Users are perfectly happy with the situation, blindly find their way through their codes and are therefore often skeptical about changes. This is often also the case with a redesign of the coding in function of the construction of an information environment. The people who provide data input are very rarely the same people as those who will analyze the extracted figures. The introduction of central code management is therefore certainly preceded by psychological warfare that should not be underestimated. One thing is certain: an information environment can only reach its full potential when it is fed by standardized information.