Automating Search In XML File With VBA in Excel
They wasted THOUSANDS of HOURS, no automation at all…
For years, system analysts in a company I used to work for, did the same grueling search over and over again. They had to write software system requirements documents (SRS documents) in which they had to describe APIs (Application programming interfaces).
Each API had Input parameters, business logic, and output parameters. In many APIs a call to a legacy system was made. This legacy system stored all of the business data in a large data base, and the interface to the legacy system enabled querying for information and also performing various important business activities.
The problem was the names of the parameters of the legacy system’s interface. This interface was built to accept and to output parameters with old names that where programmed into the code when the legacy system was built a long time ago, at the beginning of the business. No one wanted to change the legacy system’s code with new parameter names, because it was too critical and the company depended on it to continue serving its customers. The risks of breaking interfaces and causing bugs in the legacy system’s code and data base, were too high. Another alternative solution had to be found. So, the alternative solution was constructed as follows:
1.) A translation XML file was created with each parameter’s legacy name and the translation to this name in the form of a new name of the parameter as it is used in the input or output parameters of the APIs that are using this parameter.
2.) The XML structure relating to what is described above look like this:
The XML elements meaning is:
<oldName> the XML element that contains the name of the legacy system’s parameter.
<newName> the XML element that contains the name of the corresponding new API’s parameter.
3.) A web application was created for the system analysts to search in the XML described above.
The system analysts used the application to get all of the OLD parameter names of the legacy system interface, and then search for the NEW parameter names. They had to take an OLD parameter name, write it in the search box of the web application, and then click the search button to get the NEW parameter name. They had to do it ONE parameter at a time, ONE by ONE, a grueling search for each and every API. They wasted THOUSANDS of HOURS on this kind of search for YEARS and YEARS…
The developers of the web application did not think about creating a way to search for MULTIPLE NEW names in one search.
The OLD names and the NEW names had to be documented inside the XML file for future use and also it had to be written in the system requirements documents (SRS documents) that the system analysts wrote for the developers. In turn, the developers used the SRS documents to code the new APIs with the NEW parameter names
The system analysts had to describe theOLD names and the corresponding NEW names in their SRS document as follows:
Image 1 – the parameters table in the SRS documents
The automation solution was just around the corner, but nobody saw it (or tried to see it).
Nobody thought about creating a simple and effective solution for this grueling search, although it was just around the corner, in something that was used each and every day, EXCEL’s VBA.
When I joined the company as a system analyst, I already knew how to code in various languages and VBA was also one of them, All I had to figure out is how to get the updated XML file with all of the OLD and NEW names mapped correspondingly as described above. I have found out that the XML file was updated on a daily basis with every addition of new [OLD name] -> [NEW name] mapping.
Enter automation using EXCEL’s VBA.
Excel’s VBA is a powerful tool that is already present whenever you install excel (preferably version 2013 or above).
Click here to see how to enable the “Developer” tab so that you’ll be able to write VBA in your Excel.
My plan was to write VBA code in Excel that will open the dictionary XML file that is described above, and in one swoop translate a big list of OLD names to their corresponding NEW names.
The plan was as follows:
1. Open the XML file.
2. Read a user defined list of OLD names form a dedicated Excel sheet.
3. For each OLD name, use XPATH to find the <oldName> element.
4. Move to the next (sibling) element to get to the <newName> element.
5. Extract the text from the <newName> element, i.e. get the corresponding NEW name.
6. Write the NEW names in the same sheet where the OLD names are found, in a location, i.e. column, that the user will define in a “Settings” sheet (in the same row of the corresponding OLD name).
When I plan such solutions in Excel’s VBA, I try to be as generic as I can, i.e. I use a “Settings” sheet in which the user can define what should be done, how many times, the path of files, etc… The VBA code will extract the various definitions from the “Settings” sheet and act accordingly. The “Settings” sheet looks like this:
Image 2 – the “Settings” sheet
Besides the “Settings” sheet, I also use another sheet to hold the list of the OLD names. This sheet is named “ESB” i.e. Enterprise Service Bus.The ESB is a software implementation that serves as a service mediator between the legacy system and other business applications such as the applications here that implement various APIs to extract data from the legacy system’s data base. In this context, a NEW business application will call a service exposed by the ESB in order to communicate with the legacy system’s code. The ESB will channel and handle the communication between the two applications, transfer the required input parameters to the legacy system’s code, then the business application will get the data output it needs and expose it in its API output to whoever called it, i.e. some other application that will use the NEW API with the NEW Names of the input and output parameters.
How to use the new automation VBA Excel sheet
In order to use this automation sheet, the user must fill the “Settings” sheet as follows:
1. Enter the path to the dictionary XML file (either on a local PC or in a network location)
under the “XML Path” column. In image 1 above it is “C:\Dictionary.xml”.
2. Enter the column letter that contains the OLD names in the “ESB” sheet (I will elaborate more on the structure of the “ESB” sheet below), under the column “OLD Column Letter”. In image 1 above the column letter is “A”.
3. Enter the row number from which the VBA code will need to start the iteration over the list of the OLD names in the “ESB” sheet. This has to be written under the column “OLD Row START”. In image 1 above the row to start is row 10.
4. Enter the row number in which the VBA code will need to end the iteration over the list of the OLD names in the “ESB” sheet. This has to be written under the column “OLD Row END”. In image 1 above the row to end is row 300.
5. Enter the NUMBER of the column in the “ESB” sheet (starting the count from the left column in a 1 based count, not zero based count) that the VBA code should insert the NEW Name that was found in the XML file (according to the corresponding OLD Name). It has to be written under the column “Result Column”.
6. The following two entries require an explanation. It turned out that sometimes the ESB system implemented a service interface to the legacy system by using the NEW parameter Names. Hence, in order to document every OLD name and its corresponding NEW name as depicted in the table at image 1 above, the system analysts needed a way to reverse the search in the XML file, i.e. find the OLD names by searching for the NEW names element and then going to the next element (next sibling instead of previous sibling) to the OLD name element and extracting its text. So, if an “OLD to NEW” search had to be used, the user must enter an UPPER CASE “Y” letter under the “OLD to NEW” column and an UPPER CASE “N” letter under the “NEW to OLD” column, and vice versa. This is a perfect example why planning a GENERIC solution in the form of a “Settings” sheet can be adapted to tackle new and unexpected requirements. A new feature can be easily added and a new entry of setting can be created in the “Settings” sheet.
7. Finally, the user must get a list of the parameter names from the ESB service interface, and copy them into the “ESB” sheet, and click the “Search & Extract” button on the right side of the entries in the “Settings” sheet. Now, the magic happens and the names found in the XML file are copied to the “Result Column” in the “ESB” sheet.
Let’s dive in to the VBA code and see how the automation magic happens.
All the VBA code was written in the “ESB” Sheet1’s code space as you see in image 3 below.
Image 3 – automating search in XML file VBA project structure
First, all of the settings from the “Settings” sheet must be collected into properly named variables as follows:
Set esb_sht = Worksheets(“ESB”)
Dim st_shtAs Worksheet
Set st_sht = Worksheets(“Settings”)
Dim rowidxAs Integer
Dim colidxAs Integer
Dim iAs Integer
Dim counter As Integer
Dim Range AsExcel.Range
Dim file_pathAs String
””OLD names settings variables
Dim old_column_letterAs String
Dim old_row_startAs String
Dim old_row_endAs String
””NEW names settings variable
Dim new_column_indexAs Integer
””For early binding, you need to set a reference by opening the
””Tools menu->References and then click the check box of“Microsot XML, v6.0”
Dim oXMLDomAs New MSXML2.DOMDocument60
””take the file path from the “Settings” sheet
file_path = st_sht.Cells(2, 1)
””’take the OLD names settings from the “Settings” sheet
old_column_letter = st_sht.Cells(2, 2)
old_row_start = st_sht.Cells(2, 3)
old_row_end = st_sht.Cells(2, 4)
””take the NEW names setting from the “Settings” sheet
new_column_index = CInt(st_sht.Cells(2, 5))
””take the settings of “old to new“and the “new to old”
Dim old_to_newAs String
Dim new_to_oldAs String
old_to_new = st_sht.Cells(2, 6)
new_to_old = st_sht.Cells(2, 7)
””convert the old row start to make sure it is an integer
rowidx = CInt(old_row_start)
Note that in order to manipulate XML an early binding method is used above. By setting a reference to the Microsot XML, v6.0 library in the references dialog box you see in image 4 below, VBA’s IDE is able to show all the IntelliSense of the related XML types, variables, constants, etc…
Click here to read more about the differences between early binding and late binding in VBA.
So, as a result from the early binding, when you write the code “Dim oXMLDomAs New” VBA’s editor IntelliSense displays a popup with “MSXML2”, then after you select it and write a dot “.”, a popup with “DOMDocument60” is displayed for you to select it.
Image 4 – The references dialog box (shown after clicking: Tools menu->References)
After getting all of the settings it is time to load the XML file and set the selection language to XPATH as follows:
””Load XML into the XML document varaible
””Change the xml selection language to XPath
oXMLDom.setProperty “SelectionLanguage”, “XPath”
And two XML variables must be defined as follows:
””this variable will hold the node that will be found after the XPATH serach
Dim singleNodeAs MSXML2.IXMLDOMNode
””In case more than one node will be found after the XPATH search, this list will hold the nodes
Dim xmlNodeListAs MSXML2.IXMLDOMNodeList
As a side note, in reference to the xmlNodeList declared above, I have found out that the XML could contain more than one NEW parameter name for ONE corresponding OLD parameter name. I think it is good that you’ll also see how to handle a list of XML nodes. As you’ll see, I output the names from the xmlNodeList as comma separated string for example: newName1,newName2,newName3,newName4, etc…
Now, in order to iterate over the list of OLD names that is found in the “ESB” sheet, a dynamic range is built from the old_column_letter, old_row_start and the old_row_end values that where collected from the settings sheet. The result is a column range in the “ESB” sheet that looks like, for example, “A1:A300”. This is a perfect example why you should consider a “Settings” sheet for your VBA automations in Excel. The user is free to choose what section of the list (or the whole list), should be processed. This generic approach is always much better than hard coded values. The code that does this is as follows:
Dim rngAs Range
Set rng = esb_sht.Range(old_column_letter & old_row_start & “:” & old_column_letter & old_row_end)
After all of the above, comes a for each loop that iterates over each cell in the “ESB” sheet column of the OLD names.
For Each old_field In rng.Cells
In this loop all of the magic happens. First some checks should be made to make sure that if the cell is empty there will be no further processing of anything, so the following code makes sure that the processing jumps to a “continue” label at the end of the loop.
If old_field.text = “” Or IsEmpty(old_field.text) Then GoTo continue
Now the OLD parameter name is taken from the cell’s text into the old_text variable, then a few “text clean” operations are made because I have found out that the text in the column of the OLD names at the “ESB” sheet was usually copies, by the system analysts, with a few “unwanted” characters that must be removed otherwise the XPATH query you’ll see vary soon, will not find any XML node with such text. One “unwanted” character that had to be removed is the Chr(160) which is an invisible unicode character that can’t be seen at all, so I had to check to see what it is by using the following web site:
(of course you can choose another solution for seeing invisible characters, as you see fit).
The following lines of code replace the “unwanted” characters with an empty string i.e. remove them.
old_text = RTrim(LTrim(Replace(old_text, Chr(160), “”)))
old_text = Replace(old_text, “_”, “”)
Now, depending on the desired “direction” of the search in the XML, either “old_to_new“ search or “new_to_old” search, the search in the XML is constructed. In case of the “old_to_new“ search the “old_to_new“ variable will contain “Y” and the “new_to_old” variable will contain “N” and vice versa.
If old_to_new = “Y” Then
new_text = oXMLDom.SelectSingleNode(“/dictionary/dictionaryData/oldName” & _
“[translate(text(),” & strTranslate & “)='” & UCase(old_text) & “‘]”).PreviousSibling.text
Set xmlNodeList = oXMLDom.SelectNodes(“/dictionary/dictionaryData/oldName” & _
“[translate(text(),” & strTranslate & “)='” & UCase(old_text) & “‘]”)
These two XPATH queries are made because you can’t know if the OLD name will have only one (the “SelectSingleNode” function) corresponding NEW name or MORE than one (the “SelectNodes” function) New name, i.e. a list of nodes will be returned. The XPATH expression is built according to the structure of the XML described above i.e. “/dictionary/dictionaryData/oldName” and the part inside the square brackets is for selecting the node with the text that equals the OLD name text: “[text()=” & old_text & “]”.
The caveat here is that you must make the XPATH search, CASE insensitive, because by default it is case sensitive. The only way to make a case insensitive search with the Microsot XML, v6.0 library is to use the XPATH “translate” function that is explained in the following URL:
This function is being used inside the square brackets as follows:
“[translate(text(),” & trTranslate & “)='” & UCase(old_text) & “‘]”
The first argument is the text of the XML node that the XPATH engine examines when it is iterating over the XML node’s tree i.e. “text()”.
The second and third arguments are initialized before the for each loop, in the “strTranslate” variable as follows:
strTranslate = “‘abcdefghijklmnopqrstuvwxyz’,’ABCDEFGHIJKLMNOPQRSTUVWXYZ'”
Second Argument = ‘abcdefghijklmnopqrstuvwxyz’
Third argument = ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ
The Third argument’s purpose, is to instruct HOW to replace the text that is found in the First argument according to the Second argument. So, any occurrence of ‘abcdefghijklmnopqrstuvwxyz’ in the text of the element (text()) , will be replaced with ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’ (an uppercase letter).
The aim here is to transform the text of the XML node into UPPER CASE, this way the comparison will find the required node with the OLD name, because the searched text is upper cased i.e. UCase(old_text).
So, since each XML node’s text contains alphabetic character ‘abcdefghijklmnopqrstuvwxyz’ , it will be translated into an upper case character according to the third argument ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’ and then compared to UCase(old_text). This is how you circumvent the case problem by always comparing text with the same case.
If the OLD text was found in some XML element it is stored inside the new_text variable after moving to the PreviousSibling i.e. one element back to get to the newName element. Recall that the structure of the XML is as follows:
OR vice versa in the case of the “NEW to OLD“ serach, i.e. moving one element forward, nextElement, to get to the oldName element.
if MORE than one XML element contains the same searched text than these XML elements will be stored in the “xmlNodeList” variable. Then, if the “xmlNodeList” length is bigger than one, the text will be extracted from each node and a comma separated string will be created inside the “new_text” variable as follows:
If xmlNodeList.Length> 1 Then
new_text = “”
For Each Node In xmlNodeList
””OLD to english
If old_to_new = “Y” Then
new_text = new_text & Node.PreviousSibling.text & “,”
””NEWlish to heb
If new_to_old = “Y” Then
new_text = new_text & Node.NextSibling.text & “,”
”removing the last comma at the end
new_text = Left(new_text, Len(new_text) – 1)
Set xmlNodeList = Nothing
Finally the text that was found, either in the form of one name or a comma separated names is outputted to the proper cell in the “ESB” sheet as follows:
The rowidx is a variable that is incremented with every iteration of the for each loop, starting from the first row that the user wrote in the “Settings” sheet.
In case no text is found a jump to the “continue” label prevent the text from being outputted to the “ESB” sheet. The whole last part of the loop is as follows:
”if there was no translation to continue to the next row
If new_text = “” Or IsEmpty(new_text) Then
esb_sht.Cells(rowidx, new_column_index).Value = new_text
new_text = “”
rowidx = rowidx + 1
pctdone = pb_idx / lastrow
.LabelCaption.Caption = “Processing Row ” &pb_idx& ” of ” &lastrow
.LabelProgress.Width = pctdone * (.FrameProgress.Width)
pb_idx = pb_idx + 1
If pb_idx = lastrow Then
Set oXMLDom = Nothing
You can see a sample result after executing the macro, this is how the “ESB” sheet looks like with the “NEW Name”
values in column C, that were found in the XML for the corresponding “OLD Name” values.
Image 5 – Sample result in the “ESB” sheet after the execution of the macro
The XML for the result shown here is as follows: