Replace string in XML file with Python

By , last updated November 17, 2016

In this example we will show how to read XML and CSV files with Python, and perform search and replace operations. The result will be written into a new file with the same filename, but with “_new” appended to the filename.

First we start a timer as we would like to know how much time it will take to parse the whole file. The original file in our case was huge so it took some time to parse it.

start_time = time.time()

Next step is to open the original file with “read” permissions. We don’t want it to be modified by accident later, so we don’t permit that it can be changed.

open(xml_filename_original, "r")

The new file will get the “write” flag, and this is the file that will contain the result of string replacements.

open(xml_filename_new, "w")

CSV file will contain strings to find and replace, delimited by a comma and Enter for each new line:

Find,Replace
ProductA,ProductB
ProductC,ProductD
...

There cannot be any spaces or other delimiters. Python is very sensitive and treats everything as is, if you place a space after a comma in CSV file, it will search for a name with a space in the beginning.

You may try creating the CSV file in Excel, but when we did it we got the following error when running the script:

line_new = line.replace(row[0], row[1])
IndexError: list index out of range

Next is to make a simple for loop, go through the original file line after line, and replace possible occurrences.
In the end, remember to close all files and print the result.

Read also  C2664: void doProcess(Data_ptr &&) : cannot convert argument 1 from Data_ptr to Data_ptr &&

Here is the whole script code:

import csv, time

start_time = time.time()

xml_filename_original = "original_file.xml"
xml_filename_new = xml_filename_original.replace(".xml", "_new.xml")
find_and_replace_filename = "find_replace.csv"
xml_file_original = open(xml_filename_original, "r")
xml_file_new = open(xml_filename_new, "w")

counter = 0

with open(find_and_replace_filename, "r") as f:
    for line in xml_file_original:
        f.seek(0)
        reader = csv.reader(f)
        next(reader, None)
        for row in reader:
            line_old = line
            line_new = line.replace(row[0], row[1])
            if line_new != line_old:
                line = line_new
                counter += 1
                print("Counter: " + str(counter))
        xml_file_new.write(line)

xml_file_original.close()
xml_file_new.close()

elapsed_time = time.time() - start_time

print("Script ended, replaced " + str(counter) + " strings")
print("Elapsed time: " + str(elapsed_time) + " seconds")

Save it with .py extension and run with python.

python find_replace_script.py

Comments

Be the first to comment.

Leave a Reply


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*