In this example we will show how to read XML and CSV files with Python, and perform search and replace operations. The result will be written into a new file with the same filename, but with “_new” appended to the filename.
First we start a timer as we would like to know how much time it will take to parse the whole file. The original file in our case was huge so it took some time to parse it.
start_time = time.time()
Next step is to open the original file with “read” permissions. We don’t want it to be modified by accident later, so we don’t permit that it can be changed.
The new file will get the “write” flag, and this is the file that will contain the result of string replacements.
CSV file will contain strings to find and replace, delimited by a comma and Enter for each new line:
Find,Replace ProductA,ProductB ProductC,ProductD ...
There cannot be any spaces or other delimiters. Python is very sensitive and treats everything as is, if you place a space after a comma in CSV file, it will search for a name with a space in the beginning.
You may try creating the CSV file in Excel, but when we did it we got the following error when running the script:
line_new = line.replace(row, row)
IndexError: list index out of range
Next is to make a simple for loop, go through the original file line after line, and replace possible occurrences.
In the end, remember to close all files and print the result.
Here is the whole script code:
import csv, time start_time = time.time() xml_filename_original = "original_file.xml" xml_filename_new = xml_filename_original.replace(".xml", "_new.xml") find_and_replace_filename = "find_replace.csv" xml_file_original = open(xml_filename_original, "r") xml_file_new = open(xml_filename_new, "w") counter = 0 with open(find_and_replace_filename, "r") as f: for line in xml_file_original: f.seek(0) reader = csv.reader(f) next(reader, None) for row in reader: line_old = line line_new = line.replace(row, row) if line_new != line_old: line = line_new counter += 1 print("Counter: " + str(counter)) xml_file_new.write(line) xml_file_original.close() xml_file_new.close() elapsed_time = time.time() - start_time print("Script ended, replaced " + str(counter) + " strings") print("Elapsed time: " + str(elapsed_time) + " seconds")
Save it with .py extension and run with python.