Using a python program to rename all XML files within a linux directory

https://stackoverflow.com/questions/23203990

07-07-2023
|

Question

Currently, my code uses the name of an XML file as a parameter in order to take that file, parse some of its content and use it to rename said file, what I mean to do is actually run my program once and that program will search for every XML file (even if its inside a zip) inside the directory and rename it using the same parameters which is what I am having problems with.

#encoding:utf-8
import os, re
from sys import argv

script, nombre_de_archivo = argv

regexFecha = r'\d{4}-\d{2}-\d{2}'
regexLocalidad = r'localidad=\"[\w\s.,-_]*\"'
regexNombre = r'nombre=\"[\w\s.,-_]*\"'
regexTotal = r'total=\"\d+.?\d+\"'

fechas = []; localidades = []; nombres = []; totales = []

archivo = open(nombre_de_archivo)

for linea in archivo.readlines():
    fechas.append(re.findall(regexFecha, linea))
    localidades.append(re.findall(regexLocalidad, linea))
    nombres.append(re.findall(regexNombre, linea))
    totales.append(re.findall(regexTotal, linea))

fecha = str(fechas[1][0])
localidad = str(localidades[1][0]).strip('localidad=\"')
nombre = str(nombres[1][0]).strip('nombre=\"')
total = str(totales[1][0]).strip('total=\"')

nombre_nuevo_archivo = fecha+"_"+localidad+"_"+nombre+"_"+total+".xml"

os.rename(nombre_de_archivo, nombre_nuevo_archivo)

EDIT: an example of this would be.

directory contains only 3 files as well as the program.

silly.xml amusing.zip feisty.txt

So, you run the program and it ignores feisty as it is a .txt file and it reads silly.xml, ti then parses "fechas, localidad, nombre, total" concatenate or append or whatever and use that as the new file for silly.xml, then the program checks if zip has an xml file, if it does then it does the same thing.

so in the end we would have

20141211_sonora_walmart_2033.xml 20141008_sonora_starbucks_102.xml feisty txt amusing.zip

Solution

Your question is not clear, and the code you posted is too broad.

I can't debug regular expressions with the power of my sight, but there's a number of things you can do to simplify the code. Simple code means less errors, and an easier time debugging.

To locate your target files, use glob.glob:

files = glob.glob('dir/*.xml')

To parse them, ditch regular expressions and use the ElementTree API.

import xml.etree.ElementTree as ET
tree = ET.parse('target.xml')
root = tree.getroot()

There's also modules to navigate XML files with CSS notation and XPATH. Extracting fields form the filename using regex is okay, but check out named groups.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow