PHP what is the best approach to split these values up?
-
20-09-2019 - |
Question
Having a hard time with this one as I don't think I know all of my options.
I have to parse a free form text field that I need to map the values to a database.
Here is some example text, NOTE: not all fields have to be there, not all delimiters are the same and not all descriptors are available. I do need to check if the value is numeric only or is it alpha numeric.
Example 1
field1: 999-999234-24-2
field2 Description: a short description
field3: 3.222.1
asdfg
field number four: NO
field5:
Example 2
field1: 999-999234-24-2/field2 Description: a short description/field3: 3.222.1 asdfg/field number four: NO/field5:
Example 3
999-999234-24-2
Example 4
field1: 999-999234-24-2 field2 Description: a short description field3: 3.222.1 asdfg field number four: NO field5:
Example 5
field1: 999-999234-24-2 - field2 Description: a short description - field3: 3.222.1 asdfg - field number four: NO - field5:
What I would like is all fields X to be in there own column. NOTE the example data is all in the same order but live data is not.
Now I don't mind doing this in steps if I need to but having a hard time just parsing the values up into columns. any suggestions?
I was thinking some sort of case function with a RegEx but not luck so far.
Solution 4
after much though/trial and error I'm going to read them into an array and parse out each line of text. It's long and going to be a mess but should get the job done.
OTHER TIPS
Maybe you should standardize on the java .properties format then you can use this PHP example to parse it:
Since it's still stuck in my head ... the way I'd go about it is start handling each of these cases and see if there is any remaining tweaks/fallout. What appears to make this tricky is the only reliable deliminator is 'field', and if anyone uses that in a description it'll break. I'd just have to take the file and start iterating.
Splitting it with this regex would at least be a good start point for dividing the headers and the data. Basically, field plus additional optional text that covers the possibility of 'Description' and 'number four' added before the closing :
field[^:]{0,12}:
After that, you'd at least have to strip trailing / for case #2, the ' - ' for case #5, the extra linebreaks if you don't want them in the data for case #1.
RegEXP would be hard to maintain in some edge-cases. Try writing a simple finite state machine