Split string with regex not working

Question 1

Sorry, my first answer was wrong:) Try not adding ?=, only put it in parentheses like this:

allparts2 =re.compile(r'(\w{3}\s\d{2},\s\d{4}\s\d{1,2}\:\d{2}\:\d{2}\s[AM|PM].)').split(alltext)

Then try it without compile...

allparts2 = re.split('(\w{3}\s\d{2},\s\d{4}\s\d{1,2}\:\d{2}\:\d{2}\s[AM|PM].)', alltext)

When using:

#!/usr/local/bin/python2.7
import re

alltext = "Aug 07, 2014 01:01:01 PM some text Aug 07, 2014 02:02:02 PM another text Aug 07, 2014 03:03:03 AM " 

allparts2 = re.split('(?=\w{3}\s\d{2},\s\d{4}\s\d{1,2}\:\d{2}\:\d{2}\s[AM|PM].)', alltext)
print(allparts2)

Result was:

Executing the program....
$python2.7 main.py
['Aug 07, 2014 01:01:01 PM some text Aug 07, 2014 02:02:02 PM another text Aug 07, 2014 03:03:03 AM ']

When using:

#!/usr/local/bin/python2.7
import re

alltext = "Aug 07, 2014 01:01:01 PM some text Aug 07, 2014 02:02:02 PM another text Aug 07, 2014 03:03:03 AM "


allparts2 = re.split('(?:\w{3}\s\d{2},\s\d{4}\s\d{1,2}\:\d{2}\:\d{2}\s[AM|PM].)', alltext)

print(allparts2)

Result was:

Executing the program....
$python2.7 main.py
['', ' some text ', ' another text ', ' ']

When using:

#!/usr/local/bin/python2.7
import re

alltext = "Aug 07, 2014 01:01:01 PM some text Aug 07, 2014 02:02:02 PM another text Aug 07, 2014 03:03:03 AM "


allparts2 = re.split('(\w{3}\s\d{2},\s\d{4}\s\d{1,2}\:\d{2}\:\d{2}\s[AM|PM].)', alltext)

print(allparts2)

Result was:

Executing the program....
$python2.7 main.py
['', 'Aug 07, 2014 01:01:01 PM', ' some text ', 'Aug 07, 2014 02:02:02 PM', ' another text ', 'Aug 07, 2014 03:03:03 AM', ' ']

Just to compare different forms.

Question 2

Although I am unfamiliar with the Python flavour, Pythex gives me the following, I assume correct, results :

See the result

Even if these are not, there are several things in your regex which are unnecessary and/or incorrect by my knowledge.

A comma does not need to be escaped
A conditional is not done by [ condo | cond2] , but rather by parentheses (cond1|cond2)
The \s you have is optional as regex catches a white space, which is correct if you want to catch e.g. a space character, a tab character, a carriage return character, ..

Lastly, the item you are adding ?= is a lookahead, ?: makes it match, but does not make it part of your capture group.

Try this regex : (?:\w{3} \d{2}, \d{4}, [\d:]+ (?:AM|PM))

Question 3

It seems that python's re.split() doesn't split on zero-length matches.

However, the manual says

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

...

If there are capturing groups in the separator and it matches at the start of the string, the result will start with an empty string.

So you can use :

allparts2 = re.compile(r'(\w{3}\s\d{2}\,\s\d{4}\s\d{1,2}\:\d{2}\:\d{2}\s(?:AM|PM))')

Where the matching expression is surrounded by a capturing group (also notice the un-capturing group at the end). The result is :

['', 'Aug 07, 2014 01:01:01 PM', ' some text ', 'Aug 07, 2014 02:02:02 PM', ' another text ', 'Aug 07, 2014 03:03:03 AM', ' ']

You can then create your files by grouping allparts[1], allparts[2] and so on (2n+1, 2n+2).