mercredi 1 avril 2015

Regular Expression Split on character not in Qualified String (python)

I could use some assistance with something that is probably easy for those familiar. I'm trying to parse some more/less shop-brewed configuration files into a dictionary/json. I have some python code using string procedures or re.split() which works fine for everything I've tested against; However, I know there are corner cases which could break it and I would like to create generic regular expressions to better handle the logic, and so the same regex is portable to other languages (perl,awk,C,etc) we use at work to help us be consistent.


I am looking to either use re.match() or re.split() in Python.


The patterns I'm looking should do the following:


1) split a str on the first ? if ? not in a substring qualified by single and/or double quotes.



strIn:
'''
foo = 'some',"stuff?",'that "could be?" nested?', ? but still capture this? and "this?"
'''

listOut
['''foo = 'some',"stuff?",'that "could be?" nested?', ''' , ''' but still capture this? and "this?"''']


2) split a str on the first # if # not in a substring qualified by single or double quotes, and # not after the first unqualified ? (as per 1)



strIn:
'''
foo = 'some',"stuff?#, maybe 'nested#' " # #but now this is all a comment to capture ,'that "could be?#" nested#', ? but still capture this?! and "this?! "
'''

listOut:
['''foo = 'some',"stuff?#, maybe 'nested#' " ''', ''' #but now this is all a comment to capture ,'that "could be?#" nested#', ? but still capture this?! and "this?! "'''

Aucun commentaire:

Enregistrer un commentaire