Question

I have a document which looks something like:

sort=SIZE:NumberDecreasing
FieldText=(((EQUAL{226742}:LocationId)) AND ()) 

FieldText=(((EQUAL{226742}:LocationId)) AND ((EQUAL{1}:LOD AND NOTEQUAL{1}:SCR AND EMPTY{}:RPDCITYID AND NOTEQUAL{1}:Industrial))) 

FieldText=( NOT EQUAL{1}:ISSCHEME AND EQUAL{215629}:LocationId) 

sort=DEALDATE:decreasing

From this I would like the word before a colon, and if there are {} brackets, before those too, a colon, and then the word after the colon. These should ideally be the only things left in the file, each on their own new line.

Output would then look like:

SIZE:NumberDecreasing
EQUAL:LocationId 
EQUAL:LocationId
EQUAL:LOD
NOTEQUAL:SCR
EMPTY:RPDCITYID
NOTEQUAL:Industrial
EQUAL:ISSCHEME
EQUAL:LocationId    
DEALDATE:decreasing

The closest I have come so far is: Find: ^.?+ {[0-9]}:([a-zA-Z]+) Replace with: ...\1:\2...

with the intent to run it several times, and later replace ... with \n I can then remove multiple newlines.

Context: this is for a log analysis I am performing, I have already removed datestamps, and reduced elements of the query down to the sort and FieldText parameters

I do not have regular UNIX tools - I am working in a windows environment

The original log looks like:

03/11/2011 16:25:44 [9] ACTION=Query&summary=Context&print=none&printFields=DISPLAYNAME%2CRECORDTYPE%2CSTREET%2CTOWN%2CCOUNTY%2CPOSTCODE%2CLATITUDE%2CLONGITUDE&DatabaseMatch=Autocomplete&sort=RECORDTYPE%3Areversealphabetical%2BDRETITLE%3Aincreasing&maxresults=200&FieldText=%28WILD%7Bbournemou%2A%7D%3ADisplayName%20NOT%20MATCH%7BScheme%7D%3ARecordType%29 (10.55.81.151)
03/11/2011 16:25:45 [9] Returning 23 matches
03/11/2011 16:25:45 [9] Query complete
03/11/2011 16:25:46 [8] ACTION=GetQueryTagValues&documentCount=True&databaseMatch=Deal&minScore=70&weighfieldtext=false&FieldName=TotalSizeSizeInSquareMetres%2CAnnualRental%2CDealType%2CYield&start=1&FieldText=%28MATCH%7BBournemouth%7D%3ATown%29 (10.55.81.151)
03/11/2011 16:25:46 [12] ACTION=Query&databaseMatch=Deal&maxResults=50&minScore=70&sort=DEALDATE%3Adecreasing&weighfieldtext=false&totalResults=true&PrintFields=LocationId%2CLatitude%2CLongitude%2CDealId%2CFloorOrUnitNumber%2CAddressAlias%2A%2CEGAddressAliasID%2COriginalBuildingName%2CSubBuilding%2CBuildingName%2CBuildingNumber%2CDependentStreet%2CStreet%2CDependentLocality%2CLocality%2CTown%2CCounty%2CPostcode%2CSchemeName%2CBuildingId%2CFullAddress%2CDealType%2CDealDate%2CSalesPrice%2CYield%2CRent%2CTotalSizeSizeInSquareMetres%2CMappingPropertyUsetype&start=1&FieldText=%28MATCH%7BBournemouth%7D%3ATown%29 (10.55.81.151)
03/11/2011 16:25:46 [8] GetQueryTagValues complete
03/11/2011 16:25:47 [12] Returning 50 matches
03/11/2011 16:25:47 [12] Query complete
03/11/2011 16:25:51 [13] ACTION=Query&print=all&databaseMatch=locationidsearch&sort=RELEVANCE%2BPOSTCODE%3Aincreasing&maxResults=10&start=1&totalResults=true&minscore=70&weighfieldtext=false&FieldText=%28%20NOT%20LESS%7B50%7D%3AOFFICE%5FPERCENT%20AND%20EXISTS%7B%7D%3AOFFICE%5FPERCENT%20NOT%20EQUAL%7B1%7D%3AISSCHEME%29&Text=%28Brazennose%3AFullAddress%2BAND%2BHouse%3AFullAddress%29&synonym=True (10.55.81.151)
03/11/2011 16:25:51 [13] Returning 3 matches
03/11/2011 16:25:51 [13] Query complete

The purpose of the whole exercise is to find out which fields are being queried and sorted upon (and how we are querying/sorting upon them) - to this end, the output could also usefully be distinct - although that is not essential.

Was it helpful?

Solution

The Perl program below is complete, and includes your sample data in the source. It produces exactly the output you describe, including reporting NOT EQUAL{1}:ISSCHEME as EQUAL:ISSCHEME because of the intermediate space.

use strict;
use warnings;

while (<DATA>) {
  print "$1:$2\n" while /(\w+)  (?: \{\d*\} )? : (\w+)/xg;
}

__DATA__
sort=SIZE:NumberDecreasing
FieldText=(((EQUAL{226742}:LocationId)) AND ()) 

FieldText=(((EQUAL{226742}:LocationId)) AND ((EQUAL{1}:LOD AND NOTEQUAL{1}:SCR AND EMPTY{}:RPDCITYID AND NOTEQUAL{1}:Industrial))) 

FieldText=( NOT EQUAL{1}:ISSCHEME AND EQUAL{215629}:LocationId) 

sort=DEALDATE:decreasing

OUTPUT

  SIZE:NumberDecreasing
  EQUAL:LocationId
  EQUAL:LocationId
  EQUAL:LOD
  NOTEQUAL:SCR
  EMPTY:RPDCITYID
  NOTEQUAL:Industrial
  EQUAL:ISSCHEME
  EQUAL:LocationId
  DEALDATE:decreasing
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top