notepad++: keep regex (multi occurence per line) and line structure, remove other characters

StackOverflow https://stackoverflow.com/questions/16274246

  •  13-04-2022
  •  | 
  •  

سؤال

I have a 130k line text file with patent information and I just want to keep the dates (regex "[0-9]{4}-[0-9]{2}-[0-9]{2} ") for subsequent work in Excel. For this purpose I need to keep the line structure intact (also blank lines). My main problem is that I can't seem to find a way to identify and keep multiple occurrences of date information in the same line while deleting all other information.

Original file structure:

US20110228428A1 | US |   | 7 | 2010-03-19 | SEAGATE TECHNOLOGY LLC
US20120026629A1 | US |   | 7 | 2010-07-28 | TDK CORP | US20120127612A1 | US |   | EXAMINER | 2010-11-24 |   | US20120147501A1 | US |   | 2 | 2010-12-09 | SAE MAGNETICS HK LTD,HEADWAY TECHNOLOGIES INC

Desired file structure:

2010-03-19 
2010-07-28 2010-11-24 2010-12-09 

Thank you for your help!

هل كانت مفيدة؟

المحلول

Search for

.*?(?:([0-9]{4}-[0-9]{2}-[0-9]{2})|$)

And replace with

" $1"

Don't put the quotes, just to show there is a space before the $1. This will also put a space before the first match in a row.

This regex will match as less as possible .*? before it finds either the Date or the end of the row (the $). If a date is found it is stored in $1 because of the brackets around. So as replacement just put a space to separate the found dates and then the found date from $1.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top