Find and replace next and next and not find the first and last

https://stackoverflow.com/questions/20565771

01-09-2022
|

Question

Really elementary question but I can't get this to work. My sample text is provided in the bottom of the page.

The only row I want left is the ones looking like this: "178-207 30 WVRTRWALLLLFWLGWLGMLAGAVVIIVRA -3,95". I currently use TextWrangler on OSX (terminal and me are not friends) which provide regex replacements. I am trying to do this in steps, and my first step is trying to get rid of all the protein sequences.

In TextWrangler, I search for this:

Working sequence([^;]*)------------------------------------------------------------

and replace with nothing. However, what I end up with is almost an empty document, as TextWrangler seems to find the first instance of "Working sequence", but the LAST instance of "------------------------------------------------------------". How do I change so this is a step-wise process, finding the first instances of both and replacing with nothing, then the second instance etc?

Thanks and greetings from Sweden

Results summary for protein: sp|P08195|4F2_HUMAN 4F2 GN=SLC3A2 PE=1 SV=3 Translocon TM Analysis Results Partitioning: water to bilayer Window range: 19-30

Number of translocon TM predicted segments: 2

178-207 30 WVRTRWALLLLFWLGWLGMLAGAVVIIVRA -3,95

438-460 23 ARLLTSFLPAQLLRLYQLMLFTL 1,63

Working sequence length = 630):

MELQPPEASIAVVSIPRQLPGShSEAGVQGLSAGDDSELGShCVAQTGLELLASGDPLPS ASQNAEMIETGSDCVTQAGLQLLASSDPPALASKNAEVTGTMSQDTEVDMKEVELNELEP EKQPMNAASGAAMSLAGAEKNGLVKIKVAEDEAEAAAAAKFTGLSKEELLKVAGSPGWVR TRWALLLLFWLGWLGMLAGAVVIIVRAPRCRELPAQKWWhTGALYRIGDLQAFQGhGAGN LAGLKGRLDYLSSLKVKGLVLGPIhKNQKDDVAQTDLLQIDPNFGSKEDFDSLLQSAKKK SIRVILDLTPNYRGENSWFSTQVDTVATKVKDALEFWLQAGVDGFQVRDIENLKDASSFL AEWQNITKGFSEDRLLIAGTNSSDLQQILSLLESNKDLLLTSSYLSDSGSTGEhTKSLVT QYLNATGNRWCSWSLSQARLLTSFLPAQLLRLYQLMLFTLPGTPVFSYGDEIGLDAAALP GQPMEAPVMLWDESSFPDIPGAVSANMTVKGQSEDPGSLLSLFRRLSDQRSKERSLLhGD FhAFSAGPGLFSYIRhWDQNERFLVVLNFGDVGLSAGLQASDLPASASLPAKADLLLSTQ PGREEGSPLELERLKLEPhEGLLLRFPYAA

Results summary for protein: sp|Q9NPC4|A4GAT_HUMAN OS=Homo sapiens GN=A4GALT PE=2 SV=1 Translocon TM Analysis Results Partitioning: water to bilayer Window range: 19-30

Number of translocon TM predicted segments: 1

19-43 25 RVCTLFIIGFKFTFFVSIMIYWhVV -1,04

Working sequence length = 353):

MSKPPDLLLRLLRGAPRQRVCTLFIIGFKFTFFVSIMIYWhVVGEPKEKGQLYNLPAEIP CPTLTPPTPPShGPTPGNIFFLETSDRTNPNFLFMCSVESAARThPEShVLVLMKGLPGG NASLPRhLGISLLSCFPNVQMLPLDLRELFRDTPLADWYAAVQGRWEPYLLPVLSDASRI ALMWKFGGIYLDTDFIVLKNLRNLTNVLGTQSRYVLNGAFLAFERRhEFMALCMRDFVDh YNGWIWGhQGPQLLTRVFKKWCSIRSLAESRACRGVTTLPPEAFYPIPWQDWKKYFEDIN PEELPRLLSATYAVhVWNKKSQGTRFEATSRALLAQLhARYCPTThEAMKMYL

Solution

You told it to look for "Working sequence" and than anything that's not ';' the first (and next and next...) line of '-' characters aren't -. That's why it's matching everything. It does match the final line of '-' characters because you told it there should be one at the end. I think this will work for you

Working sequence([^-]*)------------------------------------------------------------

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow