如何用等長星號替換引號中的文本？-有解無憂

如何在SAS中用相同長度的星號替換引號中的文本？我的意思是，轉換：

"12345"
"hi42"
'with "double" quotes'
there are 'other words' not in quotes

到：

*******
******
**********************
there are ************* not in quotes

第 1、2、3、4 行分別有 7、6、22、13 個星號。是的，報價本身也包括在內。

我試過這樣的程式：

pat=prxparse('/[''"].*?["'']/');
do until(pos=0);
  call prxsubstr(pat,text,pos,len);
  if pos then substr(text,pos,len)=repeat('*',len-1);
end;

有用。
我的問題是：有沒有更有效的方法來做到這一點？

uj5u.com熱心網友回復：

首先，您的示例在第三個運算式上失敗，因為它不記得開頭的引號是什么 - 所以它使“double”不匹配。

您可以使用 SAS 支持的反向參考來解決這個問題：

data have;
length text $1024;
infile datalines pad;
input @1 text $80.;
datalines;
"12345"
"hi42"
'with "double" quotes'
there are 'other words' not in quotes
;;;;
run;
data want;
    set have;
    pat=prxparse('/([''"]).*?\1/');
    do until(pos=0);
      call prxsubstr(pat,text,pos,len);
      if pos then substr(text,pos,len)=repeat('*',len-1);
    end;
run;

效率方面，在我的（相當快但并非例外）SAS 服務器上處理 400k 記錄（這些 4 x 100,000）大約需要 1.5 秒。這似乎是合理的，除非您的文本更大或行數更大。另外，請注意，如果允許的話，這將在高度復雜的嵌套中失敗（單雙單等，或單內雙單將無法識別，盡管它可能仍然適合您的意圖）。

但是，如果您想要最高效，正則運算式不是答案 - 使用基本文本函式更有效。但是，要完全正確更難，并且需要更多代碼，所以如果正則運算式的性能可以接受，我不建議這樣做。但這里有一個例子——你可能需要對其進行一些調整，并且你需要回圈它以重復直到它找不到任何要替換的東西，如果根本沒有引號就不要執行它。這只是給出了如何使用文本函式的基本概念。

data want;
  set have;
  length text_sub $80;
  _start = findc(text,'"''');
  _qchar = char(text,_start);  *Save aside which char we matched on;
  _end   = findc(text,_qchar,_start 1);  *now look for that one again anywhere after the first match; 
  to_convert = substr(text,_start,_end-_start 1);
  if _start eq 1 and _end eq length(text) then text_sub = repeat('*',_end-1);
  else if _start eq 1 then text_sub = substr(text,_end 1);
  else if _end eq length(text) then text_sub = substr(text,1,_start-1)||repeat('*',_end-_start);
  else text_sub = cat(substr(text,1,_start-1),repeat('*',_end-_start),substr(text,_end 1));
run;

uj5u.com熱心網友回復：

我會跳過正則運算式，而只使用 CALL SCAN() 代替。

因此，回圈查找下一個“單詞”的位置。如果單詞以引號開頭和結尾，則將單詞替換為 *'s。

data have;
  input string $char80. ;
cards;
"12345"
"hi42"
'with "double" quotes'
there are 'other words' not in quotes

What's going on?
;

data want;
  set have;
  position=1;
  do count=1 by 1 while(position>0);
    call scan(string,count,position,length,' ','q');
    if char(string,position) in ('"',"'")
      and char(string,position)=char(string,position length-1)
      then substr(string,position,length) = repeat('*',length-1)
    ;
  end;
  drop position count length;
run;

結果

Obs    string

 1     *******
 2     ******
 3     **********************
 4     there are ************* not in quotes
 5
 6     What's going on?

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/437260.html

標籤：正则表达式 sas

上一篇：Javascript正則運算式匹配冒號和以點結尾的句子之間的文本

下一篇：在資料作業室中提取匹配URL