僅保留href和src的子目錄（ROOThtml鏈接）-有解無憂

你好，我有我的代碼，可以從外部 url 復制 html 并將其回顯到我的頁面上。一些 HTML 包含鏈接和/或圖片 SRC。我需要一些幫助來截斷它們（從絕對 url 到 $data 內的相對 url）

例如：在 html 里面有 href

<a href="https://www.trade-ideas.com/products/score-vs-ibd/" >

or SRC

<img src="http://static.trade-ideas.com/Filters/MinDUp1.gif">

我只想保留子目錄。

/products/score-vs-ibd/z

/過濾器/MinDUp1.gif

也許用 preg_replace ，但我不熟悉正則運算式。

這是我的原始代碼，效果很好，但現在我無法截斷鏈接。

<?php
$post_tags = get_the_tags();
if ( $post_tags ) {
$tag = $post_tags[0]->name; 
}   
$html= file_get_contents('https://www.trade-ideas.com/ticky/ticky.html?symbol='. "$tag");

$start = strpos($html,'<div ');
$end =  strpos($html,'<!-- /span -->',$start);
$data= substr($html,$start,$end-$start);
echo $data ;
?>

uj5u.com熱心網友回復：

這是代碼：

function getUrlPath($url) {
   $re = '/(?:https?:\/\/)?(?:[^?\/\s] [?\/])(.*)/';
   preg_match($re, $url, $matches);
   return $matches[1];
}

示例：getUrlPaths("http://myassets.com:80/files/images/image.gif")回傳files/images/image.gif

uj5u.com熱心網友回復：

您可以使用正則運算式在 html 字串中找到所有 URL preg_match_all()。
正則運算式：

'/=[\'"](https?:\/\/.*?(\/.*))[\'"]/i'

將為每次出現的="http://domain/path"or ='https://domain/path?query'（http/https、單引號或雙引號、帶/不帶查詢字串）捕獲整個 URL 和路徑/查詢字串。
然后你可以只使用str_replace()更新 html 字串。

<?php
$html = '<a href="https://www.trade-ideas.com/products/score-vs-ibd/" >
<img src="http://static.trade-ideas.com/Filters/MinDUp1.gif">
<img src=\'https://static.trade-ideas.com/Filters/MinDUp1.gif?param=value\'>';

$pattern = '/=[\'"](https?:\/\/.*?(\/.*))[\'"]/i';
$urls = [];
preg_match_all($pattern, $html, $urls);
//var_dump($urls);
foreach($urls[1] as $i => $uri){
    $html = str_replace($uri, $urls[2][$i], $html);
}
echo $html;

在這里現場運行。

請注意，這將更改所有用引號括起來的絕對 URL，緊跟在=.

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/537413.html

標籤：PHPWordPress的网址截短

上一篇：蜘蛛只抓取最后一個url，而不是全部

下一篇：百分比堆疊條形圖：從Excel到R中的ggplot2