Determing the namespace in a mediawiki page (api quest.)

https://stackoverflow.com/questions/17478630

02-06-2022
|

Question

For starters I must admit my PHP skills are pathetic (been doing this for 3 days)

I have during the last three days heavily modified an extension which parses pages just edited in mediawiki. The original purpose was to parse the page for text matching the names of other pages in the database, and autolink them. But it would only perform this function in the 'Main' namespace (namespace 0). I have modified it to parse for links across namespaces, using a weighted White-List of namespaces. (Weighted as in first come first serve based on the location in the white list. If white list is Development, Rules the Development trumps rules.)

I would like to further modify the extension to prioritize the current page's namespace by pushing it onto the front of the white list array, and to determine the current users group affiliations so as to further modify the priority list.

Determine the users group affiliations
Determine the current page's namespace SOLVED $var=$article->getTitle()->getNamespace();
Push the value to the front of the array SOLVED array_unshift($hastack, $needle);

I would greatly appreciate it if someone could point me to a site explaining at least the first two things. (The mediawiki community has thus far been less than useful)

If you offer up a code example please try to keep it stupidproof, I am that new to this {{shrug}} (Meaning keep the code samples php simple...it's extensible so there is always a better way, but I am not familiar with the extra modules.)

MediaWiki version in question is 1.21 (current stable as of 7-2013)

NOTE: For some reason the code isn't displaying right, but no lines were lost.

The array for the whitelist is defined as: $wgLinkTitlesNamespaceWhitelist = array(5000, 5002, 5004, 5006,0);

Main file LinkTitles5000_body,php

if ( !defined( 'MEDIAWIKI' ) ) { die( 'Not an entry point.' ); } /* function dump($var) { error_log(print_r($var, TRUE) . "\n", 3, 'php://stderr'); }; */ class LinkTitles_5000 { static $safeTitle;



    /// Setup function, hooks the extension's functions to MediaWiki events.
    public static function setup() {
        global $wgLinkTitlesParseOnEdit;
        global $wgLinkTitlesParseOnRender;
        global $wgHooks;
        if ( $wgLinkTitlesParseOnEdit ) {
            $wgHooks['ArticleSave'][] = 'LinkTitles_5000::onArticleSave';
        };
        if ( $wgLinkTitlesParseOnRender ) {
            $wgHooks['ArticleAfterFetchContent'][] = 'LinkTitles_5000::onArticleAfterFetchContent';
        };
        $wgHooks['ParserBeforeTidy'][] = 'LinkTitles_5000::removeMagicWord';
    }

    /// This function is hooked to the ArticleSave event.
    /// It will be called whenever a page is about to be
    /// saved.
    public static function onArticleSave( &$article, &$user, &$text, &$summary,
            $minor, $watchthis, $sectionanchor, &$flags, &$status ) {

        // To prevent time-consuming parsing of the page whenever
        // it is edited and saved, we only parse it if the flag
        // 'minor edits' is not set.
        return $minor or self::parseContent( $article, $text );
    }

    /// Called when an ArticleAfterFetchContent event occurs; this requires the
    /// $wgLinkTitlesParseOnRender option to be set to 'true'
    public static function onArticleAfterFetchContent( &$article, &$content ) {
        // The ArticleAfterFetchContent event is triggered whenever page content
        // is retrieved from the database, i.e. also for editing etc.
        // Therefore we access the global $action variabl to only parse the
        // content when the page is viewed.
        global $action;
        if ( in_array( $action, array('view', 'render', 'purge') ) ) {
            self::parseContent( $article, $content );
        };
        return true;
    }

    /// This function performs the actual parsing of the content.
    static function parseContent( &$article, &$text ) {

        // If the page contains the magic word '__NOAUTOLINKS__', do not parse
        // the content.
        $mw = MagicWord::get('MAG_LINKTITLES_TERMINOLOGY_NOAUTOLINKS');
        if ( $mw -> match( $text ) ) {
            return true;
        }

        // Configuration variables need to be defined here as globals.
        global $wgLinkTitlesPreferShortTitles;
        global $wgLinkTitlesMinimumTitleLength;
        global $wgLinkTitlesParseHeadings;
        global $wgLinkTitlesBlackList;
        global $wgLinkTitlesSkipTemplates;
        global $wgLinkTitlesFirstOnly;
        global $wgLinkTitlesWordStartOnly;
        global $wgLinkTitlesWordEndOnly;
        // global $wgLinkTitlesIgnoreCase;
        global $wgLinkTitlesSmartMode;
        global $wgCapitalLinks;
        global $wgLinkTitlesNamespaceWhitelist;
        global $wgExtraNamespaces;

        ( $wgLinkTitlesWordStartOnly ) ? $wordStartDelim = '\b' : $wordStartDelim = '';
        ( $wgLinkTitlesWordEndOnly ) ? $wordEndDelim = '\b' : $wordEndDelim = '';
        // ( $wgLinkTitlesIgnoreCase ) ? $regexModifier = 'i' : $regexModifier = '';

        // To prevent adding self-references, we now
        // extract the current page's title.
        $myTitle = $article->getTitle()->getText();

        ( $wgLinkTitlesPreferShortTitles ) ? $sort_order = 'ASC' : $sort_order = 'DESC';
        ( $wgLinkTitlesFirstOnly ) ? $limit = 1 : $limit = -1;

        if ( $wgLinkTitlesSkipTemplates )
        {
            $templatesDelimiter = '{{.+}}';
        } else {
            $templatesDelimiter = '{{[^|]+?}}|{{.+\|';
        };

        // Build a regular expression that will capture existing wiki links ("[[...]]"),
        // wiki headings ("= ... =", "== ... ==" etc.),
        // urls ("http://example.com", "[http://example.com]", "[http://example.com Description]",
        // and email addresses ("mail@example.com").
        // Since there is a user option to skip headings, we make this part of the expression
        // optional. Note that in order to use preg_split(), it is important to have only one
        // capturing subpattern (which precludes the use of conditional subpatterns).
        ( $wgLinkTitlesParseHeadings ) ? $delimiter = '' : $delimiter = '=+.+?=+|';
        $urlPattern = '[a-z]+?\:\/\/(?:\S+\.)+\S+(?:\/.*)?';
        $delimiter = '/(' . $delimiter . '\[\[.*?\]\]|' . $templatesDelimiter .
            '|\[' . $urlPattern . '\s.+?\]|'. $urlPattern .
            '(?=\s|$)|(?<=\b)\S+\@(?:\S+\.)+\S+(?=\b))/i';

        $black_list = str_replace( '_', ' ',
            '("' . implode( '", "',$wgLinkTitlesBlackList ) . '")' );

        // Depending on the global setting $wgCapitalLinks, we need
        // different callback functions further down.
        if ( $wgCapitalLinks ) {
            $callBack = "LinkTitles_5000::CallBackCaseInsensitive";
        }   else {
            $callBack = "LinkTitles_5000::CallBackCaseSensitive";
        }


        # Added to suuport $wgLinkTitlesNamespaceWhitelist
        foreach ($wgLinkTitlesNamespaceWhitelist as $LT_namespace){
            # Create the link part reflecting NameSpace:
            # if namespace is main (0) set to empty string
            if ($LT_namespace === 0){
                $LT_namespacePart = "";
            } else {
                $LT_namespacePart = str_replace('_', ' ', $wgExtraNamespaces[(int)$LT_namespace]);
                $LT_namespacePart = $LT_namespacePart . ":";
            }
        # ===

            // Build an SQL query and fetch all page titles ordered
            // by length from shortest to longest.
            // Only titles from 'normal' pages (namespace uid = 0)
            // are returned.
            $dbr = wfGetDB( DB_SLAVE );
            # modified to suuport $wgLinkTitlesNamespaceWhitelist
            #    'page_namespace = 0'  becomes 'page_namespace = ' . $LT_namespace,
            # ===
            $res = $dbr->select(
                $wgDBprefix . 'page',
                'page_title, page_namespace',
                array(
                    'page_namespace = ' . strval($LT_namespace),
                    'CHAR_LENGTH(page_title) >= ' . $wgLinkTitlesMinimumTitleLength,
                    'page_title NOT IN ' . $black_list,
                ),
                __METHOD__,
                array( 'ORDER BY' => 'CHAR_LENGTH(page_title) ' . $sort_order )
            );

            // Iterate through the page titles
            foreach( $res as $row ) {
                // Page titles are stored in the database with spaces
                // replaced by underscores. Therefore we now convert
                // the underscores back to spaces.
                $title = str_replace('_', ' ', $row->page_title);

                if ( $title != $myTitle ) {
                    LinkTitles_5000::$safeTitle = str_replace( '/', '\/', $title );
                    # add this to skip the function if more than 1 level of sub pages
                    # Thus if 0 or 1 "\/" is found we continue and process the entry
                    # if two or more are found we go AARRRRRGGGGHHHHH  and skip it!
                    if (substr_count(LinkTitles_5000::$safeTitle, '\/') >1) {
                        continue;
                    }
                    # adding this to allow for sub pages to be broken into their parts
                    $LT5000_pos = strpos(LinkTitles_5000::$safeTitle, "\/");
                    if ($LT5000_pos !== false){
                        $LT5000_front = substr(LinkTitles_5000::$safeTitle, 0, $LT5000_pos);
                        $LT5000_back = substr(LinkTitles_5000::$safeTitle, $LT5000_pos+1);
                        LinkTitles_5000::$safeTitle = substr($title, $LT5000_pos+1);
                    } else {
                        $LT5000_back = '';
                        $LT5000_front = LinkTitles_5000::$safeTitle;;
                    }

                    // split the string by [[...]] groups
                    // credits to inhan @ StackOverflow for suggesting preg_split
                    // see http://stackoverflow.com/questions/10672286
                    $arr = preg_split( $delimiter, $text, -1, PREG_SPLIT_DELIM_CAPTURE );

                    // Depending on the global configuration setting $wgCapitalLinks,
                    // the title has to be searched for either in a strictly case-sensitive
                    // way, or in a 'fuzzy' way where the first letter of the title may
                    // be either case.
                    if ( $wgCapitalLinks ) {
                        $searchTerm = '((?i)' . LinkTitles_5000::$safeTitle[0] . '(?-i)' .
                            substr(LinkTitles_5000::$safeTitle, 1) . ')';
                    }   else {
                        $searchTerm = '(' . LinkTitles_5000::$safeTitle . ')';
                    }


                    $LT5000_out = "[[" . $LT_namespacePart . $title . "|";
                    for ( $i = 0; $i < count( $arr ); $i+=2 ) {
                        // even indexes will point to text that is not enclosed by brackets
                        $arr[$i] = preg_replace( '/(?<![\:\.\@\/\?\&])' .
                            $wordStartDelim . $searchTerm . $wordEndDelim . '/',
                            $LT5000_out.'$1]]', $arr[$i], $limit, $count );
                        if (( $limit >= 0 ) && ( $count > 0  )) {
                            break;
                        };
                    };

                    $text = implode( '', $arr );

                    // If smart mode is turned on, the extension will perform a second
                    // pass on the page and add links with aliases where the case does
                    // not match.
                    if ($wgLinkTitlesSmartMode) {
                        // split the string by [[...]] groups
                        // credits to inhan @ StackOverflow for suggesting preg_split
                        // see http://stackoverflow.com/questions/10672286
                        $arr = preg_split( $delimiter, $text, -1, PREG_SPLIT_DELIM_CAPTURE );

                        for ( $i = 0; $i < count( $arr ); $i+=2 ) {
                            // even indexes will point to text that is not enclosed by brackets
                            $arr[$i] = preg_replace_callback( '/(?<![\:\.\@\/\?\&])' .
                                $wordStartDelim . '(' . LinkTitles_5000::$safeTitle . ')' .
                                $wordEndDelim . '/i', $callBack, $arr[$i], $limit, $count );
                            if (( $limit >= 0 ) && ( $count > 0  )) {
                                break;
                            };
                        };
                        $text = implode( '', $arr );
                    }
                }; // if $title != $myTitle
            }; // foreach $res as $row
        }; // foreach $wgLinkTitlesNamespaceWhitelist as $LT_namespace
        return true;
    }

    static function CallBackCaseInsensitive($matches) {
        if ($LT5000_pos !== false){
            # this if a / was found in the first place
            $LT5000_call_out = $LT_namespacePart . $LT5000_front . '/' . $LT5000_back;
        } else {
            # this if there was no slash
            $LT5000_call_out = $LT_namespacePart . $matches[0];
        }

        if ( strcmp(substr(LinkTitles_5000::$safeTitle, 1), substr($matches[0], 1)) == 0 ) {
            return '[[' . $LT5000_call_out . '|]]';
        } else  {
            return '[[' . $LT5000_call_out . '|' . $matches[0] . ']]';
        }
    }

    static function CallBackCaseSensitive($matches) {
        if ($LT5000_pos !== false){
            # this if a / was found in the first place
            $LT5000_call_out = $LT_namespacePart . $LT5000_front . '/' . $LT5000_back;
        } else {
            # this if there was no slash
            $LT5000_call_out = $LT_namespacePart . $matches[0];
        }

        if ( strcmp(substr(LinkTitles_5000::$safeTitle, 0), substr($matches[0], 0)) == 0 ) {
            return '[['. $LT5000_call_out . '|]]';
        } else  {
            return '[[' . $LT5000_call_out . '|' . $matches[0] . ']]';
        }
    }

    static function removeMagicWord( &$parser, &$text ) {
        $mw = MagicWord::get('MAG_LINKTITLES_TERMINOLOGY_NOAUTOLINKS');
        $mw -> matchAndRemove( $text );
        return true;
    }
}

And the modules loader function LinkTitles_5000.php:

if ( !defined( 'MEDIAWIKI' ) ) { die( 'Not an entry point.' ); }



/*
    error_reporting(E_ALL);
    ini_set('display_errors', 'On');
    ini_set('error_log', 'php://stderr');
    $wgMainCacheType = CACHE_NONE;
    $wgCacheDirectory = false;
*/

// Configuration variables
$wgLinkTitlesPreferShortTitles = false; 
$wgLinkTitlesMinimumTitleLength = 3;
$wgLinkTitlesParseHeadings = false;
$wgLinkTitlesParseOnEdit = true;
$wgLinkTitlesParseOnRender = false;
$wgLinkTitlesSkipTemplates = false;
$wgLinkTitlesBlackList = array();
$wgLinkTitlesFirstOnly = false;
$wgLinkTitlesWordStartOnly = true;
$wgLinkTitlesWordEndOnly = true;
$wgLinkTitlesSmartMode = true;
$wgLinkTitlesNamespaceWhitelist = array();


$wgExtensionCredits['parserhook'][] = array(
    'path'           => FILE,
    'name'           => 'LinkTitles_5000',
    'author'         => '[https://www.mediawiki.org/wiki/User:Bovender Daniel Kraus]', 
    'url'            => 'https://www.mediawiki.org/wiki/Extension:LinkTitles',
    'version'        => '2.2.0',
    'descriptionmsg' => 'linktitles-desc'
    );

$wgExtensionMessagesFiles['LinkTitles_5000'] = dirname( FILE ) . '/LinkTitles_5000.i18n.php';
  $wgExtensionMessagesFiles['LinkTitlesMagic_5000'] = dirname( FILE ) . '/LinkTitles_5000.i18n.magic.php';
  $wgAutoloadClasses['LinkTitles_5000'] = dirname( FILE ) . '/LinkTitles_5000.body.php';
    $wgExtensionFunctions[] = 'LinkTitles_5000::setup';

// vim: ts=2:sw=2:noet

Solution

Forgot about this question from a long time ago, but it turns out MediaWiki has a buit in function for this

$myNamespace = $article->getTitle()->getNamespace();

Since the article object is by default passed to an extension hooked to an on_save process, this will get the namespace as a numeric value.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow