Planphoria Elgg SiteIndex Generator v1.0

Release Notes

FEATURES

This is an Elgg site index generator that dynamically creates site index links at the moment the
search engine bot decides to visit to provide a site index that is always up to date.

SiteIndex pulls data directly from the Elgg database (it doesn't spider Elgg), and because of this, it's really fast (without the http:// spider and transfer mucky muck). It  only indexes links
that are publicly accessible to the Internet (you don't have to log in or be part of a group to see them.)  If a user desires to get their stuff into the search engine stream, they only have to make it public when creating it, otherwise that content will not be indexed.

SiteIndex uses .gz compression when sending XML files to the search engines. Though you can turn this feature off, it is recommended that you leave it on as this expedites the process to the search engine, and minimizes the file size. You will need to be able to run the gzencode() function within your version of PHP for this to work.

To support the future file sizes that one may predict a large social network would create, SiteIndex creates a master list (named siteindex.xml.gz) of files that contain index links to the sitemap.org protocol.   

Indexed link files include:

    sitemain.xml.gz = Root URL, basic public areas and User Identified Loose Pages

    siteprofiles.xml.gz = Public facing User Profile elements and associated RSS

    sitepages.xml.gz = Public facing Elgg pages

    sitegroups.xml.gz = Public facing Elgg Groups

    siteblogs.xml.gz = Public facing Elgg Blogs

    siteevents.xml.gz = Public facing Elgg Events

    sitefiles.xml.gz = Public facing Elgg Files

    sitebookmarks.xml.gz = Public facing Elgg Bookmarks

    sitecategories.xml.gz = The list of possibly searchable Tags and Categories

    siteblogarchive.xml.gz = Public facing personal blogs, linked to by month

    sitegrblogarchive.xml.gz = Public facing group blogs, linked to by month

You can view the XML content within Firefox (to see what you are sending), by simply seeking one of the above files in the Elgg base directory without the .gz extension.

If there are no public facing records to create any one of the files above, SiteIndex will not include that file within the master index.

RESULTS

On my social network that is just starting out with 10 users who had not yet begun to add content, I was able to identify 340 public facing links.  Imagine how things will look as I scale in users and in content!

DONATE
Ok, here is my plea!

I work for myself.  It took a significant amount of time to write this, analyze the Elgg database schema to produce the desired results, and based on what I am seeing as I use this, SiteIndex WILL improve your Elgg SEO, drive your user base and drive the content that in return WILL increase your SEO and make your site successful.

If you enjoy this sort of thing.  Please consider making a donation to the cause by pressing the button below to help improve and keep the code maintained as Elgg shapes, distorts and changes beneath it.

image

Many thanks!

PREREQUISITES


Be able to run the PHP function gzencode().
Be able to use mod_rewrite. (You should be able to anyway if you are using Elgg).

INSTALLATION

The installation on this could not be easier.

1. Download the SiteIndex_1.0.zip file
2. Unzip the package in the base directory of Elgg

3. Add the following line to the end of your Rewrite Rules in .htaccess. If you don't, SiteIndex won't work.

#Rewrite Rules for Planphoria SiteIndex
RewriteRule ^site([A-Za-z]+).xml(|.gz) SiteIndex_v1.0/siteindex.php?m=$1&g=$2

4. From the base directory type cd "SiteIndex"

5. Edit siteindex.php

6. Look for the section below and make appropriate edits to sitemap.php, tweak the last two parameters of makeURLrec() statements as you deem appropriate.

#####################USER CONFIGURATION###############################

#FILL IN LOOSE PAGES THAT YOU WANT INDEXED
$page = array();
#$page[0] = "http://www.example.com/page1.html";
#$page[1] = "http://www.example.com/page2.html";

#PATH TO YOUR ELGG INSTALL BASE
$path = "<path of your Elgg Install>";

# Do you want compressed sitemap output? (0 = no, 1 = yes)
# Typically you want to say yes here. Say no when debugging.
# Though it will work in Firefox, there is an issue with the way that
# Google (and Lynx for that matter) works with my Apache mod_rewrite,
# going .gz when submitting the sitemap seems to fix it.

$gzip = 1;

# You can tweak the last two parameters of the makeURLrec() function for
# better SEO (or use my defaults) where the second to last parameter is
# the frequency of how oftenthe content changes.  It can be one of the following:
#
#    * always
#    * hourly
#    * daily
#    * weekly
#    * monthly
#    * yearly
#    * never
#
#  The last parameter is the priority which can be a value between
#  0.0 and 1.0
#

###################END OF USER CONFIGURATION###########################


7. Submit your new site index to the search engines using:

 http://<your-site-here>/siteindex.xml.gz

 

 

  • I have checked my php.ini file and have found:

    zlib.output_compression = Off

    Is this a problem?

    Visiting http://www.smokist.com/siteindex/siteindex.php gets a blank page.

    siteindex.php is set to CHMOD 755

    I am keen to make this work. Thank you for your very helpful replies.

    Phil

  • Ok, the blank page is a good thing.  It means that you can see the script and you have not passed any variables to it.

    The good news is that the script works.

    Try: http://www.smokist.com/siteindex/siteindex.php?m=profiles

    This simulates what the Rewrite Rule would do.  Unfortunately, we can't rightly give this to Google as they want a .xml or .xml.gz file, which is why we do the rewrite.

    So the question is... is the rewrite rule pointing to the correct place. 

    I am not sure if the zlib thing is the problem.  My script only requires that you can use gzencode() as a function.  You might want to track that down too.

     

     

     

  • Looks good..got this:

    <?xml version="1.0" encoding="UTF-8" ?>

    - <url>
      <lastmod>2010-01-08</lastmod>
      <changefreq>daily</changefreq>
      <priority>0.8</priority>
      </url>
    and lots more of the same with different urls.
    Have been searching for mention of gzencode(), not found yet. Will keep looking.
    I wonder if I should try and get access to Apache error log?
    Thanks again.
  • Well at least the script works.  If you got there using the redirect, twice as good.

    gzencode is a php function.   You can go to the php website and read up on how it works and write a script to try to gz something.  You might even be able to use their example.  All you need is to see if it is working.

     

     

     

  • Thanks, I'll give it a whirl. I am very grateful for your helpful advice and will defo donate. Cheers. Phil.

  • Most excellent! Have followed php tutorial and used an example of gzencode() function. Have successfully proven that gzencode works on my hosting server! Successfully compressed and expanded a file! Now I'll work on the RewriteRule.

  • Now I think I've got a pretty good understanding of the RewriteRule syntax and can't for the life of me see why the server is complaining. It's just unhappy with the modified .htaccess file and there is no reason why this should happen. Asked hosting support and they don't help with scripting! Very helpful! They do have CGI error logs I can see but there are no logs and they say this means there are no errors! I'll badger them some more.

  • It's one of those things, either Rewrite rules work, or they don't.  Elgg doesn't work unless they do.  So the only thing I would suggest checking is the second part of it where things are pointing to places where things exist.

  • I don't know how this helps but I've narrowed down the cause of the error. If I delete the part (|.gz) the error does not occur. Does that mean that the script would work but would not produce gz compressed files? I currently searching madly for mention of the | character in Rewrite Rule syntax, but can't see it mentioned yet apart from in the arguments at the end of the statement.

  • Back again! It seems my problem is with the | (pipe) character. So I've got round it with 2 Rewriterules, which works well:

    RewriteRule ^site([A-Za-z]+)\.xml siteindex/siteindex.php?m=$1&g=$2
    RewriteRule ^site([A-Za-z]+)\.xml\.gz siteindex/siteindex.php?m=$1&g=$2

    I asked about the problem on Webmaster World, the reply is at http://www.webmasterworld.com/apache/4059757.htm . If you don't want to go through the hassel of registering, here is the reply:

    If you copied that code (or copied some code for use as a "template") from this or any other forum, check to be sure that the pipe character (vertical bar) is solid, and not broken. If it is, re-type that character from your keyboard while editing the file using a plain-text editor such as Notepad.

    The pipe character must be an ASCII or UTF-8 value of %7c, and not be some special UTF-16 or "high character-code" entity.

    Note that it's possible that you should be using the pattern

    RewriteRule ^site([a-z]+)\.(xml¦gz)$ siteindex/siteindex.php?m=$1&g=$2 [NC,L]

    if your intent was to rewrite URL-paths starting with one or more letters and ending with ".xml" or ".gz".

    Note the use of [NC] to make the character-comparison case-insensitive, simplifying the pattern (and doubling the code speed), and the required escaping of the literal period (it is otherwise interpreted as a regex token matching *any* single character).

    Note also the addition of the [L] flag for improved efficiency. Use of [L] on every rule is recommended, unless you have a specific reason not to use it in mind.

    I've added the escape / in front of the .'s as suggested. The answerer seems to want to put the $ at the end of the pattern but I guess your code doesn't require it. What do you think about the suggestions of changing the pattern to (a-z) only and the NC and L parameters?

    Also, can I just check that it is only the one url http://<your-site-here>/siteindex.xml.gz I need to tell search engines?

    Just about to donate. Thanks for your help and excellent mod!

  • Sorry, meant \ not /.

  • Just tested the a-z only pattern and the [NC,L] arguments and it's much faster.

  • Writing to confirm the error Phil received. (I received the same 500 error.) The pipe is definitely the culprit.

    I replaced:

    RewriteRule ^site([A-Za-z]+).xml(|.gz) SiteIndex_v1.0/siteindex.php?m=$1&g=$2

    With:
    RewriteRule ^site([a-z]+)\.(xml¦gz)$ siteindex/siteindex.php?m=$1&g=$2 [NC,L]

    And now everything works great. Thanks for posting the solution Phil! @srosenberg, do you think you could update the instructions to reflect this fix

  • Does this work with Elgg 1.7?

  • Hi, srosenberg

    It's really a great script. Dou you have a plan to let your script supporting the thewire plugin and the izap_videos plugin?

  • Has anyone gotten this to work in 1.7?  If so, how did you do it?

  • I installed this as per the instructions and have gone through several interations of .htaccess mods.  When I test it I get a blank page (even calling it with ?m=profiles.  I checked my apache error log and noticed an error saying m was undefined.  Without the ?m=profiles there isn't anyting showing up in the error log.

  • Mmmm... I have the same issue as @MKos.  This is such a nice tool, it would be great if someone could check the script against the needs of 1.7 onwards.

  • I see it s old and complicated to install, any alternative for that?

  • nobody needs to tell google about his site?

  • I know that this plugin is inactive now. I am using a third-party siteindex generator. But I feel that an ELGG specific siteindex generator will be good which respects the user's privacy.

     

  • Does this work on ELGG 1.8? What other solution are people using to generate sitemaps for their elgg sites?

  • Please some one update this highly demanded plugin to work with elgg 1.8

  • new version possible ?

Stats

  • Category: Uncategorized
  • License: GNU General Public License (GPL) version 2
  • Updated: 2014-11-17
  • Downloads: 1928
  • Recommendations: 1