Planphoria Elgg SiteIndex Generator v1.0

Release Notes

FEATURES

This is an Elgg site index generator that dynamically creates site index links at the moment the
search engine bot decides to visit to provide a site index that is always up to date.

SiteIndex pulls data directly from the Elgg database (it doesn't spider Elgg), and because of this, it's really fast (without the http:// spider and transfer mucky muck). It  only indexes links
that are publicly accessible to the Internet (you don't have to log in or be part of a group to see them.)  If a user desires to get their stuff into the search engine stream, they only have to make it public when creating it, otherwise that content will not be indexed.

SiteIndex uses .gz compression when sending XML files to the search engines. Though you can turn this feature off, it is recommended that you leave it on as this expedites the process to the search engine, and minimizes the file size. You will need to be able to run the gzencode() function within your version of PHP for this to work.

To support the future file sizes that one may predict a large social network would create, SiteIndex creates a master list (named siteindex.xml.gz) of files that contain index links to the sitemap.org protocol.   

Indexed link files include:

    sitemain.xml.gz = Root URL, basic public areas and User Identified Loose Pages

    siteprofiles.xml.gz = Public facing User Profile elements and associated RSS

    sitepages.xml.gz = Public facing Elgg pages

    sitegroups.xml.gz = Public facing Elgg Groups

    siteblogs.xml.gz = Public facing Elgg Blogs

    siteevents.xml.gz = Public facing Elgg Events

    sitefiles.xml.gz = Public facing Elgg Files

    sitebookmarks.xml.gz = Public facing Elgg Bookmarks

    sitecategories.xml.gz = The list of possibly searchable Tags and Categories

    siteblogarchive.xml.gz = Public facing personal blogs, linked to by month

    sitegrblogarchive.xml.gz = Public facing group blogs, linked to by month

You can view the XML content within Firefox (to see what you are sending), by simply seeking one of the above files in the Elgg base directory without the .gz extension.

If there are no public facing records to create any one of the files above, SiteIndex will not include that file within the master index.

RESULTS

On my social network that is just starting out with 10 users who had not yet begun to add content, I was able to identify 340 public facing links.  Imagine how things will look as I scale in users and in content!

DONATE
Ok, here is my plea!

I work for myself.  It took a significant amount of time to write this, analyze the Elgg database schema to produce the desired results, and based on what I am seeing as I use this, SiteIndex WILL improve your Elgg SEO, drive your user base and drive the content that in return WILL increase your SEO and make your site successful.

If you enjoy this sort of thing.  Please consider making a donation to the cause by pressing the button below to help improve and keep the code maintained as Elgg shapes, distorts and changes beneath it.

image

Many thanks!

PREREQUISITES


Be able to run the PHP function gzencode().
Be able to use mod_rewrite. (You should be able to anyway if you are using Elgg).

INSTALLATION

The installation on this could not be easier.

1. Download the SiteIndex_1.0.zip file
2. Unzip the package in the base directory of Elgg

3. Add the following line to the end of your Rewrite Rules in .htaccess. If you don't, SiteIndex won't work.

#Rewrite Rules for Planphoria SiteIndex
RewriteRule ^site([A-Za-z]+).xml(|.gz) SiteIndex_v1.0/siteindex.php?m=$1&g=$2

4. From the base directory type cd "SiteIndex"

5. Edit siteindex.php

6. Look for the section below and make appropriate edits to sitemap.php, tweak the last two parameters of makeURLrec() statements as you deem appropriate.

#####################USER CONFIGURATION###############################

#FILL IN LOOSE PAGES THAT YOU WANT INDEXED
$page = array();
#$page[0] = "http://www.example.com/page1.html";
#$page[1] = "http://www.example.com/page2.html";

#PATH TO YOUR ELGG INSTALL BASE
$path = "<path of your Elgg Install>";

# Do you want compressed sitemap output? (0 = no, 1 = yes)
# Typically you want to say yes here. Say no when debugging.
# Though it will work in Firefox, there is an issue with the way that
# Google (and Lynx for that matter) works with my Apache mod_rewrite,
# going .gz when submitting the sitemap seems to fix it.

$gzip = 1;

# You can tweak the last two parameters of the makeURLrec() function for
# better SEO (or use my defaults) where the second to last parameter is
# the frequency of how oftenthe content changes.  It can be one of the following:
#
#    * always
#    * hourly
#    * daily
#    * weekly
#    * monthly
#    * yearly
#    * never
#
#  The last parameter is the priority which can be a value between
#  0.0 and 1.0
#

###################END OF USER CONFIGURATION###########################


7. Submit your new site index to the search engines using:

 http://<your-site-here>/siteindex.xml.gz

 

 

  • The INSTALL.txt is slightly inaccurate where the XML redirect files are listed in format <file>.tar.gz, when they should be <file>.xml.gz.  I was typing on autopilot when I wrote the INSTALL.  The posting is correct and will fix the INSTALL.txt on the next revision.

  • Anyone know if this works in 1.5?

  • looks like a good little plugin. How do you check how many outgoing link you have. Do you know of any good webservice that does that

    Thaks

     

  • I could provide some sort of reporting aspect to it in a future revision to give you stats based on the moment that you run the report, however that isn't really going to help you with anything.  One thing I recommend is that you submit the siteindex.xml.gz to Google Webmaster Tools and they will give you a complete breakdown of all the links that are in the XML files within 5 minutes after you submit it.  That is more in line with what you need, and as Googlebot updates off a newer siteindex, you get the benefit of that report.

  • What nice work! And nice documentation!

  • Ok folks here's the news on SiteIndex and Elgg v1.5.  From what I can tell from my very limited test sandbox SiteIndex v1.0 IS compatable with Elgg v1.5.  Along the way I figured a few other things.

    From a database SELECT standpoint, it is 100% compatible.  The worst thing that can happen here is it won't return certain kinds of data. For instance, the database won't return event_calendar data when the event_calendar mod is not installed.  Since event_calendar uses the same tables as core_elgg, the database query is compatible, and no XML record for the attempt should be made in a circumstance when there is no data to create one.

    From a hyperlink standpoint.  From what I can tell, SiteIndex creates hyperlinks that are consistent with the Elgg navigation and linking method.  Again, there are links I tried testing, like event_calendar that didn't work, but since there wouldn't be any data to feed the links, the link wouldn't be created anyway.

    These are the only two places where there could be an incompatibility.  I encourage people to go ahead and try it out on Elgg v1.5.  I just ran it with success. The script only performs SELECT statements against the database, so it won't screw anything up if a minor incompatibility exists.  The worst thing that could happen is that you submit a sitemap that has a couple broken links that the search engine can't resolve and it gets reported back to you.  If this happens (watching Google Webmaster Tools), I would like to know so that I can fix it in future versions.  Again, thanks for everyone's interest.


    Happy SEO!

  • I get "Plugin Misconfigured" Error

    Also,there is no start.php file in your zip file...

    Is this the source of my problem?

  • Sandeep.  SiteIndex goes into the Elgg Base directory.  It's not a plugin per se (not yet).  You dump it into the base directory, fix the path in the siteindex.php and it's ready to rock and roll.  It does not need a start.php as it runs independently (with the exception of pulling database access info from settings.php) from the Elgg system.  It simply performs the right queries, makes proper URL's based on that data, and appropriately creates XML.  Outside of doing the initial configuration and tweak, and submittal to the search engines.. there are no other moving parts, and it just does what it does without additional need to touch it.

  • yea it took me ages to figure that out :-) but now it works like a charm :-D

  • It's in the instructions... Did you skip step 2?  :)

  • Just started working with this I get 500 error.  Putt htaccess back the way it was and deleted SiteIndex dir and still have error.  Site went down completely.  Any ideas/

  • Well firstly, it's .htaccess not htaccess. Make sure you have that . there.

    Second, for Elgg to work .htaccess needs to be there with with all those Rewrites that Elgg requires at the very end of it.  This is not optional.  Elgg works because of all of those Rewrites.

    Third, for SiteIndex to work, it needs the line mentioned in the documentatation above at the end of the Elgg rewrite list.

    Also, you might want to check your Apache config and make sure that you have the directive:

    AllowOverride All

    which enables use of .htaccess in the first place.

  • my (.)htaccess is fine as is the Apache config.  I any case returning the .htaccess to original should have taken care of the issue.  Also removing the SiteIndex dir.  Cleared server cache and browser cache still no joy.  I uploaded .htaccess before SiteIndex, site was still running.  Uploaded .htaccess and site crashed.

  • Two possible issues and please correct me if I am wrong; first is unzipping the package, it unzips as SiteIndex_v1.0.  Should or should not this be renamed SiteIndex?  Second, would an elgg install NOT in the root effect this, example http://x x xxx.c0m/xxx/.  when making the edit in siteindex.php would the trailing dash be required or not?

  • Hmm. 

    You are correct on the SiteIndex_v1.0 unzip.  I have it as SiteIndex on my site, so that's my oversight. 

    You could change the SiteIndex_v1.0 directory to SiteIndex, or change the .htaccess Rewrite to SiteIndex_v1.0.

    I have not tested the Elgg not in root aspect of it.  Thank you for your courage here.  To do that, I would imagine that you have already set RewriteBase.  So this becomes a rewrite rule under Elgg just like any other that Elgg does, and it should not be affected.  Can you send me a message with your .htaccess Rewrite Rule section?

  • Thank you.....!  That is often the problem with this sort of thing... trying to get the configurations to line up, and then as you ad plugins, those create content as well that may not be included in the site index.  The good news is that we now at least have a template to go off of so that it is possible to package them in as they are identified.

  • Srosenberg

    What happend goodbucket comment, got deleted ? i am still having a hardtime installing this plugin and i am having a hard time

  • It's not a plugin.  It has no start.php... it does not go into the mod directory.  It's a standalone application that taps the database. 

    Is it in the Elgg base directory?

  • sorry, yes it is not a plugin. It is in the elgg base directory.

     

  • does the siteindex folder have to be in the same directory where my data folder is

    <!--Session data-->

  • It's in the base directory, and it does not care where your data folder is.

  • My bad wrong post got deleted

  • Can you tell us what you did again goofbucket.  It looks like there are different rules of making it work when you have that kind of configuration.

  • Test site  elgg installed to subdirectory http: //xxxxxxx. xxx/sub/  On test site the root does not contain the .htaccess.  If placing this app it must be placed in the same folder/directory as your elgg installation containing your .htaccess

Stats

  • Category: Uncategorized
  • License: GNU General Public License (GPL) version 2
  • Updated: 2014-11-17
  • Downloads: 1928
  • Recommendations: 1