Planphoria Elgg SiteIndex Generator v1.0

Release Notes

FEATURES

This is an Elgg site index generator that dynamically creates site index links at the moment the
search engine bot decides to visit to provide a site index that is always up to date.

SiteIndex pulls data directly from the Elgg database (it doesn't spider Elgg), and because of this, it's really fast (without the http:// spider and transfer mucky muck). It  only indexes links
that are publicly accessible to the Internet (you don't have to log in or be part of a group to see them.)  If a user desires to get their stuff into the search engine stream, they only have to make it public when creating it, otherwise that content will not be indexed.

SiteIndex uses .gz compression when sending XML files to the search engines. Though you can turn this feature off, it is recommended that you leave it on as this expedites the process to the search engine, and minimizes the file size. You will need to be able to run the gzencode() function within your version of PHP for this to work.

To support the future file sizes that one may predict a large social network would create, SiteIndex creates a master list (named siteindex.xml.gz) of files that contain index links to the sitemap.org protocol.   

Indexed link files include:

    sitemain.xml.gz = Root URL, basic public areas and User Identified Loose Pages

    siteprofiles.xml.gz = Public facing User Profile elements and associated RSS

    sitepages.xml.gz = Public facing Elgg pages

    sitegroups.xml.gz = Public facing Elgg Groups

    siteblogs.xml.gz = Public facing Elgg Blogs

    siteevents.xml.gz = Public facing Elgg Events

    sitefiles.xml.gz = Public facing Elgg Files

    sitebookmarks.xml.gz = Public facing Elgg Bookmarks

    sitecategories.xml.gz = The list of possibly searchable Tags and Categories

    siteblogarchive.xml.gz = Public facing personal blogs, linked to by month

    sitegrblogarchive.xml.gz = Public facing group blogs, linked to by month

You can view the XML content within Firefox (to see what you are sending), by simply seeking one of the above files in the Elgg base directory without the .gz extension.

If there are no public facing records to create any one of the files above, SiteIndex will not include that file within the master index.

RESULTS

On my social network that is just starting out with 10 users who had not yet begun to add content, I was able to identify 340 public facing links.  Imagine how things will look as I scale in users and in content!

DONATE
Ok, here is my plea!

I work for myself.  It took a significant amount of time to write this, analyze the Elgg database schema to produce the desired results, and based on what I am seeing as I use this, SiteIndex WILL improve your Elgg SEO, drive your user base and drive the content that in return WILL increase your SEO and make your site successful.

If you enjoy this sort of thing.  Please consider making a donation to the cause by pressing the button below to help improve and keep the code maintained as Elgg shapes, distorts and changes beneath it.

image

Many thanks!

PREREQUISITES


Be able to run the PHP function gzencode().
Be able to use mod_rewrite. (You should be able to anyway if you are using Elgg).

INSTALLATION

The installation on this could not be easier.

1. Download the SiteIndex_1.0.zip file
2. Unzip the package in the base directory of Elgg

3. Add the following line to the end of your Rewrite Rules in .htaccess. If you don't, SiteIndex won't work.

#Rewrite Rules for Planphoria SiteIndex
RewriteRule ^site([A-Za-z]+).xml(|.gz) SiteIndex_v1.0/siteindex.php?m=$1&g=$2

4. From the base directory type cd "SiteIndex"

5. Edit siteindex.php

6. Look for the section below and make appropriate edits to sitemap.php, tweak the last two parameters of makeURLrec() statements as you deem appropriate.

#####################USER CONFIGURATION###############################

#FILL IN LOOSE PAGES THAT YOU WANT INDEXED
$page = array();
#$page[0] = "http://www.example.com/page1.html";
#$page[1] = "http://www.example.com/page2.html";

#PATH TO YOUR ELGG INSTALL BASE
$path = "<path of your Elgg Install>";

# Do you want compressed sitemap output? (0 = no, 1 = yes)
# Typically you want to say yes here. Say no when debugging.
# Though it will work in Firefox, there is an issue with the way that
# Google (and Lynx for that matter) works with my Apache mod_rewrite,
# going .gz when submitting the sitemap seems to fix it.

$gzip = 1;

# You can tweak the last two parameters of the makeURLrec() function for
# better SEO (or use my defaults) where the second to last parameter is
# the frequency of how oftenthe content changes.  It can be one of the following:
#
#    * always
#    * hourly
#    * daily
#    * weekly
#    * monthly
#    * yearly
#    * never
#
#  The last parameter is the priority which can be a value between
#  0.0 and 1.0
#

###################END OF USER CONFIGURATION###########################


7. Submit your new site index to the search engines using:

 http://<your-site-here>/siteindex.xml.gz

 

 

  • srosenber..

    just wanted to let you know that i have joind the party... thanks for all your help, you put in a lot of time helping me

    now that i am in, i have a few more questions.

    i am using the sitemap plugin, is there anyway for you system to index that and spite it out in xml format, that way i can submite that to google sitemap so it gets indexed

    Thanks

  • please give me a easy tutorial How to install this I put siteindex_v1.0 folder in root

     

    & add .htaccess code

     

    & when I visited the url I get 404 error

     

    I dont understande that part in which you told some thing to edit

     

    what to write in

    $path = "<path of your Elgg Install>";

    home/username/public_html ?? ot http://sitename.com

    please explain me
  • The instructions are not hard to follow, if you know how to navigate a Linux directory structure, and they do require that kind of minimal knowledge.  They are fairly straightforward and where they were not, it was corrected in the blog, or by the comments.

    SiteIndex goes in your base Elgg folder, the place where you installed Elgg.  Based on where you installed it, it may look something like:

    /home/username/public_html/elgg

    In the .htaccess file within this directory, which is required for Elgg to work.. pop in the rewrite rule using the instructions provided.  (Make sure that the name of the SiteIndex directory, matches that in the rewrite rule!)

    $path... Your path is /home/username/public_html/Elgg, or something to that effect. 

    It can't get any simpler.

     

  • thnakyou Now its done

    the problem was about the $path string Ok fixed thankyou for such great plugin

  • hi srosenberg:

    I joined the party also..  i added the rewrite rule and changed the path to

    /home/uname/public_index/site/

     

    how long should i wait till ur code will start to write to the xml.gz file....

     

    cuz at google webmaster, i get a "not found" error..

     

    thakns in advance

     

  • if i access www.mysite.com/siteindex.xml.gz, i get this

     

    You don't have permission to access /SiteIndex_v1.0/siteindex.php on this server.

  • It looks like a permissions problem to me.  Make sure that the siteindex.php is executable. chmod 755 siteindex.php should do it.

  • hi rosenberg!

     

    nops, its all ok.. i tried with 755ing the siteindex.php, doenst work..

  • i deleted some lines from .htaccess, ran the link, added again.. works..

     

    thanks :)

  • great script and thanks for the script but

    huh i did it all the ways but no use. please write installation steps clearly.

    my elgg script installed on the root folder ex : under public_html/

    and i installed the script under /public_html/siteindex/

    under the .htaccess i added the following code

    #Rewrite Rules for Planphoria SiteIndex
    RewriteRule ^site([A-Za-z]+).xml(|.gz) siteindex/siteindex.php?m=$1&g=$2

    and under siteindex.php
    i added the root path : example : "/home/username/public_html"

    i tried to excute the script with http://mysitename.com/siteindex/siteindex.php

    nothing happend! no effect! just blank script. .gz file not created in my siteindex folder.

    please help how to install the script.

    jagadish.p
  • by default my siteindex.php php file permission is 644 and all sites scripts in my server all to 644 only. i tried to changed the permission to 755 and i got the following error

    Internal Server Error

    The server encountered an internal error or misconfiguration and was unable to complete your request.

    Please contact the server administrator, webmaster@netkushi.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.

    More information about this error may be available in the server error log.

    Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

     

    and i tried to change the siteindex directory permission to 777 also but no use. please help me to install the script.

     

    thank you,

    jagadish.p

  • So your path is /home/username/public_html ?  not /home/username/public_html/elgg   or something to that effect?  chmod 755 should work.

     

  • Sweeeeet! Works like a charm.

  • My .htaccess file doesn't like the RewriteRule. Returns a 500 internal server error. I'm with Fatcow. Any sugestions?

  • That doesn't sound right considering that Elgg requires rewrite rules to work.. if Elgg works, this should work.  something is up with your configuration.

  • I don't have a clue how to troubleshoot it. My elgg installation seems to work very reliably.

  • Well the fact that you are on Fatcow doesn't help me.  Knowing the path to where you have placed the SiteIndex directory, and seeing the tail part of your .htaccess file may help me.

    Did you set the path appropriately per the instructions?

  • This is the path to my elgg install and Siteindex directory is:

    $path = "/hermes/web05b/b683/moo.gasdoc/";

    It is the same as appears in "Site Administration" section of elgg.

    The bottom part of my .htaccess file is:

    <IfModule mod_rewrite.c>

    RewriteEngine on

    # If Elgg is in a subdirectory on your site, you might need to add a RewriteBase line
    # containing the path from your site root to elgg's root. e.g. If your site is
    # http://example.com/ and Elgg is in http://example.com/sites/elgg/, you might need
    #
    #RewriteBase /sites/elgg/
    #
    # here, only without the # in front.
    #
    # If you're not running Elgg in a subdirectory on your site, but still getting lots
    # of 404 errors beyond the front page, you could instead try:
    #
    #RewriteBase /

    RewriteRule ^action\/([A-Za-z0-9\_\-\/]+)$ engine/handlers/action_handler.php?action=$1

    RewriteRule ^export\/([A-Za-z]+)\/([0-9]+)$ services/export/handler.php?view=$1&guid=$2
    RewriteRule ^export\/([A-Za-z]+)\/([0-9]+)\/$ services/export/handler.php?view=$1&guid=$2
    RewriteRule ^export\/([A-Za-z]+)\/([0-9]+)\/([A-Za-z]+)\/([A-Za-z0-9\_]+)\/$ services/export/handler.php?view=$1&guid=$2&type=$3&idname=$4

    RewriteRule ^\_css\/css\.css$ _css/css.php

    RewriteRule ^pg\/([A-Za-z0-9\_\-]+)\/(.*)$ engine/handlers/pagehandler.php?handler=$1&page=$2
    RewriteRule ^pg\/([A-Za-z0-9\_\-]+)$ engine/handlers/pagehandler.php?handler=$1

    RewriteRule xml-rpc.php engine/handlers/xml-rpc_handler.php
    RewriteRule mt/mt-xmlrpc.cgi engine/handlers/xml-rpc_handler.php

    RewriteRule ^tag/(.+)/?$ engine/handlers/pagehandler.php?handler=search&page=$1
    ###########Rewrite Rules for Planphoria SiteIndex#######################
    RewriteRule ^site([A-Za-z]+).xml(|.gz) siteIndex/siteindex.php?m=$1&g=$2
    </IfModule>

  • Hi, I have finished the installation process, however I receive a blank screen when I go to http://path to host/siteblog.xml in Firefox.  I have verified that "Path" is correct (step 6) as it contains the folder: Engine.

    I have tested the rewrite rule using  a simple test page e.g. siteblog.xml does forward to the test page.  However, when I configure htaccess to forward to siteindex_v1.0/siteindex.php, a blank page is displayed.

    Please advise.  Thank you for your time.

  • I have resolved the problem described in my post above.

  • Phil, what is the exact path of where you have SiteIndex installed?

  • The document root of my site is:

    /home/users/web/b683/moo.gasdoc/

    The elgg install is in that directory.

    The path to the SiteIndex directory is:

    /home/users/web/b683/moo.gasdoc/siteindex/

    I tried both root paths in siteindex.php and get the same error which appears to relate to the .htaccess edit. If undo the edit elgg works again.

    The elgg installer for some reason discovered the document root as:

    /hermes/web05b/b683/moo.gasdoc/

    and it works using that setting in site administration.

  • The issue is a capital I in siteIndex in the rewrite rule.  Change the capital I to a lower case one and you should be in business.

  • Unfortunately, I'd already noticed that, and no change.

  • Ok.. unfortunately I am not seeing everything you are seeing (or are able to see), so lets talk about what the rewrite rule actually does and maybe that might help.

    Basically, the objective of the rewrite rule is to convert a request from Google, Yahoo, Bing, etc that comes to your server in the form of http://www.yourdomain.com/siteindex.xml&nbsp; or  http://www.yourdomain.com/siteindex.xml.gz and rewrite the request to go to the siteindex.php script.

    If we look at the rewrite rule:

    RewriteRule ^site([A-Za-z]+).xml(|.gz) siteindex/siteindex.php?m=$1&g=$2

    The first part of it:

    ^site([A-Za-z]+).xml(|.gz)

    It basically says to rewrite any request that begins with "site", it captures the next string
    of characters between "site" and .xml and assigns that to variable $1, and if the string ends
    in .gz (and this is optional defined by the | character) it assigns that value to variable $2.

    So based on the way that siteindex.php is written, this rewrite rule can accept any of the following (with or without the gz extension)
    in addition to siteindex.xml:

        sitemain.xml.gz = Root URL, basic public areas and User Identified Loose Pages
        siteprofiles.xml.gz = Public facing User Profile elements and associated RSS
        sitepages.xml.gz = Public facing Elgg pages
        sitegroups.xml.gz = Public facing Elgg Groups
        siteblogs.xml.gz = Public facing Elgg Blogs
        siteevents.xml.gz = Public facing Elgg Events
        sitefiles.xml.gz = Public facing Elgg Files
        sitebookmarks.xml.gz = Public facing Elgg Bookmarks
        sitecategories.xml.gz = The list of possibly searchable Tags and Categories
        siteblogarchive.xml.gz = Public facing personal blogs, linked to by month
        sitegrblogarchive.xml.gz = Public facing group blogs, linked to by month

    The second part:

    siteindex/siteindex.php?m=$1&g=$2

    This is the location of siteindex.php from the elgg base directory, which should be executable by the web server (chmod 755 if you are not sure).  The stuff after the question mark are parameters for the siteindex.php script that are pulled from the access request (variables $1 and $2 mentioned earlier).

    If you are experiencing a 500 error, it may be because something is not set right so you can see the directory where siteindex.php resides.

    You should be able to go to http://www.yourdomain.com/siteindex/siteindex.php and touch the script.  If this does not work, the rewrite isn't going to work either, and that is something that you need to figure out.

Stats

  • Category: Uncategorized
  • License: GNU General Public License (GPL) version 2
  • Updated: 2014-11-17
  • Downloads: 1927
  • Recommendations: 1