FEATURES
This is an Elgg site index generator that dynamically creates site index links at the moment the
search engine bot decides to visit to provide a site index that is always up to date.
SiteIndex pulls data directly from the Elgg database (it doesn't spider Elgg), and because of this, it's really fast (without the http:// spider and transfer mucky muck). It only indexes links
that are publicly accessible to the Internet (you don't have to log in or be part of a group to see them.) If a user desires to get their stuff into the search engine stream, they only have to make it public when creating it, otherwise that content will not be indexed.
SiteIndex uses .gz compression when sending XML files to the search engines. Though you can turn this feature off, it is recommended that you leave it on as this expedites the process to the search engine, and minimizes the file size. You will need to be able to run the gzencode() function within your version of PHP for this to work.
To support the future file sizes that one may predict a large social network would create, SiteIndex creates a master list (named siteindex.xml.gz) of files that contain index links to the sitemap.org protocol.
Indexed link files include:
sitemain.xml.gz = Root URL, basic public areas and User Identified Loose Pages
siteprofiles.xml.gz = Public facing User Profile elements and associated RSS
sitepages.xml.gz = Public facing Elgg pages
sitegroups.xml.gz = Public facing Elgg Groups
siteblogs.xml.gz = Public facing Elgg Blogs
siteevents.xml.gz = Public facing Elgg Events
sitefiles.xml.gz = Public facing Elgg Files
sitebookmarks.xml.gz = Public facing Elgg Bookmarks
sitecategories.xml.gz = The list of possibly searchable Tags and Categories
siteblogarchive.xml.gz = Public facing personal blogs, linked to by month
sitegrblogarchive.xml.gz = Public facing group blogs, linked to by month
You can view the XML content within Firefox (to see what you are sending), by simply seeking one of the above files in the Elgg base directory without the .gz extension.
If there are no public facing records to create any one of the files above, SiteIndex will not include that file within the master index.
RESULTS
On my social network that is just starting out with 10 users who had not yet begun to add content, I was able to identify 340 public facing links. Imagine how things will look as I scale in users and in content!
DONATE
Ok, here is my plea!
I work for myself. It took a significant amount of time to write this, analyze the Elgg database schema to produce the desired results, and based on what I am seeing as I use this, SiteIndex WILL improve your Elgg SEO, drive your user base and drive the content that in return WILL increase your SEO and make your site successful.
If you enjoy this sort of thing. Please consider making a donation to the cause by pressing the button below to help improve and keep the code maintained as Elgg shapes, distorts and changes beneath it.
Many thanks!
PREREQUISITES
Be able to run the PHP function gzencode().
Be able to use mod_rewrite. (You should be able to anyway if you are using Elgg).
INSTALLATION
The installation on this could not be easier.
1. Download the SiteIndex_1.0.zip file
2. Unzip the package in the base directory of Elgg
3. Add the following line to the end of your Rewrite Rules in .htaccess. If you don't, SiteIndex won't work.
#Rewrite Rules for Planphoria SiteIndex
RewriteRule ^site([A-Za-z]+).xml(|.gz) SiteIndex_v1.0/siteindex.php?m=$1&g=$2
4. From the base directory type cd "SiteIndex"
5. Edit siteindex.php
6. Look for the section below and make appropriate edits to sitemap.php, tweak the last two parameters of makeURLrec() statements as you deem appropriate.
#####################USER CONFIGURATION###############################
#FILL IN LOOSE PAGES THAT YOU WANT INDEXED
$page = array();
#$page[0] = "http://www.example.com/page1.html";
#$page[1] = "http://www.example.com/page2.html";
#PATH TO YOUR ELGG INSTALL BASE
$path = "<path of your Elgg Install>";
# Do you want compressed sitemap output? (0 = no, 1 = yes)
# Typically you want to say yes here. Say no when debugging.
# Though it will work in Firefox, there is an issue with the way that
# Google (and Lynx for that matter) works with my Apache mod_rewrite,
# going .gz when submitting the sitemap seems to fix it.
$gzip = 1;
# You can tweak the last two parameters of the makeURLrec() function for
# better SEO (or use my defaults) where the second to last parameter is
# the frequency of how oftenthe content changes. It can be one of the following:
#
# * always
# * hourly
# * daily
# * weekly
# * monthly
# * yearly
# * never
#
# The last parameter is the priority which can be a value between
# 0.0 and 1.0
#
###################END OF USER CONFIGURATION###########################
7. Submit your new site index to the search engines using:
http://<your-site-here>/siteindex.xml.gz
info@elgg.org
Security issues should be reported to security@elgg.org!
©2014 the Elgg Foundation
Elgg is a registered trademark of Thematic Networks.
Cover image by Raül Utrera is used under Creative Commons license.
Icons by Flaticon and FontAwesome.
I have checked my php.ini file and have found:
zlib.output_compression = Off
Is this a problem?
Visiting http://www.smokist.com/siteindex/siteindex.php gets a blank page.
siteindex.php is set to CHMOD 755
I am keen to make this work. Thank you for your very helpful replies.
Phil
Ok, the blank page is a good thing. It means that you can see the script and you have not passed any variables to it.
The good news is that the script works.
Try: http://www.smokist.com/siteindex/siteindex.php?m=profiles
This simulates what the Rewrite Rule would do. Unfortunately, we can't rightly give this to Google as they want a .xml or .xml.gz file, which is why we do the rewrite.
So the question is... is the rewrite rule pointing to the correct place.
I am not sure if the zlib thing is the problem. My script only requires that you can use gzencode() as a function. You might want to track that down too.
Looks good..got this:
<?xml version="1.0" encoding="UTF-8" ?>
Well at least the script works. If you got there using the redirect, twice as good.
gzencode is a php function. You can go to the php website and read up on how it works and write a script to try to gz something. You might even be able to use their example. All you need is to see if it is working.
Thanks, I'll give it a whirl. I am very grateful for your helpful advice and will defo donate. Cheers. Phil.
Most excellent! Have followed php tutorial and used an example of gzencode() function. Have successfully proven that gzencode works on my hosting server! Successfully compressed and expanded a file! Now I'll work on the RewriteRule.
Now I think I've got a pretty good understanding of the RewriteRule syntax and can't for the life of me see why the server is complaining. It's just unhappy with the modified .htaccess file and there is no reason why this should happen. Asked hosting support and they don't help with scripting! Very helpful! They do have CGI error logs I can see but there are no logs and they say this means there are no errors! I'll badger them some more.
It's one of those things, either Rewrite rules work, or they don't. Elgg doesn't work unless they do. So the only thing I would suggest checking is the second part of it where things are pointing to places where things exist.
I don't know how this helps but I've narrowed down the cause of the error. If I delete the part (|.gz) the error does not occur. Does that mean that the script would work but would not produce gz compressed files? I currently searching madly for mention of the | character in Rewrite Rule syntax, but can't see it mentioned yet apart from in the arguments at the end of the statement.
Back again! It seems my problem is with the | (pipe) character. So I've got round it with 2 Rewriterules, which works well:
RewriteRule ^site([A-Za-z]+)\.xml siteindex/siteindex.php?m=$1&g=$2
RewriteRule ^site([A-Za-z]+)\.xml\.gz siteindex/siteindex.php?m=$1&g=$2
I asked about the problem on Webmaster World, the reply is at http://www.webmasterworld.com/apache/4059757.htm . If you don't want to go through the hassel of registering, here is the reply:
I've added the escape / in front of the .'s as suggested. The answerer seems to want to put the $ at the end of the pattern but I guess your code doesn't require it. What do you think about the suggestions of changing the pattern to (a-z) only and the NC and L parameters?
Also, can I just check that it is only the one url http://<your-site-here>/siteindex.xml.gz I need to tell search engines?
Just about to donate. Thanks for your help and excellent mod!
Sorry, meant \ not /.
Just tested the a-z only pattern and the [NC,L] arguments and it's much faster.
Writing to confirm the error Phil received. (I received the same 500 error.) The pipe is definitely the culprit.
I replaced:
RewriteRule ^site([A-Za-z]+).xml(|.gz) SiteIndex_v1.0/siteindex.php?m=$1&g=$2
With:
RewriteRule ^site([a-z]+)\.(xml¦gz)$ siteindex/siteindex.php?m=$1&g=$2 [NC,L]
And now everything works great. Thanks for posting the solution Phil! @srosenberg, do you think you could update the instructions to reflect this fix
Does this work with Elgg 1.7?
Hi, srosenberg
It's really a great script. Dou you have a plan to let your script supporting the thewire plugin and the izap_videos plugin?
Has anyone gotten this to work in 1.7? If so, how did you do it?
I installed this as per the instructions and have gone through several interations of .htaccess mods. When I test it I get a blank page (even calling it with ?m=profiles. I checked my apache error log and noticed an error saying m was undefined. Without the ?m=profiles there isn't anyting showing up in the error log.
Mmmm... I have the same issue as @MKos. This is such a nice tool, it would be great if someone could check the script against the needs of 1.7 onwards.
I see it s old and complicated to install, any alternative for that?
nobody needs to tell google about his site?
I know that this plugin is inactive now. I am using a third-party siteindex generator. But I feel that an ELGG specific siteindex generator will be good which respects the user's privacy.
Does this work on ELGG 1.8? What other solution are people using to generate sitemaps for their elgg sites?
Please some one update this highly demanded plugin to work with elgg 1.8
new version possible ?