« Back

My New Obsession: SEO and Liferay

Company Blogs February 3, 2010 By David Truong Staff

So BChan should have never given me a server.  Since making my wedding site, I've been exploring different things about websites outside the scope of portlet development.

This includes:

  • Setting up a full web server (not just running startup.bat)
  • Business (if you could call what I'm doing business) Requirements
  • Advertising (adsense, etc.)
  • SEO

The last one has become somewhat of an obsession for me.  It seems silly to want to have Google/Yahoo/Bing list your website because after all who is going to want to search for my wedding website, but there is something truly satisfying knowing that you are ranked in the single digitals for a search query.  I am current ranked 3 for "david and winnie" because there's some dude named David Winnie... arg!

Well anyway's I've come to let you guys know my discoveries (that is if you care at all about SEO) so you don't make the same mistakes I did.

1.  If you can live with url sessions you should immediately disable it.

This is how: session.enable.url.with.session.id=false in portal-ext.properties

Most of us don't need it since most of us require cookies for our sites || (don't require cookies && url sessions) I would actually recommend everyone do this.  Here's an article that explains further why they suck and also shows you an alternative way to remove them if you aren't lucky enough to use liferay.

Why?  Because googlebot and all the other crawlers don't support cookies so the url's to your site all end up with ;jsessionid=12356yourseoisscrewed5950 and you end up with duplicate urls because ;jsessionid=12356yourseoisscrewed5950 != ;jsessionid=6079messedupseo459056.

2. If you need them, you need to set up mod_rewrites for them immediately

I can't give a full answer for this one since I've never done it.  But I did read alot of stuff so here are some suggestions...

Make a RewriteCond for Googlebot, MSNBot and all the other ones you can think of.

Then a RewriteRule that removes the jsessionid...

I'm not sure the what the answer is really... if you know leave a comment.

3.  If you've messed up and trying to recover from your mistake (same boat as me)

First read this article.  Then do what they say =).  It took me awhile to figure it out since I didn't know apache that well because I was trying to do everything in http.conf.  I had to do them in my virtual hosts conf file before it worked because I have my server running another site (maybe I can explain another day what I learned from that site).

You are basically writing a mod_rewrite that tells the crawlers that all urls with jsessionid's appended to them have moved and they will take them off the list of urls for your site.  This is good because you won't have duplicate urls which makes them think you have duplicate content.

So there you go...

If someone can tell me how to use robots.txt for virtual hosts.. or even setting it for just one server since all my sites will use the same robot.txt   And I did read the wiki... just need more explanations =)

Threaded Replies Author Date
This is the rewrite rule I use for Google News... Mike Robins February 4, 2010 12:59 PM
Awesome Mike. Thanks! David Truong February 5, 2010 8:00 AM

This is the rewrite rule I use for Google News bot that seems to work:

RewriteCond %{HTTP_USER_AGENT} (.*)Googlebot-News(.*)
RewriteRule ^/(.*);!(.*)$ /$1
Posted on 2/4/10 12:59 PM.
Awesome Mike. Thanks!
Posted on 2/5/10 8:00 AM in reply to Mike Robins.