HTML body as page content
First of all from the Drupal page only the HTML body is used as content for the Grails page. In addition to that some meta tags are extracted:
- author
- keywords
- description
- robots
- title
link tags
Search the whole Drupal page for "link" tags. If the found tag has the attribute "rel=stylesheet" and it is not an absolute URL (don't has a scheme component) then put the Drupal URL in front.
script tags
If the attribute "src" doesn't start with "//" then put the Drupal URL in front.
a tags
Does the content type of the URL contain a "length" parameter then don't modify it.
- If the attribute "href" starts with the Drupal URL then remove the Drupal URL.
- If the attribute "href" now starts with "/" + 2 letter language code then remove those 3 characters.
- If the attribute "href" now is not an absolute URL (don't has a scheme component):
If the href starts with
/item//entity/
/organization/
/about-us/
/searchresults
/advancedsearch
/login
/user/
/journal/daily/
/journal/persons/
then do nothing- otherwise put the Grails context path + "/content" in front.
- If the attribute "href" starts with "?" then takes the browserUrl + href
- If the attribute "href" starts with "+", don't do anything.
- If there is no "href" don't add it.
- if the link is a PDF and is not an absolute URL, then the link will be rewritten as a Drupal absolute link.
img tags
If the attribute "src" is not an absolute URL (don't have a scheme component) then put the Drupal URL in front.
source tags
If the attribute "src" is not an absolute URL (don't have a scheme component) then put the Drupal URL in front.
Form tags
The action attribute will be handled as a href in the a tags.
Only the specific form selected by id it will be specially handled it. TODO: find a better way to select them if exists.