Server Load Balancing : Using content rewriting rules : Example: Redirecting HTTP to HTTPS
 
Example: Redirecting HTTP to HTTPS
You can use the content rewriting feature to send redirects. One common case to use redirects is when the requested resource requires a secure connection, and the links from your web pages are coded correctly, but the client types an HTTP URL instead of an HTTPS URL in the location bar of the web browser.
For HTTP redirect rules, you can specify the rewritten location as a literal string or regular expression.
Figure 26 shows a redirect rule that matches a literal string and rewrites a literal string. In the match condition table, the rule is set to match traffic that has the Host header domain example.com and the relative URL /resource/index.html in the HTTP request URL. The redirect action sends a secure URL in the Location header: https://example.com/resource/index.html.
Figure 26: Redirecting HTTP to HTTPS (literal string)
 
Regular expressions are a powerful way of denoting all possible forms of a string. They are very useful when trying to match text that comes in many variations but follows a definite pattern, such as dynamic URLs or web page content.
Figure 27 shows a redirect rule that uses PCRE capture and back reference syntax to create a more general rule than the previous example. This rule sends a redirect for all connections to the same URL but over HTTP. In the match condition table, the first regular expression is (.*). This expression matches any HTTP Host header and stores it as capture 0. The second regular expression is ^/(.*)$. This expression matches the path in the Request URL (the content after the /) and stores it as capture 1. The regular expression for the redirect action uses the back reference syntax https://$0$1.
Figure 27: Redirecting HTTP to HTTPS (regular expression)
Table 10 describes commonly used PCRE syntax elements. Table 11 gives examples of useful and relevant expressions that were originally submitted to the FortiGate Cookbook. For a deeper dive, consult a PCRE reference.
 
 
Regular expressions can involve very computationally intensive evaluations. For best performance, you should only use regular expressions where necessary, and build them with care.
 
Table 10: Common PCRE syntax elements
Pattern
Usage
Example
()
Creates a capture group or sub-pattern for back-reference or to denote order of operations.
Text: /url/app/app/mapp
Regular expression: (/app)*
Matches: /app/app
Text: /url?paramA=valueA&paramB=valueB
Regular expression: (param)A=(value)A&\0B\1B
Matches: paramA=valueA&paramB=valueB
$0, $1, $2, ...
Only $0, $1,..., $9 are supported.
A back-reference is a regular expression token such as $0 or $1 that refers to whatever part of the text was matched by the capture group in that position within the regular expression.
Back-references are used whenever you want the output/interpretation to resemble the original match: they insert a substring of the original matching text. Like other regular expression features, back-references help to ensure that you do not have to maintain a large, cumbersome list of all possible URLs.
To invoke a substring, use $n (0 <= n <= 9), where n is the order of appearance of capture group in the regular expression, from left to right, from outside to inside, then from top to bottom.
Let’s say the regular expressions in a condition table have the following capture groups:
(a)(b)(c(d))(e)
This syntax results in back-reference variables with the following values:
$0 — a
$1 — b
$2 — cd
$3 — d
$4 — e
 
\
Escape character.
Except, if it is followed by an alphanumeric character, the alphanumeric character is not matched literally as usual. Instead, it is interpreted as a regular expression token. For example, \w matches a word, as defined by the locale.
Except, if it is followed by regular expression special character:
*.|^$?+\(){}[]\
When this is the case, the \ escapes interpretation as a regular expression token, and instead treats the character as a normal letter.
For example, \\ matches the \ character.
Text: /url?parameter=value
Regular expression: \?param
Matches: ?param
.
Matches any single character except \r or \n.
Note: If the character is written by combining two Unicode code points, such as à where the core letter is encoded separately from the accent mark, this will not match the entire character: it will only match one of the code points.
Text: My cat catches things.
Regular expression: c.t
Matches: cat cat
+
Repeatedly matches the previous character or capture group, 1 or more times, as many times as possible (also called “greedy” matching) unless followed by a question mark ( ? ), which makes it optional.
Does not match if there is not at least 1 instance.
Text: www.example.com
Regular expression: w+
Matches: www
Would also match “w”, “ww”, “wwww”, or any number of uninterrupted repetitions of the character “w”.
*
Repeatedly matches the previous character or capture group, 0 or more times. Depending on its combination with other special characters, this token could be either:
* — Match as many times as possible (also called “greedy” matching).
*? — Match as few times as possible (also called “lazy” matching).
Text: www.example.com
Regular expression: .*
Matches: www.example.com
All of any text, except line endings (\r and \n).
Text: www.example.com
Regular expression: (w)*?
Matches: www
Would also match common typos where the “w” was repeated too few or too many times, such as “ww” in w.example.com or “wwww” in wwww.example.com. It would still match, however, if no amount of “w” existed.
?
Makes the preceding character or capture group optional (also called “lazy” matching).
This character has a different significance when followed by =.
Text: www.example.com
Regular expression: (www\.)?example.com
Matches: www.example.com
Would also match example.com.
?=
Looks ahead to see if the next character or capture group matches and evaluate the match based upon them, but does not include those next characters in the returned match string (if any).
This can be useful for back-references where you do not want to include permutations of the final few characters, such as matching “cat” when it is part of “cats” but not when it is part of “catch”.
Text: /url?parameter=valuepack
Regular expression: p(?=arameter)
Matches: p, but only in “parameter, not in “pack”, which does not end with “arameter”.
^
Matches either:
the position of the beginning of a line (or, in multiline mode, the first line), not the first character itself
the inverse of a character, but only if ^ is the first character in a character class, such as [^A]
This is useful if you want to match a word, but only when it occurs at the start of the line, or when you want to match anything that is not a specific character.
Text: /url?parameter=value
Regular expression: ^/url
Matches: /url, but only if it is at the beginning of the path string. It will not match “/url” in subdirectories.
Text: /url?parameter=value
Regular expression: [^u]
Matches: /rl?parameter=value
$
Matches the position of the end of a line (or, in multiline mode, the entire string), not the last character itself.
 
[]
Defines a set of characters or capture groups that are acceptable matches.
To define a set via a whole range instead of listing every possible match, separate the first and last character in the range with a hyphen.
Note: Character ranges are matched according to their numerical code point in the encoding. For example, [@-B] matches any UTF-8 code points from 40 to 42 inclusive: @AB
Text: /url?parameter=value1
Regular expression: [012]
Matches: 1
Would also match 0 or 2.
Text: /url?parameter=valueB
Regular expression: [A-C]
Matches: B
Would also match “A” or “C”. It would not match “b”.
{}
Quantifies the number of times the previous character or capture group may be repeated continuously.
To define a varying number repetitions, delimit it with a comma.
Text: 1234567890
Regular expression: \d{3}
Matches: 123
Text: www.example.com
Regular expression: w{1,4}
Matches: www
If the string were a typo such as “ww ” or “wwww”, it would also match that.
(?i)
Turns on case-insensitive matching for subsequent evaluation, until it is turned off or the evaluation completes.
Text: /url?Parameter=value
Regular expression: (?i)param
Matches: Param
Would also match pArAM etc.
|
Matches either the character/capture group before or after the pipe ( | ).
Text: Host: www.example.com
Regular expression: (\r\n)|\n|\r
Matches: The line ending, regardless of platform.
 
Table 11: PCRE examples submitted to the FortiGate Cookbook
Regular Expression
Usage
[a-zA-Z0-9]
Any alphanumeric character. ASCII only; e.g. does not match é or É.
[#\?](.*)
All parameters that follow a question mark or hash mark in the URL.
e.g. #pageView or ?param1=valueA&param2=valueB...;
In this expression, the capture group does not include the question mark or hash mark itself.
\b10\.1\.1\.1\b
A specific IPv4 address.
\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\b
Any IPv4 address.
(?i)\b.*\.(a(c|d|e(ro)?|f|g|i|m|n|o|q|r|s(ia)?|t|y|w|x|z)|b(a|b|d|e|f|g|h|i(z)?|j|m|n|o|r|s|t|v|w|y|z)|c(a(t)?|c|d|f|g|h|i|k|l|m|n|o((m)?(op)?)|r|s|u|v|x|y|z)|d(e|j|k|m|o|z)|e(c|du|e|g|h|r|s|t|u)|f(i|j|k|m|o|r)|g(a|b|d|e|f|g|h|i|l|m|n|ov|p|q|r|s|t|u|w|y)|h(k|m|n|r|t|u)|i(d|e|l|m|n(fo)?(t)?|o|q|r|s|t)|j(e|m|o(bs)?|p)|k(e|g|h|i|m|n|p|r|w|y|z)|l(a|b|c|i|k|r|s|t|u|vy)|m(a|c|d|e|g|h|il|k|l|m|n|o(bi)?|p|q|r|s|t|u(seum)?|v|w|x|y|z)|n(a(me)?|c|e(t)?|f|g|i|l|o|p|r|u|z)|o(m|rg)|p(a|e|f|g|h|k|l|m|n|r(o)?|s|t|w|y)|qa|r(e|o|s|u|w)|s(a|b|c|d|e|g|h|i|j|k|l|m|n|o|r|s|t|u|v|y|z)|t(c|d|el|f|g|h|j|k|l|m|n|o|p|r(avel)?|t|v|w|z)|u(a|g|k|s|y|z)|v(a|c|e|g|i|n|u)|w(f|s)|xxx|y(e|t|u)|z(a|m|w))\b
Any domain name.
(?i)\bwww\.example\.com\b
A specific domain name.
(?i)\b(.*)\.example\.com\b
Any sub-domain name of example.com.