How to set up your FortiWeb : Auto-learning : How to adapt auto-learning to dynamic URLs & unusual parameters : Configuring URL interpreters
 
Configuring URL interpreters
When using auto-learning, you must define how to interpret dynamic URLs and URLs that include parameters in non-standard ways, such as with different parameter separators (; or #, for example) or by embedding the parameter within the URL’s path structure.
In the web UI, these interpreter plug-ins are called “URL replacers.”
URL replacers match the URL as it appears in the HTTP header of the client’s request (using the regular expression in URL Path) and interpret it into this standard URL formulation:
New URL?New Param=Param Change
For example, if the URL is:
/application/value
and the URL replacer settings are:
Table 12:  
Setting name
Value
Custom-Defined
(/application)/([^/]+)
$0
$1
setting
$0 holds this part of the matched URL:
/application
and $1 holds this part of the matched URL:
value
so then the URL will be understood by auto-learning, and displayed in the report, as:
/application?setting=value
 
Need a refresher on regular expressions? See “Regular expression syntax”, “What are back-references?”, and “Cookbook regular expressions”. You can also use the examples in this section, such as “Example: URL interpreter for WordPress”.
To create a URL interpreter
1. Go to Auto Learn > Application Templates > URL Replacer.
To access this part of the web UI, your administrator’s account access profile must have Read and Write permission to items in the Autolearn Configuration category. For details, see “Permissions”.
2. Click Create New.
3. Configure these settings:
Setting name
Description
Name
Type a unique name that can be referenced by other parts of the configuration. Do not use spaces or special characters. The maximum length is 35 characters.
Type
Select either:
Predefined — Use one of the predefined URL replacers which you select in Application Type.
Custom-Defined — Define your own URL replacer by configuring URL Path, New URL, Param Change, and New Param.
4. If you selected Predefined in Type, also configure this setting:
Setting name
Description
Application Type
Select one of the predefined URL interpreter plug-ins for well-known web applications:
JSP — Use the URL replacer designed for Java server pages (JSP) web applications, where parameters are often separated by semi-colons ( ; ).
OWA — User the URL replacer designed for default URLs in Microsoft Outlook Web App (OWA), where user name and directory parameters are often embedded within the URL:
(^/exchange/)([^/]+)/*(([^/]+)/(.*))*
(^/public/)(.*)
5. If you selected Custom-Defined in Type, configure these settings:
Setting name
Description
URL Path
Type a regular expression, such as (^/[^/]+)/(.*), matching all and only the URLs to which the URL replacer should apply. The maximum length is 255 characters.
The pattern does not require a slash ( / ). However, it must at least match URLs that begin with a slash as they appear in the HTTP header, such as /index.html. Do not include the domain name, such as www.example.com.
To test the regular expression against sample text, click the >> (test) icon. This opens the Regular Expression Validator window where you can fine-tune the expression (see “Regular expression syntax”, “What are back-references?” and “Cookbook regular expressions”)
Note: If this URL replacer will be used sequentially in its set of URL replacers, instead of being mutually exclusive, this regular expression should match the URL produced by the previous interpreter, not the original URL from the request.
New URL
Type either a literal URL, such as /index.html, or a regular expression with a back-reference (such as $1) defining how the URL will be interpreted. The maximum length is 255 characters.
Note: Back-references can only refer to capture groups (parts of the expression surrounded with parentheses) within the same URL replacer. Back-references cannot refer to capture groups in other URL replacers.
Param Change
Type either the parameter’s literal value, such as user1, or a back-reference (such as $0) defining how the value will be interpreted.
New Param
Type either the parameter’s literal name, such as username, or a back-reference (such as $2) defining how the parameter’s name will be interpreted in the auto-learning report. The maximum length is 255 characters.
Note: Back-references can only refer to capture groups (parts of the expression surrounded with parentheses) within the same URL replacer. Back-references cannot refer to capture groups in other URL replacers.
6. Click OK.
7. Group the URL replacers in an application policy (see “Grouping URL interpreters”).
8. Select the application policy in one or more auto-learning profiles (see “Configuring an auto-learning profile”).
9. Select the auto-learning profiles in server policies (see “Configuring a server policy”).
See also
Regular expression syntax
Example: URL interpreter for a JSP application
Example: URL interpreter for Microsoft Outlook Web App 2007
Example: URL interpreter for WordPress
Example: URL interpreter for a JSP application
The HTTP request URL from a client is:
/app/login.jsp;jsessionid=xxx;p1=111;p2=123?p3=5555&p4=66aaaaa
which uses semi-colons as parameter separators ( ; ) in the URL, a behavior typical to JSP applications. You would create a URL replacer to recognize the JSP application’s parameters: the semi-colons.
Table 13: Example: URL replacer for JSP applications
Setting name
Value
Predefined
JSP
The predefined JSP interpreter plug-in will interpret the URL as:
/app/login.jsp?p4=66aaaaa&p1=111&p2=123&p3=5555
See also
Regular expression syntax
Example: URL interpreter for Microsoft Outlook Web App 2007
Example: URL interpreter for WordPress
Example: URL interpreter for Microsoft Outlook Web App 2007
When a client sends requests to Microsoft Outlook Web App (OWA), many of its URLs use structures like this:
/exchange/tom/index.html
/exchange/jane.doe/memo.EML
/exchange/qinlu/2012/1.html
These have user name parameters embedded in the URL. In order for auto-learning to recognize the parameters, you must either:
Set Type to Predefined and Application Type to OWA. This predefined auto-learning URL interpreter will match and recognize parameters in all default URLs.
Create your own custom URL interpreters.
A custom URL replacer for those URLs could look like this:
Table 14: Example: URL replacer for Microsoft Outlook Web App — User name structure #1
Table 15:  
URL interpreter
Setting name
Value
Name
OWAusername1
Custom-Defined
(/exchange/)([^/]+)/(.*)
$0$2
$1
username1
Then the URLs would be recognized by auto-learning as if OWA used a more conventional parameter structure like this:
/exchange/index.html?username1=tom
/exchange/memo.EML?username1=jane.doe
/exchange/2012/1.html?username1=qinlu
Notably, OWA can also include other parameters in the URL, such as a mail folder’s name. Also, OWA can include the user name and folder in more than one way. Therefore multiple URL interpreters are required to match all possible URL structures. In addition to the first URL replacer, you would also configure the following URL replacers and group them into a single set (an auto-learning “application policy”) in order to recognize all possible URLs.
Table 16: Example: URL replacer for Microsoft Outlook Web App — Folder name structure #1
Table 17:  
Sample URL
/exchange/archive-folders/2011
URL interpreter
Setting name
Value
Name
OWAfoldername1
Custom-Defined
(/exchange/)([^/]+/)(.*)
$0
$1$2
folder1
Results
/exchange/?folder1=archive-folders/2011
Table 18: Example: URL replacer for Microsoft Outlook Web App — User name structure #2
Table 19:  
Sample URL
/exchange/jane.doe
URL interpreter
Setting name
Value
Name
OWAusername2
Custom-Defined
(/exchange/)([^/]+\.[^/]+)
$0
$1
username2
Results
/exchange/?username2=jane.doe
Table 20: Example: URL replacer Microsoft Outlook Web App — Folder name structure #2
Table 21:  
Sample URL
/public/imap-share-folders/memos
URL interpreter
Setting name
Value
Name
OWAfoldername2
Custom-Defined
(/public/)([^/]+/)(.*)
$0
$1$2
folder2
Results
/public/?folder2=imap-share-folders/memos
See also
Regular expression syntax
Example: URL interpreter for a JSP application
Example: URL interpreter for WordPress
Example: URL interpreter for WordPress
If the HTTP request URL from a client is a slash-delimited chain of multiple parameters, like either of these:
/wordpress/2012/06/05
/index/province/ontario/city/ottawa/street/moodie
then the format is either of these:
/wordpress/value1/value2/value3
/index/param1/value1/param2/value2/param3/value3
In this URL format, there are 3 parameter values (with or without their names) in the URL:
param1
param2
param3
Because each interpreter can only extract a single parameter, you would create 3 URL interpreters, and group them into a set where they are used sequentially — a chain. Each interpreter would use the interpreted output of the previous one as its input, until all parameters had been extracted, at which point the last interpreter would output both the last parameter and the final interpreted URL. FortiWeb would then append parameters back onto the interpreted URL in the standard structure before storing them in the auto-learning data set.
Figure 26: Analysis of a request URL into its interpretation by a chain of URL interpreters
 
This configuration requires that for every request:
the web application includes parameters in the same sequential order, and
all parameters are always present

If parameter order or existence vary, this URL interpreter will not work
. Requests will not match the URL interpreter set if either param2 or param3 come first, or if any of the parameters are missing. On the opposite end of the spectrum, if the URL interpreter used regular expression capture groups such as (.*) to match anything in any order, i.e.:

/index/(.*)/(.*)/(.*)/(.*)/(.*)/(.*)/

then the regular expression would be too flexible: auto-learning might mistakenly match and learn some of param3’s possible values for param2, and so on.
Table 22: Example: URL replacer 1 for slash-separated parameters
Table 23:  
Setting name
Value
Name
slash-parameter3
Custom-Defined
/index/param1/(.*)/param2/(.*)/param3/(.*)/
/index/param1/$0/param2/$1/
$2
param3
Table 24: Example: URL replacer 2 for slash-separated parameters
Table 25:  
Setting name
Value
Name
slash-parameter2
Custom-Defined
/index/param1/(.*)/param2/(.*)/
/index/param1/$0/
$1
param2
Table 26: Example: URL replacer 3 for slash-separated parameters
Table 27:  
Setting name
Value
Name
slash-parameter1
Custom-Defined
/index/param1/(.*)/
/index
$0
param1
Until you add the URL interpreters to a group, FortiWeb doesn’t know the sequential order.
 
These URL interpreters will not function correctly if they are not used in that order, because each interpreter’s input is the output from the previous one. So you must set the priorities correctly when referencing each of those interpreters in the set of URL interpreters (“Grouping URL interpreters”).
Table 28: Example: URL replacer group for slash-separated parameters — entry 1
Setting name
Value
Priority
0
Type
URL REPLACER
Plugin Name
slash-parameter3
Table 29: Example: URL replacer group for slash-separated parameters — entry 2
Setting name
Value
Priority
1
Type
URL REPLACER
Plugin Name
slash-parameter2
Table 30: Example: URL replacer group for slash-separated parameters — entry 3
Setting name
Value
Priority
2
Type
URL REPLACER
Plugin Name
slash-parameter1
Then the URL will be interpreted by auto-learning as if the application used a more conventional and easily understood URL/parameter structure:
/index?param1=value1&param2=value2&param3=value3
See also
Grouping URL interpreters
Configuring an auto-learning profile
Regular expression syntax
Example: URL interpreter for a JSP application
Example: URL interpreter for Microsoft Outlook Web App 2007