How to adapt auto-learning to dynamic URLs & unusual parameters

When web applications have dynamic URLs or unusual parameter styles, you must adapt auto-learning to recognize them.

By default, auto-learning assumes that your web applications use the most common URL structure:

Some web applications, however, embed parameters within the path structure of the URL, or use unusual or non-uniform parameter separator characters. If you do not configure URL replacers for such applications, it can cause your FortiWeb appliance to gather auto-learning data incorrectly. This can cause the following symptoms:

For example, with Microsoft Outlook Web App (OWA), the user’s login name could be embedded within the path structure of the URL, such as:

/owa/tom/index.html

/owa/mary/index.html

instead of suffixed as a parameter, such as:

/owa/index.html?username=tom

/owa/index.html?username=mary

Auto-learning would continue to create new URLs as new users are added to OWA. Auto-learning would also expend extra resources learning about URLs and parameters that are actually the same. Additionally, auto-learning may not be able to fully learn the application structure, as each user may not request the same URLs.

To solve this, you would create a URL replacer that recognizes the user name within the OWA URL as if it were a standard, suffixed parameter value so that auto-learning can function properly.

See also

Configuring URL interpreters

When using auto-learning, you must define how to interpret dynamic URLs and URLs that include parameters in non-standard ways, such as with different parameter separators (; or #, for example) or by embedding the parameter within the URL’s path structure.

In the web UI, these interpreter plug-ins are called “URL replacers.”

URL replacers match the URL as it appears in the HTTP header of the client’s request (using the regular expression in URL Path) and interpret it into this standard URL formulation:

New URL?New Param=Param Change

For example, if the URL is:

/application/value

and the URL replacer settings are:

Setting name Value
Type Custom-Defined
URL Path (/application)/([^/]+)
New URL $0
Param Change $1
New Param setting

$0 holds this part of the matched URL:

/application

and $1 holds this part of the matched URL:

value

so then the URL will be understood by auto-learning, and displayed in the report, as:

/application?setting=value

Need a refresher on regular expressions? See Regular expression syntax, What are back-references?, and Cookbook regular expressions. You can also use the examples in this section, such as Example: URL interpreter for WordPress.
To create a URL interpreter

1.  Go to Auto Learn > Application Templates > URL Replacer.

To access this part of the web UI, your administrator’s account access profile must have Read and Write permission to items in the Autolearn Configuration category. For details, see Permissions.

2.  Click Create New.

3.  Configure these settings:

Name Type a unique name that can be referenced by other parts of the configuration. Do not use spaces or special characters. The maximum length is 35 characters.
Type

Select either:

4.  If you selected Predefined in Type, also configure this setting:

Application Type

Select one of the predefined URL interpreter plug-ins for well-known web applications:

  • JSP — Use the URL replacer designed for Java server pages (JSP) web applications, where parameters are often separated by semi-colons ( ; ).
  • OWA — User the URL replacer designed for default URLs in Microsoft Outlook Web App (OWA), where user name and directory parameters are often embedded within the URL:

    (^/public/)(.*)

    (^/exchange/)([^/]+)/*(([^/]+)/(.*))*

5.  If you selected Custom-Defined in Type, configure these settings:

URL Path

Type a regular expression, such as (^/[^/]+)/(.*), matching all and only the URLs to which the URL replacer should apply. The maximum length is 255 characters.

The pattern does not require a slash ( / ). However, it must at least match URLs that begin with a slash as they appear in the HTTP header, such as /index.html. Do not include the domain name, such as www.example.com.

For examples, see Example: URL interpreter for WordPress.

To test the regular expression against sample text, click the >> (test) icon. This opens the Regular Expression Validator window where you can fine-tune the expression (see Regular expression syntax, What are back-references? and Cookbook regular expressions)

Note: If this URL replacer will be used sequentially in its set of URL replacers, instead of being mutually exclusive, this regular expression should match the URL produced by the previous interpreter, not the original URL from the request.

New URL

Type either a literal URL, such as /index.html, or a regular expression with a back-reference (such as $1) defining how the URL will be interpreted. The maximum length is 255 characters.

Note: Back-references can only refer to capture groups (parts of the expression surrounded with parentheses) within the same URL replacer. Back-references cannot refer to capture groups in other URL replacers.

Param Change Type either the parameter’s literal value, such as user1, or a back-reference (such as $0) defining how the value will be interpreted.
New Param

Type either the parameter’s literal name, such as username, or a back-reference (such as $2) defining how the parameter’s name will be interpreted in the auto-learning report. The maximum length is 255 characters.

Note: Back-references can only refer to capture groups (parts of the expression surrounded with parentheses) within the same URL replacer. Back-references cannot refer to capture groups in other URL replacers.

6.  Click OK.

7.  Group the URL replacers in an application policy (see Grouping URL interpreters).

8.  Select the application policy in one or more auto-learning profiles (see Configuring an auto-learning profile).

9.  Select the auto-learning profiles in server policies (see Configuring a server policy).

See also
Example: URL interpreter for a JSP application

The HTTP request URL from a client is:

/app/login.jsp;jsessionid=xxx;p1=111;p2=123?p3=5555&p4=66aaaaa

which uses semi-colons as parameter separators ( ; ) in the URL, a behavior typical to JSP applications. You would create a URL replacer to recognize the JSP application’s parameters: the semi-colons.

Example: URL replacer for JSP applications
Setting name Value
Type Predefined
Application Type JSP

The predefined JSP interpreter plug-in will interpret the URL as:

/app/login.jsp?p4=66aaaaa&p1=111&p2=123&p3=5555

See also
Example: URL interpreter for Microsoft Outlook Web App 2007

When a client sends requests to Microsoft Outlook Web App (OWA), many of its URLs use structures like this:

/exchange/tom/index.html

/exchange/jane.doe/memo.EML

/exchange/qinlu/2012/1.html

These have user name parameters embedded in the URL. In order for auto-learning to recognize the parameters, you must either:

A custom URL replacer for those URLs could look like this:

Example: URL replacer for Microsoft Outlook Web App — User name structure #1

URL interpreter
Setting name Value
Name OWAusername1
Type Custom-Defined
URL Path (/exchange/)([^/]+)/(.*)
New URL $0$2
Param Change $1
New Param username1

Then the URLs would be recognized by auto-learning as if OWA used a more conventional parameter structure like this:

/exchange/index.html?username1=tom

/exchange/memo.EML?username1=jane.doe

/exchange/2012/1.html?username1=qinlu

Notably, OWA can also include other parameters in the URL, such as a mail folder’s name. Also, OWA can include the user name and folder in more than one way. Therefore multiple URL interpreters are required to match all possible URL structures. In addition to the first URL replacer, you would also configure the following URL replacers and group them into a single set (an auto-learning “application policy”) in order to recognize all possible URLs.

Example: URL replacer for Microsoft Outlook Web App — Folder name structure #1

Sample URL /exchange/archive-folders/2011
URL interpreter
Setting name Value
Name OWAfoldername1
Type Custom-Defined
URL Path (/exchange/)([^/]+/)(.*)
New URL $0
Param Change $1$2
New Param folder1
Results /exchange/?folder1=archive-folders/2011
Example: URL replacer for Microsoft Outlook Web App — User name structure #2

Sample URL /exchange/jane.doe
URL interpreter
Setting name Value
Name OWAusername2
Type Custom-Defined
URL Path (/exchange/)([^/]+\.[^/]+)
New URL $0
Param Change $1
New Param username2
Results /exchange/?username2=jane.doe
Example: URL replacer Microsoft Outlook Web App — Folder name structure #2

Sample URL /public/imap-share-folders/memos
URL interpreter
Setting name Value
Name OWAfoldername2
Type Custom-Defined
URL Path (/public/)([^/]+/)(.*)
New URL $0
Param Change $1$2
New Param folder2
Results /public/?folder2=imap-share-folders/memos
See also
Example: URL interpreter for WordPress

If the HTTP request URL from a client is a slash-delimited chain of multiple parameters, like either of these:

/wordpress/2012/06/05

/index/province/ontario/city/ottawa/street/moodie

then the format is either of these:

/wordpress/value1/value2/value3

/index/param1/value1/param2/value2/param3/value3

In this URL format, there are 3 parameter values (with or without their names) in the URL:

Because each interpreter can only extract a single parameter, you would create 3 URL interpreters, and group them into a set where they are used sequentially — a chain. Each interpreter would use the interpreted output of the previous one as its input, until all parameters had been extracted, at which point the last interpreter would output both the last parameter and the final interpreted URL. FortiWeb would then append parameters back onto the interpreted URL in the standard structure before storing them in the auto-learning data set.

Analysis of a request URL into its interpretation by a chain of URL interpreters

This configuration requires that for every request:

  • the web application includes parameters in the same sequential order, and
  • all parameters are always present

If parameter order or existence vary, this URL interpreter will not work. Requests will not match the URL interpreter set if either param2 or param3 come first, or if any of the parameters are missing. On the opposite end of the spectrum, if the URL interpreter used regular expression capture groups such as (.*) to match anything in any order, i.e.:

/index/(.*)/(.*)/(.*)/(.*)/(.*)/(.*)/

then the regular expression would be too flexible: auto-learning might mistakenly match and learn some of param3’s possible values for param2, and so on.

Example: URL replacer 1 for slash-separated parameters

Setting name Value
Name slash-parameter3
Type Custom-Defined
URL Path /index/param1/(.*)/param2/(.*)/param3/(.*)/
New URL /index/param1/$0/param2/$1/
Param Change $2
New Param param3
Example: URL replacer 2 for slash-separated parameters

Setting name Value
Name slash-parameter2
Type Custom-Defined
URL Path /index/param1/(.*)/param2/(.*)/
New URL /index/param1/$0/
Param Change $1
New Param param2
Example: URL replacer 3 for slash-separated parameters

Setting name Value
Name slash-parameter1
Type Custom-Defined
URL Path /index/param1/(.*)/
New URL /index
Param Change $0
New Param param1

Until you add the URL interpreters to a group, FortiWeb doesn’t know the sequential order.

These URL interpreters will not function correctly if they are not used in that order, because each interpreter’s input is the output from the previous one. So you must set the priorities correctly when referencing each of those interpreters in the set of URL interpreters (Grouping URL interpreters).

Example: URL replacer group for slash-separated parameters — entry 1
Setting name Value
Priority 0
Type URL REPLACER
Plugin Name slash-parameter3
Example: URL replacer group for slash-separated parameters — entry 2
Setting name Value
Priority 1
Type URL REPLACER
Plugin Name slash-parameter2
Example: URL replacer group for slash-separated parameters — entry 3
Setting name Value
Priority 2
Type URL REPLACER
Plugin Name slash-parameter1

Then the URL will be interpreted by auto-learning as if the application used a more conventional and easily understood URL/parameter structure:

/index?param1=value1&param2=value2&param3=value3

See also

Grouping URL interpreters

In order to use URL interpreters with an auto-learning profile, you must group URL replacers into sets.

Sets can be:

To create a custom application policy

1.  Before you create an application policy, first create the URL replacers that it will include (see Configuring URL interpreters).

2.  Go to Auto Learn > Application Templates > Application Policy.

To access this part of the web UI, your administrator’s account access profile must have Read and Write permission to items in the Autolearn Configuration category. For details, see Permissions.

3.  Click Create New.

A dialog appears.

4.  In Name, type a name that can be referenced by other parts of the configuration. Do not use spaces or special characters. The maximum length is 35 characters.

5.  Click OK.

6.  Click Create New.

A dialog appears.


7.  From Plugin Name, select an existing URL replacer from the drop-down list.

Rule order affects URL replacer matching and behavior. FortiWeb appliances evaluate URLs for a matching URL replacer starting with the smallest ID number (greatest priority) rule in the list, and continue towards the largest number in the list.

  • If no rule matches, parameters in the URL will not be interpreted.
  • If multiple rules match, the output (New URL) from earlier URL replacers will be used as the input (URL Path) for the next URL replacer, resulting in a chain of multiple interpreted parameters.

8.  Click OK.

9.  Repeat the previous steps for each URL replacer you want added to the policy.

10.  Select the application policy in an auto-learning profile (see Configuring an auto-learning profile).

11.  Select the auto-learning profiles in server policies (see Configuring a server policy).

See also