{"id":1059,"date":"2014-08-15T21:39:23","date_gmt":"2014-08-15T16:09:23","guid":{"rendered":"http:\/\/www.allerin.com\/blog\/?p=1059"},"modified":"2016-05-12T14:51:27","modified_gmt":"2016-05-12T09:21:27","slug":"unicode-supported-regular-expressions-validations","status":"publish","type":"post","link":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/","title":{"rendered":"Unicode Supported Regular Expressions &amp; Validations"},"content":{"rendered":"<p>In any application, validations play an important role in protecting application from invalid data to be saved into the database.<br \/>\nThere\u00a0 are several ways of validating data before it is saved into the database.\u00a0 We can apply native database constraints, client-side validations,\u00a0 controller-level validations or model-level validations.<br \/>\nHere are the pros and cons of these alternatives:<br \/>\nDatabase constraints &#8211; Good in cases like uniqueness, difficult to test and maintain<br \/>\nClient-side validations &#8211; Need to be supported by server-side validations too as they can be bypassed<br \/>\nController-level Validations &#8211; Whenever possible Controllers should be skinny. So avoid this.<br \/>\nModel-level\u00a0 validations &#8211; This is the best way to ensure that only valid data is\u00a0 saved into the database. They are database agnostic, cannot be bypassed by\u00a0 end users, and are convenient to test and maintain.<\/p>\n<p>Rails\u00a0 provide built-in helpers for common needs and makes validations easy.\u00a0 It also allows you to create our\u00a0 validation methods.<br \/>\nWhile\u00a0 working on an Internationalization (I18n) supported Ruby on Rails\u00a0 application, I came across a situation when I had to validate the format of\u00a0 data with regular expression.<br \/>\nThe regular expressions that we use support English characters.<\/p>\n<p>For example:<\/p>\n<p>If I want to validate name of a person to contain only characters then I will use following:<\/p>\n<p>&lt;pre&gt;<\/p>\n<p>class Person &lt; ActiveRecord::Base<\/p>\n<p>validates :name, format: { with: \/\\A[a-zA-Z]+\\z\/ }<\/p>\n<p>end<\/p>\n<p>&lt;\/pre&gt;<\/p>\n<p>When we are working with only English data then this validation will work well.<\/p>\n<p>But the problem comes when we want to store data from other languages as it only allows a-z and A-Z.<\/p>\n<p>&nbsp;<\/p>\n<p>Here is the custom solution that we can apply in Ruby on Rails application.<\/p>\n<ol>\n<li>I created a class as following<\/li>\n<\/ol>\n<p>&lt;pre&gt;<\/p>\n<p>class AppRegexp<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&lt;\/pre&gt;<\/p>\n<ol start=\"2\">\n<li>Collected all the regular expressions used in the application in this class.<\/li>\n<\/ol>\n<p>Doing this now all the regular expressions are all together at a place. Good to maintain.<\/p>\n<p>&lt;pre&gt;<\/p>\n<p>class AppRegexp<\/p>\n<p>class &lt;&lt; self<\/p>\n<p>&nbsp;<\/p>\n<p>regexps = {<\/p>\n<p>:name =&gt; Regexp.new(\/\\A[a-zA-Z]+\\z\/)<\/p>\n<p>:email =&gt; Regexp.new(\/^((\\&#8221;[^\\&#8221;\\f\\n\\r\\t\\v\\b]+\\&#8221;)|([\\w\\!\\#\\$\\%\\&amp;\\&#8217;\\*\\+\\-\\~\\\/\\^\\`\\|\\{\\}]+(\\.[\\w\\!\\#\\$\\%\\&amp;\\&#8217;\\*\\+\\-\\~\\\/\\^\\`\\|\\{\\}]+)*))@((\\[(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))\\])|(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))|((([A-Za-z0-9\\-])+\\.)+[A-Za-z\\-]+))$\/),<\/p>\n<p>:password =&gt; Regexp.new(\/^(\\S(?=.*\\d)+|\\d(?=.*\\d)*)\\S*$\/),<\/p>\n<p>:country_name =&gt; Regexp.new(\/^[a-zA-Z]+([\\.\\-\\s]?[a-zA-Z]+)*$\/),<\/p>\n<p>:locale =&gt; Regexp.new(\/^[a-z]{2}([-]{0}|[-]{1}[A-Z]{2})$\/),<\/p>\n<p>:currency_code =&gt; Regexp.new( \/^[A-Z]+*$\/),<\/p>\n<p>:zip =&gt; Regexp.new(\/^[a-zA-Z0-9]+([\\-\\s][a-zA-Z0-9]+)*$\/),<\/p>\n<p>:phone =&gt; Regexp.new(\/^[+#\\(]?([\\s]?[0-9])+[\\)]{0,1}([\\s]?[\\-#xX\\(\\)\\.)]?[0-9]+)*$\/),<\/p>\n<p>:mobile =&gt; Regexp.new(\/^[+#]?([\\s]?[0-9])+([\\s]?[\\-#xX\\(\\)\\.)]?[0-9]+)*$\/)<\/p>\n<p>}<\/p>\n<p>&nbsp;<\/p>\n<p>regexps.each_pair do |attribute, regex|<\/p>\n<p>&nbsp;<\/p>\n<p>define_method attribute do<\/p>\n<p>regex<\/p>\n<p>end<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&lt;\/pre&gt;<\/p>\n<ol start=\"3\">\n<li>Used these regular expressions from AppRegexp as<\/li>\n<\/ol>\n<p>&lt;pre&gt;<\/p>\n<p>class Person &lt; ActiveRecord::Base<\/p>\n<p>validates :name, format: { with: AppRegexp.name }<\/p>\n<p>end<\/p>\n<p>&lt;\/pre&gt;<\/p>\n<ol start=\"4\">\n<li>Now when we are working with only English characters we mostly applied inclusion rule.<\/li>\n<\/ol>\n<p>Means characters should be in A-Z or a-z. Now while working with Unicode we need to apply exclusion rule.<\/p>\n<p>As per set theory, there are 3 sets characters, digits and special characters.<\/p>\n<p>Means when we need character data to be allowed, we need to exclude digits and special characters.<\/p>\n<p>When we need characters and digits we need union of characters and digits sets.<\/p>\n<p>Or in other way we can say that we need to exclude special characters from Universal set.<\/p>\n<ol start=\"5\">\n<li>Looking at all the possible cases for regular expressions, we can broadly categorize data as below:<\/li>\n<li>only_characters_allowed: Means excluding all special characters and digits<\/li>\n<li>characters_and_numbers_allowed: Means excluding all special characters<\/li>\n<li>only_special_characters_not_allowed: Means excluding only special characters<\/li>\n<li>custom: Any other regular expression as per custom requirement<\/li>\n<li>Now to construct such custom regular expressions to support Unicode data, I have defined some constants to be used in regeular expressins.<\/li>\n<\/ol>\n<p>&lt;pre&gt;<\/p>\n<p>class AppRegexp<\/p>\n<p>&nbsp;<\/p>\n<p># Returns string of Special Characters to be used in Regexp<\/p>\n<p>SPECIAL_CHARACTERS = [<\/p>\n<p>&#8216;\\\\~&#8217;, &#8216;\\\\!&#8217;, &#8216;\\\\@&#8217;, &#8216;\\\\#&#8217;, &#8216;\\\\$&#8217;, &#8216;\\\\%&#8217;, &#8216;\\\\^&#8217;, &#8216;\\\\&amp;&#8217;, &#8216;\\\\*&#8217;, &#8216;\\\\(&#8216;, &#8216;\\\\)&#8217;,<\/p>\n<p>&#8216;\\\\_&#8217;, &#8216;\\\\+&#8217;, &#8216;\\\\-&#8216;, &#8216;\\\\=&#8217;, &#8216;\\\\|&#8217;, &#8216;\\\\{&#8216;, &#8216;\\\\}&#8217;, &#8216;\\\\[&#8216;, &#8216;\\\\]&#8217;, &#8216;\\\\:&#8217;, &#8216;\\\\;&#8217;,<\/p>\n<p>&#8216;\\\\&#8221;&#8216;, &#8216;\\\\&lt;&#8216;, &#8216;\\\\&gt;&#8217;, &#8216;\\\\.&#8217;, &#8216;\\\\?&#8217;, &#8216;\\\\\/&#8217;,<\/p>\n<p>&#8220;\\\\\\\\\\s&#8221; # Backslash &amp; space character are purposely kept together in a string<\/p>\n<p>].join<\/p>\n<p>&nbsp;<\/p>\n<p># Returns string of All the Digits of Unicode Character set<\/p>\n<p># Following list is taken from http:\/\/www.fileformat.info\/info\/unicode\/category\/Nd\/list.htm<\/p>\n<p>DIGITS = [<\/p>\n<p>(0x0030..0x0039).to_a,\u00a0\u00a0 # DIGIT ZERO to NINE<\/p>\n<p>(0x0660..0x0669).to_a,\u00a0\u00a0 # ARABIC-INDIC DIGIT ZERO to NINE<\/p>\n<p>(0x06F0..0x06F9).to_a,\u00a0\u00a0 # EXTENDED ARABIC-INDIC DIGIT ZERO to NINE<\/p>\n<p>(0x07C0..0x07C9).to_a,\u00a0\u00a0 # NKO DIGIT ZERO to NINE<\/p>\n<p>(0x0966..0x096F).to_a,\u00a0\u00a0 #\u00a0\u00a0\u00a0\u00a0 DEVANAGARI DIGIT ZERO to NINE<\/p>\n<p>(0x09E6..0x09EF).to_a,\u00a0\u00a0 # BENGALI DIGIT ZERO to NINE<\/p>\n<p>(0x0A66..0x0A6F).to_a,\u00a0\u00a0 # GURMUKHI DIGIT ZERO to NINE<\/p>\n<p>(0x0AE6..0x0AEF).to_a,\u00a0\u00a0 # GUJARATI DIGIT ZERO to NINE<\/p>\n<p>(0x0B66..0x0B6F).to_a,\u00a0\u00a0 # ORIYA DIGIT ZERO to NINE<\/p>\n<p>(0x0BE6..0x0BEF).to_a,\u00a0\u00a0 # TAMIL DIGIT ZERO to NINE<\/p>\n<p>(0x0C66..0x0C6F).to_a,\u00a0\u00a0 # TELUGU DIGIT ZERO to NINE<\/p>\n<p>(0x0CE6..0x0CEF).to_a,\u00a0\u00a0 # KANNADA DIGIT ZERO to NINE<\/p>\n<p>(0x0D66..0x0D6F).to_a,\u00a0\u00a0 # MALAYALAM DIGIT ZERO to NINE<\/p>\n<p>(0x0E50..0x0E59).to_a,\u00a0\u00a0 # THAI DIGIT ZERO to NINE<\/p>\n<p>(0x0ED0..0x0ED9).to_a,\u00a0\u00a0 # LAO DIGIT ZERO to NINE<\/p>\n<p>(0x0F20..0x0F29).to_a,\u00a0\u00a0 # TIBETAN DIGIT ZERO to NINE<\/p>\n<p>(0x1090..0x1099).to_a,\u00a0\u00a0 # MYANMAR SHAN DIGIT ZERO to NINE<\/p>\n<p>(0x17E0..0x17E9).to_a,\u00a0\u00a0 # KHMER DIGIT ZERO to NINE<\/p>\n<p>(0x1810..0x1819).to_a,\u00a0\u00a0 # MONGOLIAN DIGIT ZERO to NINE<\/p>\n<p>(0x1946..0x194F).to_a,\u00a0\u00a0 # LIMBU DIGIT ZERO to NINE<\/p>\n<p>(0x19D0..0x19D9).to_a,\u00a0\u00a0 # NEW TAI LUE DIGIT ZERO to NINE<\/p>\n<p>(0x1A80..0x1A99).to_a,\u00a0\u00a0 # TAI THAM HORA DIGIT ZERO to NINE<\/p>\n<p>(0x1B50..0x1B59).to_a,\u00a0\u00a0 # BALINESE DIGIT ZERO to NINE<\/p>\n<p>(0x1BB0..0x1BB9).to_a,\u00a0\u00a0 # SUNDANESE DIGIT ZERO to NINE<\/p>\n<p>(0x1C40..0x1C49).to_a, \u00a0# LEPCHA DIGIT ZERO to NINE<\/p>\n<p>(0x1C50..0x1C59).to_a,\u00a0\u00a0 # OL CHIKI DIGIT ZERO to NINE<\/p>\n<p>(0xA620..0xA629).to_a,\u00a0\u00a0 # VAI DIGIT ZERO to NINE<\/p>\n<p>(0xA8D0..0xA8D9).to_a,\u00a0\u00a0 # SAURASHTRA DIGIT ZERO to NINE<\/p>\n<p>(0xA900..0xA909).to_a,\u00a0\u00a0 # KAYAH LI DIGIT ZERO to NINE<\/p>\n<p>(0xA9D0..0xA9D9).to_a,\u00a0\u00a0 # JAVANESE DIGIT ZERO to NINE<\/p>\n<p>(0xAA50..0xAA59).to_a,\u00a0\u00a0 # CHAM DIGIT ZERO to NINE<\/p>\n<p>(0xABF0..0xABF9).to_a,\u00a0\u00a0 # MEETEI MAYEK DIGIT ZERO to NINE<\/p>\n<p>(0xFF10..0xFF19).to_a,\u00a0\u00a0 # FULLWIDTH DIGIT ZERO to NINE<\/p>\n<p>(0x104A0..0x104A9).to_a, # OSMANYA DIGIT ZERO to NINE<\/p>\n<p>(0x11066..0x1106F).to_a, # BRAHMI DIGIT ZERO to NINE<\/p>\n<p>(0x110F0..0x110F9).to_a, # SORA SOMPENG DIGIT ZERO to NINE<\/p>\n<p>(0x11136..0x1113F).to_a, # CHAKMA DIGIT ZERO to NINE<\/p>\n<p>(0x111D0..0x111D9).to_a, # SHARADA DIGIT ZERO to NINE<\/p>\n<p>(0x116C0..0x116C9).to_a, # TAKRI DIGIT ZERO to NINE<\/p>\n<p>(0x1D7CE..0x1D7D7).to_a, # MATHEMATICAL BOLD DIGIT ZERO to NINE<\/p>\n<p>(0x1D7D8..0x1D7E1).to_a, # MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO to NINE<\/p>\n<p>(0x1D7E2..0x1D7EB).to_a, # MATHEMATICAL SANS-SERIF DIGIT ZERO to NINE<\/p>\n<p>(0x1D7EC..0x1D7F5).to_a, # MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO to NINE<\/p>\n<p>(0x1D7F6..0x1D7FF).to_a # MATHEMATICAL MONOSPACE DIGIT ZERO to NINE<\/p>\n<p>].flatten.map(&amp;:chr).join<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&lt;\/pre&gt;<\/p>\n<ol start=\"7\">\n<li>Now I defined following general regular expressions which could be used to validate Unicode data.<\/li>\n<\/ol>\n<p>&lt;pre&gt;<\/p>\n<p>class AppRegexp<\/p>\n<p>class &lt;&lt; self<\/p>\n<p>&nbsp;<\/p>\n<p>regexps = {<\/p>\n<p>:only_characters_allowed =&gt; Regexp.new(\/^([^#{SPECIAL_CHARACTERS}#{DIGITS}]+\\s{0,1})+$\/), # Excludes all special characters &amp; digits<\/p>\n<p>:characters_and_numbers_allowed =&gt; Regexp.new(\/^([^#{SPECIAL_CHARACTERS}]+\\s{0,1})+$\/),\u00a0\u00a0 # Excludes all special characters<\/p>\n<p>:only_special_characters_not_allowed =&gt; Regexp.new(\/^([^#{SPECIAL_CHARACTERS}]+(.)*)+$\/), # Only special characters&#8217; string not allowed rest all combinations allowed<\/p>\n<p>}<\/p>\n<p>&nbsp;<\/p>\n<p>regexps.each_pair do |attribute, regex|<\/p>\n<p>&nbsp;<\/p>\n<p>define_method attribute do<\/p>\n<p>regex<\/p>\n<p>end<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&lt;\/pre&gt;<\/p>\n<ol start=\"8\">\n<li>Now our Custom Unicode supported regular expressions are ready and can be used to validate unicode data.<\/li>\n<\/ol>\n<p>For example:<\/p>\n<p>&lt;pre&gt;<\/p>\n<p>class Person &lt; ActiveRecord::Base<\/p>\n<p>validates :name, format: { with: AppRegexp.only_characters_allowed }<\/p>\n<p>end<\/p>\n<p>&lt;\/pre&gt;<\/p>\n<ol start=\"9\">\n<li>The complete code is as following (Instead of pasting complete code here we can add this to gist and provide only reference link here.)<\/li>\n<\/ol>\n<p>&lt;pre&gt;<\/p>\n<p>class AppRegexp<\/p>\n<p>&nbsp;<\/p>\n<p># Returns string of Special Characters to be used in Regexp<\/p>\n<p>SPECIAL_CHARACTERS = [<\/p>\n<p>&#8216;\\\\~&#8217;, &#8216;\\\\!&#8217;, &#8216;\\\\@&#8217;, &#8216;\\\\#&#8217;, &#8216;\\\\$&#8217;, &#8216;\\\\%&#8217;, &#8216;\\\\^&#8217;, &#8216;\\\\&amp;&#8217;, &#8216;\\\\*&#8217;, &#8216;\\\\(&#8216;, &#8216;\\\\)&#8217;,<\/p>\n<p>&#8216;\\\\_&#8217;, &#8216;\\\\+&#8217;, &#8216;\\\\-&#8216;, &#8216;\\\\=&#8217;, &#8216;\\\\|&#8217;, &#8216;\\\\{&#8216;, &#8216;\\\\}&#8217;, &#8216;\\\\[&#8216;, &#8216;\\\\]&#8217;, &#8216;\\\\:&#8217;, &#8216;\\\\;&#8217;,<\/p>\n<p>&#8216;\\\\&#8221;&#8216;, &#8216;\\\\&lt;&#8216;, &#8216;\\\\&gt;&#8217;, &#8216;\\\\.&#8217;, &#8216;\\\\?&#8217;, &#8216;\\\\\/&#8217;,<\/p>\n<p>&#8220;\\\\\\\\\\s&#8221; # Backslash &amp; space character are purposely kept together in a string<\/p>\n<p>].join<\/p>\n<p>&nbsp;<\/p>\n<p># Returns string of All the Digits of Unicode Character set<\/p>\n<p># Following list is taken from http:\/\/www.fileformat.info\/info\/unicode\/category\/Nd\/list.htm<\/p>\n<p>DIGITS = [<\/p>\n<p>(0x0030..0x0039).to_a,\u00a0\u00a0 # DIGIT ZERO to NINE<\/p>\n<p>(0x0660..0x0669).to_a,\u00a0\u00a0 # ARABIC-INDIC DIGIT ZERO to NINE<\/p>\n<p>(0x06F0..0x06F9).to_a,\u00a0\u00a0 # EXTENDED ARABIC-INDIC DIGIT ZERO to NINE<\/p>\n<p>(0x07C0..0x07C9).to_a, # NKO DIGIT ZERO to NINE<\/p>\n<p>(0x0966..0x096F).to_a,\u00a0\u00a0 #\u00a0\u00a0\u00a0\u00a0 DEVANAGARI DIGIT ZERO to NINE<\/p>\n<p>(0x09E6..0x09EF).to_a,\u00a0\u00a0 # BENGALI DIGIT ZERO to NINE<\/p>\n<p>(0x0A66..0x0A6F).to_a,\u00a0\u00a0 # GURMUKHI DIGIT ZERO to NINE<\/p>\n<p>(0x0AE6..0x0AEF).to_a,\u00a0\u00a0 # GUJARATI DIGIT ZERO to NINE<\/p>\n<p>(0x0B66..0x0B6F).to_a,\u00a0\u00a0 # ORIYA DIGIT ZERO to NINE<\/p>\n<p>(0x0BE6..0x0BEF).to_a,\u00a0\u00a0 # TAMIL DIGIT ZERO to NINE<\/p>\n<p>(0x0C66..0x0C6F).to_a,\u00a0\u00a0 # TELUGU DIGIT ZERO to NINE<\/p>\n<p>(0x0CE6..0x0CEF).to_a,\u00a0\u00a0 # KANNADA DIGIT ZERO to NINE<\/p>\n<p>(0x0D66..0x0D6F).to_a,\u00a0\u00a0 # MALAYALAM DIGIT ZERO to NINE<\/p>\n<p>(0x0E50..0x0E59).to_a,\u00a0\u00a0 # THAI DIGIT ZERO to NINE<\/p>\n<p>(0x0ED0..0x0ED9).to_a,\u00a0\u00a0 # LAO DIGIT ZERO to NINE<\/p>\n<p>(0x0F20..0x0F29).to_a,\u00a0\u00a0 # TIBETAN DIGIT ZERO to NINE<\/p>\n<p>(0x1090..0x1099).to_a,\u00a0\u00a0 # MYANMAR SHAN DIGIT ZERO to NINE<\/p>\n<p>(0x17E0..0x17E9).to_a,\u00a0\u00a0 # KHMER DIGIT ZERO to NINE<\/p>\n<p>(0x1810..0x1819).to_a,\u00a0\u00a0 # MONGOLIAN DIGIT ZERO to NINE<\/p>\n<p>(0x1946..0x194F).to_a,\u00a0\u00a0 # LIMBU DIGIT ZERO to NINE<\/p>\n<p>(0x19D0..0x19D9).to_a,\u00a0\u00a0 # NEW TAI LUE DIGIT ZERO to NINE<\/p>\n<p>(0x1A80..0x1A99).to_a,\u00a0\u00a0 # TAI THAM HORA DIGIT ZERO to NINE<\/p>\n<p>(0x1B50..0x1B59).to_a,\u00a0\u00a0 # BALINESE DIGIT ZERO to NINE<\/p>\n<p>(0x1BB0..0x1BB9).to_a,\u00a0\u00a0 # SUNDANESE DIGIT ZERO to NINE<\/p>\n<p>(0x1C40..0x1C49).to_a,\u00a0\u00a0 # LEPCHA DIGIT ZERO to NINE<\/p>\n<p>(0x1C50..0x1C59).to_a,\u00a0\u00a0 # OL CHIKI DIGIT ZERO to NINE<\/p>\n<p>(0xA620..0xA629).to_a,\u00a0\u00a0 # VAI DIGIT ZERO to NINE<\/p>\n<p>(0xA8D0..0xA8D9).to_a,\u00a0\u00a0 # SAURASHTRA DIGIT ZERO to NINE<\/p>\n<p>(0xA900..0xA909).to_a,\u00a0\u00a0 # KAYAH LI DIGIT ZERO to NINE<\/p>\n<p>(0xA9D0..0xA9D9).to_a,\u00a0\u00a0 # JAVANESE DIGIT ZERO to NINE<\/p>\n<p>(0xAA50..0xAA59).to_a,\u00a0\u00a0 # CHAM DIGIT ZERO to NINE<\/p>\n<p>(0xABF0..0xABF9).to_a,\u00a0\u00a0 # MEETEI MAYEK DIGIT ZERO to NINE<\/p>\n<p>(0xFF10..0xFF19).to_a,\u00a0\u00a0 # FULLWIDTH DIGIT ZERO to NINE<\/p>\n<p>(0x104A0..0x104A9).to_a, # OSMANYA DIGIT ZERO to NINE<\/p>\n<p>(0x11066..0x1106F).to_a, # BRAHMI DIGIT ZERO to NINE<\/p>\n<p>(0x110F0..0x110F9).to_a, # SORA SOMPENG DIGIT ZERO to NINE<\/p>\n<p>(0x11136..0x1113F).to_a, # CHAKMA DIGIT ZERO to NINE<\/p>\n<p>(0x111D0..0x111D9).to_a, # SHARADA DIGIT ZERO to NINE<\/p>\n<p>(0x116C0..0x116C9).to_a, # TAKRI DIGIT ZERO to NINE<\/p>\n<p>(0x1D7CE..0x1D7D7).to_a, # MATHEMATICAL BOLD DIGIT ZERO to NINE<\/p>\n<p>(0x1D7D8..0x1D7E1).to_a, # MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO to NINE<\/p>\n<p>(0x1D7E2..0x1D7EB).to_a, # MATHEMATICAL SANS-SERIF DIGIT ZERO to NINE<\/p>\n<p>(0x1D7EC..0x1D7F5).to_a, # MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO to NINE<\/p>\n<p>(0x1D7F6..0x1D7FF).to_a # MATHEMATICAL MONOSPACE DIGIT ZERO to NINE<\/p>\n<p>].flatten.map(&amp;:chr).join<\/p>\n<p>class &lt;&lt; self<\/p>\n<p>&nbsp;<\/p>\n<p>regexps = {<\/p>\n<p>:only_characters_allowed =&gt; Regexp.new(\/^([^#{SPECIAL_CHARACTERS}#{DIGITS}]+\\s{0,1})+$\/), # Excludes all special characters &amp; digits<\/p>\n<p>:characters_and_numbers_allowed =&gt; Regexp.new(\/^([^#{SPECIAL_CHARACTERS}]+\\s{0,1})+$\/),\u00a0\u00a0 # Excludes all special characters<\/p>\n<p>:only_special_characters_not_allowed =&gt; Regexp.new(\/^([^#{SPECIAL_CHARACTERS}]+(.)*)+$\/), # Only special characters&#8217; string not allowed rest all combinations allowed<\/p>\n<p># Other regular expressions with custom requirement<\/p>\n<p>:email =&gt; Regexp.new(\/^((\\&#8221;[^\\&#8221;\\f\\n\\r\\t\\v\\b]+\\&#8221;)|([\\w\\!\\#\\$\\%\\&amp;\\&#8217;\\*\\+\\-\\~\\\/\\^\\`\\|\\{\\}]+(\\.[\\w\\!\\#\\$\\%\\&amp;\\&#8217;\\*\\+\\-\\~\\\/\\^\\`\\|\\{\\}]+)*))@((\\[(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))\\])|(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))|((([A-Za-z0-9\\-])+\\.)+[A-Za-z\\-]+))$\/),<\/p>\n<p>:password =&gt; Regexp.new(\/^(\\S(?=.*\\d)+|\\d(?=.*\\d)*)\\S*$\/),<\/p>\n<p>:country_name =&gt; Regexp.new(\/^[a-zA-Z]+([\\.\\-\\s]?[a-zA-Z]+)*$\/),<\/p>\n<p>:locale =&gt; Regexp.new(\/^[a-z]{2}([-]{0}|[-]{1}[A-Z]{2})$\/),<\/p>\n<p>:currency_code =&gt; Regexp.new( \/^[A-Z]+*$\/),<\/p>\n<p>:zip =&gt; Regexp.new(\/^[a-zA-Z0-9]+([\\-\\s][a-zA-Z0-9]+)*$\/),<\/p>\n<p>:phone =&gt; Regexp.new(\/^[+#\\(]?([\\s]?[0-9])+[\\)]{0,1}([\\s]?[\\-#xX\\(\\)\\.)]?[0-9]+)*$\/),<\/p>\n<p>:mobile =&gt; Regexp.new(\/^[+#]?([\\s]?[0-9])+([\\s]?[\\-#xX\\(\\)\\.)]?[0-9]+)*$\/)<\/p>\n<p>}<\/p>\n<p>&nbsp;<\/p>\n<p>regexps.each_pair do |attribute, regex|<\/p>\n<p>&nbsp;<\/p>\n<p>define_method attribute do<\/p>\n<p>regex<\/p>\n<p>end<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&nbsp;<\/p>\n<p>end<\/p>\n<p>&lt;\/pre&gt;<\/p>\n<p>&nbsp;<\/p>\n<p>Above mentioned validations include all the unicode characters.<\/p>\n<p>If we want to be specific to some language characters we can change the set of characters provided in constants.<\/p>\n<p>These regular expressions are as per requirement for Unicode characters and can be modified in same way for digits.<\/p>\n<p>This is the solution that we can use for Ruby on Rails application but we can develop a variant of this applying the same technique for other languages.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In any application, validations play an important role in protecting application from invalid data to be saved into the database. There\u00a0 are several ways of validating data before it is&#8230;<\/p>\n","protected":false},"author":9192191,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[4,3],"tags":[20],"class_list":["post-1059","post","type-post","status-publish","format-standard","hentry","category-my-voice","category-technology","tag-ruby-on-rails"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.5 (Yoast SEO v27.6) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Unicode Supported Regular Expressions &amp; Validations<\/title>\n<meta name=\"description\" content=\"Here I talk about several methods for Unicode Supported Regular Expressions &amp; Validations in Ruby and Ruby on Rails programming framework\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Unicode Supported Regular Expressions &amp; Validations\" \/>\n<meta property=\"og:description\" content=\"In any application, validations play an important role in protecting application from invalid data to be saved into the database. There\u00a0 are several ways\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/\" \/>\n<meta property=\"og:site_name\" content=\"Artificial Intelligence, ROBOTICS, AUTOMATION\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/allerintech\" \/>\n<meta property=\"article:published_time\" content=\"2014-08-15T16:09:23+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-05-12T09:21:27+00:00\" \/>\n<meta name=\"author\" content=\"Temp User\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Temp User\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/unicode-supported-regular-expressions-validations\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/unicode-supported-regular-expressions-validations\\\/\"},\"author\":{\"name\":\"Temp User\",\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/#\\\/schema\\\/person\\\/1800305fd7a4b1da6e13bbb42425cf27\"},\"headline\":\"Unicode Supported Regular Expressions &amp; Validations\",\"datePublished\":\"2014-08-15T16:09:23+00:00\",\"dateModified\":\"2016-05-12T09:21:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/unicode-supported-regular-expressions-validations\\\/\"},\"wordCount\":2271,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/#organization\"},\"keywords\":[\"Ruby on Rails\"],\"articleSection\":[\"My Voice\",\"Technology\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.allerin.com\\\/blog\\\/unicode-supported-regular-expressions-validations\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/unicode-supported-regular-expressions-validations\\\/\",\"url\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/unicode-supported-regular-expressions-validations\\\/\",\"name\":\"Unicode Supported Regular Expressions &amp; Validations\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/#website\"},\"datePublished\":\"2014-08-15T16:09:23+00:00\",\"dateModified\":\"2016-05-12T09:21:27+00:00\",\"description\":\"Here I talk about several methods for Unicode Supported Regular Expressions &amp; Validations in Ruby and Ruby on Rails programming framework\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/unicode-supported-regular-expressions-validations\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.allerin.com\\\/blog\\\/unicode-supported-regular-expressions-validations\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/unicode-supported-regular-expressions-validations\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Unicode Supported Regular Expressions &amp; Validations\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/\",\"name\":\"Artificial Intelligence, ROBOTICS, AUTOMATION\",\"description\":\"Empowering Futures: Innovating with AI and Machine Learning\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/#organization\",\"name\":\"Allerin\",\"url\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/wp-content\\\/uploads\\\/2016\\\/06\\\/logo-fire.png\",\"contentUrl\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/wp-content\\\/uploads\\\/2016\\\/06\\\/logo-fire.png\",\"width\":1000,\"height\":1000,\"caption\":\"Allerin\"},\"image\":{\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/allerintech\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/allerintech\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/#\\\/schema\\\/person\\\/1800305fd7a4b1da6e13bbb42425cf27\",\"name\":\"Temp User\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/194a42e22f3078426d730e84e0c45af551304b674cd9cf8b99a0e26d09b974e8?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/194a42e22f3078426d730e84e0c45af551304b674cd9cf8b99a0e26d09b974e8?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/194a42e22f3078426d730e84e0c45af551304b674cd9cf8b99a0e26d09b974e8?s=96&d=mm&r=g\",\"caption\":\"Temp User\"},\"url\":\"https:\\\/\\\/www.allerin.com\\\/blog\\\/author\\\/tempuser\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Unicode Supported Regular Expressions &amp; Validations","description":"Here I talk about several methods for Unicode Supported Regular Expressions &amp; Validations in Ruby and Ruby on Rails programming framework","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/","og_locale":"en_US","og_type":"article","og_title":"Unicode Supported Regular Expressions &amp; Validations","og_description":"In any application, validations play an important role in protecting application from invalid data to be saved into the database. There\u00a0 are several ways","og_url":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/","og_site_name":"Artificial Intelligence, ROBOTICS, AUTOMATION","article_publisher":"https:\/\/www.facebook.com\/allerintech","article_published_time":"2014-08-15T16:09:23+00:00","article_modified_time":"2016-05-12T09:21:27+00:00","author":"Temp User","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Temp User","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/#article","isPartOf":{"@id":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/"},"author":{"name":"Temp User","@id":"https:\/\/www.allerin.com\/blog\/#\/schema\/person\/1800305fd7a4b1da6e13bbb42425cf27"},"headline":"Unicode Supported Regular Expressions &amp; Validations","datePublished":"2014-08-15T16:09:23+00:00","dateModified":"2016-05-12T09:21:27+00:00","mainEntityOfPage":{"@id":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/"},"wordCount":2271,"commentCount":1,"publisher":{"@id":"https:\/\/www.allerin.com\/blog\/#organization"},"keywords":["Ruby on Rails"],"articleSection":["My Voice","Technology"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/","url":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/","name":"Unicode Supported Regular Expressions &amp; Validations","isPartOf":{"@id":"https:\/\/www.allerin.com\/blog\/#website"},"datePublished":"2014-08-15T16:09:23+00:00","dateModified":"2016-05-12T09:21:27+00:00","description":"Here I talk about several methods for Unicode Supported Regular Expressions &amp; Validations in Ruby and Ruby on Rails programming framework","breadcrumb":{"@id":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.allerin.com\/blog\/unicode-supported-regular-expressions-validations\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.allerin.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Unicode Supported Regular Expressions &amp; Validations"}]},{"@type":"WebSite","@id":"https:\/\/www.allerin.com\/blog\/#website","url":"https:\/\/www.allerin.com\/blog\/","name":"Artificial Intelligence, ROBOTICS, AUTOMATION","description":"Empowering Futures: Innovating with AI and Machine Learning","publisher":{"@id":"https:\/\/www.allerin.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.allerin.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.allerin.com\/blog\/#organization","name":"Allerin","url":"https:\/\/www.allerin.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allerin.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.allerin.com\/blog\/wp-content\/uploads\/2016\/06\/logo-fire.png","contentUrl":"https:\/\/www.allerin.com\/blog\/wp-content\/uploads\/2016\/06\/logo-fire.png","width":1000,"height":1000,"caption":"Allerin"},"image":{"@id":"https:\/\/www.allerin.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/allerintech","https:\/\/www.linkedin.com\/company\/allerintech"]},{"@type":"Person","@id":"https:\/\/www.allerin.com\/blog\/#\/schema\/person\/1800305fd7a4b1da6e13bbb42425cf27","name":"Temp User","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/194a42e22f3078426d730e84e0c45af551304b674cd9cf8b99a0e26d09b974e8?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/194a42e22f3078426d730e84e0c45af551304b674cd9cf8b99a0e26d09b974e8?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/194a42e22f3078426d730e84e0c45af551304b674cd9cf8b99a0e26d09b974e8?s=96&d=mm&r=g","caption":"Temp User"},"url":"https:\/\/www.allerin.com\/blog\/author\/tempuser\/"}]}},"_links":{"self":[{"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/posts\/1059","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/users\/9192191"}],"replies":[{"embeddable":true,"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/comments?post=1059"}],"version-history":[{"count":1,"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/posts\/1059\/revisions"}],"predecessor-version":[{"id":1060,"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/posts\/1059\/revisions\/1060"}],"wp:attachment":[{"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/media?parent=1059"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/categories?post=1059"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.allerin.com\/blog\/wp-json\/wp\/v2\/tags?post=1059"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}