|
Date: 2010-04-10 Category: Portrait Location: China Rating: 0 Comments: 0 (Add) |
|
Permanent Link Other Sizes: Large XLarge |
||
|
Tags: |
||||
I have been hard at work behind the scenes at Wokai laying the groundwork for what will hopefully be a flexible and comprehensive multilingual website. So this week I thought I'd share some of the details that may be helpful to other philanthropic hackers out there (money-grubbing techies please hide your eyes). I'm writing this blog post while sitting on the tarmac at Beijing's airport … waiting for the "Beijing fog" to lift. This exercise in fantasy may take some time. Eventually I expect to reach the island that is South Korea Incheon Airport - and will roam from one wifi hotspot to the next… so please attribute any disorganization in this article to on-the-job environmental hazards.
General UTF-8 Configuration
First up, a preliminary yet seemingly endless task of configuring an appropriate character encoding for the platform. UTF-8 is the most common choice, and is therefore the best choice unless you truly need the two-byte UTF-16 range. As most programmers know, what makes configuring a character set difficult is simply the multitude of places where such configuration exists. Any place in your system where a conversation occurs between two distinct pieces of software deserves inspection and may come with its own charset configuration properties or quirks (i.e., bugs that haven't yet been realized). Furthermore, there are wide sections of overlapping configuration where one layer can override another to make things confusing. Below are a few of the basics, though sometimes overlooked.
Database Variables -- Define UTF-8 as defaults for the MySQL server, connecting client, and command-line client. Also note the "init_connect" property to automatically apply a default character set on all connections.
[client] default-character-set=utf8 [mysqld] init_connect='SET collation_connection = utf8_general_ci' init_connect='SET NAMES utf8' default-character-set=utf8 character-set-server=utf8 collation-server=utf8_general_ci [mysql] default-character-set=utf8
Database Tables -- Even with the above configuration, you should still check to be sure all existing tables are defined as UTF8.
CREATE TABLE `partnerPostings` (
…
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Database Connection -- Connecting clients such as JDBC can also configure properties directly.
dbUri=jdbc:mysql://127.0.0.1:3306/wokai?
useUnicode=true
&characterSetResults=UTF-8
&characterEncoding=UTF-8
<%@ page contentType="text/html;charset=UTF-8" language="java" %>
CharsetEncoding Servlet Filter
When browsers connect with the web server, their request is also encoded in a charset. Often, the browser is unaware of what type of content is being passed in HTML forms or URLs. Crudely forcing the character encoding of all inbound HTTP requests has now become a commonplace measure of protection - and prevents having to bite of more complex decision making. In a Java environment, this is easily done by creating a Servlet Filter such as the following:
public class CharsetEncodingFilter implements Filter
{
FilterConfig fc;
public void doFilter(ServletRequest req, ServletResponse res,
FilterChain chain) throws IOException, ServletException
{
req.setCharacterEncoding("UTF-8");
res.setContentType("text/html; charset=UTF-8");
chain.doFilter(req, res);
}
public void init(FilterConfig filterConfig)
{
this.fc = filterConfig;
}
public void destroy()
{
this.fc = null;
}
}
And configuring the filter in your web.xml:
CharsetEncodingFilter org.wokai.CharsetEncodingFilter
Stripes Custom Locale Picker
Whatever web MVC framework you're using, you'll likely want to configure a custom locale picker to provide some extra assurance that whatever localities are encountered in an HTTP requested are brought down to the list of localities supported by your web application. Additionally, you'll want to acknowledge a web user's request to override the default locale. I do this by setting a value in the session. With Stripes, creation of a custom locale picker is quite simple - and the results are applied to the HttpServletRequest so any calls to getLocale() will abide. Very nice.
public class CustomLocalePicker extends DefaultLocalePicker
{
public final static Locale CHINESE = new Locale("zh", "CN");
public final static Locale ENGLISH = new Locale("en", "US");
@Override
public Locale pickLocale(HttpServletRequest request)
{
Locale locale = super.pickLocale(request);
HttpSession session = request.getSession(false);
String sessionLocale = null;
if (session != null)
{
sessionLocale = (String) session.getAttribute("locale");
if (sessionLocale != null)
{
if (sessionLocale.equalsIgnoreCase("chinese"))
{
return CHINESE;
}
return ENGLISH;
}
}
if (locale.getCountry().equals("CN") ||
locale.getLanguage().equals("zh"))
{
return CHINESE;
}
return ENGLISH;
}
}
And configuring this custom locale picker into Stripes…
LocalePicker.Locales
en_US:UTF-8,zh_CN:UTF-8
LocalePicker.Class
org.wokai.CustomLocalePicker
Message Resource Bundles
Now that we know what language we want to present, it is time for the translation meat. Tokenizing your entire website (disassembling all the text into succinct phrases that can ideally be reused) is a most tedious process. As most don't take their website bilingual until after it is well endowed with content, this is by far the most time consuming process. Using Java properties files to record the mappings is very straight forward, but will require some strategy on naming conventions. In some cases reusability is key, in others quarantining the text to a specific page is necessary. Over-thinking this may also be hazard. Simplicity is paramount as you'll be handing these properties files off to non-programming types or 3rd party services which may very well ignore the hard-thought naming conventions you used. (One such service that provides collaborative translation of message resource bundles is Crowdin.net.)
If you haven't used property files for multilingual text before, you might be surprised to find that the charset is hard-coded to ISO-885920372319. (Okay I'm exaggerating a little). To be clear, Java's support for multilingual text rests on top of a non-unicode charset. Stupid, but only because Sun has preferred to leave the embarrassment on display for a decade instead of adding in the easy fix. Nonetheless, here's the hack for getting around this problem… writing UTF8 to the properties file and then converting it into ISO-8859 using "native2ascii". One can also write a custom resource bundle implementation to migrate away from property files altogether, but I've not found a significant motivator for this yet.
#!/bin/sh
native2ascii -encoding utf8 resOrig.properties > res.properties
native2ascii -encoding utf8 resOrig_zh_CN.properties > res_zh_CN.properties
Applying Localized Text
We can now present localized text on web pages with a common JSP format tag like the following:
<%@ taglib prefix="fmt" uri="http://java.sun.com/jsp/jstl/fmt" %>
...
Custom Locale Include Tag
While the above "fmt:message" tag handles most of the localization work, there are other cases where it may be desirable to create more than one version of a complete web page - or include localized fragments. For this purpose, I created a custom JSP tag that will include JSP page fragments with proper request forwarding.
public class LocaleIncludeTag extends BodyTagSupport
{
private static final long serialVersionUID = 1L;
protected static Logger logger_ = Logger.getLogger(LocaleIncludeTag.class);
private String defaultPath_;
public LocaleIncludeTag()
{
super();
}
public void setDefaultPath(String defaultPath)
{
this.defaultPath_ = defaultPath;
}
@Override
public int doStartTag() throws JspException
{
try
{
JspWriter out = this.pageContext.getOut();
ServletContext context = this.pageContext.getServletContext();
ServletRequest request = this.pageContext.getRequest();
ServletResponse response = this.pageContext.getResponse();
Locale locale = request.getLocale();
String base = this.defaultPath_;
String type = null;
String path = this.defaultPath_;
int idx = this.defaultPath_.lastIndexOf(".");
if (idx > 0)
{
base = this.defaultPath_.substring(0,idx);
type = this.defaultPath_.substring(idx);
}
if (type != null)
{
path = base + "_" + locale.toString() + type;
}
else
{
path = base + "_" + locale.toString();
}
// REVERT TO DEFAULT IF LOCALE_FILE NOT FOUND
if (context.getResource(path) == null)
{
path = this.defaultPath_;
}
RequestDispatcher rd = request.getRequestDispatcher(path);
rd.include(request, new ServletResponseWrapperInclude(response, out));
}
catch (Exception e)
{
throw new JspTagException("exception: " + e.getMessage());
}
return SKIP_BODY;
}
@Override
public int doEndTag() throws JspException
{
return EVAL_PAGE;
}
}
Additional Steps
So far, I've outlined a fairly standard approach to bilingual website development - with a few customizations for added flexibility and reliability. An implementation for retrieving localized text from the database is also needed. For Wokai, I chose to build a simple JSP helper tag to select object properties based on locality. Wokai isn't at this time planning on supporting unlimited languages, but if that were the goal you would need to externalize localized resources into a normalized table structure and build a service to feed from that.
Next on my to-do list is to add a layer of URL redirection for SEO optimization. Web crawlers run from all parts of the world and, depending on their configuration, may be presented with different versions of the website per each visit… or at a minimum will be unaware of the alternate versions available. To fix this problem, I plan to prefix all URLs with the desired locality. The prefix will be processed, stripped from the URL and the request forwarded. Of course this also requires all links on the website to undergo processing for including the required prefix. A good deal of work, but with the right tools it is quite manageable. Perhaps we'll leave this for a blog post on another day.
As the official philanthropic hacker at Wokai, I've been asked to provide regular updates on new product features, behind-the-scenes development, technology related to peer to peer lending, or even Internet engineering in general if I think it is particularly nifty.
Today's topic is pretty low level detail, and in fact usually overlooked as being too small to be worthy of any design consideration at all - though I hope to prove otherwise. Topic; the delivery of automated email messages. In the coming weeks, I'll be adding many more event-based or periodic email reports for Wokai lenders. Monthly financial accounting reports with featured micro-loan recipients, notifications of activities on loans (i.e., borrower has fully repaid the loan), etc. The intent is to make all such email generated from templates that our Marketing Director can manage.
The vast majority of websites send email messages by composing email messages inline (within their code) and handing content off to a function for email delivery. The more formal websites do it by writing code to assemble needed values, formatting these values, then passing a map of the values and a pointer to a specific email template off to a method to perform data merge and delivery. This is fairly clean and straightforward, but requires changes to the code every time someone makes a change to the email template that requires a new (or newly formatted) value.
I've assembled a much more powerful solution that leaves the code cleaner and gives the email template editor far more flexibility. Here's what the actual code looks like for sending Wokai's welcome letter after a registration:
emailManager.addContext("user", user);
emailManager.processEvent(EmailEventType.REGISTRATION);
emailManager.deliver();
That's it! All the work is neatly handled by all the right parties. Now let's take a look at how this is achieved.
(1) The first line is very straight-forward. "user" is an object that represents the user who has been registered. We are adding it to the context of the email manager service for later use in parsing templates. If there were other root objects that were related to this event, we would add them all in the same manner.
(2) The next line is where the magic happens. ProcessEvent() retrieves every email template from the database that matches the given event type (in this case a REGISTRATION), then passes the current context along to the template and asks the template to generate an EmailDeliverable (a custom object I've written that represents the basic parts of an email message).
Here's the processEvent() code:
List templates = this.emailTemplateDAO_.findActiveByEvent(event);
Iterator iterator = templates.iterator();
while (iterator.hasNext())
{
EmailTemplate template = iterator.next();
EmailDeliverable deliverable = template.generate(this.contextMap_);
this.deliverables_.add(deliverable);
}
The Template.generate() method looks like this:
@Transient
public EmailDeliverable generate() throws Exception
{
// this can be done in a startup servlet or wherever you like
Velocity.init();
EmailDeliverable e = new EmailDeliverable();
e.setFromName(this.fromName_);
e.setFromEmail(this.fromEmail_);
if (this.context_.getKeys().length > 0)
{
StringWriter writer;
String templateRef = "emailTemplate: " + this.id_;
writer = new StringWriter();
Velocity.evaluate(this.context_, writer, templateRef, this.subject_);
e.setSubject(writer.toString());
writer = new StringWriter();
Velocity.evaluate(this.context_, writer, templateRef, this.html_);
e.setHtml(writer.toString());
writer = new StringWriter();
Velocity.evaluate(this.context_, writer, templateRef, this.text_);
e.setText(writer.toString());
writer = new StringWriter();
Velocity.evaluate(this.context_, writer, templateRef, this.addressList_);
e.setAddresses(writer.toString());
}
else
{
e.setSubject(this.subject_);
e.setText(this.text_);
e.setHtml(this.html_);
e.setAddresses(this.addressList_);
}
return e;
}
Using Velocity, the email templates content can now contain references to context objects at ANY level of depth. We can also harness the power of Velocity directives directly from our email template:
Hello $user.firstName,
Here's a list of your current loans:
#foreach( $loan in $user.loans )
#if( $loan.repaid )
[dynamic email content here]
#end
#end
You might have noticed that we never indicate the sender or recipient of a message. This work is now pushed off on the editor of the email template itself! The recipient of the welcome message is simply $user.email. At this point the power of this system should be evident. The author/editor of an email template now has access to pretty much any business values related to the associated web event and has the freedom to navigate and format these values as desired to compose a highly dynamic email.
(3) Lastly, returning to our original three line implementation, the third and last line of code is to deliver all emails that have been generated and queued. You could write any method you like. For immediate return of control, one can write a method to pass the queue to a background delivery agent using a thread manager like Quartz, and/or deliver mail directly to a mail server using JavaMail (or a wrapper). A number of things should be considered.
1. If using your server's local email service (Sendmail, Postfix, etc), the percentage of emails that get caught in Spam traps will largely depend on the history of your server's IP address. Any junk email sent by this IP address in past decades can have a bearing on its current treatment by ISPs who are trying to help their customers control Spam.
2. Define an SPF DNS record for your domain.
3. Be absolutely sure your server is accepting return/bounce messages, but is NOT accepting relay mail from the outside world.
4. Ensure your delivery process is applying a correct bounce/sender address - separate from the message sender address which can change more freely based on business needs.
As a builder of one of the Internet's largest outbound email systems, I can expound on the intricacies of email reliability and tracking at length. Covering the basics may be good enough for your service, but getting into opt-out management, archiving, routing and processing of return mail starts to get rather time consuming for in-house development. At a minimum, I recommend using a third party outbound email service with a good reputation (such as AuthSMTP) to carry out deliveries. Configuring your local server to forward mail to this mail transfer agent will provide performance improvement to the web application and an added layer of queuing/logging for more flexible control over the outbound mail.
April 2010 - Singapore - Sweat and shop, sweat and shop... - Lily decides to hit the beach before she becomes visibly pregnant. We hit all the big spots in Singapore and can't seem to avoid walking malls.
November 2009 - Thailand - Phuket and Phi Phi Islands - Swimming with the fishes near Phuket Thailand - Phi Phi Islands
May 2009 - Seoul, South Korea and the DMZ - In order to renew my Chinese VISA I needed to leave the country and re-enter customs... I decided to make a quick stop in Seoul, Korea and visit the DMZ...
April 2009 - China Wedding Travels with Family - A year after our legal marriage in America, Lily and I had a big formal wedding in Beijing with more than 100 guests. My mother came from New Mexico and father and his wife came from Switzlerland came to take part in the wedding and travel a bit of China with us.
April 2009 - Ben and Lily's Wedding in Beijing, China - Our formal wedding celebration at JunWangFu palace, Beijing, China. My parents came from overseas to take part in the ceremony with more than 100 of Lily's family and friends.
January 2009 - Harbin Ice Festival 2009 - A spontaneous decision to visit the far north city of Harbin, China to see one of the world's largest ice sculpting festivals. The temperature was -21 celsius with strong winds directly from Siberia (only a few hundred miles away)... the camera froze over with frost frequently.