viernes, 27 de febrero de 2015

LanguageDetection API Client in Smalltalk

Introduction

Language Detection API is a service to query the language of a given input text. You will need to register an API key in the web site http://detectlanguage.com to use the service. This client enables to use the service from Pharo Smalltalk. The output is an object containing the language code, a confidence score and a 'is reliable' boolean value.

Installation

Inside Pharo, open the Configuration Browser and select LanguageDetection, then Install. Or evaluate the following expression:
Gofer it 
 smalltalkhubUser: 'hernan' project: 'LanguageDetection';
 configurationOf: 'LanguageDetectionAPI';
 loadStable.

Usage

| ldClient |
ldClient := LDApiClient new.
ldClient 
 query: 'Des perles de pluie venues de pays où il ne pleut pas'; 
 detectedLanguageCode.
ldClient 
 query: 'Een enkele taal is nooit genoeg ';
 detectedLanguageCode.
ldClient 
 query: 'buenos dias señor';
 detectedLanguageCode.
Enjoy

miércoles, 25 de febrero de 2015

StNER: Interface to the Stanford Named Entity Recognizer

Introduction

StNER provides a Pharo Smalltalk interface to the Stanford Named Entity Recognizer (NER). The Stanford NER recognizer is an implementation of a Named Entity Recognizer, used for tagging raw text which is a central task in Information Retrieval and Natural Language Processing. The input is a sequence of words in a text, and the NER classifier - using already trained data - try to recognize typically three types of "Named Entities" (NEs) : NAME, LOCATION and ORGANIZATION (more classes exists). The output is the tagged text in some common tagging format for tagging tokens. This recognizer works better on input more similar to the already trained labeled data sets (muc6, muc7, conll2003), however there are reports to use it with tweets, and you can retrain to recognize entities for your particular needs.

To recognize text in other languages, for example, Chinese, German, or Spanish, a different classifier (in this context a .tgz file) can be used (see NLP Stanford Demo).

Installation

  • Java is required to run the server locally.
  • Download the Stanford NER packages.
  • Inside Pharo, open the Configuration Browser and select StNER, then Install. Or evaluate
    Gofer it
     smalltalkhubUser: 'hernan' project: 'StNER';
     configurationOf: 'StNER';
     loadStable
    

Launch the server

  • Start (from Smalltalk) the (Java) server using the StNER Smalltalk server interface. For example, to start the server with default parameters in Windows:
    StSocketNERServer new
        stanfordNERPath: 'c:\stanford-ner-2015-01-30\';
        startServer.
    
  • Query an input text using the StNER Smalltalk client interface.

Server Settings

Providing path location is mandatory. If no host or port is supplied, defaults to:
  • localhost (127.0.0.1),
  • port 8080
  • JVM memory 1000m.
  • output format: inlineXML

You can configure the server with the following taggers:
  • 3 class NER tagger that can label: PERSON, ORGANIZATION, and LOCATION entities. (#setEnglish3ClassTagger)
  • 4 class NER tagger trained on the CoNLL 2003 Shared Task training data that labels for PERSON, ORGANIZATION, LOCATION, and MISC. (#setEnglish4ClassTagger)
  • 7 class NER tagger trained only on data from MUC (#setEnglish7ClassTagger): TIME, LOCATION, ORGANIZATION, PERSON, MONEY, PERCENT, DATE.

Client Usage

To tag text you can use the #tagText: method as follows:
StSocketNERClient new 
  tagText: 'University of California is located in California, United States'
and the output will be:
'University of California 
is located in California, 
United States' "
Another example including PERSON tagging:
StSocketNERClient new 
 tagText: 'Argentina President Kirchner has been asked to testify in court on the death of Alberto Nisman the crusading prosecutor who had accused her of conspiring to cover up involvement of Iran'
which results in:
'Argentina President Kirchner has been asked to testify in court on the death of Alberto Nisman the crusading prosecutor who had accused her of conspiring to cover up involvement of Iran'
Parse text to in-line XML
StSocketNERClient new 
  parseText: 'University of California is located in California, United States'
results in a Dictionary of Bag's with occurrences of tagged classes.

martes, 24 de febrero de 2015

GADM: Access to Global Administrative Areas in Smalltalk

Introduction

GADM is a high-resolution spatial database of the location of the world's administrative areas for use in GIS and similar software. GADM is freely available for academic and other non-commercial use. The data contained in GADM was collected from spatial databases provided by NGO, National Governments, and/or maps and list of names available on the Internet (e.g. from Wikipedia).

Administrative areas include: countries, provinces, counties, departments, etc. up to five sublevels, which cover most boundaries in the world. For each level it provides some attributes, foremost being the name and in some cases variant names. GADM can also be used to extract polygon shapes for visualization, for example to build choropleth maps for regions. The GADM package includes the raw data in CSV format, which I parsed to build a browseable GADM world tree, allowing off-line access to the GADM database in a hierarchical fashion with objects, without need to perform on-line queries for basic requests. A hierarchical tree can be used to build a toponym browser for example.

Installation

From within Pharo 3, or Pharo 4 you can use the Configuration Browser, or evaluate the following expression:
Gofer it
 smalltalkhubUser: 'hernan' project: 'GADM';
 configurationOf: 'GADM';
 loadStable.

Usage Examples

" To access to the whole World (as seen by GADM), evaluate "
GADMWorldTree root.

" Access country Lithuania "
GADMWorldTree @ 'Lithuania'.

" To acces the Part (Partido: spanish) where I am living:"
GADMWorldTree @ 'Argentina' @ 'Buenos Aires' @ 'La Plata'.

" You want to know which type of region is Los Angeles "
(GADMWorldTree @ 'United States' @ 'California' @ 'Los Angeles') typeName " 'County' "

" You wish to list all subregions in San Marino "
(GADMWorldTree @ 'San Marino') nodeNames
 " a SortedCollection('Acquaviva' 
'Borgo Maggiore' 
'Chiesanuova' 
'Domagnano' 
'Faetano' 
'Fiorentino' 
'Montegiardino' 
'San Marino' 
'Serravalle') "
Enjoy

sábado, 7 de febrero de 2015

Application Security 3: Setting your password rules

The post you are reading is about password enforcement rules in the Application Security package, released as Open Source on March 2014 for the Pharo Smalltalk community. Rules which you can set up are:
  • Increase the password length, which results in increasing the number of combinations search space.
  • Increase the size of character set, to increase the number of password combinations.
The default character set in the Application Security package, includes uppercase and lowercase letters, numbers and a set of non-letters. This forms a 95-character set as recommended by the FIPS, and if passwords are between 5 and 8 characters, a brute-force attack would have to guess between 7.7 billion to 6.6 quadrillion combinations. It is possible to change the password creation rules by creating checkpointed validation settings:
| settings |
settings := ASValidationSettings forCheckPoint: ASDeployCheckPoing new.

" Set my passwords will allow up to 14 characters "
settings maxPasswordCharacters: 14.

" Set the user name character length maximum "
settings maxUsernameCharacters: 14.
You can also change the default character set allowed by user names. The default is the result of evaluating:
ASValidationSettings defaultUsernameCharactersList 
  evalString gather: [ : c | c ].

" but for convenience, you should grab the 
#defaultUsernameCharactersList method and customize for your purposes:

{ '$0 to: $z' . '$A to: $z' . '$a to: $z' . 
  '($0 to: $9) , ($A to: $Z) , ($a to: $z)' . 
  '($0 to: $9) , ($A to: $Z) , ($a to: $z) , 
  #($_ $- $.)' } "
Continuing with the validation settings example, this is how you do it:
settings allowedUsernameCharacters: {'$A to: $z' . '$a to: $z' }.

" and the same could be achieved for password characters : "

settings allowedPasswordCharacters: ...
Recent password research, have claimed that using passphrases increase the combinations needed by brute-force attacks, but there is more chance of making typographical mistakes, and so is good practice to increase the number of allowed failure attempts. This can be done in Application Security by evaluating:
" Set the maximum count of allowed fails per user during a period of time "
" Default is 40 "
settings maxUserFailCount: 5.

lunes, 2 de febrero de 2015

Pharo Smalltalk Scripts, part 1

These are some scripts and tips I used in my daily developement with Pharo Smalltalk in the last years. Hope you find them useful:

Parsing XML with DOM

You can instantiate and parse a XML DOM parser with one line of code:
(XMLDOMParser parseFileNamed: 'fao_country_names.xml') 
  firstNode 
  allElementsSelect: [ : each | 
    each localName = 'geographical_region' ].

NeoCSV

You can quickly parse a CSV file using NeoCSV with just a snippet for most tasks:
(NeoCSVReader on: 'myfile.csv' asFileReference readStream)
 separator: Character tab; 
 do: [ : row | " do something with row " 
    row 
      first; 
      second; 
      third ]
or using a one-liner
'myfile.csv' asFileReference readStreamDo: [ : stream | 
  (NeoCSVReader on: stream) upToEnd ]

Lorem Ipsum

You already have the first paragraph of "Lorem ipsum" available in Pharo.
String loremIpsum

Profiling

If you are developing UI you can have the Time Profiler (Pharo) opened in any method by enclosing your code between:
 TimeProfiler new openOnBlock: [ 
   " Your code ... "
 ]

Open File Dialog

Opening a File Dialog for specific file type is a one-liner:
UIManager default chooseFileMatching: #('*.xml').

Spec

Spec is a relatively young UI Specification library. You can prototype UI's easily by using Dynamic Composable Models, for example:
| view layout |
" Configure the Spec models "
view := DynamicComposableModel new
        instantiateModels: #(labelA LabelModel textA TextInputFieldModel labelB LabelModel textB TextInputFieldModel);
        extent: 500@200;
        title: 'Title'
        yourself.
" Configure the Spec layout "
layout := SpecLayout composed
        newColumn: [ : r | r
                add: #labelA; add: #textA;
                add: #labelB; add: #textB ];
        yourself.
" Set up the widgets "
view labelA text: 'A'.
view labelB text: 'B'.
" Open the Window "
(view openDialogWithSpecLayout: layout)
        centered;
        modalRelativeTo: World.

Morphic Window and Controls

If you need a basic container Morph window, just evaluate:
| s |
(s := SystemWindow labelled: 'Window')
 openInHand.
 s addMorph: (StringMorph new
  contents: 'Comments:';
  color: Color black)
   frame: (0@0.0 corner: 1@0.2).
But you can have a control like a TextArea without container
| o ptm |
o := Object new.
ptm := PluggableTextMorph on: o text: #printString accept: nil.
ptm height: TextStyle defaultFont height + 6.
ptm acceptOnCR: true; openInHand.
And instantiated differently
(PluggableTextMorph on: Workspace new
 text: #contents
 accept: #acceptContents:
 readSelection: nil
 menu: #codePaneMenu:shifted:) openInHand.

FTP

Udo Schneider recently wrote an FTP/WebDAV Plugin for the Pharo FileSystem, and you get almost for free a FTP client using the File Browser
fs := FileSystem ftp: 'ftp://ftp.mozilla.org'.
FileList openOn: fs workingDirectory.
fs close.