Tag Archives: utility

Live Space空间博客导出工具

Live Space博客导出工具
Ver.1.1
Zhuotong Nan ([email protected])

什么用

本工具将Windows Live Space的所有帖子导出成一个XML文件,利用Google Blogger的Import功能,可以将这些帖子导入到Blogger。
本工具通过使用页面分析的方法避免了传统使用Weblog API有帖子个数的限制(space读取20个帖子,Blogger分布每天50个帖的限制)。

如果您是资深的live space用户,已经在space上有很多的帖子,这时如果因为各种原因(比如因为国内封禁了您的space)想转移到
Blogger,本工具可以极大的方面您。此外,Blogger的确有比space更为强大的功能。

生成XML后再作什么

登录到你的Blogger,访问 http://draft.blogger.com/home, 然后点击 Settings,可以看到 Import blog, Export blog和 Delete blog
等。使用Import blog功能将生成的XML文件导入到Blogger。发布后的Blogger被标记为imported,在后台发布这些imported帖子,
或者删除这些导入帖子。

运行需求

.net framework 2.0 或以上。可以从微软站点获取。
这是framework 3.5的链接,
http://www.microsoft.com/downloads/details.aspx?familyid=AB99342F-5D1A-413D-8319-81DA479AB0D7&displaylang=en

更多帮助

有关界面各参数的设置,请访问 http://nanzhuotong.blogspot.com
在此站点还可以获取帮助或者提交bug报告。
最新版也将在此站点发布。

image 
第一次运行,填写各必要的参数

image
正在运行

image
关于

Ver. 1.1
+ add a simple Chinese readme file.
* fix bug that program run even when no necessary info is provided. (11/12/08)
+ include Microsoft.WindowsLive.Id.Client.dll in the distr pack. (11/8/08)
# passwords now are saved to config file with md5 encrypted.
# add a much nice about box
+ localized, Simplified Chinese supported. (11/7/08)

Ver. 1.0
+ initial release (11/7/08)

http://cid-0ea641a5a7f665a1.skydrive.live.com/embedrowdetail.aspx/Public/livespaceexport.v1.1.zip
MD5: da31bf89753721f47bd8d3d33aabd228

Move blog posts from Live Space to Google Blogger

I developed a small tool to enable the massive posts movement to the Google Blogger. I attached the README.txt file below.

http://cid-0ea641a5a7f665a1.skydrive.live.com/embedrowdetail.aspx/Public/livespaceexport.v1.zip
md5 digest: 401b130f69a0d83f3696c6c07a6d0c0b

Live space export utility
Ver.1.0
Zhuotong Nan ([email protected])

PURPOSE

This tool is to export all live space posts (without 20 posts limit due to using a different method to achieve posts) and
to generate a xml file in a format which can be imported to Blogger.

This tool is specially useful when you want to move to a newly established Blogger. Considering you have already written
hundreds of posts in your live space, it is impossible to move all posts manually. The existing tools on the Internet
with similar purpose have more constraints, for ex, 20 posts max from live space, and 50 posts per day on the Blogger side.

The tool is designed to address those issues.

CONFIGURATION

Windows Live ID – your live id. Mine is [email protected]

Live space user name – my space is http://nzt.spaces.live.com, in this case the user name is nzt.

Live space secret word – go to your live space setting, enable your email publishing. the secret word can be found there.

Start page to navigate all posts – sign in your space, it is the link address leading to the Summary list of Blog,
My start page is like http://nzt.spaces.live.com/?_c11_BlogPart_BlogPart=summary&_c=BlogPart, replace nzt with
your name. This tool will use the link to achieve all posts list.

Use local buffer posts listing file – the first time it always gray out. After you downloaded the Live space posts list (posts
contents have not yet downloaded at that time), the posts list were saved to a local file named blogposts.txt in the current
directory. Then by enabling this option, the tool will skip the step of downloading posts list, and then continue to catch
posts contents as well as to form a final xml file.

BlogID – your Blogger id. when you log in your Blogger space, click the Customize link, you will see
a url in your browser addr bar like http://www.blogger.com/rearrange?blogID=207104551370538866, the number string here is
your id (here actually it is mine, ^_^).

Blogger user name – generally your gmail account
Blogger password – gmail password

Xml file name – the posts from live space will be organized in this file. Import this file to Blogger using Import from
Blogger in draft. To get the import function, at the time of this release, log in your blogger, and go to visit
http://draft.blogger.com/home, then click Settings on that page, you can see Import blog, Export blog and Delete blog.

WHAT TO DO NEXT

log in your blogger, and go to visit http://draft.blogger.com/home, then click Settings on that page, you can see
Import blog, Export blog and Delete blog. Use Import blog to import the created xml file. Do not worry about existing
blogger posts, they will be kept. the imported posts will be assigned with imported tag, you can make decision to publish
or delete some or all of them.

HOW IT WORKS

get posts list from live space -> download all posts -> get existing posts of your Blogger -> append posts from live space
to your existing Blogger posts to generate a xml file for import -> import to Blogger (thanks to draft Blogger)

I did not implement the import functionality in this tool based on the network consideration. The created xml file might be large
in size. In this case it requires much time to import even in a very good network condition. You decide when you do the import
using the xml file.

IMPORTANT

The tool will not alter/delete/add any post to your live space. Existing blogger posts will be exported and posts from live space
will appended to existing blogger posts. the time stamps of live space posts will be kept.

If you only import partial posts of live space to your Blogger, for ex due to network problem, make sure you delete those before
you do the second import. otherwise there will be duplicated posts.

The tool only communicates to your live space and your blogger.
For the sake of safety, please compare md5 of the zip package downloaded with the md5 string on author’s Blogger.

REQUIREMENTS

.net framework 2.0 or above is required to run it. get it from ms website.
here is the link for the framework 3.5,
http://www.microsoft.com/downloads/details.aspx?familyid=AB99342F-5D1A-413D-8319-81DA479AB0D7&displaylang=en

HELP

Visit http://nanzhuotong.blogspot.com for help and bug report.
Latest available version will also be published in that web site.

Ver. 1.0
+ initial release (11/7/08)

p.s. forgive my English and typos, I am not native English speaker.

 

image
the first time the tool run

image
properly configure it

image
the start page of summary list of posts. the page url is necessary to make the tool work.

image
enable the email publishing and set the secret word. secret word acts as password to access the live space api.

image
click start! to run it

image
you can cancel the running at any time. note in this case the formatted xml file is not completed.

image
after you downloaded all the posts, from the draft Blogger to reach Setting link (different from the normal Blogger setting link)

image
following the setting link from draft Blogger, see the Import blog function. follow its steps, good luck.

HtmlDecode Tool

It’s a very simple tool doing the html decoding. I developed it today because when I searched the Internet, I cannot locate such a tool from so many search results. There is some web site with similar conversion function. But I hate the ads on the website, which makes my laptop work slowly.

image

Look at the figure above. Put the texts to be decoded into the upper textbox, and press “Covert & copy” to do the decoding. The decoded strings will be shown on the lower textbox and copied to the clipboard when press that button. Then you can easily paste to anywhere else without doing selection and ctrl+c.

http://cid-0ea641a5a7f665a1.skydrive.live.com/embedrow.aspx/Public/HtmlDecoded.zip

To run it, you need install .net 2.0 runtime. Source codes are also included. VC# 2008 express edition is used.

开心网争车位自动程序

在v0.2的基础上,形成了1.0版。主要特点,
1. 自动帖条;
2. 智能泊车(自动识别免费车位和颜色,免费车位优先级最低,白名单里用户优化级最高)
3. 定时周期运行
4. 多线程
5. 支持主要参数和匹配模式的配置
6. 绿色、安全

如果好用,请给我email发个邮件,我的email是[email protected]

欢迎反馈测试结果

已知问题:

应用程序正常初始化失败
安装.net framework 2.0运行包,下载点这里,不会黑屏。
http://cid-0ea641a5a7f665a1.skydrive.live.com/embedrow.aspx/Public/KaixinWar.v1.zip

Excel column splitter

Author: Zhuotong Nan ([email protected])

The newest 2007 Excel holds a much larger amount of columns than the old 2003 version. So maybe you are facing the situation you want to convert to the old version but it cannot hold so much columns. Here attached the codes I developed to deal with this situation. The script will separate the sheet holding a large number of columns into several sheets with a specified number of columns.

image 
A screen shot of this script

To use it, import the VBA source codes to your Excel file, open your VBA editor, find this userform, and run it. Before running it, put your cursor on any cell of the data you want to be separated.

The author would like to thank Shugong Wang for his kind help on VBA scripting.

http://cid-0ea641a5a7f665a1.skydrive.live.com/embedrowdetail.aspx/Public/excel.splitter.script.zip

ArcGIS栅格向Surfer Grid的格式转化 (Convert ArcGIS raster to Surfer GIS)

Zhuotong Nan ([email protected])

以下简单的代码实现了ArcGIS支持的栅格向Golden software的Surfer ASCII grid的格式转化。需要注意的是,ArcGIS的栅格左上为坐标原点,向下向右为正。而Surfer grid是左下是坐标原点,向上向右为正。

Here I will show a class that implements the major functionality to convert any Raster file supported by ArcGIS to Surfer ASCII grid format. The resulting Surfer grid can be read by Surfer with version 6 or higher. One point we need to keep in mind, ArcGIS raster stores data in row-major order, right and downwards being positive, while Surfer grid takes lower left corner as the original, right and upwards being positive.

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using ESRI.ArcGIS.Geodatabase;
using ESRI.ArcGIS.DataSourcesRaster;
using ESRI.ArcGIS.esriSystem;
using ESRI.ArcGIS.Geometry;

namespace RasterToSurferGrid
{
/// <summary>
/// A wrapper class for Surfer ASCII grid file
/// </summary>
class SurferGrid
{
string id = “DSAA”;
int nx;//columns
int ny; //rows
double xlo; //the minimum X value of the grid
double xhi; //xhi is the maximum X value of the grid
double ylo;
double yhi;
double zlo;
double zhi;
List<double> data; //the grid is stored in row-major order, with the lowest row (minimum Y) first

public SurferGrid(string rasterPath)
{
data = new List<double>(nx * ny);

IWorkspaceFactory wf = new RasterWorkspaceFactoryClass();
IRasterWorkspace wk=(IRasterWorkspace)wf.OpenFromFile(System.IO.Path.GetDirectoryName(rasterPath), 0);
IRasterDataset ird = wk.OpenRasterDataset(System.IO.Path.GetFileName(rasterPath));
IRaster raster=ird.CreateDefaultRaster();

IRasterProps props = (IRasterProps)raster;
nx = props.Width;
ny = props.Height;
IEnvelope env = props.Extent;
xlo = env.XMin;
xhi = env.XMax;
ylo = env.YMin;
yhi = env.YMax;
//zlo = env.ZMin;
//zhi = env.ZMax;

//populate data
PntClass extent=new PntClass();
extent.X=nx;
extent.Y=ny;
IPixelBlock datablock=raster.CreatePixelBlock(extent);
PntClass orig = new PntClass();
orig.X = 0;
orig.Y = 0;
raster.Read(orig, datablock);

for (int y = ny – 1; y >= 0; y–)
//for (int x = 0; x < nx; x++)
{
for (int x = 0; x < nx; x++)
{
data.Add((float)datablock.GetVal(0,x,y));
}
}

//get the max, and min
double max, min;
max = data[0]; min = data[0];
for (int i = 0; i < data.Count; i++)
{
max = max > data[i] ? max : data[i];
min = min < data[i] ? min : data[i];
}

zlo = min;
zhi = max;
}

public void WriteToFile(string outPath)
{
//StringWriter sw = new StringWriter();
StreamWriter sw = new StreamWriter(outPath);
sw.WriteLine(id);
sw.WriteLine(“{0} {1}”, nx, ny);
sw.WriteLine(“{0} {1}”, xlo, xhi);
sw.WriteLine(“{0} {1}”, ylo, yhi);
sw.WriteLine(“{0} {1}”, zlo, zhi);
StringBuilder sb=new StringBuilder();
for (int i = 0; i < ny; i++)
{
for (int j = 0; j < nx; j++)
{
sb.AppendFormat(“{0} “, data[i*nx+j]);
}
sb.Remove(sb.Length – 1, 1);
sb.AppendLine();

}
sw.Write(sb.ToString());
sw.Close();
}

}

}

附件是源代码。工具的运行需要dotnet framework 2.0和ArcView以上的License运行。编译后在命令行敲入RasterToSurferGrid有使用提示。

The attached please find necessary source codes which can be compiled on your own platform. Dotnet framework 2.0 and a valid ArcView license or higher are both required to run this tool. Besides, ArcGIS assemblies for dotnet framework are required. After successful compilation, typing RasterToSurferGrid following the command prompt shows you its usage information.
http://cid-0ea641a5a7f665a1.skydrive.live.com/embedrowdetail.aspx/Public/RasterToSurferGrid.zip

读取网站的Alexa排名/Get Alexa ranking data for your site

南卓铜(Zhuotong Nan, [email protected])

由于网站自己设置的网站访问数有时不真实,为了比较网站的访问量,我们一般使用权威的第三方网站来比较访问量。Alexa网站提供被大家认可的排名数据。比如,访问http://www.alexa.com/data/details/traffic_details/westdc.westgis.ac.cn,可以看到“西部数据中心”目前排名访问。

Alexa提供了收费的Web service允许大家使用其数据,大概是每1000次请求0.15美金(见这里)。收费并不高,而且包括众多的功能。

然而作为程序员,有时候宁愿挑战一下自己的能力。比如有没有一种免费而且合法的手段来获取它的排名数据,比如Westdc.westgis.ac.cn目前排名1,080,823里的这个名次(May 06 2008)。

Alexa为了挣钱,使用了一些方法来防止简单的页面数据获取。比如我们看排名的HTML片断:

<span class=”descBold”> &nbsp;<!–Did you know? Alexa offers this data programmatically.  Visit http://aws.amazon.com/awis for more information about the Alexa Web Information Service.–><span class=”c669″>1,</span><span class=”cbf1″>34</span>0<span class=”cd05″>80</span><span class=”c9d1″>,8</span><span class=”c2e8″>23</span></span>

直接从Web页面拷贝的结果是1,34080,823,而不是正确的1,080,823。这是因为Alexa增加了一些<span>标签来混淆HTML代码,这些<span>的CSS被设置成display:none,所以在浏览器里显示却是正确的。而且这些混淆的<span>标签是随机任何组合的。

解决方案可以从模拟浏览器显示出发,逐步剥离没用的信息,最终获取排名数字。

a. 获取整个HTML源代码;分析获取源代码中有关排名的HTML片断;
b. 下载干扰的CSS表,取得display属性为none的全部css类名;
c. 利用css类名列表,从HTML片断中移去对应的<span>标签和标签内的数字;
d. 移去剩余的HTML标签;
e. 转成数值输出。

以下代码演示了此方法,使用了c# 2.0,在Visual Studio 2005编译运行通过。代码里使用了正则表达式。

/* Purpose: to get Alexa ranking data by using c#
* Author: Zhuotong Nan ([email protected])
* Date: May 06 2008
*/
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace wml.stat
{
class AlexaRanking
{
public static int Rank(string url)
{
int ret = -1;

Uri uri = new Uri(url);
string newUrl = “http://www.alexa.com/data/details/traffic_details/” + uri.Host;
System.Net.WebClient wc = new System.Net.WebClient();
string html=wc.DownloadString(newUrl);

//pattern for obtaining html codes in relation to ranking data
string htmlpattern = @” about the Alexa Web Information Service.–>(.+?)</span><!–“;
string snipet = Regex.Match(html, htmlpattern).Groups[1].Value;

//get css file which store css classes preventing from scrambling
string cssUrl = “http://client.alexa.com/common/css/scramble.css”;
string cssfile = wc.DownloadString(cssUrl);

//css class pattern for getting CSS class listing with no display to the browse
string [email protected]”.(.*?) {“;
MatchCollection cssmc = Regex.Matches(cssfile, cssclassPattern);
//css classes without display, forming reg patterns
List<string> css_nodisp_patterns = new List<string>();
foreach (Match m in cssmc)
{
css_nodisp_patterns.Add( “<span class=”” + m.Groups[1].Value
+””>.*?</span>”);
}
//remove those classes from html snippet
foreach (string p in css_nodisp_patterns)
{
snipet=Regex.Replace(snipet, p, “”);
}

//see html snippet left
//remove span tags
string tagPattern = “<[^>]*>”;
snipet=Regex.Replace(snipet, tagPattern, “”);

ret = Int32.Parse(snipet, System.Globalization.NumberStyles.AllowThousands);
return ret;
}

static void Main(string[] args)
{
AlexaRanking.Rank(“http://westdc.westgis.ac.cn”);
}
}
}

本文独立实现,但后来google发现有人利用了差不多的方法,只不过在实现上用了PHP,最终产生的结果稍有不同,见 http://plice.net/?p=10

Kappa tool

Zhuotong Nan (南卓铜,[email protected])

网上可以找到的计算Kappa系数的工具都是基于ArcView 3.x,如KappaStat。大多是做Land cover classification的人开发的。计算对象是采样的点文件和分类的polygon或者grid文件。但如果我们将Kappa应用于两个图像比较它们的相似性时,这类软件不能直接用。一个可行的方法是,将其中一个图像转换成point文件,比如应用arcgis的RasterToPoint命令。但对于很多图像对要比较,则没有好用的法子。KappaStat的图形界面和本身是一个ArcView 3.x的扩展模块,决定了即使写Avenue脚本也不是件容易实现的事。

所以我们写了一个小工具Kappa.exe。它用来计算两个栅格文件之间的Kappa系数,只要两个栅格文件有同样的width和height(即两图像是同样的mxn阵列)。Kappa工具不考虑栅格文件所在的空间参考系,只是简单的逐网格进行运算。用户需要自己确认两幅图像有合适的空间参考系。

Kappa工具与KappaStat进行了结果验证,证明计算是可靠,输出结果包括误差矩阵(Error Matrix),Kappa系数,还包括Variance和P统计值。

Kappa工具对栅格图像的支持调用了ArcGIS ArcObjects栅格API,支持很多种栅格格式,包括ESRI grid, ASCII grid, TIFF/GeoTIFF, JPEG等。

Kappa工具运行在命令行。这意味着通过简单的批处理脚本,可以实现批处理功能。比如,以下Windows batch代码实现了两个目录下相同文件名的栅格文件的Kappa计算。计算结果保存在kappa.txt文件里。

@echo off
for %%i in (00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23) do (
echo Hour %%i …
kappa.exe data1rldas%%i data2rldas%%i  -k:.kappa.txt
)
echo Done.

命令行格式,Kappa.exe [参考或实测图像] [待比较图像] {-l:日志文件} {-k:Kappa文件}

[]表示必须,{}表示可选。对ESRI grid,图像指定到目录名;对个人geodatabase(通过Access数据库保存)的Raster dataset,以geodb.mdbraster指定;文件geodatabase的Raster dataset,以geodb.gdbraster指定。如果指定日志文件,在屏幕上输出的内容将同时保存到日志文件中;如果指定Kappa文件,则将Kappa系数单独保存在此文件中。此两选项开关在批处理有用。

运行需求

dotnet framework 2.0
ArcGIS desktop 9.2 with ArcView License or higher

运行平台

运行在Windows系统。但由于工具由C#编写,调用ArcGIS ArcObjects相关API,如果在Linux上有安装ArcGIS desktop 9.2,结合MONO runtime,本工具应当可以被编译成适合于Linux平台。

Binaries和source codes可以Email联系我。

Pm v2.25

一个俺用Qt写的用于记录你平时遇到的精彩英语句子的小软件。我感觉对于大家平时需要积累英语表达来撰写科技论文很有帮助,我一般在看英文文章时,这个小东西都是打开的,看到好的地道的表达方式就记录下来,等自己写的时候如果想不起来就查一下,反正我的经验是经常感觉类似句子表达以前在某地方看过但死活想不起来,这时候这东西就发挥作用了。没有批量导入功能,有朋友请求过,实现起来并不难,但我不想做,因为这小东西本来就是为了帮大家提高英语的,如果能强迫大家敲一遍那就起来作用了。不要向我请求什么功能,又不是商用了,我没有太多时间来改善这东西,功能够用就好。

请点击这里下载Pm v2.25安装文件

MD5 sum: f9d813cbf325c3eb93beb35f1335e1b3

完成了一个雷达数据到水文模型时间序列的转换程序包

这边的一个韩国同事前面写了一个Python版本的,在Linux下可以运行。我在Windows下对它进行了改进,并做了一些优化。结果在正式计算的时候,发现Ohio流域的一个子流域(2560个HRAP网格)10年逐小时估计要花上233天以上,同时需要空间也达100G以上。所以尽管该Python可用,但基本上对于大流域属于不实用的。为了完成工作,与梁讨论了以后,决定结合GIS和自己写代码,大概Coding了两周吧,现在终于完成了好几个小工具。在性能上进行了大量的优化(前一版本在转换到一半的时候出现内存不够,所以调试要花十分多的时间)。
我写了一个Instruction,大概从原始下载过来的XMRG二进制文件到可用的水文模型(如VIC)的时间序列大概要经过近10个步骤。不过实现了一个有意思的内插方法(与前Python算法相同),从HRAP网格到1/8度的转换时,由于每个1/8度网格覆盖多个HRAP格子,这个算法使用了面积加权来计算1/8度网格的平均降水。
现在完成上述任务大概需要2天。大家就可以在自己的桌面机器上用了,而不必一定要使用服务器(有大内存)。
没有时间做多线程或者并行优化,估计那样异步或并行处理性能会得到更好的提升。