Category Archives: Tech

Live Space空间博客导出工具

Live Space博客导出工具
Ver.1.1
Zhuotong Nan ([email protected])

什么用

本工具将Windows Live Space的所有帖子导出成一个XML文件,利用Google Blogger的Import功能,可以将这些帖子导入到Blogger。
本工具通过使用页面分析的方法避免了传统使用Weblog API有帖子个数的限制(space读取20个帖子,Blogger分布每天50个帖的限制)。

如果您是资深的live space用户,已经在space上有很多的帖子,这时如果因为各种原因(比如因为国内封禁了您的space)想转移到
Blogger,本工具可以极大的方面您。此外,Blogger的确有比space更为强大的功能。

生成XML后再作什么

登录到你的Blogger,访问 http://draft.blogger.com/home, 然后点击 Settings,可以看到 Import blog, Export blog和 Delete blog
等。使用Import blog功能将生成的XML文件导入到Blogger。发布后的Blogger被标记为imported,在后台发布这些imported帖子,
或者删除这些导入帖子。

运行需求

.net framework 2.0 或以上。可以从微软站点获取。
这是framework 3.5的链接,
http://www.microsoft.com/downloads/details.aspx?familyid=AB99342F-5D1A-413D-8319-81DA479AB0D7&displaylang=en

更多帮助

有关界面各参数的设置,请访问 http://nanzhuotong.blogspot.com
在此站点还可以获取帮助或者提交bug报告。
最新版也将在此站点发布。

image 
第一次运行,填写各必要的参数

image
正在运行

image
关于

Ver. 1.1
+ add a simple Chinese readme file.
* fix bug that program run even when no necessary info is provided. (11/12/08)
+ include Microsoft.WindowsLive.Id.Client.dll in the distr pack. (11/8/08)
# passwords now are saved to config file with md5 encrypted.
# add a much nice about box
+ localized, Simplified Chinese supported. (11/7/08)

Ver. 1.0
+ initial release (11/7/08)

livespaceexport.v1.1.zip
MD5: da31bf89753721f47bd8d3d33aabd228

Move blog posts from Live Space to Google Blogger

I developed a small tool to enable the massive posts movement to the Google Blogger. I attached the README.txt file below.

livespaceexport.v1.zip
md5 digest: 401b130f69a0d83f3696c6c07a6d0c0b

Live space export utility
Ver.1.0
Zhuotong Nan ([email protected])

PURPOSE

This tool is to export all live space posts (without 20 posts limit due to using a different method to achieve posts) and
to generate a xml file in a format which can be imported to Blogger.

This tool is specially useful when you want to move to a newly established Blogger. Considering you have already written
hundreds of posts in your live space, it is impossible to move all posts manually. The existing tools on the Internet
with similar purpose have more constraints, for ex, 20 posts max from live space, and 50 posts per day on the Blogger side.

The tool is designed to address those issues.

CONFIGURATION

Windows Live ID – your live id. Mine is [email protected].

Live space user name – my space is http://nzt.spaces.live.com, in this case the user name is nzt.

Live space secret word – go to your live space setting, enable your email publishing. the secret word can be found there.

Start page to navigate all posts – sign in your space, it is the link address leading to the Summary list of Blog,
My start page is like http://nzt.spaces.live.com/?_c11_BlogPart_BlogPart=summary&_c=BlogPart, replace nzt with
your name. This tool will use the link to achieve all posts list.

Use local buffer posts listing file – the first time it always gray out. After you downloaded the Live space posts list (posts
contents have not yet downloaded at that time), the posts list were saved to a local file named blogposts.txt in the current
directory. Then by enabling this option, the tool will skip the step of downloading posts list, and then continue to catch
posts contents as well as to form a final xml file.

BlogID – your Blogger id. when you log in your Blogger space, click the Customize link, you will see
a url in your browser addr bar like http://www.blogger.com/rearrange?blogID=207104551370538866, the number string here is
your id (here actually it is mine, ^_^).

Blogger user name – generally your gmail account
Blogger password – gmail password

Xml file name – the posts from live space will be organized in this file. Import this file to Blogger using Import from
Blogger in draft. To get the import function, at the time of this release, log in your blogger, and go to visit
http://draft.blogger.com/home, then click Settings on that page, you can see Import blog, Export blog and Delete blog.

WHAT TO DO NEXT

log in your blogger, and go to visit http://draft.blogger.com/home, then click Settings on that page, you can see
Import blog, Export blog and Delete blog. Use Import blog to import the created xml file. Do not worry about existing
blogger posts, they will be kept. the imported posts will be assigned with imported tag, you can make decision to publish
or delete some or all of them.

HOW IT WORKS

get posts list from live space -> download all posts -> get existing posts of your Blogger -> append posts from live space
to your existing Blogger posts to generate a xml file for import -> import to Blogger (thanks to draft Blogger)

I did not implement the import functionality in this tool based on the network consideration. The created xml file might be large
in size. In this case it requires much time to import even in a very good network condition. You decide when you do the import
using the xml file.

IMPORTANT

The tool will not alter/delete/add any post to your live space. Existing blogger posts will be exported and posts from live space
will appended to existing blogger posts. the time stamps of live space posts will be kept.

If you only import partial posts of live space to your Blogger, for ex due to network problem, make sure you delete those before
you do the second import. otherwise there will be duplicated posts.

The tool only communicates to your live space and your blogger.
For the sake of safety, please compare md5 of the zip package downloaded with the md5 string on author’s Blogger.

REQUIREMENTS

.net framework 2.0 or above is required to run it. get it from ms website.
here is the link for the framework 3.5,
http://www.microsoft.com/downloads/details.aspx?familyid=AB99342F-5D1A-413D-8319-81DA479AB0D7&displaylang=en

HELP

Visit http://nanzhuotong.blogspot.com for help and bug report.
Latest available version will also be published in that web site.

Ver. 1.0
+ initial release (11/7/08)

p.s. forgive my English and typos, I am not native English speaker.

 

image
the first time the tool run

image
properly configure it

image
the start page of summary list of posts. the page url is necessary to make the tool work.

image
enable the email publishing and set the secret word. secret word acts as password to access the live space api.

image
click start! to run it

image
you can cancel the running at any time. note in this case the formatted xml file is not completed.

image
after you downloaded all the posts, from the draft Blogger to reach Setting link (different from the normal Blogger setting link)

image
following the setting link from draft Blogger, see the Import blog function. follow its steps, good luck.

HtmlDecode Tool

It’s a very simple tool doing the html decoding. I developed it today because when I searched the Internet, I cannot locate such a tool from so many search results. There is some web site with similar conversion function. But I hate the ads on the website, which makes my laptop work slowly.

image

Look at the figure above. Put the texts to be decoded into the upper textbox, and press “Covert & copy” to do the decoding. The decoded strings will be shown on the lower textbox and copied to the clipboard when press that button. Then you can easily paste to anywhere else without doing selection and ctrl+c.

HtmlDecoded.zip

To run it, you need install .net 2.0 runtime. Source codes are also included. VC# 2008 express edition is used.

开心网争车位自动程序

在v0.2的基础上,形成了1.0版。主要特点,
1. 自动帖条;
2. 智能泊车(自动识别免费车位和颜色,免费车位优先级最低,白名单里用户优化级最高)
3. 定时周期运行
4. 多线程
5. 支持主要参数和匹配模式的配置
6. 绿色、安全

如果好用,请给我email发个邮件,我的email是[email protected]

欢迎反馈测试结果

已知问题:

应用程序正常初始化失败
安装.net framework 2.0运行包,下载点这里,不会黑屏。
KaixinWar.v1.zip

XP Antispyware 2009 & alert balloon virus

Author: Zhuotong Nan ([email protected])

Description: An alert balloon continues to pop up to say something about the antispyware (see the below figure). When you click either left or right mouse over it, a downloading dialog is showing up and begins to download XP antispayware 2009 to your local machine. After its completion, the ‘antispyware’ looks like to scan your computer, and warns you of many spywares found in your system. This so-called XP Antispyware 2009 is not a real anti spyware, instead it is a spyware. The scanning will copy a lot of spywares to your different directories. Antivirus and anti-spyware software like Kaspersky and Spybot cannot be able to run.

image

Solution:

1. find the XP antispyware 2009 menu in your start menu, and try to uninstall.

2. delete C:Program FilesXP_AntiSpyware*

3. run regedit to bring up the register editor, and clean

HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionRunbrastk
HKLMSoftwaremicrosoftwindowscurrentversionrunXP Antispyware 2009
HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersionWindowsAppInit_DLLS karna.dat
HKEY_CURRENT_USERSoftwareMicrosoftWindowsCurrentVersionRunSVCHOST.exe C:WINDOWSsystem32driverssvchost.exe (if there exists)

4. run sfc /scannnow to restore your system protected files since beep.sys under your system32drivers directory has been modified by the virus. Or you can copy beep.sys from a clean computer to your infected one. beep.sys exists under two directories, system32drivers, and system32dllcache.

5. reboot your computer with safe mode, remove following files,

c:windowsbrastk.exe
c:windowssystem32brastk.exe
c:windowskarna.dat
c:windowssystem32karna.dat

check again your register at above-mentioned locations.

At this point, Karspersky and Spybot should be run normally. Run them and take a full scan.

6. Reboot. Hope it now works well.

Excel column splitter

Author: Zhuotong Nan ([email protected])

The newest 2007 Excel holds a much larger amount of columns than the old 2003 version. So maybe you are facing the situation you want to convert to the old version but it cannot hold so much columns. Here attached the codes I developed to deal with this situation. The script will separate the sheet holding a large number of columns into several sheets with a specified number of columns.

image 
A screen shot of this script

To use it, import the VBA source codes to your Excel file, open your VBA editor, find this userform, and run it. Before running it, put your cursor on any cell of the data you want to be separated.

The author would like to thank Shugong Wang for his kind help on VBA scripting.

excel.splitter.script.zip

有关此空间的搜索

space上面自带的search不好用了,本空间被google索引,一般索引时间不晚于更新后半小时。因此可以用google的高级搜索功能来检索。在google里搜索,

site:nzt.spaces.live.com

这里的用具体的搜索关键词来代替。比如我要在搜索本空间上有关gis的blog,搜索,

site:nzt.spaces.live.com gis

google with example keywords

结果如上图所示。

其他人的空间也可以类推了,更改site:后的域名即可。

使用matlab拟合Gamma分布

南卓铜 ([email protected])

任务描述: 有一批数据,需要将其histogram拟合成Gamma分布,histogram分20个bin,从0到100。结果如下图所示:

hist_dist_final

实现步骤:

1. 装载数据进来,数据文件是data.for.hist.txt,文件结构是一列无header的数据,共43个数据。

>> data=load(‘data.for.hist.txt’)

data是43×1的矢量。

2. 打开distribution fitting tool

>> dfittool

选择Display type为Density(PDF)。点击Data…,弹出Data对话框。

在此对话框内,设置Data为 ‘data’矢量,Censoring和Frequency为 none。设置data set name为dataset,然后点击 Create Data Set。

点击Set Bin Rules,设置Bin width为5。其余不变,OK确认关闭Set Bin Rules。这时,我们将data的histogram设置为每个bin长度为5。在不同的应用里,bin根据具体情况调整。关闭Data对话框。

3. Gamma拟合

点击New Fit…,在New Fit对话框里选择Data为dataset,Distribution为Gamma,点击Apply进行拟合。点击Close关闭New Fit对话框。

此时,看到下图的效果:

image

我们需要修改它的X域,使之匹配我们的histogram的X最大值100。点击Tools菜单下的Axes Limit Control,将X Lower Limit和X Upper Limit更改为 0和100。

image

4. 导出成figure

使用File菜单下的Print to Figure导出成figure。对figure进行必要修改。

>>figure(1)  (注,如果已经打开多个figure窗口,导出的figure序号可能不是1而是其它数字,注意figure窗口标题上的Figure n里的n)
>>legend hide
>>xlabel ‘1/32 degree, mm/day’
>>ylabel ‘Frequency’

此时效果如下,

image

5. 修改figure格式

设置figure为Tools-> edit plot状态,打开View->Property Editor。

将histogram的plot type从line改成Area,配置合适的face color和edge color。

将x轴的范围X Limits设置为0 to 100。

image

这时,histogram压着拟合曲线,选择拟合曲线,右键Cut后,再Paste进来,可以将拟合曲线带到最上面。

6. 修改数据

注意到目前为止,Y轴仍是Density。我们需要将Y坐标修改为Frequency(或者count)。Count=Density x bin width x data count,这里bin width是5,data count是43。因此,我们需要对Ydata乘以5×43=215。

选择拟合曲线,

>>ydata=get(gco,’YData’)
>>ydata_215 = ydata*215
>>set(gco,’YDATA’,ydata_215)

选择histogram,

>>set(gco,’YDATA’,get(gco,’YDATA’)*215)

修改xlim和ylim

>>ylim([0 10])
>>xlim([0 100])

将figure还原为非编辑状态。Tools->Edit Plot,并关闭Property editor。

image

以上步骤中数次修改xlim和ylim,并不是每次都需要,这里仅是为了演示需要。

7. 保存图形

>>print -dpng ‘histogram_dist.png’

在figure对话框的File-> Save As…保存为MATLAB Figure(.fig),以后可以通过File->Open…打开。

结论

本文演示了如何使用Matlab对数据的histogram使用Gamma分布进行拟合。介绍了matlab的常用命令,Distribution Fitting Tool,figure的格式化编辑功能,对figure对像数据的修改等。也介绍了比如如何调整figure里对像的z-order等实用技巧。

data.for.hist.txt

Excel 2007 Chart导出为image

南卓铜 ([email protected])

总结如下,

1) copy chart in Excel 2007,paste到paint(画图)或其它图像处理软件,然后保存为图像,如jpeg。

2) 写vba代码,ActiveChart.Export FileName:=’image.gif’, FilterName:=’GIF’,可以保存为JPG, GIF和PNG。需要知道如果在Excel里写VBA代码,并执行,有可能要求打开宏安全控制选项。但无法控制图像质量。MATLAB Handle Graphics

3) copy chart,打开PowerPoint,将chart使用special paste…到Powerpoint,可供选项有emf, png, gif, jpeg等。在powerpoint中右键点击粘贴进来的图像,可以save as picture…,保存为需要的格式。

4) 将包括chart的worksheet保存为了html,将同时生成.gif的chart图像。

5)第三方add-in,比如PUP 7,不过需要掏钱。

你还有什么tip?

大量时间序列快照文件到时间序列文件相互转化的一个思路

南卓铜([email protected])

想像这样一种情境,我们有一个空间网格文件,分辨率为30×30 m,空间范围为30x30km^2,每个格子一个数据的话,也就是说有1,000,000个数据(假设是double类型)。每个网格文件可被认为是研究区域时间序列的一个快照。整个时间序列为11年(1997-2007)逐小时,即多达96,408个网格文件。我们要做的是,将全部的时间序列快照文件转换成每网格上的时间序列文件(将形成1M个文件)。类似的,我们还要做的是全部网格的时间序列文件时间序列快照文件的转换。

全部保持文件打开是不可能的,因为文件句柄是很宝贵的资源,每个进程都只允许有限个打开的文件(比如512个)。通常的迭代思路是,打开一个时间序列快照文件(网格文件),取其中一个网格上的数据,打开此网格的时间序列文件,写入时间和对应的数据,关闭两文件。这意味着需要打开和关闭文件1Mx96,408次。大家知道磁盘I/O的开支是十分昂贵的。此迭代方法将需要极长的时间来完成(一周甚至几周,取决于硬盘读取数据)。而内存却没有得到有效的利用。

对于内存十分大(比如64G内存)的服务器,也许可以将全部的待转换文件一次性装载到内存,每个在内存内分析,组合成输出格式,再一次性格式化输出。对于一般的个人机器此方法不通。

为了优化速度,我们采用这样的解决方案。针对时间序列文件时间序列快照文件(网格文件)的转换,

在内存生成n个时间序列快照文件,每个网格上用Missing data来填充 (内存主要消耗在这里)
打开一个时间序列文件
    从时间序列文件读n行(每行包括时间和对应的double值)
    将读取的n个数据,写入对应的n个快照文件的对应位置
关闭此时间序列文件,迭代
将内存内的n个快照文件写出
从n+1的位置开始再迭代以上过程,一直到结果。(关键)

最后一次可能不正好等于n,需要程序作相应控制。n的取值需根据运行的内存情况进行高速。对2G内存的工作站,n取为5000-10000。内存越大,n值可以取越大,可以有更好的执行效果。

当然,也可以同时打开多个时间序列文件,以最大化优化性能,但带来的是迭代控制上的复杂。而且据我的有限测试,同时打开多个时间序列文件,性能并没有得到明显改善(可以理解,因为磁盘I/O的存取本质上讲是磁头的顺序读取,由同一个磁头臂来控制)。

其中需要注意的是地方,是如何控制下一次准确快速定位到n+1的位置上。时间序列文件是文本文件,顺序读取在性能上很受影响。比如在最后一个循环时,将先遍历前面的全部行,然后才到达需要的起始行。我们需要以二进制形式打开,并自行控制每个时间序列文件的起始读取位置(各文件位置可能不一样,由于每行数据长度不等)。在c#里,以StreamReader打开,无法通过base stream取得当前准确的位置(position)。我们构造了TimeSeriesDataFile类。初始化需要给定文件名和起始读取的位置。ReadLines函数可以返回给定数目的数据行,通过CurrentPosition属性取得下一次读取的起始位置。

using System;
using System.Collections.Generic;
using System.Text;

namespace nzt.TimeSeries2Spatial
{
    /// <summary>
    /// Access time series data text file as binary.
    /// </summary>
    class TimeSeriesDataFile
    {

        private string _filepath;
        private System.IO.FileStream _fs;
        private long _lastpos;
        private long _startpos;
        private const int MAXLINELENGTH=50; //bytes, ensure it larger than max length of each line.

        public TimeSeriesDataFile(string filename, long startposition)
        {
            _filepath = filename;
            _startpos = startposition;
            Open();

        }

        private void Open()
        {
            _fs = new System.IO.FileStream(_filepath, System.IO.FileMode.Open, System.IO.FileAccess.Read);
            _lastpos=_fs.Seek(_startpos, System.IO.SeekOrigin.Begin);
        }

        /// <summary>
        /// Read a number of lines from stream beginning at startposition.
        /// </summary>
        /// <param name="count">Number of lines to be expected to return</param>
        /// <returns></returns>
        public string[] ReadLines(int count)
        {
            if (_fs == null) return null;

            byte[] buffer = new byte[MAXLINELENGTH * count];

            int bytesRead=_fs.Read(buffer, 0, buffer.Length);

            if (bytesRead == 0) return null;

            //we have data in buffer now.
            List<String> sb_list = new List<String>();
            int c = count;
            StringBuilder sb = new StringBuilder();
            int i;
            for (i = 0; i < bytesRead; i++)
            {
                if (buffer[i] != ‘r’ && buffer[i] != ‘n’)
                    sb.Append((char)buffer[i]);
                else if (buffer[i] == ‘n’)
                {
                    sb_list.Add(sb.ToString());
                    if (–c<=0) break;
                    sb = new StringBuilder();
                }
            }
            if (c>0 && sb.Length>0) sb_list.Add(sb.ToString());

            _lastpos += i+1;

            return sb_list.ToArray();
        }

        public long CurrentPosition
        {
            get { return _lastpos; }
        }

        public void Close()
        {
            _fs.Close();
        }

        ~TimeSeriesDataFile()
        {
            Close();
        }

    }
}

对于时间序列快照文件(网格文件)时间序列文件的转换,应用同样的思路。但由于时间序列快照文件(网格文件)一般较小,比如几百KB(相对,11年的逐小时时间序列文件则到2MB以上),则无须对StreamReader进行改造,可以一次性load到内存,在内存进行定位分析。

如果大家还有好的解决方案,也请分享。