vendredi 29 mai 2015

Establishing why an object can't be pickled

I'm receiving an object, t, from an api of type Object. I am unable to pickle it, getting the error:

  File "p.py", line 55, in <module>
    pickle.dump(t, open('data.pkl', 'wb'))
  File "/usr/lib/python2.6/pickle.py", line 1362, in dump
    Pickler(file, protocol).dump(obj)
  File "/usr/lib/python2.6/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.6/pickle.py", line 313, in save
    (t.__name__, obj))
pickle.PicklingError: Can't pickle 'Object' object: <Object object at 0xb77b11a0>

When I do the following:

for i in dir(t): print(type(i))

I get only string objects:

<type 'str'>
<type 'str'>
<type 'str'>
...
<type 'str'>
<type 'str'>
<type 'str'>

How can I print the contents of my Object object in order to understand why it cant be pickled?

Its also possible that the object contains C pointers to QT objects, in which case it wouldn't make sense for me to pickle the object. But again I would like to see the internal structure of the object in order to establish this.

Difference between counting and totaling? [on hold]

For e.g we need to calculate the total marks of students in a class and take an average: so will I need to need to use a counter first to store the marks?

Finding count of distinct elements in DataFrame in each column

I am trying to find the count of distinct values in each column using Pandas. This is what I did.

import pandas as pd

df = pd.read_csv('train.csv')
# print(df)

a = pd.unique(df.values.ravel())
print(a)

It counts unique elements in the DataFrame irrespective of rows/columns, but I need to count for each column with output formatted as below.

policyID              0
statecode             0
county                0
eq_site_limit         0
hu_site_limit         454
fl_site_limit         647
fr_site_limit         0
tiv_2011              0
tiv_2012              0
eq_site_deductible    0
hu_site_deductible    0
fl_site_deductible    0
fr_site_deductible    0
point_latitude        0
point_longitude       0
line                  0
construction          0
point_granularity     0

What would be the most efficient way to do this, as this method will be applied to files which have size greater than 1.5GB?


Based upon the answers, df.apply(lambda x: len(x.unique())) is the fastest.

In[23]: %timeit df.apply(pd.Series.nunique)
1 loops, best of 3: 1.45 s per loop
In[24]: %timeit df.apply(lambda x: len(x.unique()))
1 loops, best of 3: 335 ms per loop
In[25]: %timeit df.T.apply(lambda x: x.nunique(), axis=1)
1 loops, best of 3: 1.45 s per loop

Algorithm equalivence from Matlab to Python

I've plotted a 3-d mesh in Matlab by below little m-file:

[x,n] = meshgrid(0:0.1:20, 1:1:100);

mu = 0;
sigma = sqrt(2)./n;

f = normcdf(x,mu,sigma);

mesh(x,n,f);

I am going to acquire the same result by utilization of Python and its corresponding modules, by below code snippet:

import numpy as np
from scipy.integrate import quad
import matplotlib.pyplot as plt

sigma = 1

def integrand(x, n):
    return (n/(2*sigma*np.sqrt(np.pi)))*np.exp(-(n**2*x**2)/(4*sigma**2))

tt = np.linspace(0, 20, 2000)
nn = np.linspace(1, 100, 100)  

T = np.zeros([len(tt), len(nn)])

for i,t in enumerate(tt):
    for j,n in enumerate(nn):
        T[i, j], _ = quad(integrand, -np.inf, t, args=(n,))

x, y = np.mgrid[0:20:0.01, 1:101:1]

plt.pcolormesh(x, y, T)

plt.show()

But the output of the Python is is considerably different with the Matlab one, and as a matter of fact is unacceptable. I am afraid of wrong utilization of the functions just like linespace, enumerate or mgrid...

Does somebody have any idea about?!...

PS. Unfortunately, I couldn't insert the output plots within this thread...!

Best

..............................

Edit: I changed the linespace and mgrid intervals and replaced plot_surface method... The output is 3d now with the suitable accuracy and smoothness...

Add values to a class in Python

Let's say I have the following in Python:

class Test():
    self.value1 = 1
    self.value2 = 2

def setvalue1(self, value):
    self.value1 = value

So one can set up value1 by doing:

Test.setvalue1('Hola')

or

Test.Value1 = 'Hola'

So far so good. My problem is I would like to set the values by reading them somewhere else so for instance I could have the following:

A = [['Value1','Hola'],['Value2','Adios']]

I would like to be able to run something that will do (in pseudo code):

for each in A:
    Test.each[0] = A[1]

Is this possible? Thanks so much!

Sorting todo.txt lines by its properties

I'm after a simple code to organize my todo.txt that has Gina Trapani's syntax, that is contexts are preceded by @, projects by +, priorities are marked by (A), (B) etc.. A task can have multiple contexts and projects.

What I would like to achieve is to first sort the lines by context and in the block of a contexts lines should be ordered by projects and lines with priorities comes first in the project.

My code until now:

import os
import sys
import re

# Configuration
todo_path = notepad.getCurrentFilename()

def ordered_set(inlist):
    out_list = []
    for val in inlist:
        if not val in out_list:
            out_list.append(val)
    return out_list

class Todo:
    def __init__(self, priority, context, project, due, task, cdate):
        self.__priority = priority
        self.__context = context
        self.__project = project
        self.__due = due
        self.__task = task
        self.__cdate = cdate

    def __len__(self):
     return len(str(re.sub(' +',' ',str(self.__priority) +' '+' '.join(self.__context) + ' ' + ' '.join(self.__project) + ' ' + str(self.__due) + ' ' + str(self.__task) + ' ' + str(self.__cdate) + '\n')))

    def priority(self):
        return self.__priority

    def context(self):
        return self.__context

    def project(self):
        return self.__project

    def due(self):
        return self.__due

    def task(self):
        return self.__task

    def cdate(self):
        return self.__cdate

def BuildTodos():
 global todos
 todo_file = open(todo_path, 'r')
 raw_todos = todo_file.readlines()
 todo_file.close()
 todos = []

 for item in raw_todos:
  item = item.strip("\n")
  todos.append(item)
 console.write("Loaded Todos\n")
 for idx, item in enumerate(todos):  
  words = item.split(' ')  
  priority = [word for word in words if re.match('^\([A-Z]\)',word)]
  context = [word for word in words if word.startswith('@')]  
  project = [word for word in words if word.startswith('+')]  
  due = [word for word in words if word.startswith('due:')]  
  task = [word for word in words if not re.match('^\([A-Z]\)',word) and not word.startswith('@') and not word.startswith('+') and not word.startswith('due:') and not re.match('[0-9]{4}-[0-9]{2}-[0-9]{2}',word)]  
  cdate = [word for word in words if re.match('[0-9]{4}-[0-9]{2}-[0-9]{2}',word)]
  todos[idx] = Todo(priority, context, project, due, task, cdate)
 console.write("Built Todos\n")
 todos.sort(key=lambda t: t.context())
# ----------------
# HELP NEEDED HERE
# sort the lines by context and within the block of contexts lines  should be
# ordered by projects and lines with priorities comes first in the project.
# ---------------- 

def OutTodos():
 for t in todos:
    console.write(re.sub(' +',' ',' '.join(t.priority()) + ' ' + ' '.join(t.context()) + ' ' + ' '.join(t.project()) + ' ' + ' '.join(t.due()) + ' ' + ' '.join(t.task()) + ' ' + ' '.join(t.cdate()) + '\n'))

console.clear()
BuildTodos()
OutTodos()

Example todo.txt file, contains utf-8 characters (!):

(A) @personal +study +python organize todo.txt áőúíéá
(A) Schedule annual checkup +Health áőúíéá
(B) Outline chapter 5 +Novel @Computer áőúíéá
(C) Add cover sheets @Office +TPSReports áőúíéá
Plan backyard herb garden @Home áőúíéá
Pick up milk @GroceryStore áőúíéá
Research self-publishing services +Novel @Computer áőúíéá
Download Todo.txt mobile app @Phone áőúíéá

I'm squeezing my mind on how to construct this sorting so to not end up with a monster. My guess would be to iterate on the todos list and have cascading ifs but not having any experience in sorting/list manipulations in python I'm out for advices.

Having issues with a corrupt encryption error

Im trying to encrypt every line of the file test.txt the function encrypt() will output basically the text all jumbled up. For an end of year project in my computer science class. But my problem is when i try to run the code "backwards" with decrypt_all it doesnt work and the file is still corrupted

from Crypto.Cipher import XOR
import base64
import os

def encrypt(key=None, plaintext=None):
    if key == None:
        key = "This_is_my_hidden_key"
    cipher = XOR.new(key)
    return base64.b64encode(cipher.encrypt(plaintext))
def decrypt(key=None, ciphertext=None):
    if key == None:
            key = "This_is_my_hidden_key"
    cipher = XOR.new(key)
    return cipher.decrypt(base64.b64decode(ciphertext))


#####Run below to encrypt all files in folder and each sub folder#######

def encrypt_all(UselessVariable = None):
    root = os.getcwd()
    path = os.path.join(root, "targetdirectory")
    x=0
    for path, subdirs, files in os.walk(root):
        for name in files:
            openfile = os.path.join(path, name)
            print openfile
            try:
                with open(openfile, 'r+') as a:
                    encrypted_text = []
                    for line in a:
                        encrypted_text.append(decrypt(None, line))
                    open(openfile,"w").close()
                    for text in range(len(encrypted_text)):
                        a.write((str(encrypted_text[text]))+ '\n')
            except IOError as e:
                print 'Operation failed: %s' % e.strerror
            x+=1
        print ""
    print x



#####Run below to decrypt all files in folder and each sub folder#######

def decrypt_all(UselessVariable = None):
    root = os.getcwd()
    path = os.path.join(root, "targetdirectory")
    x=0
    for path, subdirs, files in os.walk(root):
        for name in files:
            openfile = os.path.join(path, name)
            print openfile
            try:
                with open(openfile, 'r+') as a:
                    encrypted_text = []
                    for line in a:
                        encrypted_text.append(decrypt(None, line))
                    open(openfile,"w").close()
                    for text in range(len(encrypted_text)):
                        a.write((str(encrypted_text[text])))
            except IOError as e:
                print 'Operation failed: %s' % e.strerror
            x+=1
        print ""
    print x

Active Shape Models: matching model points to target points

I have a question regarding Active Shape Models. I am using the paper of T. Coots (which can be found here.)

I have done all of the initial steps (Procrustes Analysis to calculate mean shape, PCA to reduce dimensions) but am stuck on fitting.

This is the situation I am in now: I have calculated the mean shape with points X and have also calculated a new set of points Y that X should move to, to better fit my image.

I am using the following algorithm, which can be found on page 23 of the paper previously linked:


enter image description here


To clarify: is the mean shape calculated with Procrustes Analysis, and the is the matrix containing the eigenvectors calculated with PCA.

Everything goes well up to step 4. I can calculate the pose parameters and invert the transformation onto the points Y.

However, in stap 5, something strange happens. Whatever the pose parameters are calculated in stap 3 and applied in stap 4, stap 5 always results in almost exactly the same vector y' with very low values (one of them being 1.17747114e-05 for example). (So whether i calculated a scale of 1/10 or 1/1000, y' barely changes).

This results in the algorithm always converging to the same value of b, and thus in the same output shape x, no matter what the input set of target points Y are that I want the model points X to match with.

This sure is not the goal of the algorithm... Could anyone explain this strange behaviour? Somehow, projecting my calculated vector y in step 4 into the "tangent plane" does not take into account any of the changes made in step 4.


Edit: I have some more reasoning, though no explanation or solution. If, in step 5, i manually set y' to consist only of zeros, then in step 6, b is equal to the matrix of eigenvectors multiplicated by our meanshape. And this results in the same b I always get (since y' is always a vector with very low values).

But these eigenvectors are calculated from the meanshape using PCA... So what's expected, is that no change should take place, I think?


Can re.findall() return only the part of the regex in parens?

Looping through some data, I want to capture string of numbers that appear as page IDs (with more than one per line.) However, I only want to match number strings as part of a particular URL, but I DON'T want to record the URL, just the number.

I am currently using re.findall to identify the right URLs, and then re.sub to extract the number strings.

views = re.findall(r"/view/\d*?.htm", line)
for view in views:
    view = re.sub(r"/view/(\d+).htm", r"\1", view)
    pagelist.append(view)

Is there a way to do something like

views = re.findall(r"/view/(\d*?).htm", r"\1", line)   #I know this doesn't work

where the original findall() only returns the part of the match in parens?

how to install Lasagne package with python con windows

I'm new on python and I'm running some script on python 3.4. I'm getting the following error: ImportError: No module named 'lasagne'. Does someone know how to install this package on Python please?

Using postgres thru ODBC in python 2.7

  • I have installed Postgres.app and started it.
  • I have pip installed pypyodbc
  • I have copied the hello world lines from the Pypyodbc docs, and received the error below. any ideas what the issue might be?

Here is my code

  from __future__ import print_function
  import pypyodbc
  import datetime
  conn = pypyodbc.connect("DRIVER={psqlOBDC};SERVER=localhost") 

And I receive this error:

File "/ob/pkg/python/dan27/lib/python2.7/site-packages/pypyodbc.py", line 975, in ctrl_err
  err_list.append((from_buffer_u(state), from_buffer_u(Message), NativeError.value))
File "/ob/pkg/python/dan27/lib/python2.7/site-packages/pypyodbc.py", line 482, in UCS_dec
  uchar = buffer.raw[i:i + ucs_length].decode(odbc_decoding)
File "/ob/pkg/python/dan27/lib/python2.7/encodings/utf_32.py", line 11, in decode
  return codecs.utf_32_decode(input, errors, True)
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-1:   truncated data

what am I doing wrong?

Do I need to somehow initialize the DB / tables first? it is a weird error if that is the issue.

Different behavior of same regular expression in python and java

Firstly, my apologies as I don't know regular expressions that well.

I am using a regular expression to match a string. I tested it on python command line interface but when I ran it in Java, it produced a different result.

Python execution:

re.search("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US", "9.5 D(M) US");

gives the result as:

<_sre.SRE_Match object; span=(0, 11), match='9.5 D(M) US'>

But the Java code

import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class RegexTest {
    private static final Pattern FALLBACK_MEN_SIZE_PATTERN = Pattern.compile("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US");

    public static void main(String[] args) {
    String strTest = "9.5 D(M) US";
    Matcher matcher = FALLBACK_MEN_SIZE_PATTERN.matcher(strTest);
        if (matcher.find()) {
            System.out.println(matcher.group(0));
        }
    }
}

gives the output as:

5 D(M) US

I don't understand why it is behaving the different way.

Python time comparison at midnight

I have to save the time in AM PM format.

But i am having trouble in deciding how to enter midnight time.

Suppose the time is 9PM to 6AM next morning. I have to divide it into day to day basis . Like this

t1 = datetime.datetime.strptime('09:00PM', '%I:%M%p').time()

t2 = datetime.datetime.strptime('12:00AM', '%I:%M%p').time()

t3 = datetime.datetime.strptime('06:00AM', '%I:%M%p').time()

Now i want to know whether the t2 should be

12:00 AM or 11.59PM

If i do 12:00AM then i can't compare if 9pm > 12am but 11.59 looks odd or may be it is right way

PHP's array_slice vs Python's splitting arrays

Some background

I was having a go at the common "MaxProfit" programming challenge. It basically goes like this:

Given a zero-indexed array A consisting of N integers containing daily prices of a stock share for a period of N consecutive days, returns the maximum possible profit from one transaction during this period.

I was quite pleased with this PHP algorithm I came up, having avoided the naive brute-force attempt:

public function maxProfit($prices)
{
    $maxProfit = 0;
    $key = 0;
    $n = count($prices);

    while ($key < $n - 1) {
        $buyPrice = $prices[$key];
        $maxFuturePrice = max( array_slice($prices, $key+1) );          
        $profit = $maxFuturePrice - $buyPrice;

        if ($profit > $maxProfit) $maxProfit = $profit;
        $key++;
    }
    return $maxProfit;
}

However, having tested my solution it seems to perform badly performance-wise, perhaps even in O(n2) time.

I did a bit of reading around the subject and discovered a very similar python solution. Python has some quite handy array abilities which allow splitting an array with a a[s : e] syntax, unlike in PHP where I used the array_slice function. I decided this must be the bottleneck so I did some tests:

Tests

PHP array_slice()

$n = 10000;    
$a = range(0,$n);

$start = microtime(1);
foreach ($a as $key => $elem) {
    $subArray = array_slice($a, $key);
}
$end = microtime(1);

echo sprintf("Time taken: %sms", round(1000 * ($end - $start), 4)) . PHP_EOL;

Results:

$ php phpSlice.php
Time taken: 4473.9199ms
Time taken: 4474.633ms
Time taken: 4499.434ms

Python a[s : e]

import time

n = 10000
a = range(0, n)

start = time.time()
for key, elem in enumerate(a):
    subArray = a[key : ]
end = time.time()

print "Time taken: {0}ms".format(round(1000 * (end - start), 4))

Results:

$ python pySlice.py 
Time taken: 213.202ms
Time taken: 212.198ms
Time taken: 215.7381ms
Time taken: 213.8121ms

Question

  1. Why is PHP's array_slice() around 20x less efficient than Python?
  2. Is there an equivalently efficient method in PHP that achieves the above and thus hopefully makes my maxProfit algorithm run in O(N) time?

Showing sprite wihout a group

I have two sprites in a class for esier control (specifickly: tank turret and suspension). If I try to launch program it works wihout any errors, but it dosen't show anything. I also tried to put both of spirtes in group in class, bu it throved errorTypeError: draw() missing 1 required positional argument: 'surface'. How I should do the displaying of my tank wihout disambling of my group?

Reading mulitple data from a text file

I am trying to read two pieces of data from a single text file. Here is how the file looks:

PaxHeader/data-science000755 777777 777777 00000000262 12525446741 015207 xustar00armourp000000 000000 18 gid=1050026054
17 uid=488147323
20 ctime=1431779590
20 atime=1431779720
38 LIBARCHIVE.creationtime=1431719347
23 SCHILY.dev=16777218
24 SCHILY.ino=110226037
18 SCHILY.nlink=4
data-science/000755 Äâ{Ä>ñ F00000000000 12525446741 013547 5ustar00armourp000000 000000 data-science/PaxHeader/merged-sensor-files.csv000644 777777 777777 00000000214 12525446724 021646 xustar00armourp000000 000000 18 gid=1050026054
17 uid=488147323
20 ctime=1431779590
20 atime=1431779720
23 SCHILY.dev=16777218
24 SCHILY.ino=110226038
18 SCHILY.nlink=1
data-science/merged-sensor-files.csv000644 Äâ{Ä>ñ F00016452751 12525446724 020164 0ustar00armourp000000 000000 MTU, Time, Power, Cost, Voltage
MTU1,05/11/2015 19:59:06,4.102,0.62,122.4
MTU1,05/11/2015 19:59:05,4.089,0.62,122.3
MTU1,05/11/2015 19:59:04,4.089,0.62,122.3
MTU1,05/11/2015 19:59:06,4.089,0.62,122.3
MTU1,05/11/2015 19:59:04,4.097,0.62,122.4
MTU1,05/11/2015 19:59:03,4.097,0.62,122.4
MTU1,05/11/2015 19:59:02,4.111,0.62,122.5
MTU1,05/11/2015 19:59:03,4.111,0.62,122.5
MTU1,05/11/2015 19:59:02,4.104,0.62,122.5
MTU1,05/11/2015 19:59:01,4.090,0.62,122.4
MTU1,05/11/2015 19:59:00,4.093,0.62,122.4
MTU1,05/11/2015 19:58:59,4.112,0.62,122.5
data-science/PaxHeader/weather.json000644 777777 777777 00000000214 12525446741 017610 xustar00armourp000000 000000 18 gid=1050026054
17 uid=488147323
20 ctime=1431779590
20 atime=1431779720
23 SCHILY.dev=16777218
24 SCHILY.ino=110226039
18 SCHILY.nlink=1
data-science/weather.json000644 Äâ{Ä>ñ F00000000766 12525446741 016112 0ustar00armourp000000 000000 {"1431388800":"75.4","1431392400":"73.2","1431396000":"72.1","1431399600":"71.0", "1431403200":"70.7","1431406800":"69.6","1431410400":"69.0","1431414000":"68.8","1431417600":"69.2","1431421200":"67.9","1431424800":"68.6","1431428400":"68.7","1431432000":"72.1","1431435600":"76.2","1431439200":"80.1","1431442800":"80.7","1431446400":"80.9","1431450000":"83.3","1431453600":"84.5","1431457200":"85.1","1431460800":"87.0","1431464400":"84.2","1431468000":"84.4","1431471600":"83.0","1431475200":"81.1"}

So basically I want to get the values like below

MTU, Time, Power, Cost, Voltage
    MTU1,05/11/2015 19:59:06,4.102,0.62,122.4

as separate pandas frame and then another frame for the below dictionary.

{"1431388800":"75.4","1431392400":"73.2","1431396000":"72.1","1431399600":"71.0", "1431403200":"70.7","1431406800":"69.6","1431410400":"69.0","1431414000":"68.8","1431417600":"69.2","1431421200":"67.9","1431424800":"68.6","1431428400":"68.7","1431432000":"72.1","1431435600":"76.2","1431439200":"80.1","1431442800":"80.7","1431446400":"80.9","1431450000":"83.3","1431453600":"84.5","1431457200":"85.1","1431460800":"87.0","1431464400":"84.2","1431468000":"84.4","1431471600":"83.0","1431475200":"81.1"}

I can manually cut and copy paste these two portions in separate files and read in, but I want to automate it using regex. I think I know how we can regex it, but while reading the whole file as a text, I am seeing the following values.

So I did this:

f=open("file",'r').read()
print(f)

'PaxHeader/data-science\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00000755 \x00777777 \x00777777 \x0000000000262 12

These are the first few lines of file. Not sure why I see \x00 a lot. Is it becauuse of some space or some non -recognised character?

Any idea how to get the desired result?

Thanks

Find duplicate element in a list of lists [on hold]

I am looking for Python ideas for the following problem.

Given a list of lists...

[[20, 21, 22], [17, 18, 19, 20], [10, 11, 12, 13]]

If there is a duplicate element that is common between any or all the lists, return True. If all of the elements are unique, return False.

In the example above, 20 is common and would return True. The example below would return False because all the numbers are unique between the lists.

[[20, 21, 22], [17, 18, 19], [10, 11, 12, 13]]

Lastly, testing for duplicates in an individual list is not needed because the numbers are always sequential.

FYI - this problem will be used to optimize an airline crew members monthly schedule. Each list represents a 3, 4, or 5 day airline trips that can't overlap.

BTW - this problem is not an assignment but a personal quest to work less and get paid more :) Sorry it was unclear. I tried a brute force method which works but was hoping for a more elegant Pythonic method. I appreciate all the responses as they are leading me into new areas of Python programming.

How to explicitly control task schedule of Spark exactly?

I tried to achieve a parallelized image processing technique using Spark. Different from conventional Spark work with millions of tasks. I only want to separate the image into the number of worker (machine) I have and let one worker process one image patch. So one image patch is one task, if I have 12 image patches, I have 12 tasks. The question is how to explicitly control the schedule of task to each worker. The current situation happens that if I parallelize the image patches, they often send several patches to one or two worker and leave the others not working. I tried to set the system property of spark to control the spark.cores.max and spark.default.parallelism. But it seems not helpful. The only way to make the task send to different workers as separate as possible is to enlarge the second parameter of SparkContext.parallelize - numSlices. Here is the code:

img = misc.imread('test_.bmp')
height, width = img.shape
divisions, patch_width, patch_height = partitionParameters(width, height, 2, 2, border=100)

spark = SparkContext(appName="Miner")
# spark.setSystemProperty('spark.cores.max','1')
spark.setSystemProperty('spark.default.parallelism','24')

broadcast_img = spark.broadcast(img)

start = datetime.now()
print "--------------------", divisions
# run spark
run = spark.parallelize(divisions, 24).cache()
print "--------------- RDD size: ", run._jrdd.splits().size()
result = run.map(lambda (x, y): crop_sub_img(broadcast_img.value, x, y, patch_width, patch_height, width, height)) \
                .map(lambda ((x, y), subimg): fastSeg.process(subimg, x, y)) \
                .collect()

img = cat_sub_img(result, width, height)
end = datetime.now()

print "time cost:", (end-start) 

As you can see, I only have four patches set in divisions. divisions is a list of tuple with x and y-axis of the image patch. Only I set the numSlices to a high value 24 which far exceeds the actual tasks I have in divisions, most of workers are used now. But it seems not reasonable. If I set to 4, it will sent all tasks to only one worker! There must be someway to control how many task one worker accept. I am not familiar with the core of Spark. Can anyone help me, Thanks?

One thought it happens is that the image size is too small for one worker. So spark will assume one worker could handle that and send all to one.

Override a package method

If I import a package, let's say networkx.

How to override one method inside it so that it's the one called by every other function inside the package ?

Example :

import networkx

def _draw_networkx_nodes(G, pos,
                        nodelist=None,
                        node_size=300,
                        node_color='r',
                        node_shape='o',
                        alpha=1.0,
                        cmap=None,
                        vmin=None,
                        vmax=None,
                        ax=None,
                        linewidths=None,
                        label=None,
                        **kwds):
     print 'OK'

nx.draw_networkx_nodes = _draw_networkx_nodes

nx.draw(G, pos)

I want the method draw to call other methods that will call my overriden function

Python - Pyramid and matplotlib - Cannot Have More Than One View Output A SVG?

I am developing a Python Pyramid application where I am intending to create more than one SVG image to chart statistics using pie charts. In my testing I find that one SVG view works correctly and as soon as I add a second SVG output view, and a second SVG image is loaded (order of SVG image load doesn't matter), whether directly through its view, or through another view that references this view, the SVG images "are combined" in any other further calls to load a SVG file. This appears to be a bug somewhere in the Python stack as it appears memory is not cleared properly (primarily in the case of more than one SVG file, see further details below). Also note below that after enough image/page loads a TclError is encountered.

Since I was using SVG in a more detailed application with many more views, I am reproducing this in a minimized/reduced application to show it isn't something extra I'm doing and this code is generated right from the Pyramid alchemy template and database calls are not involved. The database is actively utilized in my more details application. This application only has 3 views, where the first view is part of the original template. I am also adding DEBUG logging to make it clear that there is no indication that there is any internal calling of the other SVG view.

Some of the view code is based on Matplotlib svg as string and not a file primarily for the use of StringIO. Note that as a pie chart is needed, that is the primary reason why my code differs from the code in referenced question. I find the issue is essentially the same whether I use StringIO or cStringIO. In my code I am using cStringIO.

The full application code is available at: http://ift.tt/1HAeOYJ

Code From First SVG View:

@view_config(route_name='view_test_svg')
def test_svg_view(request):
    # Full module import is not allowed by Pyramid
    #from pylab import *
    # Do individual required imports instead
    from pylab import figure, axes, pie, title, savefig
    log.debug('In test_svg_view')
    figure(1, figsize=(6,6))
    ax = axes([0.1, 0.1, 0.8, 0.8])
    labels = ['Frogs', 'Hogs', 'Dogs', 'Logs']
    fracs = [15, 30, 45, 10]
    explode=(0, 0.05, 0, 0)
    pie(fracs, explode=explode, labels=labels,
                                autopct='%1.1f%%', shadow=True, startangle=90)
    title('Raining Hogs and Dogs', bbox={'facecolor':'0.8', 'pad':5})
    imgdata = cStringIO.StringIO()
    savefig(imgdata, format='svg')
    imgdata.seek(0)
    svg_dta = imgdata.getvalue()
    # Close the StringIO buffer
    imgdata.close()
    return Response(svg_dta, content_type='image/svg+xml')

Python Version: Python 2.7.5

Python Package Configuration (Primary Packages Only)

  • pyramid-1.6a1-py2.7
  • matplotlib-1.4.3-py2.7-win32

Steps Taken To Reproduce:

  1. pserve pyramidapp.

Command: pserve development.ini --reload

Starting server in PID 4912.
serving on http://0.0.0.0:6543

  1. Load http://localhost:6543/test.svg

Note this works properly

DEBUG [pyramidapp.views:22][Dummy-2] In test_svg_view

Step 2 image

  1. Load http://localhost:6543/test2.svg

Note this "combines" both SVG files together

DEBUG [pyramidapp.views:45][Dummy-3] In test2_svg_view

Step 3 image

  1. Load http://localhost:6543/test.svg

Note this works exactly like test2.svg, with the correct title, since they are also of similar length, and now images are combined in this view as well

DEBUG [pyramidapp.views:22][Dummy-4] In test_svg_view

Step 4 image

  1. Rehost application and only load http://localhost:6543/test2.svg

Note this works properly for first load as this view was loaded before test.svg this time

DEBUG [pyramidapp.views:45][Dummy-2] In test2_svg_view

Step 5 image

Tracelog when using Control+C to terminate the pserve process

Error in sys.exitfunc:
Traceback (most recent call last):
  File "--python_path--\lib\atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "--python_path--\lib\site-packages\matplotlib-1.4.3-py2.7-win32.egg\ma
tplotlib\_pylab_helpers.py", line 89, in destroy_all
    manager.destroy()
  File "--python_path--\lib\site-packages\matplotlib-1.4.3-py2.7-win32.egg\ma
tplotlib\backends\backend_tkagg.py", line 588, in destroy
    self.window.destroy()
  File "--python_path--\lib\lib-tk\Tkinter.py", line 1789, in destroy
    for c in self.children.values(): c.destroy()
  File "--python_path--\lib\lib-tk\Tkinter.py", line 2042, in destroy
    self.tk.call('destroy', self._w)
_tkinter.TclError: out of stack space (infinite loop?)
^C caught in monitor process

Important: After enough SVG image loads the following is encountered:

The only way to fix this currently is to restart pserve. Also note that views, such as the my_view load properly as long as SVG images are not referenced, or utilized, by such views.

Another important note, as long as only one SVG file, i.e. http://localhost:6543/test.svg, is loaded the entire time of pserve it seems that image can be reloaded/refreshed (potentially) infinite times without any apparent issue, or encountering of the following:

_tkinter header

_tkinter.TclError
TclError: out of stack space (infinite loop?)
Traceback (most recent call last)
File "--python_path--\lib\site-packages\pyramid_debugtoolbar-2.0.2-py2.7.egg\pyramid_debugtoolbar\panels

\performance.py", line 69, in noresource_timer_handler
Display the sourcecode for this frameOpen an interactive python shell in this frameresult = handler(request)
File "--python_path--\lib\site-packages\pyramid-1.6a1-py2.7.egg\pyramid\tweens.py", line 20, in excview_tween
Display the sourcecode for this frameOpen an interactive python shell in this frameresponse = handler(request)
File "--python_path--\lib\site-packages\pyramid_tm-0.11-py2.7.egg\pyramid_tm\__init__.py", line 94, in tm_tween
Display the sourcecode for this frameOpen an interactive python shell in this framereraise(*exc_info)
File "--python_path--\lib\site-packages\pyramid_tm-0.11-py2.7.egg\pyramid_tm\__init__.py", line 75, in tm_tween
Display the sourcecode for this frameOpen an interactive python shell in this frameresponse = handler(request)
File "--python_path--\lib\site-packages\pyramid-1.6a1-py2.7.egg\pyramid\router.py", line 145, in handle_request
Display the sourcecode for this frameOpen an interactive python shell in this frameview_name
File "--python_path--\lib\site-packages\pyramid-1.6a1-py2.7.egg\pyramid\view.py", line 527, in _call_view
Display the sourcecode for this frameOpen an interactive python shell in this frameresponse = view_callable

(context, request)
File "--python_path--\lib\site-packages\pyramid-1.6a1-py2.7.egg\pyramid\config\views.py", line 384, in 

viewresult_to_response
Display the sourcecode for this frameOpen an interactive python shell in this frameresult = view(context, 

request)
File "--python_path--\lib\site-packages\pyramid-1.6a1-py2.7.egg\pyramid\config\views.py", line 506, in 

_requestonly_view
Display the sourcecode for this frameOpen an interactive python shell in this frameresponse = view(request)
File "c:\projects\python\pyramid\pyramidapp\pyramidapp\views.py", line 55, in test2_svg_view
Display the sourcecode for this frameOpen an interactive python shell in this framesavefig(imgdata, 

format='svg')
File "--python_path--\lib\site-packages\matplotlib-1.4.3-py2.7-win32.egg\matplotlib\pyplot.py", line 578, in 

savefig
Display the sourcecode for this frameOpen an interactive python shell in this framedraw()   # need this if 

'transparent=True' to reset colors
File "--python_path--\lib\site-packages\matplotlib-1.4.3-py2.7-win32.egg\matplotlib\pyplot.py", line 571, in 

draw
Display the sourcecode for this frameOpen an interactive python shell in this frameget_current_fig_manager

().canvas.draw()
File "--python_path--\lib\site-packages\matplotlib-1.4.3-py2.7-win32.egg\matplotlib\backends\backend_tkagg.py", 

line 350, in draw
Display the sourcecode for this frameOpen an interactive python shell in this frametkagg.blit(self._tkphoto, 

self.renderer._renderer, colormode=2)
File "--python_path--\lib\site-packages\matplotlib-1.4.3-py2.7-win32.egg\matplotlib\backends\tkagg.py", line 

24, in blit
Display the sourcecode for this frameOpen an interactive python shell in this frametk.call("PyAggImagePhoto", 

photoimage, id(aggimage), colormode, id(bbox_array))
TclError: out of stack space (infinite loop?)

Cronjob: Python script not writing to file

I'm working on raspbian and wrote a python script which communicates via RS232 with some hardware relatet to pysical IO States of the Raspberry. It also writes to a logfile.

Everything works fine when I start the script from command line: pi@raspberrypi ~/scripts $ python steppercontrol.py

I added the scrip as a cronjob ( sudo crontab -e )

SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games:/usr/games
@reboot /usr/bin/python /home/pi/scripts/steppercontrol.py

The script works and is running after reboot, but the logfile is not written syslog gives the following

cat /var/log/syslog | grep CRON
May 29 12:05:16 raspberrypi /USR/SBIN/CRON[2106]: (CRON) info (No MTA installed, discarding output)
May 29 12:17:01 raspberrypi /USR/SBIN/CRON[2456]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 29 13:17:01 raspberrypi /USR/SBIN/CRON[2509]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)

The chmods should be ok:

pi@raspberrypi ~/scripts $ ls -lh
total 16K
-rwxr-xr-x 1 pi pi 3.1K May 27 12:55 steppercontrol.py
-rwxrwxrwx 1 pi pi  249 May 29 12:05 stepperlog

IMO it's not related to python itself. I also could not manage to redirect stdout from the script (as cronjob) to a file. I am lost, here is my script:

Btw: it's my first python script and generally i am not very good with linux, but raspbian and google make things easy ;-)

import serial
import time
import pifacedigitalio as p
import datetime

# function to read data by busy waiting
# timeout is enable, non blocking
def getData( p ):
   "get data busy waiting"

   d = ''
   if p.inWaiting() <= 0:
     return d

   time.sleep(0.3)
   while p.inWaiting() > 0:
      d += p.read(1)
   return d
# end of function

# main program logig
# init serial communication
port = serial.Serial("/dev/ttyUSB0", bytesize=serial.EIGHTBITS, baudrate=9600, stopbits=serial.STOPBITS_TWO, timeout=1.0)
p.init()
for i in range(0,8):
   p.digital_write(i,0)
   p.digital_write_pullup(i, 1)

logfile = open('/home/pi/scripts/stepperlog','a')
i = datetime.datetime.now()
logfile.write(str(i) + " script started \n")
print(str(i) + " script started \n")

# query hello world string and write answer to screen
port.write("?ver\r")
d = getData(port)
print(">> " + d + "\n")
port.write("!a\r")

# setup stepper drive
port.write("!axis 1 0 0\r")            # disable all axis excep X-axis
port.write("!pitch 1 1 1\r")           # set pitch of rod ... needed?
port.write("!cur 1 1 1\r")             # set current per motor to 1 A
port.write("!accel 0.5 0.5 0.5\r")     # set accelleration to value in m/s^2
port.write("!velfac 0.1 0.1 0.1\r")    # reduce speed by facor 1/10

pinList = [0,0,0,0,0,0]
prevSelection = -2

while 1:
   for i in range(0,6):
      pinList[i] = p.digital_read(i)
      p.digital_write(i,pinList[i])
      #print(">> I/O " + str(i) + " : " + str(pinList[i]))

   speed = 0;
   curSelection = -1

   if pinList[0] == 1:      # position 1
      speed = 5;            # move down fast 5mm/s
      curSelection = 0
   elif pinList[1] == 1:    # position 3
      speed = -0.1;         # move up 50 um/s
      curSelection = 1
   elif pinList[2] == 1:    # position 5
      speed = -0.2;         # move up 100 um/s
      curSelection = 2
   elif pinList[3] == 1:    # position 7
      speed = -0.3;         # move up 100 um/s
      curSelection = 3
   elif pinList[4] == 1:    # position 9
      speed = -0.4;         # move up 100 um/s
      curSelection = 4
   elif pinList[5] == 1:    # position 11
      speed = -5;           # move up fast 5 mm/s
      curSelection = 5

   calcspeed = float(speed)
   calcspeed *= 10/1.36                # factor 10/100 corresponds to speed reduction from above !

   if curSelection != prevSelection:
      i = datetime.datetime.now()
      logfile.write(str(i) + " " + str(prevSelection) + " " + str(curSelection) + " " + str(speed) + " " + str(calcspeed) + "\n")
      print(str(i) + " " + str(prevSelection) + " " + str(curSelection) + " " + str(speed) + " " + str(calcspeed))
   prevSelection = curSelection

   speed = "%.10f" % calcspeed         # float to string
   port.write("!speed" + speed + "\r")

   wait = 0.1
   time.sleep(float(wait))

Python Numpy Sort rows

Very new to Python so please bear with me here. I am trying to sort an array that I have imported into python with numpy.sort:

 guy = numpy.sort(sasBody, axis=-0)

The first column is a column of strings, so I would like to alphabetically sort the array. The problem I am having is that it does sort the first column, however all the numbers associated with the prior rows are now not connected to its correct first column counterpart.

What am I doing wrong?

How can I only display ForeignKey in a ModelForm, not make it editable?

I have the following Models and Forms:

#models.py
class NetworkDevice(models.Model):
    user = models.ForeignKey(User)
    device_name = models.CharField(_('device name'), max_length=100)
    ...

#forms.py
class NetworkDevicesForm(ModelForm):
    class Meta:
        model = NetworkDevice
        fields=('user', 'device_name',...)

'...' are some fields I left out, since they are not important for this. I want to create a formset based on my ModelForm:

#views.py
in some view:
    network_device_formset = modelformset_factory(models.NetworkDevice,  
         extra=0, form=NetworkDevicesForm, fields=(
         'user', 'device_name', ...))

And I display it like this in my template:

<form action="{% url 'some:action'  %}" method="post">
{% csrf_token %}
{{ devices_formset.management_form }}
<table>
{% for form in devices_formset %}
    {% if forloop.first %}
    <thead>
        <tr>
            {% for field in form.visible_fields %}
            <th>{{ field.label }}</th>
            {% endfor %}
        </tr>
    </thead>
    {% endif %}
{% endfor %}

    <tbody>
        {% for form in devices_formset %}
        <tr>
            {% for field in form %}
            <td>{{ field }}</td>
            {% endfor %}
        </tr>
        {% endfor %}
    </tbody>
</table>
<input type="submit" value='{% trans "Save" %}'/>
</form>

Now this will display my ForeignKey with an HTML select tag. I don't even want to show all the choices there however. I just want to display the key for the according instance. I can disable the select tag:

class NetworkDevicesForm(ModelForm):
class Meta:
    model = NetworkDevice
    fields=('user', 'device_name', ...)
    widgets = {'user': widgets.Select(attrs={'readonly': True,
                                                      'disabled': True})}

But then I get errors on the validation of the user field. Guess I could overwrite the validation for this somehow. But still this would display all the options for the foreign key in the generated html. Is there no way I can just display the value in my template without specifying it in the ModelForm, since I don't want to edit it. Some magic like:

    <tbody>
        {% for form in devices_formset %}
        <tr>
            <td>{{ form.user }}</td>
            {% for field in form %}
            <td>{{ field }}</td>
            {% endfor %}
        </tr>
        {% endfor %}
    </tbody>

Except that {{ form.user }} is not working. Can I access that somehow in the template? Hope I was clear about what I want to do and it's possible.

Is there a way to cache datasets on the client side using Python/Anaconda/Bokeh?

I have a python Bokeh project and generating the plots works nicely but it takes a while. As I understood the Bokeh server generates a model based on the data which is pushed (or pulled) and then the graph generation occurs on the client side using Javascript. Is there a way to instruct Bokeh to cache as much as possible on the client side? ideally do this per document. Is there an API for this? Are there other caching strategies available using Bokeh?

Formset can_delete function don't work with default values

When I set a default value for integer in my model class, my formset row can't be deleted until i fill all the fields in my empty extra row ( i understood that it is because the is_valid() works only if extra form is empty or is filled with everything). If I fill everything in my extra fields and choose the row to be deleted it works perfectly fine

I want to make a fields called hours,days,calendar_end readonly, so that they will be calculated using data from other fields and then showed in my formset.(not dynamically, but after adding a new row)

I have a couple of problems here:

If i don't use default values in my class, the formset won't allow me to add a new form without filling hours,days,calendar_end fields

If i use the default values for them, the can_delete function don't work until i complete all fields on my extra field

I hope it can be done without using the redirection to the other page

here's my view function.

def calend_graph_create(request):

AuthorFormSet = modelformset_factory(CalendGraph, fields=('project_name','work_name','work_amount','unit_name','employee_name','hours','days','approve_need','approve_time','calendar_start','calendar_end',),extra =1, can_delete=True)

for form in AuthorFormSet(): 
    form.fields['hours'].widget.attrs['readonly'] = True #can i do that?

if request.method == 'POST':
    if formset.is_valid():
        formset.save()
        return HttpResponseRedirect("/calend-graph-create/")

    args={}
    args.update(csrf(request))
    args['formset'] = formset
    return render_to_response("tasks/calend-graph-create.html",args)

else:
    formset = AuthorFormSet()

#starting to calculate hours, days, and calendar end
entries= CalendGraph.objects.all()
app_entries= Approve.objects.all()
for each in entries:
    temp=0
    if each.approve_need == True:
        for time in app_entries:
            if each.work_name == time.name_ap:
                temp=temp+ time.time_ap
                each.approve_time=time.time_ap
    temp=temp+each.work_amount
    each.hours=temp
    each.days=temp / 8
    each.calendar_end=each.calendar_start + timedelta(days=each.days)

for each in entries:
    each.save()


args={}
args.update(csrf(request))
args['formset'] = formset
return render_to_response("tasks/calend-graph-create.html",args)

TypeError: in method 'new_Frame', expected argument 2 of type 'int'

import wx
class MainWindow(wx.Frame):
    def _init_ (self, parent, title):
        wx.Frame. __init__(self, parent, title=title, size=(200, 100))
        self.control = wx.TextCtrl(self, style=wx.TE_MULTILINE)
        self.CreateStatusBar()

        #setting up the menu

        filemenu = wx.Menu()

        menuAbout = filemenu.Append(wx.ID_ABOUT, "About", "information about the use of this program")
        menuExit = filemenu.Append(wx.ID_EXIT, "Exit", "Exit this program")

        menuBar = wx.MenuBar()

        menuBar.Append(filemenu,"File")
        self.SetMenuBar(menuBar)
        self.Bind(wx.EVT_MENU, self.OnAbout, menuAbout)
        self.Bind(wx.EVT_MENU, self.OnExit, menuExit)

        self.Show(True)

    def OnAbout(self,e):
        dlg = wx.MessageDialog(self, "A small text editor", "About sample     editor", wx.OK)
        dlg.ShowModal()
        dlg.Destroy()

    def OnExit(self,e):
        self.Close(True)
    app = wx.App(False)
    frame = MainWindow(None, "sample editor")
    app.MainLoop()

Splitted Array being filled in for loop by function calculating values from a splitted array

I got stuck at a for loop Problem.

I have a signal as an array and I split it up in multiple epochs.

times = np.arange(0, duration, 1 / sfreq)
nse1 = np.random.rand(times.size) * nse_amp
x =  amp * np.sin( 2 * np.pi * 200 * times            ) + nse1
x2 = np.array_split(x,epochs)

I do this a second time for a y-signal. Let's say my signal x has the shape of (100) then my splitted array for 2 epochs should have the form of (2, 50).

Now I want to use a function in a for loop to calculate a value for each value of each segment of my split up array...Something like:

for i in range(0,epochs):
    Rxy[i], freqs_xy[i] = mlab.csd(x2[i], y2[i], NFFT=nfft, Fs=sfreq)

So I will get an array for Rxy like (2, 50)

Hope you get what I want to do.

Greetings, Daniel

Differences between each F1-score values in sklearns.metrics.classification_report and sklearns.metrics.f1_score with a binary confusion matrix

I have (true) boolean values and predicted boolean values like:

y_true = np.array([True, True, False, False, False, True, False, True, True,
       False, True, False, False, False, False, False, True, False,
        True, True, True, True, False, False, False, True, False,
        True, False, False, False, False, True, True, False, False,
       False, True, True, True, True, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False, True, True, False, True, False, True, True, True,
       False, False, True, False, True, False, False, True, False,
       False, False, False, False, False, False, False, True, False,
        True, True, True, True, False, False, True, False, True,
        True, False, True, False, True, False, False, True, True,
       False, False, True, True, False, False, False, False, False,
       False, True, True, False])

y_pred = np.array([False, False, False, False, False, True, False, False, True,
       False, True, False, False, False, False, False, False, False,
        True, True, True, True, False, False, False, False, False,
       False, False, False, False, False, True, False, False, False,
       False, True, False, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False, True, False, False, False, False, False, False, False,
       False, False, True, False, False, False, False, True, False,
       False, False, False, False, False, False, False, True, False,
       False, True, False, False, False, False, True, False, True,
        True, False, False, False, True, False, False, True, True,
       False, False, True, True, False, False, False, False, False,
       False, True, False, False])

I'm using the following imports

from sklearn.metrics import f1_score, classification_report, confusion_matrix

Confusion matrix looks like:

print(confusion_matrix(y_true, y_pred))

[[67  0]
 [21 24]]

I'm doing:

print("f1_score: %f" % f1_score(y_true, y_pred))
print(classification_report(y_true, y_pred))

I get:

f1_score: 0.695652
             precision    recall  f1-score   support

      False       0.76      1.00      0.86        67
       True       1.00      0.53      0.70        45

avg / total       0.86      0.81      0.80       112

I see 4 values of f1-score (0.695652, 0.86, 0.70, 0.80). I wonder what are differences between each values and how they are calculated.

"Request Should Have succeeded, but was "400 Bad Request" error on Robotframework-httpLibrary

I am using Robotframework-httplibrary for automating my webservice API, which has Header and Request body. Manually If I test these APIs using REST Client, the APIs works fine. However, from my Robot framework Testcase it fails.

I understand that the error is with testcase syntax or with the interpretor..python2.7(may be it is not supported)

Below is the test script:

*** Settings***  
Library      HttpLibrary.HTTP  
*** Test Cases ***  

Test Create Process       Set Request Header    clientID: vg_site1
                      Set Request Body      {"               "}
                      POST               http://localhost:portno/application
                      Response Status Code Should Equal   200 OK

output:Request should have succeeded, but was "400 Bad Request"

Pandas copy values from other dataframe

Pandas dataframe df1 contains a list of values A

df1 = pd.DataFrame({'A':['a','a','b']})

   A
0  a
1  a
2  b

Dataframe df2 can be seen as mapping from values in A to values in B

df2 = pd.DataFrame({'A':['a','b'], 'B':[2,3]})

   A  B
0  a  2
1  b  3

I want to apply the mapping to df1. The working version I have is this one, but I feel there is potential for improvement, as I find my solution unreadable and I am unsure about how it would generalize to multiindexes

df2.set_index('A').loc[df1.set_index('A').index].reset_index()
   A  B
0  a  2
1  a  2
2  b  3

I could also convert df2 to a dictionary and use the replace method, but it does not convince me either.

Reorganizing the data in a dataframe

I have data in the following format:

data = 
[
  {'data1': [{'sub_data1': 0}, {'sub_data2': 4}, {'sub_data3': 1}, {'sub_data4': -5}]},
  {'data2': [{'sub_data1': 1}, {'sub_data2': 1}, {'sub_data3': 1}, {'sub_data4': 12}]},
  {'data3': [{'sub_data1': 3}, {'sub_data2': 0}, {'sub_data3': 1}, {'sub_data4': 7}]},

]

How should I reorganize it so that when save it to hdf by

a = pd.DataFrame(data, columns=map(lambda x: x.name, ['data1', 'data2', 'data3']))
a.to_hdf('my_data.hdf')

I get a dataframe in the following format:

            data1       data2     data3
_________________________________________
sub_data1   0           1           1
sub_data2   4           1           0
sub_data3   1           1           1
sub_data4   -5          12          7

update1: after following advice given me below and saving it an hdf file and reading it, I got this which is not what I want:

       data1                        data2                      data3   
0      {u'sub_data1': 22}           {u'sub_data1': 33}          {u'sub_data1': 44}   
1      {u'sub_data2': 0}            {u'sub_data2': 11}          {u'sub_data2': 44}   
2      {u'sub_data3': 12}           {u'sub_data3': 16}          {u'sub_data3': 19}   
3      {u'sub_data4': 0}            {u'sub_data4': 0}           {u'sub_data4': 0}   

numpy: difference between NaN and masked array

In numpy there are two ways to mark missing values: I can either use a NaN or a masked array. I understand that using NaNs is (potentially) faster while masked array offers more functionality (which?).

I guess my question is if/ when should I use one over the other? What is the use case of np.NaN in a regular array vs. a masked array?

I am sure the answer must be out there but I could not find it...

using more than one linestyle in the same trend line with matplotlib

I want to plot a line using the bold linestyle='k-' and after a certain value on the axes, I want the same line as dashed ('k--') or vice-versa. I want to show the dashed part as an extension to the bold line. One way to do this is to treat them as two individual plots and use different linestyles. I have attached the figure of an example. Just wondering if there was any other way to do this! enter image description here

Python NumPy: How to fill a matrix using and equation

I wish to initialise a matrix A, using the equation A_i,j = f(i,j) for some f (It's not important what this is).

How can I do so concisely avoiding a situation where I have two for loops?

Unknown Matplotlib error

I get the following error and I cannot work out what or how I should fix it:

python-2.7.4_scipy0.13/lib/python2.7/site-packages/matplotlib/axes.py:2757: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=0, top=0.0
  + 'bottom=%s, top=%s') % (bottom, top))

The code is:

plt.scatter(chart_dict[chart][0], chart_dict[chart][1], c=colours[count], alpha=1.0, label=chart, lw = 0)
plt.tight_layout()
plt.ylabel(titles_markers[2])
plt.xlabel(titles_markers[1])
plt.yscale('log')
plt.grid(b=True, which='major', color='b', linestyle='-')
plt.title(titles_markers[0])
plt.legend()
plt.savefig(os.path.join(directory, titles_markers[0].replace(' ', '_')+'.png'))

how to delete a duplicate column read from excel in pandas

data in excel, a b a d 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7

ipdb> df= pd.io.excel.read_excel(r"sample.xlsx",sheetname="Sheet1") ipdb> df a b a.1 d 0 1 2 3 4 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7

how to delete the column "a.1"? when pandas reads the data from excel it automatically changes the column name of 2nd a to a.1.

I tried df.drop("a.1",index=1) , this does not work.

I have a huge excel file which has duplicate names, and i am interested only in few of columns.

How to save counter of entries in str and show it in Text field?

Using such example i want to count number of entries. And represent string with number of entries in Text field. Like, for example:

1: number of entries(1)
21: number of entries(2)
...

Where should i put my counter variable?

import random
from tkinter import *

class MyApp(Frame):

    def __init__(self, master):
        super(MyApp, self).__init__(master)
        self.grid()
        self.create_widgets()        

    def create_widgets(self):

        Label(self, text = 'Your number:').grid(row = 0, column = 0, sticky = W)
        self.num_ent = Entry(self)
        self.num_ent.grid(row = 0, column = 1, sticky = W)
        Button(self, text = 'Check', command = self.update).grid(row = 0, column = 2, sticky = W)
        self.user_text = Text(self, width = 50, height = 40, wrap = WORD)
        self.user_text.grid(row = 2, column = 0, columnspan = 3)

    def update(self):

        guess = self.num_ent.get()
        guess += '\n'
        self.user_text.insert(0.0, guess)

root = Tk()
root.geometry('410x200')
root.title('Entries counter')
app = MyApp(root)
root.mainloop()

sum zeros and ones by another vector in python

I have the following data array m:

import numpy as np
a = [[1],[0],[1],[0],[0]]
b = [[1],[0],[1],[0],[0]]
c = d = [[1],[0],[1],[0],[0]]
m = np.hstack((a,b,c,d))
m
array([[1, 0, 1, 1],
       [0, 0, 0, 0],
       [1, 1, 1, 1],
       [0, 0, 0, 0],
       [0, 1, 0, 0]])

I have the following vector prior

prior = [0.1,0.2,0.3,0.4]

I now want to create a new vector of length 5, where each row of m is summed according to this scheme

if 1 then add 1/prior

if 0 then add 0.1*1/prior

so for the first row in m we would get

(1/0.1)+(0.1*1/0.2)+(1/0.3)+(1/0.4) = 16.33

the second row is

(0.1*1/0.1)+(0.1*1/0.2)+(0.1*1/0.3)+(0.1*1/0.4) = 2.083

m should be the basis and numpy may be used (perhaps .sum(axis=1)) ?

Tensor product of N matrices in c++ and Armadillo

I am new in c++ and I want to write a program to calculate Tensor product of N matrices ,a1,a2,a3,....,aN, made in Armadillo. Of course, Armadillo is able to calculate Tensor product but it is possible just for two matrices, kron(a1,a2), and if we want to calculate tensor product of all of them we have to write:

kron(kron(kron(a1,a2),a3),...) 

I want to introduce a function which does a same work. Of course, I wrote it in python as:

import numpy as np
import scipy.sparse as sp

def tens(*args):
    for i, j in enumerate(args):
        if i == 0:
            output = j
        else:
            output = sp.kron(out, j)
    return output

and it works but when I want to change the program in c++, I think it sould take the following shape:

#include <iostream>
#include <armadillo>

using namespace std;
using namespace arma;

cx_mat tens(cx_mat argc,...)
{
cx_mat output;
for( cx_mat x = 0; x < argc; x++ ) 

        if (x == 0)
            output = argc[0];
        else
            output = kron(ttt, argc[x]);

return output;
}

and to test it for,a1,a2,a3, we can add the main function to the end of the program:

int main()
{
cx_mat ii(2,2,fill::eye);
cx_mat ee = ii.col(0); // extract a column vector
cx_mat gg = ii.col(1);

tens(ee,gg,gg);
return 0;
}

I think the problem is in 'if' and 'else' comments. Can anybody guide me?

Time series: Mean per hour per day per Id number

I am a somewhat beginner programmer and learning python (+pandas) and hope I can explain this well enough. I have a large time series pd dataframe of over 3 million rows and initially 12 columns spanning a number of years. This covers people taking a ticket from different locations denoted by Id numbers(350 of them). Each row is one instance (one ticket taken). I have searched many questions like counting records per hour per day and getting average per hour over several years. However, I run into the trouble of including the 'Id' variable. I'm looking to get the mean value of people taking a ticket for each hour, for each day of the week (mon-fri) and per station.
I have the following, setting datetime to index:

    Id          Start_date  Count  Day_name_no
    149 2011-12-31 21:30:00      1            5  
    150 2011-12-31 20:51:00      1            0  
    259 2011-12-31 20:48:00      1            1  
    3015 2011-12-31 19:38:00     1            4  
    28 2011-12-31 19:37:00       1            4  

Using groupby and Start_date.index.hour, I cant seem to include the 'Id'.

My alternative approach is to split the hour out of the date and have the following:

    Id  Count  Day_name_no  Trip_hour
    149      1            2         5
    150      1            4         10
    153      1            2         15
    1867     1            4         11
    2387     1            2         7

I then get the count first with:

Count_Item = TestFreq.groupby([TestFreq['Id'], TestFreq['Day_name_no'], TestFreq['Hour']]).count().reset_index()

     Id Day_name_no Trip_hour   Count
     1  0           7          24
     1  0           8          48
     1  0           9          31
     1  0           10         28
     1  0           11         26
     1  0           12         25

Then use groupby and mean:

Mean_Count = Count_Item.groupby(Count_Item['Id'], Count_Item['Day_name_no'], Count_Item['Hour']).mean().reset_index()

However, this does not give the desired result as the mean values are incorrect. I hope I have explained this issue in a clear way. I looking for the mean per hour per day per Id as I plan to do clustering to separate my dataset into groups before applying a predictive model on these groups.

Any help would be grateful and if possible an explanation of what I am doing wrong either code wise or my approach.

Thanks in advance.

I have edited this to try make it a little clearer. Writing a question with a lack of sleep is probably not advisable. A toy dataset that i start with:

    Date        Id     Dow Hour Count
    12/12/2014  1234    0   9   1
    12/12/2014  1234    0   9   1
    12/12/2014  1234    0   9   1
    12/12/2014  1234    0   9   1
    12/12/2014  1234    0   9   1
    19/12/2014  1234    0   9   1
    19/12/2014  1234    0   9   1
    19/12/2014  1234    0   9   1
    26/12/2014  1234    0   10  1
    27/12/2014  1234    1   11  1
    27/12/2014  1234    1   11  1
    27/12/2014  1234    1   11  1
    27/12/2014  1234    1   11  1
    04/01/2015  1234    1   11  1

I now realise I would have to use the date first and get something like:

    Date         Id    Dow Hour Count
    12/12/2014  1234    0   9   5
    19/12/2014  1234    0   9   3
    26/12/2014  1234    0   10  1
    27/12/2014  1234    1   11  4
    04/01/2015  1234    1   11  1

And then calculate the mean per Id, per Dow, per hour. And want to get this:

    Id  Dow Hour    Mean
    1234    0   9   4
    1234    0   10  1
    1234    1   11  2.5

I hope this makes it a bit clearer. My real dataset spans 3 years with 3 million rows, contains 350 Id numbers.

Run a setup.py from another Python script, path issue, installing in calling script dir?

First, I've found how to call a script from within an other script in Python, the call works perfectly well, but here's the problem I'm running into :

In order to easy-install my web-app (Bottle) on an another server, I packed inside a /redist rep, with mod_wsgi and PyMySQL source files. What I'm trying to achieve is a kind of "setup.py" file, that will launch the /mod_wsgi/setup.py install file and the same with the PyMySQL setup file.

Here's what I'm doing for PyMySQL for example :

subprocess.call("python3 ./redist/PyMySQL/setup.py install", shell=True)

The instalation runs fine, BUT, I end up with a /build, a /dist and a /PyMySQL.egg-info folders in my app directory, and when I'm trying to launch anything that imports PyMySQL, it told me that the module didn't exist.

If I manually install it (using my terminal I mean, like cd /redist/PyMySQL/ and then py3 setup.py install), it works great, and the import will then work ...

Any idea ? Am I doing something wrong ?

By advance, thanks :)

batch xml bat program

I'm trying to edit xml files in a batch script

this is my xml file:

<?xml version="1.0" encoding="UTF-8"?>
<task name="analyse">
   <taskInfo taskId="21a09311-ade3-4e9a-af21-d13be8b7ba45" runAt="2015-05-20 13:48:50" runTime="5 minutes, 53 seconds">
      <project name="13955 - HMI Volvo Truck PA15" number="e20d51c0-71dc-4572-8f9b-4c150bf35222" />
      <language lcid="1031" name="German (Germany)" />
      <tm name="ENG-DEU_en-GB_de-DE.sdltm" />
      <settings reportInternalFuzzyLeverage="yes" reportLockedSegments="no" reportCrossFileRepetitions="yes" minimumMatchScore="70" searchMode="bestWins" missingFormattingPenalty="1" differentFormattingPenalty="1" multipleTranslationsPenalty="1" autoLocalizationPenalty="0" textReplacementPenalty="0" />
   </taskInfo>
   <file name="VT MAIN TRACK_PA15_Default_DE-DE_20150520_102527.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="55" characters="755" placeables="3" tags="0" />
         ' Replace the Value word="55" with "0"
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="2" words="20" characters="0" placeables="0" tags="0" />
         'Cut the value words="20" replace with 0
         <repeated segments="17" words="34" characters="293" placeables="2" tags="0" />
         'add the value to current value 20 to 34  so the new value is words="54"
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </file>
   <file name="VT MAIN TRACK_PA15_Default_DE-DE_20150523_254796.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="67" characters="755" placeables="3" tags="0" />
         ' Replace the Value word="67" with "0"
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="2" words="35" characters="0" placeables="0" tags="0" />
         'Cut the value words="35" replace with 0
         <repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
         'add the value to current value 35 to 54  so the new value is words="89"
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </file>
   <batchTotal>
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="139" characters="755" placeables="3" tags="0" />
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="0" words="0" characters="0" placeables="0" tags="0" />
         <repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </batchTotal>
</task>

general notes:

  • the <task> is the root element (end element </task>)
  • the important here is to modify a few tags in a section called file <file> and endtag </file>
  • there can be X occurrences of <file>*</file>

What i need,

for each <file> element, i would like to:

  • In <inContextExact>, Set the value of the attribute words with 0

    <inContextExact ... words="55" ... /> => <inContextExact ... words="0" ... />

  • In <crossFileRepeated>, Set the value of the attribute words with 0

    <crossFileRepeated ... words="20" ... /> => <crossFileRepeated ... words="0" ... />

  • In <total>, Set the value of the words attribute to be calculated by my own logic

    <total ... words="1462" ... /> => <total ... words="??" ... />

I could really appreciate an example of processing XML files in batch.

Python: "ValueError: can't format dates this early" on one PC, works on other

I have a Python script that works perfectly fine on my Dev PC. Both are Windows 7 with same python version (2.7.9). However on the target machine I get a

ValueError: can't format dates this early

The error seems to come from pywin32 module.

The code uses a 3rd party library invoked by pywin32:

raw = win32com.client.Dispatch("MyLib.MyClass")

and then fails later on:

acq_time = raw.GetCreationDate()

Now I'm lost why this is working on my PC and not on the target machine. Both have a "corporate install" of Windows 7 eg. same Regional and date-time settings.

What is the issue? Can anyone guide me how I might resolve it?

Use list comprehension without iteration variable [duplicate]

This question already has an answer here:

I wonder if there is a way to use e.g. a list comprehension without an iteration variable if I do not need it? For example, in this sample of code:

a = [random.randrange(-10, 11) / 10 for i in range(100)]

I get a warning "Local variable 'i' value is not used". Is there any variant of the list comprehension construct without iteration variable?

Python pyperclip module: ArgumentError

I'm trying to do some work on Python with the text that I store on the clipboard. I'm using the pyperclip module. I install the module in the cmd with pip as usual:

pip install pyperclip

And seems to work fine because when I import the module import pyperclip it runs without any problem. But when I try to run any piece of code using it, even the example of usage given in the documentation:

pyperclip.copy('The text to be copied to the clipboard.')
pyperclip.paste()

I get an ArgumentError:

ArgumentError: argument 1: <class 'TypeError'>: wrong type

I'm a noob in Python which may be one of the reasons I'm clueless on how to solve this.

Finding count of distinct elements in DataFrame in each column

I am trying to find the count of distinct values in each column using Pandas. This is what I did.

import pandas as pd

df = pd.read_csv('train.csv')
# print(df)

a = pd.unique(df.values.ravel())
print(a)

It counts unique elements in the DataFrame irrespective of rows/columns, but I need to count for each column with output formatted as below.

policyID              0
statecode             0
county                0
eq_site_limit         0
hu_site_limit         454
fl_site_limit         647
fr_site_limit         0
tiv_2011              0
tiv_2012              0
eq_site_deductible    0
hu_site_deductible    0
fl_site_deductible    0
fr_site_deductible    0
point_latitude        0
point_longitude       0
line                  0
construction          0
point_granularity     0

What would be the most efficient way to do this, as this method will be applied to files which have size greater than 1.5GB?


Based upon the answers, df.apply(lambda x: len(x.unique())) is the fastest.

In[23]: %timeit df.apply(pd.Series.nunique)
1 loops, best of 3: 1.45 s per loop
In[24]: %timeit df.apply(lambda x: len(x.unique()))
1 loops, best of 3: 335 ms per loop
In[25]: %timeit df.T.apply(lambda x: x.nunique(), axis=1)
1 loops, best of 3: 1.45 s per loop

How to change numbers around in a list (python)

i have been working on this for 3 hours now but i have no clue how to do it

can anyone help me with this?

values = [1, 2, 3, 4, 5]

temp = values[0]

for index in range (len(values) -1):
    values[index] = values [index]

values[len(values)-1] = temp
print values

i want the printed values to be in order as [2,3,4,5,1] by simply changing those in the brackets

Why is for _ in range(n) slower than for _ in [""]*n?

Testing alternatives to for _ in range(n) (to execute some action n times, even if the action does not depend on the value of n) I noticed that there is another formulation of this pattern that is faster, for _ in [""] * n.

For example:

timeit('for _ in range(10^1000): pass', number=1000000)

returns 16.4 seconds;

whereas,

timeit('for _ in [""]*(10^1000): pass', number=1000000)

takes 10.7 seconds.

Why is [""] * 10^1000 so much faster than range(10^1000) in Python 3?

All testing done using Python 3.3

Location of css after install python

I'n new with python. But installed a liveblog based on python3.x. Right now im trying to find the css file with filezilla, but i cant find it.

I can access the file with browser, this is the file i'm searching: goo.gl/I4IHyf

the problem is that i dont see the content or lib directory on ftp.

How to check whether a user created successfully in Django

I was trying Django and I used this code to create a user

from django.shortcuts import render
from django.http import HttpResponse
from django.contrib.auth.models import User    

def register(request):
    user = User.objects.create_user('John', 'lennon@thebeatles.com', 'johnpassword')
    user.last_name = "James"
    user.is_active = True
    status = user.save()
    return HttpResponse(status)

My question is how can I check whether a user is successfully created or not and also to display an error message, if I am unable to create a user. When I run this code, it creates a user but returns a value None

Thanks

mercredi 6 mai 2015

How can I create dynamic objects in VBA for creating MSXML

I am looking for a less verbose way of creating hierarchical repeating XML with VBA and MSXML. At the moment I have a spreadsheet that lays out my XML levels like so

Intro Level 1 - Exam Section Level 2 - Exam Section Level 2 - Exam Section Level 2 - Exam Section Level 2 - Exam Section Level 1 - Exam Section Level 2 - Exam Section Level 2 - Exam Section Exit

This is a variant in VBA and the order is maintained

I have the following code that works fine - the Intro and Exit creation are fine as they are different to the section node. But the section node has repeating factors - rather than using the if logic of what level am I at and doing 5 times the same code (making it a nightmare to maintain) - is there a simpler way for me to iterate the levels and add to the parent.

We can assume that the if there is a level1 and then a level2 the parent of that level 2 would be the preceding level1

'loop the sections to add to dom structure - this uses a range in the spreadsheet called ExamSections
    For i = LBound(varExamSections) To UBound(varExamSections)

    'dont add blank values from the spreadsheet
        If varExamSections(i, 1) <> "" Then

            If varExamSections(i, 1) = "Intro" Then

                'add intro section element
                Set introSection = dom.createElement("section")
                assessment.appendChild introSection
                introSection.setAttribute "ident", varExamSections(i, 2)

                'add intro section proc extension
                Set sectionprocExtension = dom.createElement("sectionproc_extension")
                introSection.appendChild sectionprocExtension

                'add intro section control
                Set SectionControl = dom.createElement("sectioncontrol")
                sectionprocExtension.appendChild SectionControl
                SectionControl.setAttribute "end_assessment", "false"

            End If

            If varExamSections(i, 1) = "Level 1 - Exam Section" Then

                'add level section element
                Set sectionLvl1 = dom.createElement("section")
                assessment.appendChild sectionLvl1
                sectionLvl1.setAttribute "ident", varExamSections(i, 2)

                If varExamSections(i, 3) <> "" Then
                    sectionLvl1.setAttribute "title", varExamSections(i, 3)
                End If


            End If

            If varExamSections(i, 1) = "Level 2 - Exam Section" Then

                'add level section element
                Set sectionLvl2 = dom.createElement("section")
                sectionLvl1.appendChild sectionLvl2
                sectionLvl2.setAttribute "ident", varExamSections(i, 2)

                If varExamSections(i, 3) <> "" Then
                    sectionLvl2.setAttribute "title", varExamSections(i, 3)
                End If


            End If

            If varExamSections(i, 1) = "Level 3 - Exam Section" Then

                'add level section element
                Set sectionLvl3 = dom.createElement("section")
                sectionLvl2.appendChild sectionLvl3

                sectionLvl3.setAttribute "ident", varExamSections(i, 2)

                If varExamSections(i, 3) <> "" Then
                    sectionLvl3.setAttribute "title", varExamSections(i, 3)
                End If

            End If

            If varExamSections(i, 1) = "Level 4 - Exam Section" Then

                'add level section element
                Set sectionLvl4 = dom.createElement("section")
                sectionLvl3.appendChild sectionLvl4

                sectionLvl4.setAttribute "ident", varExamSections(i, 2)

                If varExamSections(i, 3) <> "" Then
                    sectionLvl4.setAttribute "title", varExamSections(i, 3)
                End If


            End If

            If varExamSections(i, 1) = "Level 5 - Exam Section" Then

                'add level section element
                Set sectionLvl5 = dom.createElement("section")
                sectionLvl4.appendChild sectionLvl5

                sectionLvl5.setAttribute "ident", varExamSections(i, 2)

                If varExamSections(i, 3) <> "" Then
                    sectionLvl5.setAttribute "title", varExamSections(i, 3)
                End If

            End If

            If varExamSections(i, 1) = "Exit" Then

                'add level section element
                Set sectionExit = dom.createElement("section")
                assessment.appendChild sectionExit

            End If


            'exam_reporting_groupref.setAttribute "linkrefid", varReportingGroups(i, 2)

        End If

    Next i

Any help would be greatly appreciated.

Convert list of all folders and subfolders of unit c:\ to an XML file in c#

I want to make a small program in C # that allows to load a list with the path of all folders and subfolders of any unit, eg "C: \", to an XML or any other file type. The goal is that after this, the file will serve to another program that I have to do a search and open particular folder.

It is possible to do this?

Thanks,

Fusioncharts XT Hovereffect only works on the half chart

I've got a Multi-Series Chart, which only shows animation on the left side of the chart. The showTooltip works well, as long it's on the left side but when you move over the anchors on the right side, it doesn't work.

Does anyone know this error?

repeat rows for multiple groupitem xslt 2.0

I want to achieve this in xslt 2.0. For same company the actor should repeat. And if they have multiple attributes e.g University Name then it should be comma separate.

CompanyName Location City Profession Name University

ABC Mumbai Mumbai Actor Harry Potter Delhi, Mumbai ABC Mumbai Mumbai Actor Wilma Smith Mumbai XYZ Viman Pune Doctor Ajay Singh Pune

Please suggest me xslt `

how to find out '<' character which is not a markup tag in xml string using java?

The below xml is converted into a String.I have to find '<' character which is part of xml element actionComent value

<actionTakenTaskCollectionRoot>
  <actionTakenTask actionTakenTaskId="8a8080844cd55b0b014cd5f783ea0692">
    <actionComment>a **<** b</actionComment>
  </actionTakenTask>
</actionTakenTaskCollectionRoot>

How to use BI Reporting Tools

I have just now deployed a SpagoBI server / Studio and want to generate some reports with charts with MS SQL Server as my backend database.

I was told that we need to create an XML for our query in database. I am not sure how to do that as it is my very first time.

Kindly guide.

XSLT infinite template recursion in PHP

I am using the XSLT processor available in PHP to transform an XML file. When my XML file is a sample file everything is ok but when I try to process a file with something like 1000 lines I have this error:

Warning: XSLTProcessor::transformToXml(): xsltApplyXSLTTemplate: A potential infinite template recursion was detected. You can adjust xsltMaxDepth (--maxdepth) in order to raise the maximum number of nested template calls and variables/params (currently set to 3000)

My xml file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://ift.tt/1dDT9n4">
  <teiHeader xml:lang="en" />
  <text>
    <body>
      <div type="chapter" n="1">
        <p>
          <s xml:id="e_1">In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.</s>         
        </p>
        <p>
          <s xml:id="e_2">"Whenever you feel like criticizing any one," he told me, "just remember that all the people in this world haven't had the advantages that you've had."</s>
        </p>
        </body>
  </text>
</TEI>

and my XSLT treatment is:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://ift.tt/tCZ8VR" 
xmlns:exsl="http://exslt.org/common"
xmlns:set="http://exslt.org/sets"
xmlns:tei="http://ift.tt/1dDT9n4"
extension-element-prefixes="exsl set">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="tei:div">
    <xsl:call-template name="split-chapter">
        <xsl:with-param name="nodes" select="tei:p/tei:s"/>
    </xsl:call-template>
</xsl:template>

<xsl:template name="split-chapter">
    <xsl:param name="nodes"/>
    <xsl:param name="limit" select="300"/>
    <xsl:param name="remaining-nodes" select="dummy-node"/>
    <!-- 1. Calculate the total length of nodes -->
    <xsl:variable name="lengths">
        <xsl:for-each select="$nodes">
            <length>
                <xsl:value-of select="string-length()" />
            </length>
        </xsl:for-each>
    </xsl:variable>
    <xsl:variable name="total-length" select="sum(exsl:node-set($lengths)/length)" />
    <!-- 2. Process the chapter: -->
    <xsl:choose>
        <!-- If chapter is too long and can be shortened ... -->
        <xsl:when test="$total-length > $limit and count($nodes) > 1">
            <!-- ... try again with one node less. -->
            <xsl:call-template name="split-chapter">
                <xsl:with-param name="nodes" select="$nodes[not(position()=last())]"/>
                <xsl:with-param name="remaining-nodes" select="$remaining-nodes | $nodes[last()]"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <!-- Otherwise create a chapter with the current nodes ... -->
            <div type="chapter" n="{@n}" length="{$total-length}" >
                <!-- ... list the paras participating in this chapter ... -->
                <xsl:for-each select="$nodes/parent::tei:p">
                    <p>
                        <!-- ... and process the nodes still left in each para. -->
                        <xsl:apply-templates select="set:intersection(tei:s, $nodes)"/>
                    </p>
                </xsl:for-each>
            </div>
            <!-- Then process any remaining nodes. -->
            <xsl:if test="$remaining-nodes">
                <xsl:call-template name="split-chapter">
                    <xsl:with-param name="nodes" select="$remaining-nodes"/>
                </xsl:call-template>
            </xsl:if>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

How to create a XML file with the default format

I'm beginner in XML and I have this informations

box1 -> name, colour, from

box2 -> name, weight

box3 -> name, colour, from, weight

and I want to make one XML file like this:

<boxName>name1
    <boxColour>colour1</boxColour>
    <boxFrom>from1</boxFrom>
</boxName>
<boxName>name2
    <boxColour>colour2</boxColour>
    <boxWeight>weight2</boxWeight>
</boxName>
<boxName>name3
    <boxColour>colour3</boxColour>
    <boxFrom>from3</boxFrom>
    <boxWeight>weight3</boxWeight>
</boxName>

I created my XML in this form:

TiXmlDocument doc;
TiXmlDeclaration* decl = new TiXmlDeclaration("1.0", "utf-8", "");
doc.LinkEndChild( decl );
TiXmlElement* element = new TiXmlElement("boxName");
doc.LinkEndChild(element);
TiXmlText* text = new TiXmlText("name1");
element->LinkEndChild(text);
TiXmlElement* element2 = new TiXmlElement("boxColour");
TiXmlElement* element3 = new TiXmlElement("boxFrom");
TiXmlText* text2 = new TiXmlText("colour1");
TiXmlText* text3 = new TiXmlText(from1);
element->LinkEndChild(element2);
element->LinkEndChild(element3);
element2->LinkEndChild(text2);
element3->LinkEndChild(text3);
doc.SaveFile( "XML.xml" );

but the problem is that number of boxes is unknown and each box my have 1,2,3 or more child, but the format for each box and it's information is the same (as the above)

please help me to make the XML file

Thanks

upgrade winform application not overwriting XML file even when "Always Copy"

I have an winform application say StackOverflow.exe. When I upgrade the latest release from StackOverflow.exe version 1.1 to version 1.2, its not overwriting XML file(say plugin.xml) from output content folder.

Note: plugin.xml is overwriten everytime i close StackOverflow.exe.

Aitoc layered anvigation in 1-column layout

I want to include Aitoc layered navigation pro to my 1-column layout on my magento site. I've been successful in adding the normal layered navigation by inserting this line of code in the content block catalog.xml

I thought that this would be enough to include the layered navigation pro, but this doesn't seem to be the case. Does anyone know how I can include this?

I am little stuck in writing an XSLT to remove extra roes and duplicate nodes. SO need your help.

My XML looks like following:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<Rowsets CachedTime="" DateCreated="2015-05-05T19:27:06" EndDate="2015-05-05T19:27:06" StartDate="2015-05-05T18:27:06" Version="14.0.0 Build(802)">
    <Rowset>
        <Columns>
            <Column Description="DateTime" MaxRange="0" MinRange="0" Name="DateTime" SQLDataType="93" SourceColumn="DateTime"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LI132.PV" SQLDataType="6" SourceColumn="10LI132.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LQ132.PV" SQLDataType="6" SourceColumn="10LQ132.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10TI112.PV" SQLDataType="6" SourceColumn="10TI112.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LI135.PV" SQLDataType="6" SourceColumn="10LI135.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LQ132.PV" SQLDataType="6" SourceColumn="10LQ132.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LI127.PV" SQLDataType="6" SourceColumn="10LI127.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10TI112.PV" SQLDataType="6" SourceColumn="10TI112.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LQ127.PV" SQLDataType="6" SourceColumn="10LQ127.PV"/>
        </Columns>
        <Row>
            <DateTime>2015-05-05T18:27:06</DateTime>
            <A>55465.359375</A>
            <B>1808040</B>
            <C>-331.424926757812</C>
            <D>-74553.75</D>
            <B>1808040</B>
            <F>-10100.994140625</F>
            <C>-331.424926757812</C>
            <G>-445363.5625</G>
        </Row>
        <Row>
            <DateTime>2015-05-05T18:27:06</DateTime>
            <A>NA</A>
            <B>NA</B>
            <C>NA</C>
            <D>NA</D>
            <B>1808040</B>
            <F>NA</F>
            <C>NA</C>
            <G>NA</G>
        </Row>
        <Row>
            <DateTime>2015-05-05T18:27:06</DateTime>
            <A>NA</A>
            <B>NA</B>
            <C>NA</C>
            <D>NA</D>
            <B>NA</B>
            <F>NA</F>
            <C>-331.424926757812</C>
            <G>NA</G>
        </Row>
    </Rowset>
</Rowsets>

I want my resultant XML as following:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<Rowsets CachedTime="" DateCreated="2015-05-05T19:27:06" EndDate="2015-05-05T19:27:06" StartDate="2015-05-05T18:27:06" Version="14.0.0 Build(802)">
    <Rowset>
        <Columns>
            <Column Description="DateTime" MaxRange="0" MinRange="0" Name="DateTime" SQLDataType="93" SourceColumn="DateTime"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LI132.PV" SQLDataType="6" SourceColumn="10LI132.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LQ132.PV" SQLDataType="6" SourceColumn="10LQ132.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10TI112.PV" SQLDataType="6" SourceColumn="10TI112.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LI135.PV" SQLDataType="6" SourceColumn="10LI135.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LQ132.PV" SQLDataType="6" SourceColumn="10LQ132.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LI127.PV" SQLDataType="6" SourceColumn="10LI127.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10TI112.PV" SQLDataType="6" SourceColumn="10TI112.PV"/>
            <Column Description="" MaxRange="0" MinRange="0" Name="_10LQ127.PV" SQLDataType="6" SourceColumn="10LQ127.PV"/>
        </Columns>
        <Row>
            <DateTime>2015-05-05T18:27:06</DateTime>
            <A>55465.359375</A>
            <B>1808040</B>
            <C>-331.424926757812</C>
            <D>-74553.75</D>
            <F>-10100.994140625</F>
            <G>-445363.5625</G>
        </Row>
    </Rowset>
</Rowsets>

PLease note that nodes , ets are dynamically generated so cannot hard-code them in XSLT.

Let me know if you hv got any ideas to solve my issue.

Thanks in advance.