[dscng-dev] import_dsc_dat_files.py assertion error

Bedrich Kosata bedrich.kosata at nic.cz
Thu Jun 14 19:27:08 CEST 2012


Hi Thomas,

I cannot see a duplicate in this output - 
http://thomas.dupas.be/dscng_duplicates.txt. Please note that the time 
is stored in two separate field in this output - time and minute - these 
together form the actual time (the reason is that this table is in fact 
a view that shows data that are in fact stored internally in an array).
(If there indeed was duplication of data, it would have been hidden by 
the SUM and GROUP BY anyway; if you would really like to check for 
duplicate data, you should run a simple select without grouping and with 
server_id column -
SELECT time AT TIME ZONE 'UTC',server_id,minute,value1,count FROM 
dscng_data2d WHERE time AT TIME ZONE 'UTC' >= '2012-06-14 
13:59:00+02:00' AND time AT TIME ZONE 'UTC' <= '2012-06-14 
14:59:00+02:00' AND data_type_id = 25;).

To the order of import question - provided there is no bug, it should 
not matter in which order you import your data. DSCng should even skip 
data from previous imports, so it should be possible to run import on 
the same data several times. The only limitation is that there should 
always be only one import script running at a time, because speed 
optimisations in the code are not compatible with parallel imports.

Best regards

Beda

On 06/14/2012 05:05 PM, Thomas Dupas wrote:
> Hi Beda,
>
> Thanks for the answer, shortly after my mail I executed the select query, which showed many duplicate entries for 1 minute.
> Presumably caused by the errored imports indeed.
>
> I do notice that it's easy to get duplicate entries in that table.
> For example:
> I started by importing all data for all nodes from May/2012, without stopping the dsc cron job. Worked without any issues, dashboard worked etc.
> Afterwards the June/2012 data was imported node by node through a for loop, with the cron job stopped. This caused duplicate records again, see http://thomas.dupas.be/dscng_duplicates.txt
> Does DSCng expect all data to be imported in consecutive order? I haven't tried to import all June/2012 data in one batch yet.
>
> Br,
>
> Thomas
> ________________________________________
> Van: dscng-dev-bounces at lists.nic.cz [dscng-dev-bounces at lists.nic.cz] namens Bedrich Kosata [bedrich.kosata at nic.cz]
> Verzonden: donderdag 14 juni 2012 14:25
> Aan: dscng-dev at lists.nic.cz
> Onderwerp: Re: [dscng-dev] import_dsc_dat_files.py assertion error
>
> Hi Thomas,
>
> thanks for reporting these problems.
> The first problem would probably be a rounding error where something
> like 1.00000000001 is considered higher than 1 and thus the assertion
> fails. I will fix it asap.
> As to the second problem, this is probably a result of the previous
> crash - there is some statistics that is written to the db at the end of
> an import and because of the crash it is out of sync (this is one of the
> things that need to be improved before a preview release).
> Please run the debug.py script in the root directory and let me know if
> it helps.
>
> Best regards
>
> Beda
>
>
>
> On 06/14/2012 01:53 PM, Thomas Dupas wrote:
>> FYI, in case somebody else ever has the same issue.
>> There seems to be a conflict if dsc is still running/writing at the same time.
>> Run the script again 2-3 times untill it completes correctly.
>>
>> The dashboard still doesn't work here though
>> "Overall traffic: Error occured when fetching data: Internal Server Error
>> Rcode chart: Error occured when fetching data: Internal Server Error"
>>
>> Any clues to what tables I should check?
>> apache error log contains:
>>
>> "DEBUG:dsc_storage:Getting available timespan: 0.001486
>> DEBUG:dsc_storage:Getting available timespan: 0.001532
>> DEBUG:dsc_storage:SELECT: SELECT server_id,SUM(count)
>>               FROM dscng_data2d
>>               WHERE time = '2012-06-14T13:00:00+02:00'::timestamptz AT TIME ZONE 'UTC' AND minute = 59 AND count != -1
>>               AND data_type_id = 16
>>               GROUP BY server_id;
>> DEBUG:dsc_storage:Getting available timespan: 0.000695
>> DEBUG:dsc_storage:Query preparation: 0.0105571746826
>> DEBUG:dsc_storage:SELECT time AT TIME ZONE 'UTC', minute,value1,SUM(count) FROM dscng_data2d WHERE time AT TIME ZONE 'UTC'>= '2012-06-14 12:59:00+02:00' AND time AT TIME ZONE 'UTC'<= '2012-06-14 13:59:00+02:00' AND data_type_id = 25 AND count != -1  GROUP BY time,minute,value1 ORDER BY time,minute;
>> DEBUG:dsc_storage:Query execution: 0.00430297851562
>> DEBUG:dsc_storage:Result processing: 0.00108504295349
>> DEBUG:dsc_storage:Time points: 0
>> ERROR:django.request:Internal Server Error: /dscng/json/data_type_detail_dt/
>> Traceback (most recent call last):
>>     File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 111, in get_response
>>       response = callback(request, *callback_args, **callback_kwargs)
>>     File "/var/www/dscng/dsc/plots/views.py", line 380, in get_data_type_detail_dt_json
>>       data_dict.append({'count': data[0].get(subtype, 0),
>> IndexError: list index out of range
>> ERROR:django.request:Internal Server Error: /dscng/json/overall_traffic/
>> Traceback (most recent call last):
>>     File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 111, in get_response
>>       response = callback(request, *callback_args, **callback_kwargs)
>>     File "/var/www/dscng/dsc/plots/views.py", line 319, in get_overall_traffic_json
>>       qps = storage.get_overall_traffic(timepoint) / 60.0
>> TypeError: unsupported operand type(s) for /: 'NoneType' and 'float'
>> "
>>
>> Br,
>>
>> Thomas
>>
>> ________________________________________
>> Van: dscng-dev-bounces at lists.nic.cz [dscng-dev-bounces at lists.nic.cz] namens Thomas Dupas [thomas at dupas.be]
>> Verzonden: donderdag 14 juni 2012 12:21
>> Aan: dscng-dev at lists.nic.cz
>> Onderwerp: [dscng-dev] import_dsc_dat_files.py assertion error
>>
>> Hi,
>>
>> I'm using the latest git version, where the import_dsc_dat_files.py errors out on the very last moment.
>> At first sight it can't handle the 99.9% done / 0.1% remaining to 100% done / 0% remaining transition
>>
>> "# DSCng - importing original DSC .dat files
>>
>> * Importing into database 'dsc'
>> * Logging into 'dscng_import-2012-06-14_112109.log'
>>
>> #################### 0:23:19, 3.29 s/dir, 1052 kB/s, 99.9%, ~0:00:01 to goTraceback (most recent call last):
>>     File "import_dsc_dat_files.py", line 1547, in<module>
>>       main()
>>     File "import_dsc_dat_files.py", line 1544, in main
>>       show_progress=True, stat_out_stream=stat_out_stream)
>>     File "import_dsc_dat_files.py", line 1447, in import_dirs
>>       log_to_progressbar(format_percent_bar(done_part) + " " + \
>>     File "import_dsc_dat_files.py", line 1479, in format_percent_bar
>>       assert 0<= percent<= 1
>> AssertionError"
>>
>> When querying some tables in the database I can already see some data, but I can't output anything useful in the webinterface.
>> I only see the list of servers in the left column
>>
>> Br,
>>
>> Thomas Dupas
>> _______________________________________________
>> dscng-dev mailing list
>> dscng-dev at lists.nic.cz
>> https://lists.nic.cz/cgi-bin/mailman/listinfo/dscng-dev
>> _______________________________________________
>> dscng-dev mailing list
>> dscng-dev at lists.nic.cz
>> https://lists.nic.cz/cgi-bin/mailman/listinfo/dscng-dev
>
> _______________________________________________
> dscng-dev mailing list
> dscng-dev at lists.nic.cz
> https://lists.nic.cz/cgi-bin/mailman/listinfo/dscng-dev
> _______________________________________________
> dscng-dev mailing list
> dscng-dev at lists.nic.cz
> https://lists.nic.cz/cgi-bin/mailman/listinfo/dscng-dev



More information about the dscng-dev mailing list